Skip to content

Documentation for Unix.[in/out]_channel_of_descr is misleading and leads to unwanted closing of file descriptor  #9786

@jhjourdan

Description

@jhjourdan

A common pattern for using sockets in OCaml is to transform them in two channels: one for input and one for output:

  let sock =
    Unix.socket [...] Unix.SOCK_STREAM 0 in
  let inchan = Unix.in_channel_of_descr sock in
  let outchan = Unix.out_channel_of_descr sock in

However, at some point, inchan and outchan will be closed, and this will close the underlying file descriptor. The documentation of Unix.[in/out]_channel_of_descr says:

Closing the channel also closes the underlying file descriptor (unless it was already closed).

In other words, it says implicitly that it is fine to call both close_in and close_out on inchan and outchan for closing the socket: what says the documentation says that the closing of the second channel will simply not close the underlying file descriptor, since it has already been closed.

However, things actually happens differently in the current implementation: close_in and close_out both always call the close system call, and simply ignore the return value. This is problematic, since the POSIX file descriptor may very well be reused between the closing of the two channels, resulting in the closing of a possibly completely unrelated file descriptor.

This is exemplified with the following OCaml program:

let () =
  (* Create a socket, and convert it in two channels *)
  let addr = Unix.ADDR_INET(Unix.inet_addr_loopback, 0) in
  let sock =
    Unix.socket (Unix.domain_of_sockaddr addr) Unix.SOCK_STREAM 0 in
  let inchan = Unix.in_channel_of_descr sock in
  let outchan = Unix.out_channel_of_descr sock in

  (* Close the input channel (and thus the underlying socket) *)
  close_in inchan;

  (* Open a completely unrelated file. This will reuse the file descriptor or the socket. *)
  let file = Unix.(openfile "file.tmp" [O_WRONLY;O_CREAT;O_TRUNC] 0o666) in

  (* Close the output channel of the socket. This will close the file. *)
  close_out outchan;

  (* Try to write to the file. *)
  ignore(Unix.write_substring file "aa" 0 2)
  (* Its file descriptor is closed, but we never asked so. This crashes with "exception Unix.Unix_error(Unix.EBADF, "write", "")" *)

This pattern is rather common. It is used in several tutorials for using the OCaml socket API. For example, see https://caml.inria.fr/pub/docs/oreilly-book/html/book-ora187.html, https://rosettacode.org/wiki/Sockets, https://ocaml.github.io/ocamlunix/sockets.html, ... Moreover, this bug is already present in ocamldebug implementation (not using the Unix interface, but by using the internal C API, but the idea is the same).

The reason nobody discovered the bug in real code is probably that the input and output channels are most often closed at the same time. The impression of soundness is, however, misleading: indeed, between the two closing calls, many things can happen, including a context switch to another thread, which may very well open e.g., another socket.

I can see two fixes to this bug:

  • Say that this is not a bug: make more precise the documentation and explain that one should not call both Unix.in_channel_of_descr and Unix.out_channel_of_descr on a given file descriptor. There is a simple workaround: simply call Unix.dup on the file descriptor. However, the various books, tutorials, etc... will remain there and the pattern will very probably will still be used in the wild for long.
  • Say what the docs implicitly says. The simplest is perhaps to change the internal implementation of the Unix.file_descr type, and to make it a record of two fields: the first one contains the actual POSIX file descriptor number, while the second is a mutable flag indicating whether the file descriptor has been closed. But this is a massive change in the Unix implementation, since pretty much all its functions use the file_descr type.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions