Skip to content

Easily reproducible leaking of file descriptors, show stopper. #208

@fxfactorial

Description

@fxfactorial

I have boiled down my usage to this code sample which will leak file descriptors.

say you have:

#require "lwt.unix"

open Lwt.Infix

let echo ic oc = Lwt_io.(write_chars oc (read_chars ic))

let program =
  let server_address = Unix.(ADDR_INET (inet_addr_loopback, 2000)) in

  let other_addr = Unix.(ADDR_INET (inet_addr_loopback, 2001)) in

  let server = Lwt_io.establish_server server_address begin fun (tcp_ic, tcp_oc) ->
      Lwt_io.with_connection other_addr begin fun (nc_ic, nc_oc) ->

        Lwt_io.printl "Created connection" >>= fun () ->
        echo tcp_ic nc_oc <&> echo nc_ic tcp_oc >>= fun () ->
        Lwt_io.printl "finished"

      end
      |> Lwt.ignore_result

    end
  in
  fst (Lwt.wait ())

let () =
  Lwt_main.run program

and then you create a simple server with:

nc -l 2001

and then let's start up the OCaml code with

utop example.ml

and then open up a client

nc localhost 2000
blah blah
^c

Then looking at the connections for port 2000 using lsof, we see

ocamlrun 71109 Edgar    6u  IPv4 0x7ff3e309cb80aead      0t0  TCP 127.0.0.1:callbook (LISTEN)
ocamlrun 71109 Edgar    7u  IPv4 0x7ff3e309c9dc8ead      0t0  TCP 127.0.0.1:callbook->127.0.0.1:54872 (CLOSE_WAIT)

In fact for each usage of nc localhost 2000, we'll get a leftover CLOSE_WAIT record from the lsof usage.

Eventually this will lead to the system running out of file descriptors, which will MOST annoyingly not crash the program, but will lead to Lwt to just hang.

I can't tell if I am doing something wrong or if this is a genuine bug, in any case this is a serious bug for me and I run out of file descriptors within 10 hours...

EDIT: It seems to me that the problem is that one side of the connection is closed but the other isn't, I would have thought that with_connection should cleanup/close up whenever either side closes.

EDIT II: I have tried every which way where I manually close the descriptors with Lwt_io.close, but I still have the CLOSE_WAIT message.

EDIT III: Even used Lwt_unix.close on a raw fd given to with_connection's optional fd argument with similar bad results.

EDIT IV: Most insidious is if I use Lwt_daemon.daemonize, then this problem seemingly goes away

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions