Skip to content

eio_luv: allow Net.connect to be cancelled#311

Merged
talex5 merged 1 commit intoocaml-multicore:mainfrom
talex5:cancel-connect
Sep 20, 2022
Merged

eio_luv: allow Net.connect to be cancelled#311
talex5 merged 1 commit intoocaml-multicore:mainfrom
talex5:cancel-connect

Conversation

@talex5
Copy link
Copy Markdown
Collaborator

@talex5 talex5 commented Sep 8, 2022

Fixes #303.

/cc @haesbaert @nojb

@haesbaert
Copy link
Copy Markdown
Contributor

haesbaert commented Sep 8, 2022

I think we have to call the uv_handle_close on the TCP handle, and then wait for the connect to fail, not do a manual shutdown on the underlying descriptor, I wrote this to demonstrate what I mean:

#include <stdio.h>
#include <stdlib.h>
#include <uv.h>

static void
on_connect_cb(uv_connect_t *uv_connect, int status)
{
	printf("%s: %p status=%d (%s)\n", __func__, uv_connect, status, uv_strerror(status));
}

static void
on_timer_cb(uv_timer_t *uv_timer)
{
	uv_tcp_t *uv_socket = uv_handle_get_data((uv_handle_t *)uv_timer);
	
	printf("%s: uv_timer=%p uv_socket=%p\n", __func__, uv_socket,
	    uv_socket);
	uv_close((uv_handle_t *)uv_socket, (uv_close_cb)NULL);
}

int
main(void)
{
	uv_loop_t *loop = uv_default_loop();
	uv_tcp_t uv_socket;
	uv_connect_t uv_connect;
	struct sockaddr_in sin;
	uv_timer_t uv_timer;

	uv_tcp_init(loop, &uv_socket);
	uv_ip4_addr("142.250.186.174", 8080, &sin);

	uv_tcp_connect(&uv_connect, &uv_socket, (const struct sockaddr *)&sin,
	    on_connect_cb);

	uv_timer_init(loop, &uv_timer);
	uv_handle_set_data((uv_handle_t *)&uv_timer, &uv_socket);
	uv_timer_start(&uv_timer, on_timer_cb, 1000, 0);

	printf("Default loop.\n");
	uv_run(loop, UV_RUN_DEFAULT);

	uv_loop_close(loop);
	return (0);
}

This seems to be correct as the callback is called with UV_CANCELED:

sam:tmp: cc -o uv_connect_cancel uv_connect_cancel.c -Wall -luv && ./uv_connect_cancel; echo $?
Default loop.
on_timer_cb: uv_timer=0x7ffdd04f15b0 uv_socket=0x7ffdd04f15b0
on_connect_cb: 0x7ffdd04f14b0 status=-125 (operation canceled)
0

@haesbaert
Copy link
Copy Markdown
Contributor

The manual describes the semantics:

       void uv_close(uv_handle_t *handle, uv_close_cb close_cb)
              Request  handle  to  be  closed.  close_cb will be called asynchronously after this call. This MUST be
              called on each handle before memory is released.   Moreover,  the  memory  can  only  be  released  in
              close_cb or after it has returned.

              Handles  that  wrap file descriptors are closed immediately but close_cb will still be deferred to the
              next iteration of the event loop.  It gives you a chance to free up any resources associated with  the
              handle.

              In-progress  requests,  like uv_connect_t or uv_write_t, are cancelled and have their callbacks called
              asynchronously with status=UV_ECANCELED.

Comment on lines +503 to +507
Fiber_context.set_cancel_fn k.fiber (fun _ex ->
match to_unix_opt `Peek sock with
| Some fd -> Unix.shutdown fd Unix.SHUTDOWN_ALL
| None -> ()
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Fiber_context.set_cancel_fn k.fiber (fun _ex ->
match to_unix_opt `Peek sock with
| Some fd -> Unix.shutdown fd Unix.SHUTDOWN_ALL
| None -> ()
)
Fiber_context.set_cancel_fn k.fiber (fun _ex -> Handle.close sock)

| None -> ()
)
with ex ->
Handle.close sock;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Handle.close sock;
Handle.ensure_closed sock;

@talex5
Copy link
Copy Markdown
Collaborator Author

talex5 commented Sep 8, 2022

@haesbaert : that sounds good. However, the tests hang if I do that!

I moved things around a bit and added a commented-out close call. Does it work for you if you uncomment it and do make test_luv?

@haesbaert
Copy link
Copy Markdown
Contributor

@haesbaert : that sounds good. However, the tests hang if I do that!

I moved things around a bit and added a commented-out close call. Does it work for you if you uncomment it and do make test_luv?

You're right, and no, it's still hangs with/without the commented line :(

@haesbaert
Copy link
Copy Markdown
Contributor

I noticed that make test_luv hangs once in a while in main/HEAD (355478d) as well, I didn't try tracking it yet.

@talex5
Copy link
Copy Markdown
Collaborator Author

talex5 commented Sep 9, 2022

Yeah, I'm not sure why the tests hang. Would be good to find out.

It would also be good to make it easier to find out, maybe by improving MDX, or by having Eio dump out some state on a signal or something. If you do EIO_BACKEND=luv dune runtest after a hanging build, the dune build log will at least tell you which test file it was (because it tries to rebuild only that one).

If you can capture a hanging case with rr record, that would be great!

@haesbaert
Copy link
Copy Markdown
Contributor

haesbaert commented Sep 10, 2022

Did some minor digging, it's not in MDX, it seems to hang on lib_eio_linux/tests/test.ml on the very first test:

(* Write a string to a pipe and read it out again. *)
let test_copy () =
  Printf.eprintf "COPY A\n";
  Eio_linux.run ~queue_depth:3 @@ fun _stdenv ->
  Switch.run @@ fun sw ->
  Printf.eprintf "COPY B\n";
  let msg = "Hello!" in
  let from_pipe, to_pipe = Eio_linux.pipe sw in
  let buffer = Buffer.create 20 in
  Printf.eprintf "COPY C\n";
  Fiber.both
    (fun () ->
       Printf.eprintf "COPY D1\n";
       Eio.Flow.copy from_pipe (Eio.Flow.buffer_sink buffer); (* XXXXX hangs here XXXXX *)
       Printf.eprintf "COPY D1A\n";
    )
    (fun () ->
       Printf.eprintf "COPY D2\n";
       Eio.Flow.copy (Eio.Flow.string_source msg) to_pipe;
       Printf.eprintf "COPY D2A\n";
       Eio.Flow.copy (Eio.Flow.string_source msg) to_pipe;
       Eio.Flow.close to_pipe;
       Printf.eprintf "COPY D2B\n";
    );
  Printf.eprintf "COPY D3\n";
  Alcotest.(check string) "Copy correct" (msg ^ msg) (Buffer.contents buffer);
  Printf.eprintf "COPY D4\n";
  Eio.Flow.close from_pipe;
  Printf.eprintf "COPY D5\n";
  Printf.eprintf "\nCOPY FINISHED!!!\n\n"

gives:

sam:eio_linux: cd $(pwd);  grep COPY io.000.output
COPY A
COPY B
COPY C
COPY D1
COPY D2
COPY D2A
COPY D2B

I'm running

        rm -rf _build
        EIO_BACKEND=luv dune runtest --always-show-command-line --verbose --no-buffer

Funny thing is I can't trigger with runtest -j1.

@talex5
Copy link
Copy Markdown
Collaborator Author

talex5 commented Sep 12, 2022

Be carefuil: Printf is buffered:

utop # Printf.eprintf "hi\n"; Unix.sleep 2;;
[ waits 2 seconds ]
hi
- : unit = ()

You can use %! to force a flush, or use Eio's traceln (which always flushes).

@haesbaert
Copy link
Copy Markdown
Contributor

Be carefuil: Printf is buffered:

utop # Printf.eprintf "hi\n"; Unix.sleep 2;;
[ waits 2 seconds ]
hi
- : unit = ()

You can use %! to force a flush, or use Eio's traceln (which always flushes).

Ah yes, stupid me, I automatically assumed that eprintf was unbuffered, turns out I got lucky anyway, so it hangs because somehow the EOF is not seen by the reader Fiber, it reads "Hello" twice from the input pipe and then hangs forever waiting for the the EOF, if you put a sleep in the start of the writer, somehow it fixes, I didn't look much further.

@talex5
Copy link
Copy Markdown
Collaborator Author

talex5 commented Sep 13, 2022

That is odd! What happens if you just run ./_build/default/lib_eio_linux/tests/test.exe in a loop? It doesn't hang for me.

@talex5
Copy link
Copy Markdown
Collaborator Author

talex5 commented Sep 20, 2022

Given that the test hang is in code not affected by this (eio_luv) PR, and was occurring before, I propose to merge this as-is and open a new issue for the eio_linux test hang.

@haesbaert
Copy link
Copy Markdown
Contributor

Given that the test hang is in code not affected by this (eio_luv) PR, and was occurring before, I propose to merge this as-is and open a new issue for the eio_linux test hang.

Yes please, sorry I got sidetracked.

Fixes ocaml-multicore#303.

Reported by Nicolás Ojeda Bär.
@talex5 talex5 merged commit 97248d9 into ocaml-multicore:main Sep 20, 2022
@talex5 talex5 deleted the cancel-connect branch September 20, 2022 09:11
@talex5 talex5 mentioned this pull request Sep 20, 2022
talex5 added a commit to talex5/opam-repository that referenced this pull request Oct 12, 2022
CHANGES:

Changes:

- Update to OCaml 5.0.0~beta1 (@anmonteiro @talex5 ocaml-multicore/eio#346).

- Add API for seekable read/writes (@nojb ocaml-multicore/eio#307).

- Add `Flow.write` (@haesbaert ocaml-multicore/eio#318).
  This provides an optimised alternative to `copy` in the case where you are writing from a buffer.

- Add `Net.with_tcp_connect` (@bikallem ocaml-multicore/eio#302).
  Convenience function for opening a TCP connection.

- Add `Eio.Time.Timeout` (@talex5 ocaml-multicore/eio#320).
  Makes it easier to pass timeouts around.

- Add `Eio_mock.Clock` (@talex5 ocaml-multicore/eio#328).
  Control time in tests.

- Add `Buf_read.take_while1` and `skip_while1` (@bikallem ocaml-multicore/eio#309).
  These fail if no characters match.

- Make the type parameter for `Promise.t` covariant (@anmonteiro ocaml-multicore/eio#300).

- Move list functions into a dedicated submodule (@raphael-proust ocaml-multicore/eio#315).

- Direct implementation of `Flow.source_string` (@c-cube ocaml-multicore/eio#317).
  Slightly faster.

Bug fixes:

- `Condition.broadcast` must interlock as well (@haesbaert ocaml-multicore/eio#324).

- Split the reads into no more than 2^32-1 for luv (@haesbaert @talex5 @EduardoRFS ocaml-multicore/eio#343).
  Luv uses a 32 bit int for buffer sizes and wraps if the value passed is too big.

- eio_luv: allow `Net.connect` to be cancelled (@talex5 @nojb ocaml-multicore/eio#311).

- eio_main: Use dedicated log source (@anmonteiro ocaml-multicore/eio#326).

- linux_eio: fix kernel version number in log message (@talex5 @nojb ocaml-multicore/eio#314).

- Account for stack differences in the socketpair test (issue ocaml-multicore/eio#312) (@haesbaert ocaml-multicore/eio#313).

Documentation:

- Add Best Practices section to README (@talex5 ocaml-multicore/eio#299).

- Documentation improvements (@talex5 ocaml-multicore/eio#295 ocaml-multicore/eio#337).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eio.Net.connect cannot be cancelled when using luv backend

2 participants