Conversation
bin/exec.ml
Outdated
| killed process stops. The pid of the killed process will be reaped. *) | ||
| let kill_and_reap_process pid = | ||
| let pid_int = Pid.to_int pid in | ||
| Unix.kill pid_int Sys.sigkill; |
There was a problem hiding this comment.
sigkill is a bit aggressive. sigstop is gentler.
There was a problem hiding this comment.
do you mean sigterm? sigstop will "pause" the process.
There was a problem hiding this comment.
I initially used sigkill here because on windows it's the only signal emulated. On unix I think we should send sigterm and then if the process is still running after some period, send sigkill. Scheduler.wait_for_process waits for a process with a timeout and sends sigkill if the timeout expires so that might be helpful, but it looks like it raises Build_cancelled if the build is restarted while it's waiting.
There was a problem hiding this comment.
I initially used sigkill here because on windows it's the only signal emulated
You can keep using Sygkill on Windows.
Scheduler.wait_for_process waits for a process with a timeout and sends sigkill if the timeout expires so that might be helpful, but it looks like it raises Build_cancelled if the build is restarted while it's waiting.
The behavior that we would like would be:
- Send sigterm
- Wait for a timeout
- Send sigkill if the process isn't dead
It would of course be nice if we could sure that with the rest of the code somehow.
There was a problem hiding this comment.
Updated to wait send sigterm and then wait for a timeout before sending sigkill if the process is still running. I implemented this with periodic calls to wait [ WNOHANG ] because it seems simpler than a blocking call to wait in a separate thread.
bin/exec.ml
Outdated
| let kill_and_reap_process pid = | ||
| let pid_int = Pid.to_int pid in | ||
| Unix.kill pid_int Sys.sigkill; | ||
| let _stopped_pid, _process_status = Unix.waitpid [] pid_int in |
There was a problem hiding this comment.
Why not use the Scheduler primitives for launching and waiting for processes?
There was a problem hiding this comment.
Does Scheduler provide a way of starting a process? The only relevant function I see is:
val wait_for_process :
?timeout:float
-> ?is_process_group_leader:bool
-> Pid.t
-> Proc.Process_info.t Fiber.tfor waiting on a process. My goal is to start a process which runs concurrently to dune, and when a file is changed, to kill the child process if it's still running and then wait on it so its pid can be reaped. I don't see a way of doing this with the Scheduler module.
There was a problem hiding this comment.
There's Process for launching the process. But actually, you don't need that and you're free to use your existing function. You can definitely use Scheduler.wait_for_process though.
The other operations you've mentioned can be done outside of Scheduler.
| let build_and_run_in_child_process | ||
| { get_path_and_build_if_necessary; args; env } = | ||
| get_path_and_build_if_necessary () | ||
| |> Fiber.map ~f:(Result.map ~f:(spawn_process ~args ~env)) |
There was a problem hiding this comment.
We should release the global lock before running the process and then acquire it again. So that the user can start builds while their program is running.
There was a problem hiding this comment.
Updated to take the lock around running the build system
46bbcd1 to
4290314
Compare
7fb56ed to
7b48f1c
Compare
|
@rgrinberg do you see any more changes that need to be made to this PR? |
snowleopard
left a comment
There was a problem hiding this comment.
Thanks, this looks pretty cool! I'll let Rudi approve once his comments are addressed.
990b614 to
488e5cc
Compare
|
Looks like something is hanging on macos. I'll dig deeper tomorrow. |
|
On macos it looks like we miss filesystem events if they happen to close together in time. The tests wait until the program has run before modifying its source code to trigger a rebuild, and on macos sometimes a rebuild is not triggered by the change. Adding a delay fixes the problem but I don't want to depend on that in a test so I've disabled the tests on macos. I'll make an issue for this once this change is merged. |
|
Yeah, the issue on macos is well known. I think we have an assortment of hacks to deal with it in the test suite. I added some cosmetic changes to make it easier to review the code. The old code nested to the right a little too much. I removed releasing the lock. Unfortunately, it's not correct because we aren't unloading and reloading the in memory databases. |
bin/exec.ml
Outdated
| restore_cwd_and_execve common prog argv env | ||
| (* This will prevent status line printing from interfering with the output of | ||
| the running program. *) | ||
| Console.Backend.set Console.Backend.dumb; |
There was a problem hiding this comment.
I don't think this is right because we aren't calling finish on the backend. I'll remove it for now and this fix can come independently since it also affects normal dune exec.
|
Okay, the PR looks good to me. There's still some issues, but they can be addressed separately. |
Signed-off-by: Stephen Sherratt <stephen@sherra.tt>
|
@rgrinberg thanks for the fixes :) I've squashed all the commits into a single commit and rebased |
This reverts commit 5ff9a4f. This was causing occasional segfaults on macos when running `dune exec` so reverting this until we figure out what's causing that.
This reverts commit 5ff9a4f. This was causing occasional segfaults on macos when running `dune exec` so reverting this until we figure out what's causing that. Signed-off-by: Stephen Sherratt <stephen@sherra.tt>
Fixes #2934
Signed-off-by: Stephen Sherratt stephen@sherra.tt