feat(cram): cleanup subprocesses on exit by Alizter · Pull Request #11827 · ocaml/dune

Alizter · 2025-05-21T08:51:45Z

This PR adds a reproduction case for #11820 where a subprocess is started in a cram test but becomes orphaned once the cram test terminates.

We then introduce the following fix. We use a more sophisticated trap at the start of the underlying shell script in order to terminate all processes in the process group. In order to avoid the main shell being terminated, we use a second trap to exit gracefully.

Fix #11820

This also allows us to re-enable some previously flakey tests in the CI.

A future improvement would be to run cram tests entirely as dune actions themselves, that way we can delegate process handling to the engine and have better cross platform support. That will require a lot more work however, so in the meantime we offer this improvement.

Signed-off-by: Ali Caglayan <alizter@gmail.com>

This should fix a common problem where users run subprocesses, for example using `./script.sh &` inside their cram tests. If these subprocesses do not terminate by themselves, they become orphaned and have to be manually killed. A common use case is testing a web-server. We add a more sophisticated trap for the underlying shell script of a cram test so that it can successfully terminate all of its children. Signed-off-by: Ali Caglayan <alizter@gmail.com>

Signed-off-by: Ali Caglayan <alizter@gmail.com>

gridbugs · 2025-05-22T01:04:47Z

src/dune_rules/cram/cram_exec.ml

+     subprocess PIDs as not to orphan them. The first "trap" will kill all
+     processes in the process group and the second "trap" will make sure the main
+     shell process exits gracefully. *)
+  fprln oc {|trap "trap 'exit 0' TERM && kill -- -$$" EXIT|};


I'd like a little more detail in the comment here for future readers, specifically to understand the reason for registering a trap handler inside another trap handler. Here's my guess at why it's written this way:

The kill -- -$$ sends TERM to all processes in the process group. Is the reason for the "inner" trap so that when said TERM is received by the main shell process that it exits with 0? I guess there's no easy way to send TERM to all processes in the current process's group except for the current process? And if you were to move the inner trap outside of the EXIT trap handler (and just do kill -- -$$ on EXIT) then the process could exit 0 in response to an unexpected TERM which we don't want.

Exactly, this was my solution to avoiding killing the main process. If we end up killing the main process, then Dune will complain that the process is was running received a kill signal, what we want instead is to preserve the old trap behaviour which was to exit 0.

By switching the trapping behaviour just before we kill the process group, we can signal to all the processes in the group that they should exit when encountering TERM, which they all promptly do.

gridbugs

Very useful! Minor comment to clarify the nested trap but otherwise happy with this.

rgrinberg

I don't think this is the right fix. We can use the same mechanism for cleanup that we do for regular actions.

rgrinberg · 2025-06-29T18:28:01Z

Thinking about this again, it's probably fine to have some sort of behavior for cram tests, but it needs to provide some sort of an improvement over the regular behavior of killing the process group, and a way to disable this behavior for when the user wants to write their own signal handler. One way to improve the behavior could be to tell the user which pids are being leaked and making the test fail if there's leakage.

rgrinberg · 2025-07-01T23:02:58Z

Closing as it's not useful in its current state.

Alizter force-pushed the cram-terminate-children branch from 4cabe9e to 9573983 Compare May 21, 2025 08:52

Alizter mentioned this pull request May 21, 2025

Background tasks in Cram tests are not terminated #11820

Closed

Alizter changed the title ~~test: crams not terminating subprocesses~~ feat(cram): cleanup subprocesses on exit May 21, 2025

Alizter marked this pull request as draft May 21, 2025 09:13

Alizter force-pushed the cram-terminate-children branch from 9573983 to eff4020 Compare May 21, 2025 09:18

Alizter added 2 commits May 21, 2025 10:45

test: crams not terminating subprocesses

3d5f610

Signed-off-by: Ali Caglayan <alizter@gmail.com>

chore: cleanup unused function

7946146

Signed-off-by: Ali Caglayan <alizter@gmail.com>

Alizter force-pushed the cram-terminate-children branch 2 times, most recently from adbbb35 to 49cc1b4 Compare May 21, 2025 10:53

Alizter marked this pull request as ready for review May 21, 2025 10:53

Alizter force-pushed the cram-terminate-children branch 3 times, most recently from 6595f3f to 5889ad5 Compare May 21, 2025 12:06

fixup: renable previously flakey tests

883b219

Signed-off-by: Ali Caglayan <alizter@gmail.com>

Alizter force-pushed the cram-terminate-children branch from 5889ad5 to 883b219 Compare May 21, 2025 14:30

gridbugs reviewed May 22, 2025

View reviewed changes

gridbugs approved these changes May 22, 2025

View reviewed changes

rgrinberg requested changes May 22, 2025

View reviewed changes

Leonidas-from-XIV mentioned this pull request May 23, 2025

Terminate process group after cram execution #11841

Merged

Alizter mentioned this pull request Jun 12, 2025

Allow concurrent exec with watch mode #11840

Merged

rgrinberg closed this Jul 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cram): cleanup subprocesses on exit#11827

feat(cram): cleanup subprocesses on exit#11827
Alizter wants to merge 4 commits intoocaml:mainfrom
Alizter:cram-terminate-children

Alizter commented May 21, 2025 •

edited

Loading

Uh oh!

gridbugs May 22, 2025

Uh oh!

Alizter May 22, 2025

Uh oh!

gridbugs left a comment

Uh oh!

rgrinberg left a comment

Uh oh!

rgrinberg commented Jun 29, 2025

Uh oh!

rgrinberg commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Alizter commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gridbugs May 22, 2025

Choose a reason for hiding this comment

Uh oh!

Alizter May 22, 2025

Choose a reason for hiding this comment

Uh oh!

gridbugs left a comment

Choose a reason for hiding this comment

Uh oh!

rgrinberg left a comment

Choose a reason for hiding this comment

Uh oh!

rgrinberg commented Jun 29, 2025

Uh oh!

rgrinberg commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Alizter commented May 21, 2025 •

edited

Loading