Skip to content

feat(cram): cleanup subprocesses on exit#11827

Closed
Alizter wants to merge 4 commits intoocaml:mainfrom
Alizter:cram-terminate-children
Closed

feat(cram): cleanup subprocesses on exit#11827
Alizter wants to merge 4 commits intoocaml:mainfrom
Alizter:cram-terminate-children

Conversation

@Alizter
Copy link
Copy Markdown
Collaborator

@Alizter Alizter commented May 21, 2025

This PR adds a reproduction case for #11820 where a subprocess is started in a cram test but becomes orphaned once the cram test terminates.

We then introduce the following fix. We use a more sophisticated trap at the start of the underlying shell script in order to terminate all processes in the process group. In order to avoid the main shell being terminated, we use a second trap to exit gracefully.

Fix #11820

This also allows us to re-enable some previously flakey tests in the CI.

A future improvement would be to run cram tests entirely as dune actions themselves, that way we can delegate process handling to the engine and have better cross platform support. That will require a lot more work however, so in the meantime we offer this improvement.

@Alizter Alizter force-pushed the cram-terminate-children branch from 4cabe9e to 9573983 Compare May 21, 2025 08:52
@Alizter Alizter changed the title test: crams not terminating subprocesses feat(cram): cleanup subprocesses on exit May 21, 2025
@Alizter Alizter marked this pull request as draft May 21, 2025 09:13
@Alizter Alizter force-pushed the cram-terminate-children branch from 9573983 to eff4020 Compare May 21, 2025 09:18
Alizter added 2 commits May 21, 2025 10:45
Signed-off-by: Ali Caglayan <alizter@gmail.com>
Signed-off-by: Ali Caglayan <alizter@gmail.com>
@Alizter Alizter force-pushed the cram-terminate-children branch 2 times, most recently from adbbb35 to 49cc1b4 Compare May 21, 2025 10:53
@Alizter Alizter marked this pull request as ready for review May 21, 2025 10:53
This should fix a common problem where users run subprocesses, for
example using `./script.sh &` inside their cram tests. If these
subprocesses do not terminate by themselves, they become orphaned and
have to be manually killed. A common use case is testing a web-server.

We add a more sophisticated trap for the underlying shell script of a
cram test so that it can successfully terminate all of its children.

Signed-off-by: Ali Caglayan <alizter@gmail.com>
@Alizter Alizter force-pushed the cram-terminate-children branch 3 times, most recently from 6595f3f to 5889ad5 Compare May 21, 2025 12:06
Signed-off-by: Ali Caglayan <alizter@gmail.com>
@Alizter Alizter force-pushed the cram-terminate-children branch from 5889ad5 to 883b219 Compare May 21, 2025 14:30
subprocess PIDs as not to orphan them. The first "trap" will kill all
processes in the process group and the second "trap" will make sure the main
shell process exits gracefully. *)
fprln oc {|trap "trap 'exit 0' TERM && kill -- -$$" EXIT|};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like a little more detail in the comment here for future readers, specifically to understand the reason for registering a trap handler inside another trap handler. Here's my guess at why it's written this way:

The kill -- -$$ sends TERM to all processes in the process group. Is the reason for the "inner" trap so that when said TERM is received by the main shell process that it exits with 0? I guess there's no easy way to send TERM to all processes in the current process's group except for the current process? And if you were to move the inner trap outside of the EXIT trap handler (and just do kill -- -$$ on EXIT) then the process could exit 0 in response to an unexpected TERM which we don't want.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, this was my solution to avoiding killing the main process. If we end up killing the main process, then Dune will complain that the process is was running received a kill signal, what we want instead is to preserve the old trap behaviour which was to exit 0.

By switching the trapping behaviour just before we kill the process group, we can signal to all the processes in the group that they should exit when encountering TERM, which they all promptly do.

Copy link
Copy Markdown
Collaborator

@gridbugs gridbugs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very useful! Minor comment to clarify the nested trap but otherwise happy with this.

Copy link
Copy Markdown
Member

@rgrinberg rgrinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right fix. We can use the same mechanism for cleanup that we do for regular actions.

@rgrinberg
Copy link
Copy Markdown
Member

Thinking about this again, it's probably fine to have some sort of behavior for cram tests, but it needs to provide some sort of an improvement over the regular behavior of killing the process group, and a way to disable this behavior for when the user wants to write their own signal handler. One way to improve the behavior could be to tell the user which pids are being leaked and making the test fail if there's leakage.

@rgrinberg
Copy link
Copy Markdown
Member

Closing as it's not useful in its current state.

@rgrinberg rgrinberg closed this Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Background tasks in Cram tests are not terminated

3 participants