Skip to content

Run GC after tests finish before process/worker terminated to prevent timeout on termination#13139

Closed
tjenkinson wants to merge 3 commits into
jestjs:mainfrom
tjenkinson:gc-after-running-test
Closed

Run GC after tests finish before process/worker terminated to prevent timeout on termination#13139
tjenkinson wants to merge 3 commits into
jestjs:mainfrom
tjenkinson:gc-after-running-test

Conversation

@tjenkinson

@tjenkinson tjenkinson commented Aug 15, 2022

Copy link
Copy Markdown
Contributor

Summary

I was getting the

A worker process has failed to exit gracefully and has been force exited. This is likely caused by tests leaking due to improper teardown. Try running with --detectOpenHandles to find leaks. Active timers can also cause this, ensure that .unref() was called on them.

error at the end of my tests, and increasing
https://github.com/facebook/jest/blob/7b52a2edd3c2dfffa547000d2d0a8942b4e8fc5a/packages/jest-worker/src/base/BaseWorkerPool.ts#L19
fixed it.

I also noticed that triggering a manual gc in an afterAll for all tests fixed the issue too.

https://github.com/tjenkinson/redos-detector/blob/74c9eb9cc31c8399289de7aab22ef509660c0d5a/src/jest-setup.js#L1-L9

My theory for what's happening is that before the process exits node is triggering GC automatically, and this is taking longer than the FORCE_EXIT_DELAY.

So this PR triggers GC when the tests have finished, and this seems to fix the issue.

I initially thought this logic should go in the jest-worker package and started by adding a new message that would be sent from the child process when GC had completed, which the main process would wait for, and this would be what then started the force exit timer. The solution in this PR felt simpler though so I went with that for now.

Might fix #11354

Note --detectOpenHandles also "fixes" it, but I think that's because doing that also forces a globalThis.gc before the process is requested to end as part of https://github.com/facebook/jest/blob/7b52a2edd3c2dfffa547000d2d0a8942b4e8fc5a/packages/jest-runner/src/runTest.ts#L382

Test plan

I started writing a unit test but that got complicated because currently the tests in testRunner.test.ts work, but internally an error is thrown because none of the test files actually exist and can be read from the file system. If you think a test for this is needed we'll need to figure out how to mock the files so that it actually runs to the logic that then triggers the GC.

I tested this with my project where the issue can be seen.

To do that yourself you can:

You should see the error quoted above.

Then if you replace jest-runner with the version in this PR the error will go away.

because if a test triggers a lot of allocations that are no longer referenced it can take the garbage collector a while to run.

Without this change GC happens automatically later when we are waiting for the worker/process to close, and that can result in the worker/process being killed (`forceExit()`) due to hitting a timeout (`FORCE_EXIT_DELAY`)
@SimenB

SimenB commented Aug 16, 2022

Copy link
Copy Markdown
Member

I'd love to have a test for this! An integration tests seems reasonable as I don't think a unit test can capture the gc need. Ideally there's no need to mock files.

Tests that might serve as examples: https://github.com/facebook/jest/blob/812763dcfea924e098fa0f490131e42e3a4c0011/e2e/__tests__/detectOpenHandles.ts (which then runs the tests within https://github.com/facebook/jest/tree/main/e2e/detect-open-handles)

@tjenkinson

tjenkinson commented Aug 16, 2022

Copy link
Copy Markdown
Contributor Author

Having a go at creating a functional test but finding it difficult to actually trigger the issue :/

The gc call takes around 1779 ms in my project and not managing to hit anything like that so far

@tjenkinson

Copy link
Copy Markdown
Contributor Author

Ok spent more time on this. This consistently fixes my issue but I'm not super sure what the root cause is. From trial and error though here's what I found

  • not creating this WeakRef causes a manual GC call in an afterAll to go from multiple seconds down to a few milliseconds
  • when doing multiple gc calls in a row in afterAll the first is multiple seconds but the second is still > 1 second. So if it's GC blocking the process exit that doesn't really make sense because if it's still > 1 second it should still trigger the same error
  • adding a return new Promise((resolve) => setTimeout(resolve, 500)); in the afterAll also makes the error go away, so it might be something other than GC that needs to finish before the process can exit and the gc call just gives that more time
  • adding a return new Promise((resolve) => setTimeout(resolve, 500)); in the afterAll and doing the gc call after results in the gc taking less time

So I'm not convinced the gc call is really helping and it might just be a side effect of waiting a bit longer for something else to calm down (which only happens when there are a lot of WeakRef's).

Maybe increasing FORCE_EXIT_DELAY a bit (maybe 1 second), would be better?

@github-actions

Copy link
Copy Markdown

This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions Bot added the Stale label Nov 15, 2022
@github-actions

Copy link
Copy Markdown

This PR was closed because it has been stalled for 30 days with no activity. Please open a new PR if the issue is still relevant, linking to this one.

@github-actions github-actions Bot closed this Dec 15, 2022
@github-actions

Copy link
Copy Markdown

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Jan 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spurious error: "A worker process has failed to exit gracefully and has been force exited."

3 participants