Skip to content

add proper shutdown for the hypercorn server that serves the ASF gateway#7609

Merged
thrau merged 4 commits intomasterfrom
fix-gateway-shutdown
Feb 14, 2023
Merged

add proper shutdown for the hypercorn server that serves the ASF gateway#7609
thrau merged 4 commits intomasterfrom
fix-gateway-shutdown

Conversation

@thrau
Copy link
Member

@thrau thrau commented Feb 3, 2023

Following an inquiry by @dfangl, I noticed that shutdown is never called on the HypercornServer serving the ASF gateway. This may be one of the causes for some of the exceptions we've been seeing at the end of test runs. I also found an error in the code that shuts down the event loop (shutdown_asyncgens is an async function)

This change will affect the shutdown procedure and make it take - worst case - 10 seconds, because of this line:

self._closed.wait(timeout=10)

I know the mantra has been making localstack shut down fast. With this change, the shutdown procedure will take at least 10 seconds if an HTTP request is currently in progress that is blocking the gateway shutdown. There's no practical way of forcefully cancelling request threads that are spawned by the gateway.

We could add a check for config.FORCE_SHUTDOWN, but that is turned off by default, and doubt there's anyone actually turning it on, so the change would have no effect. Although it could be useful for tests, since the added shutdown routine should have no other effect on the user experience (other than maybe removing some of the exceptions when localstack shuts down).

Please share your thoughts @alexrashed @dfangl

/add 2023-02-07
following further investigation, it became quite obvious that the shutdown procedure wouldn't have any effect on the interpreter shutdown, because ThreadPoolExecutor threads are not daemon threads. So I implemented an Executor that spawns daemon threads. It's a bit simpler than the default implementation, since we don't need to globally keep track of threads any more, and the reference keeping goes away.

Now the interpreter will shut down correctly, with the caveat that threads that claim resources may not free those correctly.

/add 2023-02-13
as @dfangl pointed out, the problem is not actually threads being not being daemon threads (which they are if they are spawned from another daemon thread), but actually the global atexit handler that joins all threads created in a thread pool. i changed the custom thread pool to a version that simply deletes the threads it creates from the global list of threads. with our current way of starting the asgi gateway (from another daemon thread), the interpreter should now shut down correctly.

@thrau thrau temporarily deployed to localstack-ext-tests February 3, 2023 23:59 — with GitHub Actions Inactive
@thrau thrau requested review from alexrashed and dfangl February 3, 2023 23:59
@github-actions
Copy link

github-actions bot commented Feb 4, 2023

LocalStack integration with Pro

       3 files  ±0         3 suites  ±0   1h 27m 49s ⏱️ -4s
1 690 tests ±0  1 357 ✔️ ±0  333 💤 ±0  0 ±0 
2 402 runs  ±0  1 731 ✔️ ±0  671 💤 ±0  0 ±0 

Results for commit 067930e. ± Comparison against base commit 4ee16a2.

♻️ This comment has been updated with latest results.

@thrau thrau force-pushed the fix-gateway-shutdown branch from 555cd17 to c62c089 Compare February 4, 2023 01:39
@thrau thrau temporarily deployed to localstack-ext-tests February 4, 2023 01:39 — with GitHub Actions Inactive
Copy link
Member

@alexrashed alexrashed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only takes the 10 seconds if a client is actively waiting for a response by LocalStack.
So, I think it is reasonable to have this shutdown grace period for ongoing HTTP requests, and 10s doesn't seem to be too much.
I am just a bit concerned about all the other spawned gateways and other server instances.
I do agree that a component's lifecycle should be controlled by the component starting it, but it would also be nice to have a safety net which properly shuts down components which are not properly handled.

Copy link
Member

@dfangl dfangl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the 10s are reasonable.
For customers, if a connection hangs, it won't change the behavior, as docker usually kills the container after 10s anyway.
For us as developers, this change should at least stop any new incoming requests which may pollute the logs.
I agree however, and am a big fan, of letting the components handle their own lifecycle, but it needs a certain discipline.

@thrau thrau force-pushed the fix-gateway-shutdown branch from c62c089 to 0352580 Compare February 7, 2023 16:20
@thrau thrau temporarily deployed to localstack-ext-tests February 7, 2023 16:20 — with GitHub Actions Inactive
@thrau thrau force-pushed the fix-gateway-shutdown branch from 0352580 to 067930e Compare February 7, 2023 16:22
@thrau thrau temporarily deployed to localstack-ext-tests February 7, 2023 16:23 — with GitHub Actions Inactive
@thrau thrau requested review from alexrashed and dfangl February 7, 2023 16:28
@thrau thrau marked this pull request as draft February 13, 2023 19:37
@thrau thrau force-pushed the fix-gateway-shutdown branch from 067930e to 6d401c6 Compare February 13, 2023 22:54
@thrau thrau temporarily deployed to localstack-ext-tests February 13, 2023 22:54 — with GitHub Actions Inactive
@thrau thrau force-pushed the fix-gateway-shutdown branch from 6d401c6 to 9b1ad24 Compare February 13, 2023 22:56
@thrau thrau temporarily deployed to localstack-ext-tests February 13, 2023 22:56 — with GitHub Actions Inactive
@thrau thrau force-pushed the fix-gateway-shutdown branch from 9b1ad24 to 9253de8 Compare February 13, 2023 22:58
@thrau thrau marked this pull request as ready for review February 13, 2023 22:58
@thrau
Copy link
Member Author

thrau commented Feb 13, 2023

i added a new version of the thread pool based on our discussion @alexrashed @dfangl

@thrau thrau temporarily deployed to localstack-ext-tests February 13, 2023 22:58 — with GitHub Actions Inactive
@thrau
Copy link
Member Author

thrau commented Feb 14, 2023

I'm going to merge this for now to see whether this fixes the pro test timeout issue! Let's revert if we see that it's not a good solution.

@thrau thrau merged commit cf29e24 into master Feb 14, 2023
@thrau thrau deleted the fix-gateway-shutdown branch February 14, 2023 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants