add proper shutdown for the hypercorn server that serves the ASF gateway by thrau · Pull Request #7609 · localstack/localstack

thrau · 2023-02-03T23:59:35Z

Following an inquiry by @dfangl, I noticed that shutdown is never called on the HypercornServer serving the ASF gateway. This may be one of the causes for some of the exceptions we've been seeing at the end of test runs. I also found an error in the code that shuts down the event loop (shutdown_asyncgens is an async function)

This change will affect the shutdown procedure and make it take - worst case - 10 seconds, because of this line:

localstack/localstack/http/hypercorn.py

Line 62 in d74b355

self._closed.wait(timeout=10)

I know the mantra has been making localstack shut down fast. With this change, the shutdown procedure will take at least 10 seconds if an HTTP request is currently in progress that is blocking the gateway shutdown. There's no practical way of forcefully cancelling request threads that are spawned by the gateway.

We could add a check for config.FORCE_SHUTDOWN, but that is turned off by default, and doubt there's anyone actually turning it on, so the change would have no effect. Although it could be useful for tests, since the added shutdown routine should have no other effect on the user experience (other than maybe removing some of the exceptions when localstack shuts down).

Please share your thoughts @alexrashed @dfangl

/add 2023-02-07
following further investigation, it became quite obvious that the shutdown procedure wouldn't have any effect on the interpreter shutdown, because ThreadPoolExecutor threads are not daemon threads. So I implemented an Executor that spawns daemon threads. It's a bit simpler than the default implementation, since we don't need to globally keep track of threads any more, and the reference keeping goes away.

Now the interpreter will shut down correctly, with the caveat that threads that claim resources may not free those correctly.

/add 2023-02-13
as @dfangl pointed out, the problem is not actually threads being not being daemon threads (which they are if they are spawned from another daemon thread), but actually the global atexit handler that joins all threads created in a thread pool. i changed the custom thread pool to a version that simply deletes the threads it creates from the global list of threads. with our current way of starting the asgi gateway (from another daemon thread), the interpreter should now shut down correctly.

github-actions · 2023-02-04T01:27:06Z

LocalStack integration with Pro

      3 files ±0       3 suites ±0 1h 27m 49s ⏱️ -4s
1 690 tests ±0 1 357 ✔️ ±0 333 💤 ±0 0 ❌ ±0
2 402 runs ±0 1 731 ✔️ ±0 671 💤 ±0 0 ❌ ±0

Results for commit 067930e. ± Comparison against base commit 4ee16a2.

♻️ This comment has been updated with latest results.

alexrashed

It only takes the 10 seconds if a client is actively waiting for a response by LocalStack.
So, I think it is reasonable to have this shutdown grace period for ongoing HTTP requests, and 10s doesn't seem to be too much.
I am just a bit concerned about all the other spawned gateways and other server instances.
I do agree that a component's lifecycle should be controlled by the component starting it, but it would also be nice to have a safety net which properly shuts down components which are not properly handled.

localstack/aws/serving/edge.py

dfangl

I also think the 10s are reasonable.
For customers, if a connection hangs, it won't change the behavior, as docker usually kills the container after 10s anyway.
For us as developers, this change should at least stop any new incoming requests which may pollute the logs.
I agree however, and am a big fan, of letting the components handle their own lifecycle, but it needs a certain discipline.

thrau · 2023-02-13T22:58:42Z

i added a new version of the thread pool based on our discussion @alexrashed @dfangl

thrau · 2023-02-14T14:20:42Z

I'm going to merge this for now to see whether this fixes the pro test timeout issue! Let's revert if we see that it's not a good solution.

thrau temporarily deployed to localstack-ext-tests February 3, 2023 23:59 — with GitHub Actions Inactive

thrau requested review from alexrashed and dfangl February 3, 2023 23:59

thrau force-pushed the fix-gateway-shutdown branch from 555cd17 to c62c089 Compare February 4, 2023 01:39

thrau temporarily deployed to localstack-ext-tests February 4, 2023 01:39 — with GitHub Actions Inactive

alexrashed reviewed Feb 6, 2023

View reviewed changes

localstack/aws/serving/edge.py Show resolved Hide resolved

dfangl reviewed Feb 6, 2023

View reviewed changes

thrau force-pushed the fix-gateway-shutdown branch from c62c089 to 0352580 Compare February 7, 2023 16:20

thrau temporarily deployed to localstack-ext-tests February 7, 2023 16:20 — with GitHub Actions Inactive

thrau force-pushed the fix-gateway-shutdown branch from 0352580 to 067930e Compare February 7, 2023 16:22

thrau temporarily deployed to localstack-ext-tests February 7, 2023 16:23 — with GitHub Actions Inactive

thrau requested review from alexrashed and dfangl February 7, 2023 16:28

thrau marked this pull request as draft February 13, 2023 19:37

thrau added 2 commits February 13, 2023 23:22

add proper shutdown for the hypercorn server that serves the ASF gateway

7add044

add daemon thread pool to work around interpreter shutdown issues

b5d936d

thrau force-pushed the fix-gateway-shutdown branch from 067930e to 6d401c6 Compare February 13, 2023 22:54

thrau temporarily deployed to localstack-ext-tests February 13, 2023 22:54 — with GitHub Actions Inactive

thrau force-pushed the fix-gateway-shutdown branch from 6d401c6 to 9b1ad24 Compare February 13, 2023 22:56

thrau temporarily deployed to localstack-ext-tests February 13, 2023 22:56 — with GitHub Actions Inactive

simplify thread pool

9253de8

thrau force-pushed the fix-gateway-shutdown branch from 9b1ad24 to 9253de8 Compare February 13, 2023 22:58

thrau marked this pull request as ready for review February 13, 2023 22:58

thrau temporarily deployed to localstack-ext-tests February 13, 2023 22:58 — with GitHub Actions Inactive

remove logging

8047598

thrau merged commit cf29e24 into master Feb 14, 2023

thrau deleted the fix-gateway-shutdown branch February 14, 2023 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add proper shutdown for the hypercorn server that serves the ASF gateway#7609

add proper shutdown for the hypercorn server that serves the ASF gateway#7609
thrau merged 4 commits intomasterfrom
fix-gateway-shutdown

thrau commented Feb 3, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Feb 4, 2023 •

edited

Loading

Uh oh!

alexrashed left a comment

Uh oh!

Uh oh!

dfangl left a comment

Uh oh!

thrau commented Feb 13, 2023

Uh oh!

thrau commented Feb 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

thrau commented Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LocalStack integration with Pro

Uh oh!

alexrashed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dfangl left a comment

Choose a reason for hiding this comment

Uh oh!

thrau commented Feb 13, 2023

Uh oh!

thrau commented Feb 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thrau commented Feb 3, 2023 •

edited

Loading

github-actions bot commented Feb 4, 2023 •

edited

Loading