add proper shutdown for the hypercorn server that serves the ASF gateway#7609
add proper shutdown for the hypercorn server that serves the ASF gateway#7609
Conversation
555cd17 to
c62c089
Compare
alexrashed
left a comment
There was a problem hiding this comment.
It only takes the 10 seconds if a client is actively waiting for a response by LocalStack.
So, I think it is reasonable to have this shutdown grace period for ongoing HTTP requests, and 10s doesn't seem to be too much.
I am just a bit concerned about all the other spawned gateways and other server instances.
I do agree that a component's lifecycle should be controlled by the component starting it, but it would also be nice to have a safety net which properly shuts down components which are not properly handled.
dfangl
left a comment
There was a problem hiding this comment.
I also think the 10s are reasonable.
For customers, if a connection hangs, it won't change the behavior, as docker usually kills the container after 10s anyway.
For us as developers, this change should at least stop any new incoming requests which may pollute the logs.
I agree however, and am a big fan, of letting the components handle their own lifecycle, but it needs a certain discipline.
c62c089 to
0352580
Compare
0352580 to
067930e
Compare
067930e to
6d401c6
Compare
6d401c6 to
9b1ad24
Compare
9b1ad24 to
9253de8
Compare
|
i added a new version of the thread pool based on our discussion @alexrashed @dfangl |
|
I'm going to merge this for now to see whether this fixes the pro test timeout issue! Let's revert if we see that it's not a good solution. |
Following an inquiry by @dfangl, I noticed that
shutdownis never called on theHypercornServerserving the ASF gateway. This may be one of the causes for some of the exceptions we've been seeing at the end of test runs. I also found an error in the code that shuts down the event loop (shutdown_asyncgensis an async function)This change will affect the shutdown procedure and make it take - worst case - 10 seconds, because of this line:
localstack/localstack/http/hypercorn.py
Line 62 in d74b355
I know the mantra has been making localstack shut down fast. With this change, the shutdown procedure will take at least 10 seconds if an HTTP request is currently in progress that is blocking the gateway shutdown. There's no practical way of forcefully cancelling request threads that are spawned by the gateway.
We could add a check for
config.FORCE_SHUTDOWN, but that is turned off by default, and doubt there's anyone actually turning it on, so the change would have no effect. Although it could be useful for tests, since the added shutdown routine should have no other effect on the user experience (other than maybe removing some of the exceptions when localstack shuts down).Please share your thoughts @alexrashed @dfangl
/add 2023-02-07
following further investigation, it became quite obvious that the shutdown procedure wouldn't have any effect on the interpreter shutdown, because
ThreadPoolExecutorthreads are not daemon threads. So I implemented an Executor that spawns daemon threads. It's a bit simpler than the default implementation, since we don't need to globally keep track of threads any more, and the reference keeping goes away.Now the interpreter will shut down correctly, with the caveat that threads that claim resources may not free those correctly.
/add 2023-02-13
as @dfangl pointed out, the problem is not actually threads being not being daemon threads (which they are if they are spawned from another daemon thread), but actually the global atexit handler that joins all threads created in a thread pool. i changed the custom thread pool to a version that simply deletes the threads it creates from the global list of threads. with our current way of starting the asgi gateway (from another daemon thread), the interpreter should now shut down correctly.