Context and brief summary
While working on fractal-server, we recently noticed an unexpected behavior: when running parsl (with HTEX executor) within a uvicorn web server, some part of the DataFlowKernel.cleanup method would shutdown our active server. We discovered the error is not related to parsl, but rather to the use of multiprocessing from within a uvicorn server. You can find here a minimal reproducible example below, and our fix based on a small change to parsl/executors/high_throughput/interchange.py. Since this "bug" is not related to parsl, we are not asking for any change here. This issue is to document our findings/tests, and can be closed right away.
This is mostly work done with @mfranzon, and the corresponding issue in our repo is here.
Minimal reproducible example
We setup a fresh python 3.8.13 environment with fastapi 0.85.0, uvicorn 0.18.3 and parsl 2022.10.10.
We use the following main.py script
from fastapi import FastAPI
from parsl.dataflow.dflow import DataFlowKernel
from parsl.config import Config
from parsl.app.python import PythonApp
from parsl.executors import HighThroughputExecutor
import math
app = FastAPI()
@app.get("/")
async def root():
htex = HighThroughputExecutor()
conf = Config(executors=[htex])
dfk = DataFlowKernel(config=conf)
app = PythonApp(
math.cos,
data_flow_kernel=dfk
)
res = app(1).result()
dfk.cleanup()
return {"message": f'{res}'}
which we run from the command line via
Now our web server is active, and any POST request will trigger one execution of root(). When we call this endpoint via
the uvicorn server receives some signal that shuts it down, and the logs (in the terminal) show
INFO: 127.0.0.1:52626 - "GET / HTTP/1.1" 200 OK
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [71313]
The server is not reachable any more, which is a critical problem for our use case (fractal-server should open and close several DFKs with HTEX executors, and obviously it should remain active).
Source of the problem
We realized that the problem comes from using multiprocessing from within a uvicorn server. This is explained by @Mixser in
Quoting from those comments:
We have a master process (uvicorn) that listen and accept request on the port. During server initialization, unicorn setups signal listeners (by using asyncio.add_signal_handler calls) for graceful shutdown of the application. But it uses a file descriptor event notification instead default approach (see https://docs.python.org/3/library/signal.html#signal.set_wakeup_fd).
After initialization, we have an opened socket (which has been used in the signal.set_wakeup_fd call) and our main process waits for data from this socket. Any received data will be interpreted as a signal.
Next, we are creating a new process in our HTTP handler -- this new process will inherit this opened socket (opened file descriptor), and we are waiting for signals from this socket.
When we are sending a signal to the child, it goes to this opened socket. But our parent process listens to this socket too, so it receives a signal to terminate and shut down the application.
How to avoid -- we need to return the default behavior of signal handlers for child process and don't use the inherited fd from the parent; We can achieve this by calling signal.set_wakeup_fd(-1)
The reason why this appears in our example is that parsl HTEX executor does use a ForkProcess, in
|
self.interchange_proc = ForkProcess(target=interchange.starter, |
where
ForkProcess is simply
|
ForkProcess: Type = multiprocessing.get_context('fork').Process |
Workaround (by patching parsl)
Based on the fastapi/uvicorn issues above, we modified parsl HTEX interchange by adding the lines
import signal
signal.set_wakeup_fd(-1)
signal.signal(signal.SIGTERM, signal.SIG_DFL)
signal.signal(signal.SIGINT, signal.SIG_DFL)
at the beginning of the starter function. This is visible in our fork: https://github.com/fractal-analytics-platform/parsl/blob/1.3.1-dev/parsl/executors/high_throughput/interchange.py. At the moment we didn't notice any side effect of this change in the use of parsl, but we cannot say whether this change will affect parsl in some way.
Tests with recent parsl functionalities
As of #2433, we can specify the start_method parameter for a HTEX. This concerns the workers, rather than the interchange, and does not affect our problem here. Just for completeness, we also tested our same example by adding the start_method argument in HighThroughputExecutor: independently on our choice (fork, spawn, thread), the problem remained there. This is not surprising, since that PR only modifies what takes place in process_worker_pool.py.
One could try by replacing ForkProcess with SpawnProcess or Thread, in
|
self.interchange_proc = ForkProcess(target=interchange.starter, |
Preliminary tests led to frozen server (when using
SpawnProcess) and to other compatibility issues when using
Thread (
AttributeError: 'Thread' object has no attribute 'terminate') - and we did not investigate further.
Summary
We found a way to use parsl/HTEX from within a uvicorn server, and by now we are using it in our parsl fork. If this issue is solved upstream (in uvicorn), we'll gladly switch back to parsl mainline. This issue is here just in case others find the same problem.
Context and brief summary
While working on fractal-server, we recently noticed an unexpected behavior: when running parsl (with HTEX executor) within a uvicorn web server, some part of the
DataFlowKernel.cleanupmethod would shutdown our active server. We discovered the error is not related to parsl, but rather to the use of multiprocessing from within a uvicorn server. You can find here a minimal reproducible example below, and our fix based on a small change toparsl/executors/high_throughput/interchange.py. Since this "bug" is not related to parsl, we are not asking for any change here. This issue is to document our findings/tests, and can be closed right away.This is mostly work done with @mfranzon, and the corresponding issue in our repo is here.
Minimal reproducible example
We setup a fresh python 3.8.13 environment with fastapi 0.85.0, uvicorn 0.18.3 and parsl 2022.10.10.
We use the following
main.pyscriptwhich we run from the command line via
Now our web server is active, and any POST request will trigger one execution of
root(). When we call this endpoint viathe uvicorn server receives some signal that shuts it down, and the logs (in the terminal) show
The server is not reachable any more, which is a critical problem for our use case (
fractal-servershould open and close several DFKs with HTEX executors, and obviously it should remain active).Source of the problem
We realized that the problem comes from using multiprocessing from within a uvicorn server. This is explained by @Mixser in
Quoting from those comments:
The reason why this appears in our example is that parsl HTEX executor does use a
ForkProcess, inparsl/parsl/executors/high_throughput/executor.py
Line 477 in dfcbff8
where
ForkProcessis simplyparsl/parsl/multiprocessing.py
Line 15 in dfcbff8
Workaround (by patching parsl)
Based on the fastapi/uvicorn issues above, we modified parsl HTEX interchange by adding the lines
at the beginning of the
starterfunction. This is visible in our fork: https://github.com/fractal-analytics-platform/parsl/blob/1.3.1-dev/parsl/executors/high_throughput/interchange.py. At the moment we didn't notice any side effect of this change in the use of parsl, but we cannot say whether this change will affect parsl in some way.Tests with recent parsl functionalities
As of #2433, we can specify the
start_methodparameter for a HTEX. This concerns the workers, rather than the interchange, and does not affect our problem here. Just for completeness, we also tested our same example by adding thestart_methodargument inHighThroughputExecutor: independently on our choice (fork,spawn,thread), the problem remained there. This is not surprising, since that PR only modifies what takes place inprocess_worker_pool.py.One could try by replacing
ForkProcesswithSpawnProcessorThread, inparsl/parsl/executors/high_throughput/executor.py
Line 477 in dfcbff8
Preliminary tests led to frozen server (when using
SpawnProcess) and to other compatibility issues when usingThread(AttributeError: 'Thread' object has no attribute 'terminate') - and we did not investigate further.Summary
We found a way to use parsl/HTEX from within a uvicorn server, and by now we are using it in our parsl fork. If this issue is solved upstream (in uvicorn), we'll gladly switch back to parsl mainline. This issue is here just in case others find the same problem.