With initiatives in #1598 we are very flexible and allow quite a lot of options to our users. Writing a Guide on this could be really helpful. We could also talk about technical side of each of the options.
Possible overall layout:
- General information about fastapi server and how sync and async functions are run in the server.(sync in threadpool, async in event-loop, see here)
- Using the default launch(without queue) for low traffic.(also note that queue is enabled by default in spaces)
- Increasing number of threads in the starlette server, see here, this is useful especially for remote servers with high throughput and parallelization.
example: demo/concurrency-without-queue
- Using the default queue, which will run every request sequentially.
- Configuring the queue parameters, especially concurrency_count. It depends on the server's concurrency optimality, which would depend on GPU power, ML Model size, and etc. See here
example: demo/concurrency-with-queue
With initiatives in #1598 we are very flexible and allow quite a lot of options to our users. Writing a Guide on this could be really helpful. We could also talk about technical side of each of the options.
Possible overall layout:
example: demo/concurrency-without-queue
example: demo/concurrency-with-queue