Write an Advanced Guide about Queue, Concurrency, Batching in Gradio

With initiatives in #1598 we are very flexible and allow quite a lot of options to our users. Writing a Guide on this could be really helpful. We could also talk about technical side of each of the options.

Possible overall layout:

1. General information about fastapi server and how sync and async functions are run in the server.(sync in threadpool, async in event-loop, see [here](https://fastapi.tiangolo.com/async/))
1. Using the default launch(without queue) for low traffic.(also note that queue is enabled by default in spaces)
2. Increasing number of threads in the starlette server, see [here](https://github.com/gradio-app/gradio/blob/d7e5251c5ec5d581d7fdf6d0391f4a783e4f1171/gradio/blocks.py#L838), this is useful especially for remote servers with high throughput and parallelization.
**example: demo/concurrency-without-queue**
3. Using the default queue, which will run every request sequentially.
4. Configuring the queue parameters, especially concurrency_count. It depends on the server's concurrency optimality, which would depend on GPU power, ML Model size, and etc. See [here](https://github.com/gradio-app/gradio/blob/d7e5251c5ec5d581d7fdf6d0391f4a783e4f1171/gradio/blocks.py#L771)
**example: demo/concurrency-with-queue**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write an Advanced Guide about Queue, Concurrency, Batching in Gradio #2016

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Write an Advanced Guide about Queue, Concurrency, Batching in Gradio #2016

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions