Skip to content

Write an Advanced Guide about Queue, Concurrency, Batching in Gradio #2016

Description

@omerXfaruq

With initiatives in #1598 we are very flexible and allow quite a lot of options to our users. Writing a Guide on this could be really helpful. We could also talk about technical side of each of the options.

Possible overall layout:

  1. General information about fastapi server and how sync and async functions are run in the server.(sync in threadpool, async in event-loop, see here)
  2. Using the default launch(without queue) for low traffic.(also note that queue is enabled by default in spaces)
  3. Increasing number of threads in the starlette server, see here, this is useful especially for remote servers with high throughput and parallelization.
    example: demo/concurrency-without-queue
  4. Using the default queue, which will run every request sequentially.
  5. Configuring the queue parameters, especially concurrency_count. It depends on the server's concurrency optimality, which would depend on GPU power, ML Model size, and etc. See here
    example: demo/concurrency-with-queue

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions