Conversation
|
All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-2218-all-demos |
|
makes sense. Should |
|
No I don't think batched functions typically support a single batch size. Usually, you use the largest batch size that you expect to fit in memory. So there would be no real problem if a batch of size 5 samples were to get fed into a batch_fn where we've set that batch_size to be 8) Would you agree with that @apolinario @pcuenca? |
|
Yes, except when using JAX, where a multiple of 8 is often required for parallelization and batch sizes are therefore fixed, usually. But this could be handled by the user in the |
|
@abidlabs This looks great! I agree with @aliabid94 that it may be better to tie the batch fn with the event listener as opposed to the queue. With this approach, it seems only one function can be batched per demo. |
|
@freddyaboulton yes for sure The
The event handlers will have the following parameters:
|
|
@abidlabs why is there a |
|
@abidlabs Thank you for the summary! Another question - Can we avoid adding another api endpoint? I think it will make it easier for api "consumers" to use the api, e.g. people loading interfaces, the front-end, people using the api manually, if we don't add another end point and it will make it easier to avoid bugs like #1316 |
|
Agree with @farukozderim, why do we need a separate batch fn? And with @freddyaboulton that we shouldn't add another API endpoint. Here's the syntax I have in mind: |
|
Thanks for all of the suggestions guys! This makes a lot of sense, we don't actually need a separate User API
Backend changes
|
I'm not sure if adding those parameters to the queue method is the best because I don't think it's a good idea to apply the same batching behavior to all backend functions. For Interfaces, for example, I don't think we want to batch the flagging or at least not the same way we batch the main prediction function. What if we control the batching behavior for interfaces via the constructor? |
I had originally avoided that because (a) I didn't want to add too many parameters to the But now that I think about it, if batch mode is enabled, we require the user to pass in a So yeah I think this is a better approach, thanks @freddyaboulton |
|
Thinking aloud here... When
|
|
All righty, this is finally ready for another review! @freddyaboulton would you be able to take a look? I've merged in your changes as well as added support for examples in batch mode and error handling. |
|
Merged in Run the following: import gradio as gr
import time
def trim_words(words, lens):
trimmed_words = []
time.sleep(5)
for w, l in zip(words, lens):
trimmed_words.append(w[:l])
return [trimmed_words]
with gr.Blocks() as demo:
with gr.Row():
word = gr.Textbox(label="word", value="abc")
leng = gr.Number(label="leng", precision=0, value=1)
output = gr.Textbox(label="Output")
with gr.Row():
run = gr.Button()
canc = gr.Button("Cancel", variant="stop")
event = run.click(trim_words, [word, leng], output, batch=True, max_batch_size=16)
canc.click(None, None, None, cancels=event)
demo.queue()
demo.launch()Open up 4 separate tabs. Then:
|
freddyaboulton
left a comment
There was a problem hiding this comment.
Looks good @abidlabs ! Awesome PR.
Unit tests and code looks good to me. I tried with some examples locally and wasn't able to break anything.
I also tested and cancelling does seem to work as expected. If the batch is already being processed, there's no way to remove the cancelled event from the batch but the frontend will disconnect the queue so the result won't be shown.
Before merge I'd like to:
- Make sure gr.Blocks.Load works with batched functions
- Run the benchmark script on a space with a batched fn to make sure it's ok
Also for some reason stream_frames and stream_audio are broken on the deployed space for this PR?
Is it introduced in this PR? Don't see it for PR 2488: https://huggingface.co/spaces/gradio-pr-deploys/pr-2488-all-demos
| else: | ||
| processed_input = raw_input | ||
| return processed_input | ||
| if inspect.isasyncgenfunction(block_fn.fn): |
There was a problem hiding this comment.
Do we support asyncgenfunctions?
There was a problem hiding this comment.
We do not, which is why the function returns False. Or did you mean something else?
| rest_of_batch = [ | ||
| event for event in self.event_queue if event.fn_index == event_fn_index | ||
| ][: batch_size - 1] | ||
| events.extend(rest_of_batch) |
There was a problem hiding this comment.
Calling remove may be inefficient cause I think python has to move all the elements of the list whenever you remove something but I suspect it would only be noticed for very large batches (which are rare)
|
Thanks for the detailed review @freddyaboulton! Let me go through the issues you identified and fix this up. Another thing we need to do is add batched functions to our guides and show example usage. But I'll do that as part of a separate PR (maybe one tackling #2016) so that we don't keep this open for too much longer and accumulate more conflicts. |
|
@abidlabs Good news I just did some benchmarking on batching and I think the results are favorable! What I didI created this space: https://huggingface.co/spaces/gradio/queue-batch-benchmark which is a copy of our existing queue benchmark but I added a batching. I also created this space: https://huggingface.co/spaces/gradio/queue-batch-benchmark-max-size-50 which uses a max_batch_size=20 (I titled the space incorrectly lol) for each function. ResultsMain branch (no batching)
This PR (max_batch_size 4)
This PR (max_batch_size 20)
I think it's interesting that a higher max_batch_size did not uniformly bring down the latency. I tried again, only sending requests for the image function:
Using a max_batch_size of 20 is still faster than no batching but not faster than a batch size of 4 🤔 I think this is evidence that we can optimize the batch logic (maybe gathering data for each event in the batch is adding overhead?) but we can do so in a future PR. IMO this can go out and we can get user feedback. Other thinggr.Blocks.load does not work if the space using batching. I tried with the queue benchmark and you'll see that the websocket returns "['foo']" and then we take the first element of it to return "[". Other than that (and the comments I already made) I think this is good to merge! Feel free to merge without fixing |
|
Amaaazing thank you so much @freddyaboulton for the benchmarking! Have a few meetings, but then I'll parse through this more fully and address the comments above |
|
Also, just noticed that a demo without batching built off this PR is showing higher latency than the same demo built off main. Would be good to look into this as well!
|
How are you running |
|
My bad for confusing terminology. There are two batches here: —batch_size 50 means send 50 requests to the space at once. The other batch size is max_batch_size in the event listener. I’ll edit my previous comment to make that clearer. |
|
I’m also going to run the two benchmark apps locally (not hosted on spaces) and run the benchmark to make sure it’s not something weird with my internet connection. |
|
Thanks @freddyaboulton this is super helpful |
|
Actually disregard that last comment about not-batching being slower in this branch. I can't reproduce locally (while I still reproduce the fact batching is faster than not batching). I did a factory reset on https://huggingface.co/spaces/gradio/queue-benchmark-batch-branch and the times are now pretty much in line with https://huggingface.co/spaces/gradio/queue-benchmark 🤷♂️ . False alarm I think. |
|
All righty, this is finally ready to be merged. I've added documentation about how to use batched functions as well. There are further optimizations that can be added, like @farukozderim pointed out, but I'll save that for a future PR (probably when I write the Guide on batched functions and concurrency) @freddyaboulton I didn't understand your question about I'll merge in in a couple of hours -- thanks all! |


Ignore this giant thread and go all the way to the bottom: #2218 (comment)
⬇️ ⬇️ ⬇️ ⬇️ ⬇️ ⬇️ ⬇️