Enable non-streaming mode in transformers serve#41446
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
21cf126 to
5a99194
Compare
Remove typos Remove typos Remove typos
af65c33 to
dca9e05
Compare
ArthurZucker
left a comment
There was a problem hiding this comment.
I am missing a lot of context, but let's make sure funcs are not nested if possible, and func names are helpful + document why we need themn
| from huggingface_hub import model_info | ||
| from huggingface_hub.constants import HF_HUB_OFFLINE | ||
| from openai.types.chat.chat_completion import Choice | ||
| from starlette.responses import StreamingResponse |
There was a problem hiding this comment.
IDK what starlette is, prob a new deps?
There was a problem hiding this comment.
(also do we really need it?)
There was a problem hiding this comment.
I moved the import from starlette to the one from FastAPI -> the FastAPI one is a re-export of the starlette one, but it's more coherent from an import shielding perspective
| return stream_chat_completion(generation_streamer, request_id) | ||
| if req.get("stream"): | ||
|
|
||
| def sse(_generator): |
There was a problem hiding this comment.
not a very helpful func name
There was a problem hiding this comment.
Correct, replaced it by map so that it's cleaner
| load_and_register_attn_kernel(applicable_attn_implementation) | ||
| # log that we used kernel fallback if successful | ||
| if attn_implementation.startswith("flash_attention"): | ||
| if attn_implementation.startswith("flash_"): |
There was a problem hiding this comment.
we should check "flash_" in attn_implementation" because if we fallback / use a kernel, it starts with kernel_community/....
|
I cleaned it up a bit @ArthurZucker; IMO we might want to take a look at simplifying the stream/non-stream path in the future, depending on how the tool handling is done (still needs to be implemented for non-streaming generate and everything CB). |
* Enable non-streaming in transformers serve Remove typos Remove typos Remove typos * Fix tests * Arthur review
* Enable non-streaming in transformers serve Remove typos Remove typos Remove typos * Fix tests * Arthur review
* Enable non-streaming in transformers serve Remove typos Remove typos Remove typos * Fix tests * Arthur review
Needs this to be merged first: #41444
Tests and docs need to be added before undraft