Skip to content

Enable non-streaming mode in transformers serve#41446

Merged
LysandreJik merged 3 commits into
mainfrom
serving-lighteval
Oct 15, 2025
Merged

Enable non-streaming mode in transformers serve#41446
LysandreJik merged 3 commits into
mainfrom
serving-lighteval

Conversation

@LysandreJik

Copy link
Copy Markdown
Member

Needs this to be merged first: #41444

Tests and docs need to be added before undraft

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Base automatically changed from streaming-requests-cb to main October 10, 2025 08:24
@LysandreJik LysandreJik marked this pull request as ready for review October 14, 2025 08:27
@LysandreJik

Copy link
Copy Markdown
Member Author

Test need a bit of a refactor, which will land once we merge @Wauplin's PR here: #41487

@ArthurZucker ArthurZucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing a lot of context, but let's make sure funcs are not nested if possible, and func names are helpful + document why we need themn

Comment thread src/transformers/commands/serving.py Outdated
from huggingface_hub import model_info
from huggingface_hub.constants import HF_HUB_OFFLINE
from openai.types.chat.chat_completion import Choice
from starlette.responses import StreamingResponse

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK what starlette is, prob a new deps?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also do we really need it?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the import from starlette to the one from FastAPI -> the FastAPI one is a re-export of the starlette one, but it's more coherent from an import shielding perspective

Comment thread src/transformers/commands/serving.py Outdated
return stream_chat_completion(generation_streamer, request_id)
if req.get("stream"):

def sse(_generator):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a very helpful func name

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, replaced it by map so that it's cleaner

Comment thread src/transformers/modeling_utils.py Outdated
load_and_register_attn_kernel(applicable_attn_implementation)
# log that we used kernel fallback if successful
if attn_implementation.startswith("flash_attention"):
if attn_implementation.startswith("flash_"):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check "flash_" in attn_implementation" because if we fallback / use a kernel, it starts with kernel_community/....

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

@LysandreJik

Copy link
Copy Markdown
Member Author

I cleaned it up a bit @ArthurZucker; IMO we might want to take a look at simplifying the stream/non-stream path in the future, depending on how the tool handling is done (still needs to be implemented for non-streaming generate and everything CB).

@ArthurZucker ArthurZucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@LysandreJik LysandreJik merged commit 13a35a5 into main Oct 15, 2025
26 checks passed
@LysandreJik LysandreJik deleted the serving-lighteval branch October 15, 2025 07:37
i3hz pushed a commit to i3hz/transformers that referenced this pull request Oct 15, 2025
* Enable non-streaming in transformers serve

Remove typos

Remove typos

Remove typos

* Fix tests

* Arthur review
ngazagna-qc pushed a commit to ngazagna-qc/transformers that referenced this pull request Oct 23, 2025
* Enable non-streaming in transformers serve

Remove typos

Remove typos

Remove typos

* Fix tests

* Arthur review
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* Enable non-streaming in transformers serve

Remove typos

Remove typos

Remove typos

* Fix tests

* Arthur review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants