[Bugfix] [Frontend] responses api, refactored simple event streaming#38227
[Bugfix] [Frontend] responses api, refactored simple event streaming#38227bfroemel wants to merge 17 commits into
Conversation
…e when streaming responses that include tool calls Signed-off-by: Bernhard Froemel <bf@ctsw.at>
- refactored simple streaming events out into its own source file Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
…ixes Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
There was a problem hiding this comment.
Code Review
This pull request refactors the simple streaming event processing logic into a new dedicated module, simple_streaming_events.py. It introduces new OpenAI response types and handles complex transitions between reasoning, content, and tool calls, including validation of the SSE event sequence. A critical bug was identified where state.previous_token_ids was not updated correctly, which could lead to incorrect parsing in subsequent iterations.
|
@qandrew @chaunceyjiang Kindly requesting initial feedback. I'll try to merge with master every couple of days and might add additional fixes, as I continue to work on #37294 Many thanks for any guidance/input! btw: already, the current state works well with the OpenAI |
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
…ixes Signed-off-by: Bernhard Froemel <bf@ctsw.at>
chaunceyjiang
left a comment
There was a problem hiding this comment.
I suggest splitting the bug fix and the refactor into two separate PRs.
We can prioritize reviewing the bug fix first. For the refactor, I’d recommend getting familiar with the response API first, and then leveraging AI to help with it.
|
Uhm - so "the bug fix" is a bit more involved; without the refactor (essentially splitting up I can ofc try to move everything back to
Could you point out what the current PR is lacking that you feel I am not familiar (enough) with the responses API and/or should have leveraged AI to help with differently? ;) Apologies for pushing back + appreciating your patience!! |
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
… passing on prompt token ids to parser Signed-off-by: Bernhard Froemel <bf@ctsw.at>
|
This is now on top of #38755 ([Parser] Migrate response api streaming to unified parser) which allowed to further slim |
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
This PR originally started to just collect fixes related to streaming issues uncovered during the development of #37294 Now it has evolved to a refactored simple (non-harmony code path) event streaming implementation that gives:
Current limitations (might uncover more)
(?:reasoning)?(?:output text)?(?:tool call)*"\n\n").vllm/entrypoints/openai/responses/serving.py::responses_full_generator()Test Plan and Results
I have validated this refactor with the openrouter responses API (type of events, sequence of related events; although the openrouter API delivers a different sequence of unrelated events probably due to differences in the implementation details).
Streaming event example with reasoning and tool calls.
Additionally I have tested the codex coding-agent with the Qwen3.5-27B model. So far streaming of events is flawless.
Unit tests have been extended to cover main code paths, incl. transitions from reasoning to output_text to tool calls.
Use of LLMs
Note, for this PR I have used LLM tools (codex TUI client with Qwen3.5-27B) for the following tasks (for the first time seriously - hence my verbosity):
I did at first attempt to only use LLMs to do the refactor. It failed :) Apparently, the models I can run locally aren't good at implementing streaming events code - always introduced subtle bugs, but also made kind-of "progress" that kept up hopes of a successful implementation for several days; but ultimately that experiment ended in a mess/good learning experience.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.