Skip to content

[Bugfix] [Frontend] responses api, refactored simple event streaming#38227

Closed
bfroemel wants to merge 17 commits into
vllm-project:mainfrom
bfroemel-ai:pr-respapi-streamingfixes
Closed

[Bugfix] [Frontend] responses api, refactored simple event streaming#38227
bfroemel wants to merge 17 commits into
vllm-project:mainfrom
bfroemel-ai:pr-respapi-streamingfixes

Conversation

@bfroemel

@bfroemel bfroemel commented Mar 26, 2026

Copy link
Copy Markdown

Purpose

This PR originally started to just collect fixes related to streaming issues uncovered during the development of #37294 Now it has evolved to a refactored simple (non-harmony code path) event streaming implementation that gives:

  • correct accumulation and emission of response.output_item.done events when streaming responses with tool calls
  • tool call parsing improvements (passing request.tools)
  • reasoning delta preservation
  • various streaming event ordering/format fixes
  • stronger separation of concerns (by moving the simple streaming event functionality into its own source code file)
  • more internal structure related to the delta message parsing, internal state tracking, etc.

Current limitations (might uncover more)

  • it's a simple event streamer that only supports the sequence pattern: (?:reasoning)?(?:output text)?(?:tool call)*
  • model/tool parser quirks required to do some filtering of spurious model output to avoid phantom output_text messages (only for example consisting of "\n\n" ).
  • tool call ids and other identifiers of the streamed events do not match the final response completed event (yet?). apparently we parse model output twice: once during streaming, and once afterwards in vllm/entrypoints/openai/responses/serving.py::responses_full_generator()
  • some code duplication and naming inconsistency (content vs. output text) is still in there; planning to take another pass at cleanup after initial feedback

Test Plan and Results

I have validated this refactor with the openrouter responses API (type of events, sequence of related events; although the openrouter API delivers a different sequence of unrelated events probably due to differences in the implementation details).

Streaming event example with reasoning and tool calls.
curl http://[hostname]/v1/responses -H 'Content-Type: application/json' -d '{
    "model": "qwen3-coder-next",
    "input": [
      { 
        
        "type": "message",
        "role": "user",
        "content": [ {"type": "input_text",
        "text": "Hi!" }]
      },
      {
        "type": "message",
        "role": "assistant",
        "content": [ {"type": "output_text",
        "text": "What is up my friend?" }]
      },
      {
        "type": "message",
        "role": "user",
        "content": [ {"type": "input_text",
        "text": "what\'s the weather in Paris and London?"}]
      }
    ],
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Returns the weather of location.",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string"
            }
          },
          "required": ["location"],
          "additionalProperties": false
        }
      }
    ],
    "stream": true,
    "parallel_tool_calls": true
  }
  '
event: response.created
data: {"response":{"id":"resp_a51e7bc805274e09","created_at":1774527787,"incomplete_details":null,"instructions":null,"metadata":null,"model":"qwen3-coder-next","object":"response","output":[],"parallel_tool_calls":true,"temperature":0.6,"tool_choice":"auto","tools":[{"name":"get_weather","parameters":{"type":"object","properties":{"location":{"type":"string"}},"required":["location"],"additionalProperties":false},"strict":null,"type":"function","defer_loading":null,"description":"Returns the weather of location."}],"top_p":0.95,"background":false,"max_output_tokens":249683,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"reasoning":null,"service_tier":"auto","status":"in_progress","text":null,"top_logprobs":null,"truncation":"disabled","usage":null,"user":null,"kv_transfer_params":null,"input_messages":null,"output_messages":null},"sequence_number":0,"type":"response.created"}

event: response.in_progress
data: {"response":{"id":"resp_a51e7bc805274e09","created_at":1774527787,"incomplete_details":null,"instructions":null,"metadata":null,"model":"qwen3-coder-next","object":"response","output":[],"parallel_tool_calls":true,"temperature":0.6,"tool_choice":"auto","tools":[{"name":"get_weather","parameters":{"type":"object","properties":{"location":{"type":"string"}},"required":["location"],"additionalProperties":false},"strict":null,"type":"function","defer_loading":null,"description":"Returns the weather of location."}],"top_p":0.95,"background":false,"max_output_tokens":249683,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"reasoning":null,"service_tier":"auto","status":"in_progress","text":null,"top_logprobs":null,"truncation":"disabled","usage":null,"user":null,"kv_transfer_params":null,"input_messages":null,"output_messages":null},"sequence_number":1,"type":"response.in_progress"}

event: response.output_item.added
data: {"item":{"id":"b92d294002ce3292","summary":[],"type":"reasoning","content":null,"encrypted_content":null,"status":"in_progress"},"output_index":0,"sequence_number":2,"type":"response.output_item.added"}

event: response.content_part.added
data: {"content_index":0,"item_id":"b92d294002ce3292","output_index":0,"part":{"text":"","type":"reasoning_text"},"sequence_number":3,"type":"response.content_part.added"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"The","item_id":"b92d294002ce3292","output_index":0,"sequence_number":4,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" user","item_id":"b92d294002ce3292","output_index":0,"sequence_number":5,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" is","item_id":"b92d294002ce3292","output_index":0,"sequence_number":6,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" asking","item_id":"b92d294002ce3292","output_index":0,"sequence_number":7,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" about","item_id":"b92d294002ce3292","output_index":0,"sequence_number":8,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" the","item_id":"b92d294002ce3292","output_index":0,"sequence_number":9,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" weather","item_id":"b92d294002ce3292","output_index":0,"sequence_number":10,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" in","item_id":"b92d294002ce3292","output_index":0,"sequence_number":11,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" Paris","item_id":"b92d294002ce3292","output_index":0,"sequence_number":12,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" and","item_id":"b92d294002ce3292","output_index":0,"sequence_number":13,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" London","item_id":"b92d294002ce3292","output_index":0,"sequence_number":14,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":".","item_id":"b92d294002ce3292","output_index":0,"sequence_number":15,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" I","item_id":"b92d294002ce3292","output_index":0,"sequence_number":16,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" have","item_id":"b92d294002ce3292","output_index":0,"sequence_number":17,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" a","item_id":"b92d294002ce3292","output_index":0,"sequence_number":18,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" get","item_id":"b92d294002ce3292","output_index":0,"sequence_number":19,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"_weather","item_id":"b92d294002ce3292","output_index":0,"sequence_number":20,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" function","item_id":"b92d294002ce3292","output_index":0,"sequence_number":21,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" that","item_id":"b92d294002ce3292","output_index":0,"sequence_number":22,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" can","item_id":"b92d294002ce3292","output_index":0,"sequence_number":23,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" help","item_id":"b92d294002ce3292","output_index":0,"sequence_number":24,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" with","item_id":"b92d294002ce3292","output_index":0,"sequence_number":25,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" this","item_id":"b92d294002ce3292","output_index":0,"sequence_number":26,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":".","item_id":"b92d294002ce3292","output_index":0,"sequence_number":27,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" I","item_id":"b92d294002ce3292","output_index":0,"sequence_number":28,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" need","item_id":"b92d294002ce3292","output_index":0,"sequence_number":29,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" to","item_id":"b92d294002ce3292","output_index":0,"sequence_number":30,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" call","item_id":"b92d294002ce3292","output_index":0,"sequence_number":31,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" it","item_id":"b92d294002ce3292","output_index":0,"sequence_number":32,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" twice","item_id":"b92d294002ce3292","output_index":0,"sequence_number":33,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":",","item_id":"b92d294002ce3292","output_index":0,"sequence_number":34,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" once","item_id":"b92d294002ce3292","output_index":0,"sequence_number":35,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" for","item_id":"b92d294002ce3292","output_index":0,"sequence_number":36,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" each","item_id":"b92d294002ce3292","output_index":0,"sequence_number":37,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" location","item_id":"b92d294002ce3292","output_index":0,"sequence_number":38,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":".","item_id":"b92d294002ce3292","output_index":0,"sequence_number":39,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"\n\n","item_id":"b92d294002ce3292","output_index":0,"sequence_number":40,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"Let","item_id":"b92d294002ce3292","output_index":0,"sequence_number":41,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" me","item_id":"b92d294002ce3292","output_index":0,"sequence_number":42,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" make","item_id":"b92d294002ce3292","output_index":0,"sequence_number":43,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" the","item_id":"b92d294002ce3292","output_index":0,"sequence_number":44,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" function","item_id":"b92d294002ce3292","output_index":0,"sequence_number":45,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" calls","item_id":"b92d294002ce3292","output_index":0,"sequence_number":46,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" for","item_id":"b92d294002ce3292","output_index":0,"sequence_number":47,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" both","item_id":"b92d294002ce3292","output_index":0,"sequence_number":48,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" Paris","item_id":"b92d294002ce3292","output_index":0,"sequence_number":49,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" and","item_id":"b92d294002ce3292","output_index":0,"sequence_number":50,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" London","item_id":"b92d294002ce3292","output_index":0,"sequence_number":51,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":".","item_id":"b92d294002ce3292","output_index":0,"sequence_number":52,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"\n","item_id":"b92d294002ce3292","output_index":0,"sequence_number":53,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.done
data: {"content_index":0,"item_id":"b92d294002ce3292","output_index":0,"sequence_number":54,"text":"The user is asking about the weather in Paris and London. I have a get_weather function that can help with this. I need to call it twice, once for each location.\n\nLet me make the function calls for both Paris and London.\n","type":"response.reasoning_text.done"}

event: response.content_part.done
data: {"content_index":0,"item_id":"b92d294002ce3292","output_index":0,"part":{"text":"The user is asking about the weather in Paris and London. I have a get_weather function that can help with this. I need to call it twice, once for each location.\n\nLet me make the function calls for both Paris and London.\n","type":"reasoning_text"},"sequence_number":55,"type":"response.content_part.done"}

event: response.output_item.done
data: {"item":{"id":"b92d294002ce3292","summary":[],"type":"reasoning","content":[{"text":"The user is asking about the weather in Paris and London. I have a get_weather function that can help with this. I need to call it twice, once for each location.\n\nLet me make the function calls for both Paris and London.\n","type":"reasoning_text"}],"encrypted_content":null,"status":"completed"},"output_index":0,"sequence_number":56,"type":"response.output_item.done"}

event: response.output_item.added
data: {"item":{"arguments":"","call_id":"call_9f7d1283cd384953","name":"get_weather","type":"function_call","id":"b46585bc2bf3c79d","namespace":null,"status":"in_progress"},"output_index":1,"sequence_number":57,"type":"response.output_item.added"}

event: response.function_call_arguments.delta
data: {"delta":"{","item_id":"b46585bc2bf3c79d","output_index":1,"sequence_number":58,"type":"response.function_call_arguments.delta"}

event: response.function_call_arguments.delta
data: {"delta":"\"location\": \"Paris\"","item_id":"b46585bc2bf3c79d","output_index":1,"sequence_number":59,"type":"response.function_call_arguments.delta"}

event: response.function_call_arguments.delta
data: {"delta":"}","item_id":"b46585bc2bf3c79d","output_index":1,"sequence_number":60,"type":"response.function_call_arguments.delta"}

event: response.function_call_arguments.done
data: {"arguments":"{\"location\": \"Paris\"}","item_id":"b46585bc2bf3c79d","name":"get_weather","output_index":1,"sequence_number":61,"type":"response.function_call_arguments.done"}

event: response.output_item.done
data: {"item":{"arguments":"{\"location\": \"Paris\"}","call_id":"call_9f7d1283cd384953","name":"get_weather","type":"function_call","id":"b46585bc2bf3c79d","namespace":null,"status":"completed"},"output_index":1,"sequence_number":62,"type":"response.output_item.done"}

event: response.output_item.added
data: {"item":{"arguments":"","call_id":"call_acd05b05d5f79645","name":"get_weather","type":"function_call","id":"a3708c000fdcfedd","namespace":null,"status":"in_progress"},"output_index":2,"sequence_number":63,"type":"response.output_item.added"}

event: response.function_call_arguments.delta
data: {"delta":"{","item_id":"a3708c000fdcfedd","output_index":2,"sequence_number":64,"type":"response.function_call_arguments.delta"}

event: response.function_call_arguments.delta
data: {"delta":"\"location\": \"London\"","item_id":"a3708c000fdcfedd","output_index":2,"sequence_number":65,"type":"response.function_call_arguments.delta"}

event: response.function_call_arguments.delta
data: {"delta":"}","item_id":"a3708c000fdcfedd","output_index":2,"sequence_number":66,"type":"response.function_call_arguments.delta"}

event: response.function_call_arguments.done
data: {"arguments":"{\"location\": \"London\"}","item_id":"a3708c000fdcfedd","name":"get_weather","output_index":2,"sequence_number":67,"type":"response.function_call_arguments.done"}

event: response.output_item.done
data: {"item":{"arguments":"{\"location\": \"London\"}","call_id":"call_acd05b05d5f79645","name":"get_weather","type":"function_call","id":"a3708c000fdcfedd","namespace":null,"status":"completed"},"output_index":2,"sequence_number":68,"type":"response.output_item.done"}

event: response.completed
data: {"response":{"id":"resp_a51e7bc805274e09","created_at":1774527787,"incomplete_details":null,"instructions":null,"metadata":null,"model":"qwen3-coder-next","object":"response","output":[{"id":"rs_a523dffe05467e64","summary":[],"type":"reasoning","content":[{"text":"The user is asking about the weather in Paris and London. I have a get_weather function that can help with this. I need to call it twice, once for each location.\n\nLet me make the function calls for both Paris and London.\n","type":"reasoning_text"}],"encrypted_content":null,"status":null},{"arguments":"{\"location\": \"Paris\"}","call_id":"chatcmpl-tool-bd94b2f768c04a4d","name":"get_weather","type":"function_call","id":"fc_a87ea0f7262df053","namespace":null,"status":"completed"},{"arguments":"{\"location\": \"London\"}","call_id":"chatcmpl-tool-a8f453d2decf1b0d","name":"get_weather","type":"function_call","id":"fc_8c0b5d07a371a03a","namespace":null,"status":"completed"}],"parallel_tool_calls":true,"temperature":0.6,"tool_choice":"auto","tools":[{"name":"get_weather","parameters":{"type":"object","properties":{"location":{"type":"string"}},"required":["location"],"additionalProperties":false},"strict":null,"type":"function","defer_loading":null,"description":"Returns the weather of location."}],"top_p":0.95,"background":false,"max_output_tokens":249683,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"reasoning":null,"service_tier":"auto","status":"completed","text":null,"top_logprobs":null,"truncation":"disabled","usage":{"input_tokens":317,"input_tokens_details":{"cached_tokens":0,"input_tokens_per_turn":[],"cached_tokens_per_turn":[]},"output_tokens":104,"output_tokens_details":{"reasoning_tokens":0,"tool_output_tokens":0,"output_tokens_per_turn":[],"tool_output_tokens_per_turn":[]},"total_tokens":421},"user":null,"kv_transfer_params":null,"input_messages":null,"output_messages":null},"sequence_number":69,"type":"response.completed"}

Additionally I have tested the codex coding-agent with the Qwen3.5-27B model. So far streaming of events is flawless.
Unit tests have been extended to cover main code paths, incl. transitions from reasoning to output_text to tool calls.

Use of LLMs

Note, for this PR I have used LLM tools (codex TUI client with Qwen3.5-27B) for the following tasks (for the first time seriously - hence my verbosity):

  • test case generation (manually reviewed)
  • reviews and for explaining the code base
  • assistance resolving merge conflicts (upstream moves fast)

I did at first attempt to only use LLMs to do the refactor. It failed :) Apparently, the models I can run locally aren't good at implementing streaming events code - always introduced subtle bugs, but also made kind-of "progress" that kept up hopes of a successful implementation for several days; but ultimately that experiment ended in a mess/good learning experience.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

bfroemel added 13 commits March 18, 2026 09:41
…e when streaming responses that include tool calls

Signed-off-by: Bernhard Froemel <bf@ctsw.at>
- refactored simple streaming events out into its own source file

Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
…ixes

Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
@mergify mergify Bot added frontend bug Something isn't working labels Mar 26, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the simple streaming event processing logic into a new dedicated module, simple_streaming_events.py. It introduces new OpenAI response types and handles complex transitions between reasoning, content, and tool calls, including validation of the SSE event sequence. A critical bug was identified where state.previous_token_ids was not updated correctly, which could lead to incorrect parsing in subsequent iterations.

Comment thread vllm/entrypoints/openai/responses/simple_streaming_events.py Outdated
@bfroemel

bfroemel commented Mar 26, 2026

Copy link
Copy Markdown
Author

@qandrew @chaunceyjiang Kindly requesting initial feedback. I'll try to merge with master every couple of days and might add additional fixes, as I continue to work on #37294 Many thanks for any guidance/input!

btw: already, the current state works well with the OpenAI codex coding TUI (a pure responses API only client) and Qwen3.5, but of course it is also always a matter how to integrate properly and universally for most models. The simple streaming events implementation on the master branch is unfortunately not generating the required/correct sequence of events for use with codex.

Signed-off-by: Bernhard Froemel <bf@ctsw.at>

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

…ixes

Signed-off-by: Bernhard Froemel <bf@ctsw.at>

@chaunceyjiang chaunceyjiang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest splitting the bug fix and the refactor into two separate PRs.

We can prioritize reviewing the bug fix first. For the refactor, I’d recommend getting familiar with the response API first, and then leveraging AI to help with it.

@bfroemel

bfroemel commented Apr 9, 2026

Copy link
Copy Markdown
Author

Uhm - so "the bug fix" is a bit more involved; without the refactor (essentially splitting up vllm/entrypoints/openai/responses/serving.py::_process_simple_streaming_events() into what's now in the new file vllm/entrypoints/openai/responses/simple_streaming_events.py I would probably not have been able to implement the desired behavior.

I can ofc try to move everything back to vllm/entrypoints/openai/responses/serving.py in one PR and do a second PR that splits it up again, but I am expecting that in the first ("bugfix") PR we'd essentially see one large change in vllm/entrypoints/openai/responses/serving.py::_process_simple_streaming_events() that is even more difficult to review compared to what we already have in this PR (with the added structure that imo should simplify review). -> Kindly asking for final clarification.

I’d recommend getting familiar with the response API first, and then leveraging AI to help with it.

Could you point out what the current PR is lacking that you feel I am not familiar (enough) with the responses API and/or should have leveraged AI to help with differently? ;)

Apologies for pushing back + appreciating your patience!!

bfroemel added 2 commits April 9, 2026 13:26
Signed-off-by: Bernhard Froemel <bf@ctsw.at>
… passing on prompt token ids to parser

Signed-off-by: Bernhard Froemel <bf@ctsw.at>
@bfroemel

bfroemel commented Apr 9, 2026

Copy link
Copy Markdown
Author

This is now on top of #38755 ([Parser] Migrate response api streaming to unified parser) which allowed to further slim vllm/entrypoints/openai/responses/simple_streaming_events.py down.

 tests/entrypoints/openai/responses/test_serving_responses.py | 977 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 vllm/entrypoints/openai/responses/serving.py                 | 528 +++-------------------------------------------------------------
 vllm/entrypoints/openai/responses/simple_streaming_events.py | 665 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1654 insertions(+), 516 deletions(-)

@mergify

mergify Bot commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bfroemel.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Apr 14, 2026
@bfroemel bfroemel closed this Apr 14, 2026
@bfroemel bfroemel deleted the pr-respapi-streamingfixes branch April 14, 2026 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend needs-rebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants