Skip to content

[Parser] Migrate response api streaming to unified parser#38755

Merged
DarkLight1337 merged 3 commits into
vllm-project:mainfrom
sfeng33:parser
Apr 8, 2026
Merged

[Parser] Migrate response api streaming to unified parser#38755
DarkLight1337 merged 3 commits into
vllm-project:mainfrom
sfeng33:parser

Conversation

@sfeng33

@sfeng33 sfeng33 commented Apr 1, 2026

Copy link
Copy Markdown
Collaborator

Co-authored with @qandrew

Purpose

Move the reasoning/tool-call streaming orchestration logic out of OpenAIServingResponses and into a new parse_delta() method in the unified parser. No behaviour change.

  • Add StreamState dataclass to hold per-stream mutable state (reasoning_ended, previous_text, previous_token_ids, etc.)
  • Add parse_delta() to DelegatingParser that orchestrates reasoning extraction, reasoning-end detection, and tool-call extraction using internal StreamState
  • Simplify _process_simple_streaming_events in serving.py — remove ~80 lines of inline parser branching, replace with single parse_delta() call

Test Plan

vllm serve Qwen/Qwen3-8B \
    --max-model-len 8192 \
    --enforce-eager \
    --reasoning-parser qwen3 \
    --enable-auto-tool-choice \
    --tool-call-parser hermes

 curl -s http://localhost:8000/v1/responses \
    -H "Content-Type: application/json" \
    -d '{
    "model": "Qwen/Qwen3-8B",
    "stream": false/true,
    "input": "What is the weather in San Francisco?",
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    ]
  }'

@sfeng33 sfeng33 marked this pull request as ready for review April 1, 2026 21:20
@mergify mergify Bot added the frontend label Apr 1, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the streaming event processing logic by introducing a parse_delta method in the Parser class to orchestrate reasoning and tool extraction using a new StreamState dataclass. Feedback points out that the parser initialization in serving.py lacks the required tools argument, which is necessary for tool call extraction. Additionally, the parse_delta implementation contains a bug where reasoning deltas are overwritten by tool deltas when both are present in a single chunk, leading to potential data loss.

Comment thread vllm/entrypoints/openai/responses/serving.py Outdated
Comment thread vllm/parser/abstract_parser.py
@sfeng33

sfeng33 commented Apr 1, 2026

Copy link
Copy Markdown
Collaborator Author

@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 2, 2026
@chaunceyjiang chaunceyjiang self-assigned this Apr 2, 2026

@aarnphm aarnphm left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. will run ready tag for this.

@sfeng33

sfeng33 commented Apr 6, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks for the review!

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Andrew Xia <axia@meta.com>
sfeng33 added 2 commits April 7, 2026 18:05
Signed-off-by: sfeng33 <4florafeng@gmail.com>
@DarkLight1337 DarkLight1337 merged commit 927975e into vllm-project:main Apr 8, 2026
50 checks passed
@sfeng33 sfeng33 deleted the parser branch April 8, 2026 02:11
@bfroemel

bfroemel commented Apr 9, 2026

Copy link
Copy Markdown

Hi @sfeng33 and @chaunceyjiang did you see/consider my bug fixing effort here: #38227 ? (I was hoping that vllm moves forward with these fixes...)

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
…ct#38755)

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Andrew Xia <axia@meta.com>
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
…ct#38755)

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Andrew Xia <axia@meta.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…ct#38755)

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Andrew Xia <axia@meta.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…ct#38755)

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Andrew Xia <axia@meta.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
…ct#38755)

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Andrew Xia <axia@meta.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…ct#38755)

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants