Skip to content

πŸ› fix: stream tool call arguments incrementally in Response API#13506

Merged
arvinxx merged 2 commits intocanaryfrom
fix/response-api-tool-call-streaming
Apr 2, 2026
Merged

πŸ› fix: stream tool call arguments incrementally in Response API#13506
arvinxx merged 2 commits intocanaryfrom
fix/response-api-tool-call-streaming

Conversation

@arvinxx
Copy link
Copy Markdown
Member

@arvinxx arvinxx commented Apr 2, 2026

πŸ’» Change Type

  • πŸ› fix

πŸ”— Related Issue

πŸ”€ Description of Change

Fix Response API streaming for tool calls. The internal tools_calling stream chunks contain accumulated arguments (the full string up to that point), but the Response API was treating each chunk as a complete independent output_item β€” emitting a full lifecycle (added β†’ delta β†’ done β†’ item.done) per token delta and incrementing output_index from 0 to 90+.

Before: Each token delta creates a new output_item:

output_index=0: added β†’ delta("") β†’ done β†’ item.done
output_index=1: added β†’ delta('{"des') β†’ done β†’ item.done
output_index=2: added β†’ delta('{"description"') β†’ done β†’ item.done
... up to output_index=93

After: Single stable output_item with true incremental deltas:

output_index=1: added (in_progress)
output_index=1: delta('{"des')
output_index=1: delta('cription"')
output_index=1: delta(': "η”Ÿζˆ')
... 
output_index=1: arguments.done β†’ item.done (completed)

Implementation:

  • Track active tool calls in a Map<callId, {fcItemId, outputIndex, prevArguments}>
  • First chunk: emit output_item.added + initial delta
  • Subsequent chunks: compute incremental delta via args.slice(prevArgs.length)
  • Finalize with arguments.done + output_item.done when stream ends or tool_end arrives

πŸ§ͺ How to Test

  • Tested locally
  • No tests needed (no existing test infra for openapi package)

Test with: POST /api/v1/responses with a model that returns tool calls (e.g., code sandbox). Verify:

  1. output_index stays stable per tool call
  2. delta fields contain only incremental content
  3. Only one output_item.added / output_item.done pair per tool call
  4. response.completed output matches the streamed tool call

πŸ€– Generated with Claude Code

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lobehub Ready Ready Preview, Comment Apr 2, 2026 5:45pm

Request Review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @arvinxx, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@lobehubbot
Copy link
Copy Markdown
Member

@nekomeowww - This is a backend API fix for Response API streaming (tool call arguments). Please take a look.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

βœ… All modified and coverable lines are covered by tests.
βœ… Project coverage is 66.43%. Comparing base (dd7819b) to head (eef3e86).
⚠️ Report is 1 commits behind head on canary.

Additional details and impacted files
@@           Coverage Diff            @@
##           canary   #13506    +/-   ##
========================================
  Coverage   66.43%   66.43%            
========================================
  Files        1976     1976            
  Lines      163601   163601            
  Branches    18709    19473   +764     
========================================
  Hits       108695   108695            
  Misses      54784    54784            
  Partials      122      122            
Flag Coverage Ξ”
app 58.08% <ΓΈ> (ΓΈ)
database 92.57% <ΓΈ> (ΓΈ)
packages/agent-runtime 88.98% <ΓΈ> (ΓΈ)
packages/context-engine 86.51% <ΓΈ> (ΓΈ)
packages/conversation-flow 92.36% <ΓΈ> (ΓΈ)
packages/file-loaders 87.02% <ΓΈ> (ΓΈ)
packages/memory-user-memory 66.68% <ΓΈ> (ΓΈ)
packages/model-bank 99.85% <ΓΈ> (ΓΈ)
packages/model-runtime 84.68% <ΓΈ> (ΓΈ)
packages/prompts 66.48% <ΓΈ> (ΓΈ)
packages/python-interpreter 92.90% <ΓΈ> (ΓΈ)
packages/ssrf-safe-fetch 0.00% <ΓΈ> (ΓΈ)
packages/utils 90.41% <ΓΈ> (ΓΈ)
packages/web-crawler 88.82% <ΓΈ> (ΓΈ)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Ξ”
Store 66.55% <ΓΈ> (ΓΈ)
Services 49.05% <ΓΈ> (ΓΈ)
Server 65.94% <ΓΈ> (ΓΈ)
Libs 51.03% <ΓΈ> (ΓΈ)
Utils 91.01% <ΓΈ> (ΓΈ)
πŸš€ New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • πŸ“¦ JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 179a8eae8e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with πŸ‘.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

arvinxx and others added 2 commits April 3, 2026 00:58
The tool_calling stream chunks contain accumulated arguments (not
deltas), but the Response API was treating each chunk as a complete
independent output_item β€” creating a new lifecycle (added β†’ delta β†’
done) per token and incrementing output_index to 90+.

Fix: track active tool calls by call_id and compute true incremental
deltas by slicing off previously-seen content. Each tool call now
gets a single stable output_item with proper streaming deltas,
finalized only when the stream ends or tool execution begins.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When call_llm retries after a failed attempt, activeToolCalls may
contain entries from the failed stream that never received a
tool_end. Without clearing, finishActiveToolCalls would emit
phantom function_call done events and misalign output_index for
the successful attempt. Reset the map on stream_retry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@arvinxx arvinxx force-pushed the fix/response-api-tool-call-streaming branch from 216a2d3 to eef3e86 Compare April 2, 2026 16:58
@arvinxx arvinxx merged commit 126db96 into canary Apr 2, 2026
34 checks passed
@arvinxx arvinxx deleted the fix/response-api-tool-call-streaming branch April 2, 2026 17:46
arvinxx added a commit that referenced this pull request Apr 7, 2026
# πŸš€ release: 20260407

This release includes **148 commits**. Key updates are below.

- **Response API tool execution is more capable and reliable** β€” Added
hosted builtin tools + client-side function tools and improved tool-call
streaming/completion behavior.
[#13406](#13406)
[#13414](#13414)
[#13506](#13506)
[#13555](#13555)
- **Input and composition UX upgraded** β€” Added AI input auto-completion
and multiple chat-input stability fixes.
[#13458](#13458)
[#13551](#13551)
[#13481](#13481)
- **Model/provider compatibility improved** β€” Better Gemini/Google tool
schema handling and additional model updates.
[#13429](#13429)
[#13465](#13465)
[#13613](#13613)
- **Desktop and CLI reliability improved** β€” Gateway WebSocket support
and desktop runtime upgrades.
[#13608](#13608)
[#13550](#13550)
[#13557](#13557)
- **Security hardening continued** β€” Fixed auth and sanitization risks
and upgraded vulnerable dependencies.
[#13535](#13535)
[#13529](#13529)
[#13479](#13479)

### Models & Providers

- Added/updated support for `glm-5v-turbo`, GLM-5.1 updates, and
qwen3.5-omni series.
[#13487](#13487)
[#13405](#13405)
[#13422](#13422)
- Added additional ImageGen providers/models (Wanxiang 2.7 and Keling
from Qwen). [#13478](#13478)
- Improved Gemini/Google tool schema and compatibility handling across
runtime paths. [#13429](#13429)
[#13465](#13465)
[#13613](#13613)

### Response API & Runtime

- Added hosted builtin tools in Response API and client-side function
tool execution support.
[#13406](#13406)
[#13414](#13414)
- Improved stream tool-call argument handling and `response.completed`
output correctness.
[#13506](#13506)
[#13555](#13555)
- Improved runtime error/context handling for intervention and provider
edge cases. [#13420](#13420)
[#13607](#13607)

### Desktop App

- Bumped desktop dependencies and runtime integrations (`agent-browser`,
`electron`). [#13550](#13550)
[#13557](#13557)
- Simplified desktop release channel setup by removing nightly release
flow. [#13480](#13480)

### CLI

- Added OpenClaw migration command.
[#13566](#13566)
- Added local device binding support for `lh agent run`.
[#13277](#13277)
- Added WebSocket gateway support and reconnect reliability
improvements. [#13608](#13608)
[#13418](#13418)

### Security

- Removed risky `apiKey` fallback behavior in webapi auth path to
prevent bypass risk.
[#13535](#13535)
- Sanitized HTML artifact rendering and iframe sandboxing to reduce
XSS-to-RCE risk. [#13529](#13529)
- Upgraded nodemailer to v8 to address SMTP command injection advisory.
[#13479](#13479)

### Bug Fixes

- Fixed image generation model default switch issues.
[#13587](#13587)
- Fixed subtopic re-fork message scope behavior and agent panel reset
edge cases. [#13606](#13606)
[#13556](#13556)
- Fixed chat-input freeze on paste and mention plugin behavior.
[#13551](#13551)
[#13415](#13415)
- Fixed auth/social sign-in and settings UX edge cases.
[#13368](#13368)
[#13392](#13392)
[#13338](#13338)

### Credits

Huge thanks to these contributors:

@chriszf @hardy-one @Innei @lijian @neko @OctopusNote @rdmclin2
@rivertwilight @RylanCai @suyua9 @sxjeru @Tsuki @wangyk @WindSpiritSR
@yizhuo @YuTengjing @hezhijie0327 @arvinxx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants