fix(gateway): return real usage for OpenAI-compatible chat completions#62986
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 02bd1f8d0c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b36d155812
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bb901fbf09
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
bb901fb to
1d5e54a
Compare
|
@codex review |
Greptile SummaryThis PR fixes two gaps in the OpenAI-compatible Confidence Score: 5/5Safe to merge — no critical regressions, prior concerns resolved, and comprehensive test coverage added. All remaining findings are P2 or lower. The finalization gate is sound and mirrors the existing openresponses-http.ts pattern; the total_tokens and cacheWrite issues flagged in prior review threads have been explicitly addressed. Tests cover non-stream usage mapping, stream usage chunk emission, the lifecycle-end-before-usage race, and default stream behavior without usage gating. No files require special attention. Reviews (4): Last reviewed commit: "Gateway: preserve aggregate usage totals" | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2ac3692fcf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Tip: Greploops — Automatically fix all review issues by running Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 678903e0ca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
f61e98a to
66122d4
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 66122d4aab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5c97325bb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
265ffef to
f1a102c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6b6fa1fe64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
6b6fa1f to
ba6a87b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ba6a87b747
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1901b68345
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 095c072ddc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 15897be529
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1debbc4071
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!abortController.signal.aborted) { | ||
| abortController.abort(); | ||
| } | ||
| maybeFinalize(); | ||
| }, STREAM_USAGE_FINALIZE_GRACE_MS); |
There was a problem hiding this comment.
Remove fixed 2s abort from include_usage finalization
This timer turns a normal slow-tail run into an early successful close: once lifecycle:end requests finalize, it sets zero usage and aborts after 2s even if the command is still legitimately finishing. In src/agents/pi-embedded-subscribe.handlers.lifecycle.ts:92-149, phase: "end" is emitted before async flush of pending media/channel output, so runs that need more than 2 seconds for that flush will be cut off here and the SSE closes with [DONE] before trailing assistant output (or fallback payload text) can be sent, producing truncated completions and fake zero usage.
Useful? React with 👍 / 👎.
1debbc4 to
591a39a
Compare
obviyus
left a comment
There was a problem hiding this comment.
Verified the OpenAI-compatible chat completions usage fix path on the latest head. This keeps real usage in non-stream responses, emits the final usage chunk when include_usage is requested, and moves the terminal lifecycle signal behind the final flush so the compat stream no longer needs the truncating HTTP fallback.
During landing I rebased onto latest main, kept the changelog entry in the active Unreleased fixes block, and replaced the 2s HTTP abort patch with the root-cause lifecycle fix in pi-embedded subscribe. That addressed the current-head P1 about truncating slow-tail output after lifecycle end.
Local gate: oxfmt passed on the touched files. I also ran the targeted lifecycle/openai-http test lane, but the Vitest wrapper in this worktree did not return a clean exit, so I am not claiming a fully green local test run.
|
Landed. Maintainer commits during landing:
Merged as 2ccd1839f212d84dfb78f78e3699c384a12ee091. Thanks @Lellansin. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 40138ef572
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return flushBlockReplyBufferResult | ||
| .then(() => flushPendingMediaAndChannel()) | ||
| .then(() => emitLifecycleTerminal()); |
There was a problem hiding this comment.
Emit lifecycle terminal even when flush callbacks fail
If either flushBlockReplyBufferResult or onBlockReplyFlush rejects, this promise chain short-circuits before emitLifecycleTerminal() runs, so the run never emits a terminal lifecycle event from handleAgentEnd. Because createEmbeddedPiSessionEventHandler catches and logs handler rejections, this failure is silent and downstream listeners can miss phase: "end"/"error" (or receive a later generic fallback), which breaks lifecycle ordering guarantees under channel flush failures.
Useful? React with 👍 / 👎.
…aw#62986) (thanks @Lellansin) * Gateway: fix chat completions usage compatibility * Gateway: clarify usage-gated stream wait * Gateway: preserve aggregate usage totals * Agents: clamp usage components before total * fix(gateway): bound usage stream finalization * fix: add OpenAI compat usage changelog (openclaw#62986) (thanks @Lellansin) * fix(agents): emit lifecycle terminal events after flush --------- Co-authored-by: Ayaan Zaidi <hi@obviy.us>
…aw#62986) (thanks @Lellansin) * Gateway: fix chat completions usage compatibility * Gateway: clarify usage-gated stream wait * Gateway: preserve aggregate usage totals * Agents: clamp usage components before total * fix(gateway): bound usage stream finalization * fix: add OpenAI compat usage changelog (openclaw#62986) (thanks @Lellansin) * fix(agents): emit lifecycle terminal events after flush --------- Co-authored-by: Ayaan Zaidi <hi@obviy.us>
…aw#62986) (thanks @Lellansin) * Gateway: fix chat completions usage compatibility * Gateway: clarify usage-gated stream wait * Gateway: preserve aggregate usage totals * Agents: clamp usage components before total * fix(gateway): bound usage stream finalization * fix: add OpenAI compat usage changelog (openclaw#62986) (thanks @Lellansin) * fix(agents): emit lifecycle terminal events after flush --------- Co-authored-by: Ayaan Zaidi <hi@obviy.us>
…aw#62986) (thanks @Lellansin) * Gateway: fix chat completions usage compatibility * Gateway: clarify usage-gated stream wait * Gateway: preserve aggregate usage totals * Agents: clamp usage components before total * fix(gateway): bound usage stream finalization * fix: add OpenAI compat usage changelog (openclaw#62986) (thanks @Lellansin) * fix(agents): emit lifecycle terminal events after flush --------- Co-authored-by: Ayaan Zaidi <hi@obviy.us>
…aw#62986) (thanks @Lellansin) * Gateway: fix chat completions usage compatibility * Gateway: clarify usage-gated stream wait * Gateway: preserve aggregate usage totals * Agents: clamp usage components before total * fix(gateway): bound usage stream finalization * fix: add OpenAI compat usage changelog (openclaw#62986) (thanks @Lellansin) * fix(agents): emit lifecycle terminal events after flush --------- Co-authored-by: Ayaan Zaidi <hi@obviy.us>
Summary
/v1/chat/completionsreturned hardcoded zero usage in non-stream responses, and stream responses did not expose usage in an OpenAI-compatible way.result.meta.agentMeta.usage, returns it for non-stream responses, and emits a final usage chunk for stream responses whenstream_options.include_usage=true.openresponses-http.ts(wait for the ingress command to settle when usage output is requested) instead of closing the SSE immediately from the priorlifecycle/finallypaths. This is adapter-level close ordering so the optional usage chunk can precede[DONE]; it is not an HTTP-layersetTimeoutremoval relative tomain.total_tokens, and preserves upstream aggregate totals viamax(componentTotal, aggregateTotal)when an aggregate total is present.prompt_tokens = input + cacheRead(cache-write excluded intentionally).Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
/v1/chat/completionsadapter hardcoded zero usage for non-stream responses and did not implement a compatible usage-return path for streaming responses.Regression Test Plan (if applicable)
src/gateway/openai-http.test.ts(endpoint contract) andsrc/agents/usage.test.ts(toOpenAiChatCompletionsUsage/ normalization edge cases).agentMeta.usage; stream responses emit a final usage chunk only whenstream_options.include_usage=true; normal streaming finalization is not blocked when usage output is not requested.User-visible / Behavior Changes
/v1/chat/completionsresponses now return realusagevalues instead of hardcoded zeros when upstream usage metadata exists./v1/chat/completionsresponses now emit a final usage chunk whenstream_options.include_usage=true.Diagram (if applicable)
Before:
flowchart LR A["User request"] --> B["/v1/chat/completions adapter"] B --> C["Hardcoded usage = 0 / no streaming usage path"]After:
flowchart LR A["User request"] --> B["/v1/chat/completions adapter"] B --> C["Derive usage from meta.agentMeta.usage<br/>(normalizeUsage → toOpenAiChatCompletionsUsage)"] C --> D["Non-stream: usage in JSON body"] C --> E["Stream: finalize gate + optional usage chunk<br/>when stream_options.include_usage = true"]Security Impact (required)
No)No)No)No)No)Yes, explain risk + mitigation: N/ARepro + Verification
Environment
result.meta.agentMeta.usage/v1/chat/completionsOPENCLAW_DEBUG_OPENAI_USAGE=1, gateway OpenAI-compatible endpoint enabledSteps
/v1/chat/completionsand inspect theusagefield./v1/chat/completionswithstream_options.include_usage=trueand inspect the final SSE usage chunk.Expected
Actual
agentMeta.usage.Evidence
Human Verification (required)
include_usageis absent.Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Yes)No)No)Risks and Mitigations
agentMeta.usagefields with existing fallback behavior.Trade-offs / Deferred follow-up
/v1/chat/completionsstream finalization semantics aligned withopenresponses-http.ts(finalize request + final usage gate), instead of introducing a chat-completions-only fallback.stream_options.include_usage=true, completion latency can still depend on when the ingress command settles on very slow tail/cleanup paths.openai-httpandopenresponses-httptogether.Evidence links
src/gateway/openai-http.ts— OpenAI-compatible/v1/chat/completionsadapter: usage mapping, streaming finalize gate, optional usage chunk whenstream_options.include_usage=true.src/gateway/openai-http.test.ts— Contract tests for non-stream usage, streaming usage chunk, finalize/lifecycle behavior, and default stream path without usage.src/agents/usage.ts—normalizeUsage+toOpenAiChatCompletionsUsage(OpenAI-styleprompt_tokens/completion_tokens/total_tokens, including aggregate vs component totals).src/agents/usage.test.ts— Unit tests for chat-completions usage mapping edge cases.Reference (not modified in this PR):
src/gateway/openresponses-http.ts— finalize / usage-gating pattern this endpoint mirrors in comments and behavior.Related unresolved threads