fix: extract real token usage in /v1/chat/completions response (closes #38735)#46293
fix: extract real token usage in /v1/chat/completions response (closes #38735)#46293Br1an67 wants to merge 1 commit into
Conversation
Greptile SummaryThis PR fixes a long-standing bug (#38735) where Key changes:
Confidence Score: 2/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/gateway/openai-http.ts
Line: 604-614
Comment:
**Streaming usage chunk is dead code in normal operation**
`emitAgentEvent` (in `agent-events.ts`) dispatches listeners **synchronously** via a `for` loop. In `agentCommandInternal` (`agent.ts` line 810), the lifecycle `"end"` event is emitted *before* the function returns — meaning the `onAgentEvent` handler in this file runs synchronously, sets `closed = true`, calls `writeDone(res)`, and calls `res.end()` **while `agentCommandFromIngress` is still resolving its Promise**.
By the time `await agentCommandFromIngress(...)` at line 581 resumes, `closed` is already `true`. The check at line 583 (`if (closed) return;`) then exits early, so the usage chunk below is never reached in any normal streaming run.
Evidence from `agent.ts`:
```
// line 810 — emitted synchronously, BEFORE agentCommandInternal returns
emitAgentEvent({ runId, stream: "lifecycle", data: { phase: "end", endedAt: Date.now() } });
// line 819 — execution continues (persist transcript, build return value) ...
// ... then agentCommandFromIngress finally resolves
```
One approach to fix this: move the usage write to *before* the `if (closed) return` guard, using `res.writableEnded` as the safety check instead of `closed`:
```
const result = await agentCommandFromIngress(commandInput, defaultRuntime, deps);
// Write usage before the early-return so it is sent even when the lifecycle
// event already closed `closed`. Guard on the actual socket state instead.
if (!res.writableEnded) {
const usage = resolveUsage(result);
writeSse(res, {
id: runId,
object: "chat.completion.chunk",
created: Math.floor(Date.now() / 1000),
model,
choices: [{ index: 0, delta: {}, finish_reason: "stop" }],
usage,
});
}
if (closed) {
return;
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/gateway/openai-http.ts
Line: 604-614
Comment:
**`stream_options.include_usage` not checked**
The OpenAI streaming spec only includes a usage chunk when the request contains `stream_options: { include_usage: true }`. Emitting the usage chunk unconditionally means callers that do not set this option will receive an unexpected extra SSE event, which could confuse well-behaved OpenAI-compatible clients that follow the spec strictly.
Consider gating the streaming usage chunk on the request field:
```typescript
const includeUsage =
typeof (payload as { stream_options?: { include_usage?: unknown } }).stream_options
?.include_usage === "boolean"
? (payload as { stream_options: { include_usage: boolean } }).stream_options.include_usage
: false;
```
Then only write the usage chunk when `includeUsage` is `true`. The non-streaming path is unaffected because usage is always included there per the spec.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: f667b11 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f667b11d39
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return { | ||
| prompt_tokens: prompt, | ||
| completion_tokens: completion, | ||
| total_tokens: prompt + completion, |
There was a problem hiding this comment.
Preserve usage.total when mapping OpenAI token counts
result.meta.agentMeta.usage can carry a total value that is not equal to input + output (for example when cache tokens are counted, or when only total usage is available), but this code always recomputes total_tokens from prompt/completion only. That causes /v1/chat/completions to under-report total usage and can still emit 0 totals for runs that reported non-zero totals, so the mapping should prefer usage.total when present and only fall back to the sum.
Useful? React with 👍 / 👎.
| if (!closed) { | ||
| const usage = resolveUsage(result); | ||
| writeSse(res, { | ||
| id: runId, | ||
| object: "chat.completion.chunk", | ||
| created: Math.floor(Date.now() / 1000), | ||
| model, | ||
| choices: [{ index: 0, delta: {}, finish_reason: "stop" }], | ||
| usage, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Streaming usage chunk is dead code in normal operation
emitAgentEvent (in agent-events.ts) dispatches listeners synchronously via a for loop. In agentCommandInternal (agent.ts line 810), the lifecycle "end" event is emitted before the function returns — meaning the onAgentEvent handler in this file runs synchronously, sets closed = true, calls writeDone(res), and calls res.end() while agentCommandFromIngress is still resolving its Promise.
By the time await agentCommandFromIngress(...) at line 581 resumes, closed is already true. The check at line 583 (if (closed) return;) then exits early, so the usage chunk below is never reached in any normal streaming run.
Evidence from agent.ts:
// line 810 — emitted synchronously, BEFORE agentCommandInternal returns
emitAgentEvent({ runId, stream: "lifecycle", data: { phase: "end", endedAt: Date.now() } });
// line 819 — execution continues (persist transcript, build return value) ...
// ... then agentCommandFromIngress finally resolves
One approach to fix this: move the usage write to before the if (closed) return guard, using res.writableEnded as the safety check instead of closed:
const result = await agentCommandFromIngress(commandInput, defaultRuntime, deps);
// Write usage before the early-return so it is sent even when the lifecycle
// event already closed `closed`. Guard on the actual socket state instead.
if (!res.writableEnded) {
const usage = resolveUsage(result);
writeSse(res, {
id: runId,
object: "chat.completion.chunk",
created: Math.floor(Date.now() / 1000),
model,
choices: [{ index: 0, delta: {}, finish_reason: "stop" }],
usage,
});
}
if (closed) {
return;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/openai-http.ts
Line: 604-614
Comment:
**Streaming usage chunk is dead code in normal operation**
`emitAgentEvent` (in `agent-events.ts`) dispatches listeners **synchronously** via a `for` loop. In `agentCommandInternal` (`agent.ts` line 810), the lifecycle `"end"` event is emitted *before* the function returns — meaning the `onAgentEvent` handler in this file runs synchronously, sets `closed = true`, calls `writeDone(res)`, and calls `res.end()` **while `agentCommandFromIngress` is still resolving its Promise**.
By the time `await agentCommandFromIngress(...)` at line 581 resumes, `closed` is already `true`. The check at line 583 (`if (closed) return;`) then exits early, so the usage chunk below is never reached in any normal streaming run.
Evidence from `agent.ts`:
```
// line 810 — emitted synchronously, BEFORE agentCommandInternal returns
emitAgentEvent({ runId, stream: "lifecycle", data: { phase: "end", endedAt: Date.now() } });
// line 819 — execution continues (persist transcript, build return value) ...
// ... then agentCommandFromIngress finally resolves
```
One approach to fix this: move the usage write to *before* the `if (closed) return` guard, using `res.writableEnded` as the safety check instead of `closed`:
```
const result = await agentCommandFromIngress(commandInput, defaultRuntime, deps);
// Write usage before the early-return so it is sent even when the lifecycle
// event already closed `closed`. Guard on the actual socket state instead.
if (!res.writableEnded) {
const usage = resolveUsage(result);
writeSse(res, {
id: runId,
object: "chat.completion.chunk",
created: Math.floor(Date.now() / 1000),
model,
choices: [{ index: 0, delta: {}, finish_reason: "stop" }],
usage,
});
}
if (closed) {
return;
}
```
How can I resolve this? If you propose a fix, please make it concise.| if (!closed) { | ||
| const usage = resolveUsage(result); | ||
| writeSse(res, { | ||
| id: runId, | ||
| object: "chat.completion.chunk", | ||
| created: Math.floor(Date.now() / 1000), | ||
| model, | ||
| choices: [{ index: 0, delta: {}, finish_reason: "stop" }], | ||
| usage, | ||
| }); | ||
| } |
There was a problem hiding this comment.
stream_options.include_usage not checked
The OpenAI streaming spec only includes a usage chunk when the request contains stream_options: { include_usage: true }. Emitting the usage chunk unconditionally means callers that do not set this option will receive an unexpected extra SSE event, which could confuse well-behaved OpenAI-compatible clients that follow the spec strictly.
Consider gating the streaming usage chunk on the request field:
const includeUsage =
typeof (payload as { stream_options?: { include_usage?: unknown } }).stream_options
?.include_usage === "boolean"
? (payload as { stream_options: { include_usage: boolean } }).stream_options.include_usage
: false;Then only write the usage chunk when includeUsage is true. The non-streaming path is unaffected because usage is always included there per the spec.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/openai-http.ts
Line: 604-614
Comment:
**`stream_options.include_usage` not checked**
The OpenAI streaming spec only includes a usage chunk when the request contains `stream_options: { include_usage: true }`. Emitting the usage chunk unconditionally means callers that do not set this option will receive an unexpected extra SSE event, which could confuse well-behaved OpenAI-compatible clients that follow the spec strictly.
Consider gating the streaming usage chunk on the request field:
```typescript
const includeUsage =
typeof (payload as { stream_options?: { include_usage?: unknown } }).stream_options
?.include_usage === "boolean"
? (payload as { stream_options: { include_usage: boolean } }).stream_options.include_usage
: false;
```
Then only write the usage chunk when `includeUsage` is `true`. The non-streaming path is unaffected because usage is always included there per the spec.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c49a5c52db
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return { | ||
| prompt_tokens: prompt, | ||
| completion_tokens: completion, | ||
| total_tokens: meta?.total ?? prompt + completion, |
There was a problem hiding this comment.
Count cache tokens in fallback total_tokens calculation
When agentMeta.usage.total is missing, this fallback computes total_tokens as prompt + completion, but prompt is taken from usage.input only. In this codebase, usage can include cacheRead/cacheWrite separately (for example, CLI parsing in src/agents/cli-runner/helpers.ts populates cache fields while total remains optional), so cache-hit runs will still under-report usage.total_tokens on /v1/chat/completions. The fallback should include cache fields (or reuse a shared usage normalizer) so totals stay accurate when total is absent.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dc1ff0d0d1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (!closed) { | ||
| const usage = resolveUsage(result); |
There was a problem hiding this comment.
Emit streaming usage before lifecycle closes response
In the streaming path, this usage chunk is guarded by if (!closed), but closed is set to true as soon as a lifecycle phase: "end"/"error" event arrives, and agentCommandFromIngress emits one of those before returning in normal runs (src/commands/agent.ts). That means the new usage write is skipped for real streamed completions, so clients still won’t receive token usage despite this change. Consider deferring res.end() until after post-run usage emission or moving usage emission into the lifecycle-end handling path.
Useful? React with 👍 / 👎.
…openclaw#38735) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0fdd003b2e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| id === defaultSlotIdForKey("contextEngine") && | ||
| normalizedOwner !== CORE_CONTEXT_ENGINE_OWNER | ||
| ) { | ||
| return { ok: false, existingOwner: CORE_CONTEXT_ENGINE_OWNER }; |
There was a problem hiding this comment.
Register default context engine through privileged owner
This guard blocks registrations for defaultSlotIdForKey("contextEngine") unless the owner is core, but built-in startup still registers legacy via registerLegacyContextEngine() → registerContextEngine("legacy", ...) (public owner), and that failure result is ignored. As a result the default legacy engine never enters the registry, so resolveContextEngine() throws Context engine "legacy" is not registered on normal agent paths that call ensureContextEnginesInitialized() before resolution, turning default runs into hard failures unless a custom engine is configured.
Useful? React with 👍 / 👎.
|
Closing to manage active PR count. Will reopen when slot is available. |
Summary
Change Type
Linked Issue/PR
Security Impact
All No
Evidence
Compatibility / Migration
Failure Recovery
This PR was AI-assisted (fully tested with pnpm build/check/test).