Skip to content

fix: extract real token usage in /v1/chat/completions response (closes #38735)#46293

Closed
Br1an67 wants to merge 1 commit into
openclaw:mainfrom
Br1an67:fix/38735
Closed

fix: extract real token usage in /v1/chat/completions response (closes #38735)#46293
Br1an67 wants to merge 1 commit into
openclaw:mainfrom
Br1an67:fix/38735

Conversation

@Br1an67

@Br1an67 Br1an67 commented Mar 14, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: /v1/chat/completions returns hardcoded zero usage tokens
  • What changed: Usage now extracted from agentCommandFromIngress result.meta
  • What did NOT change: Chat completion response format, streaming behavior

Change Type

  • Bug fix

Linked Issue/PR

Security Impact

All No

Evidence

  • pnpm build + pnpm check + pnpm test all passing

Compatibility / Migration

  • Backward compatible? Yes — same response shape, just with real values

Failure Recovery

  • How to revert: revert this commit

This PR was AI-assisted (fully tested with pnpm build/check/test).

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: XS labels Mar 14, 2026
@greptile-apps

greptile-apps Bot commented Mar 14, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a long-standing bug (#38735) where /v1/chat/completions always returned hardcoded zero token counts by extracting real usage from result.meta.agentMeta.usage. The non-streaming fix is correct and straightforward. However, the streaming fix has a critical flaw that makes it effectively inoperative in normal usage, and the streaming path also deviates from the OpenAI spec around when usage should be emitted.

Key changes:

  • Adds resolveUsage(result) helper to extract prompt_tokens, completion_tokens, and total_tokens from the agent result's metadata — defensive and correctly typed.
  • Non-streaming path (lines 500–518): Works correctly. resolveUsage is called on the resolved result and the real token counts are included in the response.
  • Streaming path (lines 604–614): The new usage SSE chunk is unreachable in normal operation. emitAgentEvent dispatches listeners synchronously (agent-events.ts), so the lifecycle "end" event fires and sets closed = true while agentCommandFromIngress is still resolving. By the time the await resumes, the if (closed) return guard at line 583 exits early — the usage chunk is never written.
  • Spec deviation: The OpenAI streaming spec only includes a usage chunk when the request sets stream_options: { include_usage: true }. This PR writes the usage chunk unconditionally, which may confuse spec-compliant clients.

Confidence Score: 2/5

  • The streaming usage fix is effectively a no-op in normal operation due to a synchronous event dispatch ordering issue; merging as-is would give a false sense that streaming usage is fixed when it is not.
  • The non-streaming fix is correct and safe. However, the streaming half of the fix has a confirmed logic bug: emitAgentEvent is synchronous, so the lifecycle "end" handler runs and sets closed = true before agentCommandFromIngress returns, causing the if (closed) return early-exit to silently drop the usage chunk on every normal streaming call. The bug does not cause data corruption or a crash, but it means the stated goal of the PR (real token counts in streaming responses) is not achieved.
  • src/gateway/openai-http.ts — specifically the streaming async IIFE (lines 579–639) and the interaction between the lifecycle event handler and the closed guard.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/gateway/openai-http.ts
Line: 604-614

Comment:
**Streaming usage chunk is dead code in normal operation**

`emitAgentEvent` (in `agent-events.ts`) dispatches listeners **synchronously** via a `for` loop. In `agentCommandInternal` (`agent.ts` line 810), the lifecycle `"end"` event is emitted *before* the function returns — meaning the `onAgentEvent` handler in this file runs synchronously, sets `closed = true`, calls `writeDone(res)`, and calls `res.end()` **while `agentCommandFromIngress` is still resolving its Promise**.

By the time `await agentCommandFromIngress(...)` at line 581 resumes, `closed` is already `true`. The check at line 583 (`if (closed) return;`) then exits early, so the usage chunk below is never reached in any normal streaming run.

Evidence from `agent.ts`:
```
// line 810 — emitted synchronously, BEFORE agentCommandInternal returns
emitAgentEvent({ runId, stream: "lifecycle", data: { phase: "end", endedAt: Date.now() } });

// line 819 — execution continues (persist transcript, build return value) ...
// ... then agentCommandFromIngress finally resolves
```

One approach to fix this: move the usage write to *before* the `if (closed) return` guard, using `res.writableEnded` as the safety check instead of `closed`:

```
const result = await agentCommandFromIngress(commandInput, defaultRuntime, deps);

// Write usage before the early-return so it is sent even when the lifecycle
// event already closed `closed`. Guard on the actual socket state instead.
if (!res.writableEnded) {
  const usage = resolveUsage(result);
  writeSse(res, {
    id: runId,
    object: "chat.completion.chunk",
    created: Math.floor(Date.now() / 1000),
    model,
    choices: [{ index: 0, delta: {}, finish_reason: "stop" }],
    usage,
  });
}

if (closed) {
  return;
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gateway/openai-http.ts
Line: 604-614

Comment:
**`stream_options.include_usage` not checked**

The OpenAI streaming spec only includes a usage chunk when the request contains `stream_options: { include_usage: true }`. Emitting the usage chunk unconditionally means callers that do not set this option will receive an unexpected extra SSE event, which could confuse well-behaved OpenAI-compatible clients that follow the spec strictly.

Consider gating the streaming usage chunk on the request field:

```typescript
const includeUsage =
  typeof (payload as { stream_options?: { include_usage?: unknown } }).stream_options
    ?.include_usage === "boolean"
    ? (payload as { stream_options: { include_usage: boolean } }).stream_options.include_usage
    : false;
```

Then only write the usage chunk when `includeUsage` is `true`. The non-streaming path is unaffected because usage is always included there per the spec.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: f667b11

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f667b11d39

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/gateway/openai-http.ts Outdated
return {
prompt_tokens: prompt,
completion_tokens: completion,
total_tokens: prompt + completion,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve usage.total when mapping OpenAI token counts

result.meta.agentMeta.usage can carry a total value that is not equal to input + output (for example when cache tokens are counted, or when only total usage is available), but this code always recomputes total_tokens from prompt/completion only. That causes /v1/chat/completions to under-report total usage and can still emit 0 totals for runs that reported non-zero totals, so the mapping should prefer usage.total when present and only fall back to the sum.

Useful? React with 👍 / 👎.

Comment on lines +604 to +614
if (!closed) {
const usage = resolveUsage(result);
writeSse(res, {
id: runId,
object: "chat.completion.chunk",
created: Math.floor(Date.now() / 1000),
model,
choices: [{ index: 0, delta: {}, finish_reason: "stop" }],
usage,
});
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming usage chunk is dead code in normal operation

emitAgentEvent (in agent-events.ts) dispatches listeners synchronously via a for loop. In agentCommandInternal (agent.ts line 810), the lifecycle "end" event is emitted before the function returns — meaning the onAgentEvent handler in this file runs synchronously, sets closed = true, calls writeDone(res), and calls res.end() while agentCommandFromIngress is still resolving its Promise.

By the time await agentCommandFromIngress(...) at line 581 resumes, closed is already true. The check at line 583 (if (closed) return;) then exits early, so the usage chunk below is never reached in any normal streaming run.

Evidence from agent.ts:

// line 810 — emitted synchronously, BEFORE agentCommandInternal returns
emitAgentEvent({ runId, stream: "lifecycle", data: { phase: "end", endedAt: Date.now() } });

// line 819 — execution continues (persist transcript, build return value) ...
// ... then agentCommandFromIngress finally resolves

One approach to fix this: move the usage write to before the if (closed) return guard, using res.writableEnded as the safety check instead of closed:

const result = await agentCommandFromIngress(commandInput, defaultRuntime, deps);

// Write usage before the early-return so it is sent even when the lifecycle
// event already closed `closed`. Guard on the actual socket state instead.
if (!res.writableEnded) {
  const usage = resolveUsage(result);
  writeSse(res, {
    id: runId,
    object: "chat.completion.chunk",
    created: Math.floor(Date.now() / 1000),
    model,
    choices: [{ index: 0, delta: {}, finish_reason: "stop" }],
    usage,
  });
}

if (closed) {
  return;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/openai-http.ts
Line: 604-614

Comment:
**Streaming usage chunk is dead code in normal operation**

`emitAgentEvent` (in `agent-events.ts`) dispatches listeners **synchronously** via a `for` loop. In `agentCommandInternal` (`agent.ts` line 810), the lifecycle `"end"` event is emitted *before* the function returns — meaning the `onAgentEvent` handler in this file runs synchronously, sets `closed = true`, calls `writeDone(res)`, and calls `res.end()` **while `agentCommandFromIngress` is still resolving its Promise**.

By the time `await agentCommandFromIngress(...)` at line 581 resumes, `closed` is already `true`. The check at line 583 (`if (closed) return;`) then exits early, so the usage chunk below is never reached in any normal streaming run.

Evidence from `agent.ts`:
```
// line 810 — emitted synchronously, BEFORE agentCommandInternal returns
emitAgentEvent({ runId, stream: "lifecycle", data: { phase: "end", endedAt: Date.now() } });

// line 819 — execution continues (persist transcript, build return value) ...
// ... then agentCommandFromIngress finally resolves
```

One approach to fix this: move the usage write to *before* the `if (closed) return` guard, using `res.writableEnded` as the safety check instead of `closed`:

```
const result = await agentCommandFromIngress(commandInput, defaultRuntime, deps);

// Write usage before the early-return so it is sent even when the lifecycle
// event already closed `closed`. Guard on the actual socket state instead.
if (!res.writableEnded) {
  const usage = resolveUsage(result);
  writeSse(res, {
    id: runId,
    object: "chat.completion.chunk",
    created: Math.floor(Date.now() / 1000),
    model,
    choices: [{ index: 0, delta: {}, finish_reason: "stop" }],
    usage,
  });
}

if (closed) {
  return;
}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +604 to +614
if (!closed) {
const usage = resolveUsage(result);
writeSse(res, {
id: runId,
object: "chat.completion.chunk",
created: Math.floor(Date.now() / 1000),
model,
choices: [{ index: 0, delta: {}, finish_reason: "stop" }],
usage,
});
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream_options.include_usage not checked

The OpenAI streaming spec only includes a usage chunk when the request contains stream_options: { include_usage: true }. Emitting the usage chunk unconditionally means callers that do not set this option will receive an unexpected extra SSE event, which could confuse well-behaved OpenAI-compatible clients that follow the spec strictly.

Consider gating the streaming usage chunk on the request field:

const includeUsage =
  typeof (payload as { stream_options?: { include_usage?: unknown } }).stream_options
    ?.include_usage === "boolean"
    ? (payload as { stream_options: { include_usage: boolean } }).stream_options.include_usage
    : false;

Then only write the usage chunk when includeUsage is true. The non-streaming path is unaffected because usage is always included there per the spec.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/openai-http.ts
Line: 604-614

Comment:
**`stream_options.include_usage` not checked**

The OpenAI streaming spec only includes a usage chunk when the request contains `stream_options: { include_usage: true }`. Emitting the usage chunk unconditionally means callers that do not set this option will receive an unexpected extra SSE event, which could confuse well-behaved OpenAI-compatible clients that follow the spec strictly.

Consider gating the streaming usage chunk on the request field:

```typescript
const includeUsage =
  typeof (payload as { stream_options?: { include_usage?: unknown } }).stream_options
    ?.include_usage === "boolean"
    ? (payload as { stream_options: { include_usage: boolean } }).stream_options.include_usage
    : false;
```

Then only write the usage chunk when `includeUsage` is `true`. The non-streaming path is unaffected because usage is always included there per the spec.

How can I resolve this? If you propose a fix, please make it concise.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c49a5c52db

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

return {
prompt_tokens: prompt,
completion_tokens: completion,
total_tokens: meta?.total ?? prompt + completion,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count cache tokens in fallback total_tokens calculation

When agentMeta.usage.total is missing, this fallback computes total_tokens as prompt + completion, but prompt is taken from usage.input only. In this codebase, usage can include cacheRead/cacheWrite separately (for example, CLI parsing in src/agents/cli-runner/helpers.ts populates cache fields while total remains optional), so cache-hit runs will still under-report usage.total_tokens on /v1/chat/completions. The fallback should include cache fields (or reuse a shared usage normalizer) so totals stay accurate when total is absent.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dc1ff0d0d1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +606 to +607
if (!closed) {
const usage = resolveUsage(result);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Emit streaming usage before lifecycle closes response

In the streaming path, this usage chunk is guarded by if (!closed), but closed is set to true as soon as a lifecycle phase: "end"/"error" event arrives, and agentCommandFromIngress emits one of those before returning in normal runs (src/commands/agent.ts). That means the new usage write is skipped for real streamed completions, so clients still won’t receive token usage despite this change. Consider deferring res.end() until after post-run usage emission or moving usage emission into the lifecycle-end handling path.

Useful? React with 👍 / 👎.

…openclaw#38735)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation size: S and removed size: XS labels Mar 15, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0fdd003b2e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +71 to +74
id === defaultSlotIdForKey("contextEngine") &&
normalizedOwner !== CORE_CONTEXT_ENGINE_OWNER
) {
return { ok: false, existingOwner: CORE_CONTEXT_ENGINE_OWNER };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Register default context engine through privileged owner

This guard blocks registrations for defaultSlotIdForKey("contextEngine") unless the owner is core, but built-in startup still registers legacy via registerLegacyContextEngine()registerContextEngine("legacy", ...) (public owner), and that failure result is ignored. As a result the default legacy engine never enters the registry, so resolveContextEngine() throws Context engine "legacy" is not registered on normal agent paths that call ensureContextEnginesInitialized() before resolution, turning default runs into hard failures unless a custom engine is configured.

Useful? React with 👍 / 👎.

@Br1an67

Br1an67 commented Mar 17, 2026

Copy link
Copy Markdown
Contributor Author

Closing to manage active PR count. Will reopen when slot is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation gateway Gateway runtime size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/v1/chat/completions returns hardcoded usage: { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 }

1 participant