Skip to content

fix(openai-http): propagate token usage in /v1/chat/completions response#38893

Closed
extrasmall0 wants to merge 3 commits into
openclaw:mainfrom
extrasmall0:raymond/fix-chat-completions-usage
Closed

fix(openai-http): propagate token usage in /v1/chat/completions response#38893
extrasmall0 wants to merge 3 commits into
openclaw:mainfrom
extrasmall0:raymond/fix-chat-completions-usage

Conversation

@extrasmall0

Copy link
Copy Markdown
Contributor

Problem

The non-streaming /v1/chat/completions endpoint always returns hardcoded zero token usage:

"usage": { "prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0 }

This makes it impossible for consumers of the OpenAI-compatible API to track token usage for cost monitoring.

Root Cause

The handler discards the full agentCommandFromIngress() result and only extracts text via resolveAgentResponseText(). The usage data is available in result.meta.agentMeta.usage (with .input and .output fields) but was never read.

Fix

Extract the input and output token counts from the agent run result and map them to the OpenAI-compatible prompt_tokens / completion_tokens / total_tokens fields. Falls back to 0 when usage data is unavailable.

Tests

Added two test cases:

  • Verifies usage is correctly propagated when agentMeta.usage is present
  • Verifies graceful fallback to zeros when usage data is absent

All existing tests continue to pass.

Fixes #38735

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: XS labels Mar 7, 2026
@greptile-apps

greptile-apps Bot commented Mar 7, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a real issue where the non-streaming /v1/chat/completions endpoint was returning hardcoded zero token counts. The approach is correct in principle — reading result.meta.agentMeta.usage and mapping it to OpenAI's prompt_tokens/completion_tokens/total_tokens — but the implementation is incomplete compared to the sibling openresponses-http.ts handler.

  • Bug: cache tokens excluded from total_tokensEmbeddedPiAgentMeta.usage carries cacheRead, cacheWrite, and a pre-computed total field (see src/agents/pi-embedded-runner/types.ts). The new code only reads input and output, so any cache token counts are silently dropped and total_tokens will be understated. openresponses-http.ts already handles this correctly with total = value.total ?? input + output + cacheRead + cacheWrite.
  • The type cast used (result as { meta?: { agentMeta?: { usage?: { input?: number; output?: number } } } }) should be expanded to include the missing fields, mirroring the pattern in openresponses-http.ts.
  • The two new tests are a good addition but do not cover the cache-token case, leaving the incomplete computation untested.

Confidence Score: 3/5

  • Safe to merge only after fixing the incomplete cache-token handling in total_tokens.
  • The core change is straightforward and the fallback-to-zero path is correct. However, total_tokens will be wrong for any run that involves cache tokens (cacheRead/cacheWrite), because those fields are excluded from both the type cast and the arithmetic. The identical scenario is handled correctly in openresponses-http.ts, so the fix is clear, but the bug should be addressed before merging.
  • src/gateway/openai-http.ts — specifically the usage extraction and total_tokens calculation.

Last reviewed commit: 566f864

Comment thread src/gateway/openai-http.ts Outdated
Comment on lines +493 to +514
@@ -503,7 +508,11 @@ export async function handleOpenAiHttpRequest(
finish_reason: "stop",
},
],
usage: { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 },
usage: {
prompt_tokens: promptTokens,
completion_tokens: completionTokens,
total_tokens: promptTokens + completionTokens,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total_tokens omits cache tokens and ignores pre-computed total

The EmbeddedPiAgentMeta.usage shape includes cacheRead, cacheWrite, and a pre-computed total field (see src/agents/pi-embedded-runner/types.ts). This PR only reads input and output, so total_tokens will be understated whenever cache tokens are present.

The sibling handler openresponses-http.ts already solves this correctly — its toUsage helper (lines 170–181) uses:

const total = value.total ?? input + output + cacheRead + cacheWrite;

The implementation here should follow the same pattern:

Suggested change
const agentUsage = (
result as {
meta?: {
agentMeta?: {
usage?: {
input?: number;
output?: number;
cacheRead?: number;
cacheWrite?: number;
total?: number;
};
};
};
}
)?.meta?.agentMeta?.usage;
const promptTokens = agentUsage?.input ?? 0;
const completionTokens = agentUsage?.output ?? 0;
const cacheReadTokens = agentUsage?.cacheRead ?? 0;
const cacheWriteTokens = agentUsage?.cacheWrite ?? 0;
const totalTokens =
agentUsage?.total ?? promptTokens + completionTokens + cacheReadTokens + cacheWriteTokens;
sendJson(res, 200, {
id: runId,
object: "chat.completion",
created: Math.floor(Date.now() / 1000),
model,
choices: [
{
index: 0,
message: { role: "assistant", content },
finish_reason: "stop",
},
],
usage: {
prompt_tokens: promptTokens,
completion_tokens: completionTokens,
total_tokens: totalTokens,
},
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/openai-http.ts
Line: 493-514

Comment:
**`total_tokens` omits cache tokens and ignores pre-computed `total`**

The `EmbeddedPiAgentMeta.usage` shape includes `cacheRead`, `cacheWrite`, and a pre-computed `total` field (see `src/agents/pi-embedded-runner/types.ts`). This PR only reads `input` and `output`, so `total_tokens` will be understated whenever cache tokens are present.

The sibling handler `openresponses-http.ts` already solves this correctly — its `toUsage` helper (lines 170–181) uses:
```ts
const total = value.total ?? input + output + cacheRead + cacheWrite;
```

The implementation here should follow the same pattern:

```suggestion
      const agentUsage = (
        result as {
          meta?: {
            agentMeta?: {
              usage?: {
                input?: number;
                output?: number;
                cacheRead?: number;
                cacheWrite?: number;
                total?: number;
              };
            };
          };
        }
      )?.meta?.agentMeta?.usage;
      const promptTokens = agentUsage?.input ?? 0;
      const completionTokens = agentUsage?.output ?? 0;
      const cacheReadTokens = agentUsage?.cacheRead ?? 0;
      const cacheWriteTokens = agentUsage?.cacheWrite ?? 0;
      const totalTokens =
        agentUsage?.total ?? promptTokens + completionTokens + cacheReadTokens + cacheWriteTokens;

      sendJson(res, 200, {
        id: runId,
        object: "chat.completion",
        created: Math.floor(Date.now() / 1000),
        model,
        choices: [
          {
            index: 0,
            message: { role: "assistant", content },
            finish_reason: "stop",
          },
        ],
        usage: {
          prompt_tokens: promptTokens,
          completion_tokens: completionTokens,
          total_tokens: totalTokens,
        },
      });
```

How can I resolve this? If you propose a fix, please make it concise.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 566f8643f1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +496 to +497
const promptTokens = agentUsage?.input ?? 0;
const completionTokens = agentUsage?.output ?? 0;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include cached prompt tokens in usage mapping

When result.meta.agentMeta.usage includes cacheRead/cacheWrite (or a precomputed total), mapping prompt_tokens to only usage.input undercounts prompt usage and makes total_tokens (promptTokens + completionTokens) smaller than the agent’s own total. This path is expected because run metadata computes total usage from input + cache + output in src/agents/pi-embedded-runner/run.ts:195-201, so /v1/chat/completions will still report inaccurate token accounting for providers with prompt caching enabled.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ebcd06f966

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

};
}
)?.meta?.agentMeta?.usage;
const promptTokens = agentUsage?.input ?? 0;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include cached tokens in prompt_tokens for chat usage

When agentMeta.usage has cacheRead/cacheWrite (common with prompt caching), this maps prompt_tokens to only usage.input, so prompt usage is underreported while total_tokens still includes cached tokens via the fallback sum. That produces inconsistent accounting for OpenAI-compatible clients that assume total_tokens = prompt_tokens + completion_tokens and use prompt tokens for billing/cost analytics. Fresh evidence: the newly added cache test in src/gateway/openai-http.test.ts now explicitly expects this mismatch (prompt_tokens: 10 with total_tokens: 38).

Useful? React with 👍 / 👎.

@gambletan gambletan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: propagate token usage in /v1/chat/completions response

Overall: Good fix, well-tested.

What works well

  • The usage propagation follows OpenAI's response format correctly (prompt_tokens, completion_tokens, total_tokens).
  • Smart total_tokens logic: prefers agentMeta.usage.total when available, falls back to summing input + output + cacheRead + cacheWrite.
  • Four test cases cover the key scenarios: basic usage, cache tokens in total, precomputed total, and missing usage (zeros fallback).
  • Safe defaults (?? 0) prevent NaN propagation.

Suggestions

  1. Type assertion could be cleaner: The inline type cast at lines 493-506 is quite verbose. Consider extracting a helper type or using a shared AgentMeta interface if one exists in the codebase. Something like:

    interface AgentUsage { input?: number; output?: number; cacheRead?: number; cacheWrite?: number; total?: number; }

    This would be more maintainable than the inline cast.

  2. The as never casts in tests (e.g., } as never) bypass type safety. Consider using satisfies or a proper mock builder if the test infrastructure supports it.

  3. Streaming endpoint: This PR only covers the non-streaming (/v1/chat/completions) path. Does the streaming SSE path also need usage propagation? OpenAI's streaming API includes usage in the final [DONE] chunk when stream_options.include_usage is set. Worth noting if it's out of scope.

LGTM — meaningful improvement for API consumers. 👍

@extrasmall0

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review! Good point on the type assertion — I kept it inline to avoid adding a new interface for a single use site, but if usage propagation gets added to the streaming path too it would make sense to extract it then.

Re: streaming — yeah, intentionally scoped this to the non-streaming path. The SSE handler has a different flow for the final chunk so figured that can be a follow-up if needed.

@extrasmall0 extrasmall0 force-pushed the raymond/fix-chat-completions-usage branch from fbd4b0c to 977a7f9 Compare March 7, 2026 20:46

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 977a7f94ce

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

};
}
)?.meta?.agentMeta?.usage;
const promptTokens = agentUsage?.input ?? 0;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count cache tokens in prompt_tokens to keep usage consistent

When agentMeta.usage includes cacheRead/cacheWrite, this sets prompt_tokens from usage.input only, but total_tokens is computed with cache tokens included. In cached-prompt runs this underreports prompt usage and breaks the OpenAI-compatible invariant many clients rely on (total_tokens = prompt_tokens + completion_tokens), which can skew billing/analytics for consumers of /v1/chat/completions.

Useful? React with 👍 / 👎.

@extrasmall0 extrasmall0 force-pushed the raymond/fix-chat-completions-usage branch from 977a7f9 to 3bd93c4 Compare March 9, 2026 16:07
…mpletions response

The non-streaming /v1/chat/completions handler returned hardcoded zeros
for usage (prompt_tokens, completion_tokens, total_tokens). The actual
token counts were available in result.meta.agentMeta.usage but were
never extracted.

Extract input/output token counts from the agent run result and map them
to the OpenAI-compatible usage fields.

Fixes openclaw#38735
@extrasmall0 extrasmall0 force-pushed the raymond/fix-chat-completions-usage branch from 3bd93c4 to 17c14f3 Compare March 10, 2026 16:12
@extrasmall0

Copy link
Copy Markdown
Contributor Author

Closing stale PR — will revisit if still relevant. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/v1/chat/completions returns hardcoded usage: { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 }

2 participants