fix(openai-http): propagate token usage in /v1/chat/completions response by extrasmall0 · Pull Request #38893 · openclaw/openclaw

extrasmall0 · 2026-03-07T13:44:50Z

Problem

The non-streaming /v1/chat/completions endpoint always returns hardcoded zero token usage:

"usage": { "prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0 }

This makes it impossible for consumers of the OpenAI-compatible API to track token usage for cost monitoring.

Root Cause

The handler discards the full agentCommandFromIngress() result and only extracts text via resolveAgentResponseText(). The usage data is available in result.meta.agentMeta.usage (with .input and .output fields) but was never read.

Fix

Extract the input and output token counts from the agent run result and map them to the OpenAI-compatible prompt_tokens / completion_tokens / total_tokens fields. Falls back to 0 when usage data is unavailable.

Tests

Added two test cases:

Verifies usage is correctly propagated when agentMeta.usage is present
Verifies graceful fallback to zeros when usage data is absent

All existing tests continue to pass.

Fixes #38735

greptile-apps · 2026-03-07T13:46:18Z

Greptile Summary

This PR fixes a real issue where the non-streaming /v1/chat/completions endpoint was returning hardcoded zero token counts. The approach is correct in principle — reading result.meta.agentMeta.usage and mapping it to OpenAI's prompt_tokens/completion_tokens/total_tokens — but the implementation is incomplete compared to the sibling openresponses-http.ts handler.

Bug: cache tokens excluded from total_tokens — EmbeddedPiAgentMeta.usage carries cacheRead, cacheWrite, and a pre-computed total field (see src/agents/pi-embedded-runner/types.ts). The new code only reads input and output, so any cache token counts are silently dropped and total_tokens will be understated. openresponses-http.ts already handles this correctly with total = value.total ?? input + output + cacheRead + cacheWrite.
The type cast used (result as { meta?: { agentMeta?: { usage?: { input?: number; output?: number } } } }) should be expanded to include the missing fields, mirroring the pattern in openresponses-http.ts.
The two new tests are a good addition but do not cover the cache-token case, leaving the incomplete computation untested.

Confidence Score: 3/5

Safe to merge only after fixing the incomplete cache-token handling in total_tokens.
The core change is straightforward and the fallback-to-zero path is correct. However, total_tokens will be wrong for any run that involves cache tokens (cacheRead/cacheWrite), because those fields are excluded from both the type cast and the arithmetic. The identical scenario is handled correctly in openresponses-http.ts, so the fix is clear, but the bug should be addressed before merging.
src/gateway/openai-http.ts — specifically the usage extraction and total_tokens calculation.

_{Last reviewed commit: 566f864}

greptile-apps · 2026-03-07T13:46:21Z

@@ -503,7 +508,11 @@ export async function handleOpenAiHttpRequest(
            finish_reason: "stop",
          },
        ],
-        usage: { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 },
+        usage: {
+          prompt_tokens: promptTokens,
+          completion_tokens: completionTokens,
+          total_tokens: promptTokens + completionTokens,


total_tokens omits cache tokens and ignores pre-computed total

The EmbeddedPiAgentMeta.usage shape includes cacheRead, cacheWrite, and a pre-computed total field (see src/agents/pi-embedded-runner/types.ts). This PR only reads input and output, so total_tokens will be understated whenever cache tokens are present.

The sibling handler openresponses-http.ts already solves this correctly — its toUsage helper (lines 170–181) uses:

const total = value.total ?? input + output + cacheRead + cacheWrite;

The implementation here should follow the same pattern:

Suggested change

const agentUsage = (

result as {

meta?: {

agentMeta?: {

usage?: {

input?: number;

output?: number;

cacheRead?: number;

cacheWrite?: number;

total?: number;

};

};

};

}

)?.meta?.agentMeta?.usage;

const promptTokens = agentUsage?.input ?? 0;

const completionTokens = agentUsage?.output ?? 0;

const cacheReadTokens = agentUsage?.cacheRead ?? 0;

const cacheWriteTokens = agentUsage?.cacheWrite ?? 0;

const totalTokens =

agentUsage?.total ?? promptTokens + completionTokens + cacheReadTokens + cacheWriteTokens;

sendJson(res, 200, {

id: runId,

object: "chat.completion",

created: Math.floor(Date.now() / 1000),

model,

choices: [

{

index: 0,

message: { role: "assistant", content },

finish_reason: "stop",

},

],

usage: {

prompt_tokens: promptTokens,

completion_tokens: completionTokens,

total_tokens: totalTokens,

},

});

Prompt To Fix With AI

This is a comment left during a code review. Path: src/gateway/openai-http.ts Line: 493-514 Comment: **`total_tokens` omits cache tokens and ignores pre-computed `total`** The `EmbeddedPiAgentMeta.usage` shape includes `cacheRead`, `cacheWrite`, and a pre-computed `total` field (see `src/agents/pi-embedded-runner/types.ts`). This PR only reads `input` and `output`, so `total_tokens` will be understated whenever cache tokens are present. The sibling handler `openresponses-http.ts` already solves this correctly — its `toUsage` helper (lines 170–181) uses: ```ts const total = value.total ?? input + output + cacheRead + cacheWrite; ``` The implementation here should follow the same pattern: ```suggestion const agentUsage = ( result as { meta?: { agentMeta?: { usage?: { input?: number; output?: number; cacheRead?: number; cacheWrite?: number; total?: number; }; }; }; } )?.meta?.agentMeta?.usage; const promptTokens = agentUsage?.input ?? 0; const completionTokens = agentUsage?.output ?? 0; const cacheReadTokens = agentUsage?.cacheRead ?? 0; const cacheWriteTokens = agentUsage?.cacheWrite ?? 0; const totalTokens = agentUsage?.total ?? promptTokens + completionTokens + cacheReadTokens + cacheWriteTokens; sendJson(res, 200, { id: runId, object: "chat.completion", created: Math.floor(Date.now() / 1000), model, choices: [ { index: 0, message: { role: "assistant", content }, finish_reason: "stop", }, ], usage: { prompt_tokens: promptTokens, completion_tokens: completionTokens, total_tokens: totalTokens, }, }); ``` How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 566f8643f1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-07T13:48:43Z

+      const promptTokens = agentUsage?.input ?? 0;
+      const completionTokens = agentUsage?.output ?? 0;


Include cached prompt tokens in usage mapping

When result.meta.agentMeta.usage includes cacheRead/cacheWrite (or a precomputed total), mapping prompt_tokens to only usage.input undercounts prompt usage and makes total_tokens (promptTokens + completionTokens) smaller than the agent’s own total. This path is expected because run metadata computes total usage from input + cache + output in src/agents/pi-embedded-runner/run.ts:195-201, so /v1/chat/completions will still report inaccurate token accounting for providers with prompt caching enabled.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ebcd06f966

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-07T14:21:09Z

+          };
+        }
+      )?.meta?.agentMeta?.usage;
+      const promptTokens = agentUsage?.input ?? 0;


Include cached tokens in prompt_tokens for chat usage

When agentMeta.usage has cacheRead/cacheWrite (common with prompt caching), this maps prompt_tokens to only usage.input, so prompt usage is underreported while total_tokens still includes cached tokens via the fallback sum. That produces inconsistent accounting for OpenAI-compatible clients that assume total_tokens = prompt_tokens + completion_tokens and use prompt tokens for billing/cost analytics. Fresh evidence: the newly added cache test in src/gateway/openai-http.test.ts now explicitly expects this mismatch (prompt_tokens: 10 with total_tokens: 38).

Useful? React with 👍 / 👎.

gambletan

Review: propagate token usage in /v1/chat/completions response

Overall: Good fix, well-tested. ✅

What works well

The usage propagation follows OpenAI's response format correctly (prompt_tokens, completion_tokens, total_tokens).
Smart total_tokens logic: prefers agentMeta.usage.total when available, falls back to summing input + output + cacheRead + cacheWrite.
Four test cases cover the key scenarios: basic usage, cache tokens in total, precomputed total, and missing usage (zeros fallback).
Safe defaults (?? 0) prevent NaN propagation.

Suggestions

Type assertion could be cleaner: The inline type cast at lines 493-506 is quite verbose. Consider extracting a helper type or using a shared AgentMeta interface if one exists in the codebase. Something like:
```
interface AgentUsage { input?: number; output?: number; cacheRead?: number; cacheWrite?: number; total?: number; }
```
This would be more maintainable than the inline cast.
The as never casts in tests (e.g., } as never) bypass type safety. Consider using satisfies or a proper mock builder if the test infrastructure supports it.
Streaming endpoint: This PR only covers the non-streaming (/v1/chat/completions) path. Does the streaming SSE path also need usage propagation? OpenAI's streaming API includes usage in the final [DONE] chunk when stream_options.include_usage is set. Worth noting if it's out of scope.

LGTM — meaningful improvement for API consumers. 👍

extrasmall0 · 2026-03-07T15:35:58Z

Thanks for the thorough review! Good point on the type assertion — I kept it inline to avoid adding a new interface for a single use site, but if usage propagation gets added to the streaming path too it would make sense to extract it then.

Re: streaming — yeah, intentionally scoped this to the non-streaming path. The SSE handler has a different flow for the final chunk so figured that can be a follow-up if needed.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 977a7f94ce

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-07T20:50:35Z

+          };
+        }
+      )?.meta?.agentMeta?.usage;
+      const promptTokens = agentUsage?.input ?? 0;


Count cache tokens in prompt_tokens to keep usage consistent

When agentMeta.usage includes cacheRead/cacheWrite, this sets prompt_tokens from usage.input only, but total_tokens is computed with cache tokens included. In cached-prompt runs this underreports prompt usage and breaks the OpenAI-compatible invariant many clients rely on (total_tokens = prompt_tokens + completion_tokens), which can skew billing/analytics for consumers of /v1/chat/completions.

Useful? React with 👍 / 👎.

…mpletions response The non-streaming /v1/chat/completions handler returned hardcoded zeros for usage (prompt_tokens, completion_tokens, total_tokens). The actual token counts were available in result.meta.agentMeta.usage but were never extracted. Extract input/output token counts from the agent run result and map them to the OpenAI-compatible usage fields. Fixes openclaw#38735

extrasmall0 · 2026-03-30T01:42:48Z

Closing stale PR — will revisit if still relevant. Thanks!

openclaw-barnacle Bot added gateway Gateway runtime size: XS labels Mar 7, 2026

greptile-apps Bot reviewed Mar 7, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Mar 7, 2026

View reviewed changes

openclaw-barnacle Bot added size: S and removed size: XS labels Mar 7, 2026

chatgpt-codex-connector Bot reviewed Mar 7, 2026

View reviewed changes

gambletan reviewed Mar 7, 2026

View reviewed changes

extrasmall0 force-pushed the raymond/fix-chat-completions-usage branch from fbd4b0c to 977a7f9 Compare March 7, 2026 20:46

chatgpt-codex-connector Bot reviewed Mar 7, 2026

View reviewed changes

extrasmall0 force-pushed the raymond/fix-chat-completions-usage branch from 977a7f9 to 3bd93c4 Compare March 9, 2026 16:07

extrasmall0 added 3 commits March 10, 2026 09:11

include cache tokens and precomputed total in usage mapping

b18f861

include prompt_tokens_details when cache tokens present

17c14f3

extrasmall0 force-pushed the raymond/fix-chat-completions-usage branch from 3bd93c4 to 17c14f3 Compare March 10, 2026 16:12

extrasmall0 closed this Mar 30, 2026

Lellansin mentioned this pull request Apr 8, 2026

fix(gateway): return real usage for OpenAI-compatible chat completions #62986

Merged

25 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(openai-http): propagate token usage in /v1/chat/completions response#38893

fix(openai-http): propagate token usage in /v1/chat/completions response#38893
extrasmall0 wants to merge 3 commits into
openclaw:mainfrom
extrasmall0:raymond/fix-chat-completions-usage

extrasmall0 commented Mar 7, 2026

Uh oh!

greptile-apps Bot commented Mar 7, 2026

Uh oh!

greptile-apps Bot Mar 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 7, 2026

Uh oh!

gambletan left a comment

Uh oh!

extrasmall0 commented Mar 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 7, 2026

Uh oh!

extrasmall0 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

+      const agentUsage = (
+        result as {
+          meta?: {
+            agentMeta?: {
+              usage?: {
+                input?: number;
+                output?: number;
+                cacheRead?: number;
+                cacheWrite?: number;
+                total?: number;
+              };
+            };
+          };
+        }
+      )?.meta?.agentMeta?.usage;
+      const promptTokens = agentUsage?.input ?? 0;
+      const completionTokens = agentUsage?.output ?? 0;
+      const cacheReadTokens = agentUsage?.cacheRead ?? 0;
+      const cacheWriteTokens = agentUsage?.cacheWrite ?? 0;
+      const totalTokens =
+        agentUsage?.total ?? promptTokens + completionTokens + cacheReadTokens + cacheWriteTokens;
+      sendJson(res, 200, {
+        id: runId,
+        object: "chat.completion",
+        created: Math.floor(Date.now() / 1000),
+        model,
+        choices: [
+          {
+            index: 0,
+            message: { role: "assistant", content },
+            finish_reason: "stop",
+          },
+        ],
+        usage: {
+          prompt_tokens: promptTokens,
+          completion_tokens: completionTokens,
+          total_tokens: totalTokens,
+        },
+      });

		const promptTokens = agentUsage?.input ?? 0;
		const completionTokens = agentUsage?.output ?? 0;

Uh oh!

Conversation

extrasmall0 commented Mar 7, 2026

Problem

Root Cause

Fix

Tests

Uh oh!

greptile-apps Bot commented Mar 7, 2026

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps Bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

gambletan left a comment

Choose a reason for hiding this comment

Review: propagate token usage in /v1/chat/completions response

What works well

Suggestions

Uh oh!

extrasmall0 commented Mar 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

extrasmall0 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants