Description
The /v1/chat/completions endpoint always returns zero token usage regardless of provider:
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
Tested with both google/gemini-3.1-pro-preview and openai/gpt-4o-mini — both return zeros.
Root cause
In the gateway CLI bundle (gateway-cli-*.js, around line 18022), the non-streaming handler discards the full agentCommandFromIngress() result and only extracts text via resolveAgentResponseText(). The response object is then constructed with hardcoded zeros.
However, the usage data is available in result.meta.agentMeta.usage (with .input and .output fields). Other handlers in the same file (e.g., the cron/session handler) already extract and use this data via hasNonzeroUsage(usage).
Suggested fix
Instead of:
const content = resolveAgentResponseText(await agentCommandFromIngress(commandInput, defaultRuntime, deps));
// ... usage: { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 }
Do:
const result = await agentCommandFromIngress(commandInput, defaultRuntime, deps);
const content = resolveAgentResponseText(result);
const usage = result?.meta?.agentMeta?.usage;
// ... usage: { prompt_tokens: usage?.input ?? 0, completion_tokens: usage?.output ?? 0, total_tokens: (usage?.input ?? 0) + (usage?.output ?? 0) }
The streaming handler (stream: true) also does not emit a usage chunk — ideally a final SSE chunk with usage should be sent before [DONE], per the OpenAI spec.
Environment
- OpenClaw 2026.3.2 (85377a2)
- Tested providers: Google (API key), OpenAI (API key)
- Endpoint:
gateway.http.endpoints.chatCompletions.enabled: true
Impact
Anyone using the /v1/chat/completions endpoint for programmatic access cannot track token usage for cost monitoring. Related: #12299
Description
The
/v1/chat/completionsendpoint always returns zero token usage regardless of provider:Tested with both
google/gemini-3.1-pro-previewandopenai/gpt-4o-mini— both return zeros.Root cause
In the gateway CLI bundle (
gateway-cli-*.js, around line 18022), the non-streaming handler discards the fullagentCommandFromIngress()result and only extracts text viaresolveAgentResponseText(). The response object is then constructed with hardcoded zeros.However, the usage data is available in
result.meta.agentMeta.usage(with.inputand.outputfields). Other handlers in the same file (e.g., the cron/session handler) already extract and use this data viahasNonzeroUsage(usage).Suggested fix
Instead of:
Do:
The streaming handler (
stream: true) also does not emit a usage chunk — ideally a final SSE chunk withusageshould be sent before[DONE], per the OpenAI spec.Environment
gateway.http.endpoints.chatCompletions.enabled: trueImpact
Anyone using the
/v1/chat/completionsendpoint for programmatic access cannot track token usage for cost monitoring. Related: #12299