feat(gateway): return actual token usage in /v1/chat/completions#24604
Closed
jogelin wants to merge 2 commits into
Closed
feat(gateway): return actual token usage in /v1/chat/completions#24604jogelin wants to merge 2 commits into
jogelin wants to merge 2 commits into
Conversation
25d2208 to
59c0f8c
Compare
Previously the OpenAI-compatible chat completions endpoint always
returned usage: { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 }.
This extracts actual token counts from the agent command result
(result.meta.agentMeta.usage) and returns them in the response,
using the existing normalizeUsage() utility.
This enables downstream consumers (pipeline runners, scripts) to
track per-request token consumption without needing the full
WebSocket session protocol.
59c0f8c to
f80b3e8
Compare
|
This pull request has been automatically marked as stale due to inactivity. |
Contributor
Author
Reopen |
|
This pull request has been automatically marked as stale due to inactivity. |
|
This pull request has been automatically marked as stale due to inactivity. |
|
Closing due to inactivity. |
25 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The OpenAI-compatible
/v1/chat/completionsendpoint always returns hardcoded zero token counts:This makes it impossible for downstream consumers (pipeline runners, scripts, monitoring tools) to track per-request token consumption without using the full WebSocket session protocol.
Solution
Extract actual token counts from the agent command result (
result.meta.agentMeta.usage) and return them in the HTTP response, using the existingnormalizeUsage()utility fromagents/usage.ts.Changes
src/gateway/openai-http.ts: AddresolveAgentUsage()function that extracts and normalizes token usage from the agent result, replacing the hardcoded zeros.Impact
/v1/chat/completionsresponses now include actualprompt_tokensandcompletion_tokensGreptile Summary
Adds actual token usage reporting to the
/v1/chat/completionsendpoint by extracting usage data from agent command results and normalizing it using the existingnormalizeUsageutility.Changes:
normalizeUsagefromagents/usage.tsAgentCommandResulttype that captures themeta.agentMeta.usagestructureresolveAgentUsage()function to extract and format token counts from agent resultsresolveAgentUsage(result)in non-streaming responsesNote: Per PR description, streaming responses are intentionally excluded from this change.
Confidence Score: 5/5
normalizeUsageutility, with proper fallback to zero values when data is unavailable. The implementation correctly handles null/undefined values, and only affects non-streaming responses as intended. The type definitions are appropriate and consistent with existing patterns in the codebase.Last reviewed commit: 25d2208