feat(@ai-sdk/google): preserve per-modality token details in usage data#14016
Merged
felixarntz merged 4 commits intomainfrom Apr 3, 2026
Merged
feat(@ai-sdk/google): preserve per-modality token details in usage data#14016felixarntz merged 4 commits intomainfrom
felixarntz merged 4 commits intomainfrom
Conversation
Add `promptTokensDetails` and `candidatesTokensDetails` to the Gemini response usage schema so per-modality token counts (text, image, audio, video) flow through to `usage.raw` instead of being stripped by Zod. This enables downstream consumers like ai-gateway to bill different input modalities at their correct rates (e.g. audio at $0.50/1M vs text/image/video at $0.25/1M).
felixarntz
approved these changes
Apr 3, 2026
Collaborator
felixarntz
left a comment
There was a problem hiding this comment.
@R-Taneja Awesome, LGTM. Added 2 examples for verification.
vercel-ai-sdk bot
pushed a commit
that referenced
this pull request
Apr 3, 2026
…ta (#14016) ## Summary - Add `promptTokensDetails` and `candidatesTokensDetails` to the Gemini response `usageSchema` so per-modality token counts (TEXT, IMAGE, AUDIO, VIDEO) are no longer stripped by Zod parsing - These fields now flow through to `usage.raw`, enabling downstream consumers to distinguish token usage by modality ## Why Gemini charges different rates for different input modalities (e.g. audio input is $0.50/1M tokens vs $0.25/1M for text/image/video). The ai-gateway needs per-modality token counts to bill correctly. Previously, `promptTokensDetails` was present in the Gemini API response but stripped during Zod schema validation, making it impossible to differentiate modalities downstream. ## Validation - Ran the `generate-text/google/image.ts` example and confirmed `promptTokensDetails` now appears in `usage.raw` with both `TEXT` and `IMAGE` modality entries - All 328 existing tests pass; 10 snapshots updated to include the new fields --------- Co-authored-by: Felix Arntz <felix.arntz@vercel.com>
Contributor
|
✅ Backport PR created: #14110 |
Contributor
|
🚀 Published in:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
promptTokensDetailsandcandidatesTokensDetailsto the Gemini responseusageSchemaso per-modality token counts (TEXT, IMAGE, AUDIO, VIDEO) are no longer stripped by Zod parsingusage.raw, enabling downstream consumers to distinguish token usage by modalityWhy
Gemini charges different rates for different input modalities (e.g. audio input is $0.50/1M tokens vs $0.25/1M for text/image/video). The ai-gateway needs per-modality token counts to bill correctly. Previously,
promptTokensDetailswas present in the Gemini API response but stripped during Zod schema validation, making it impossible to differentiate modalities downstream.Validation
generate-text/google/image.tsexample and confirmedpromptTokensDetailsnow appears inusage.rawwith bothTEXTandIMAGEmodality entries