fix(fireworks_ai): account for cache read/creation tokens in cost calculator#24860
fix(fireworks_ai): account for cache read/creation tokens in cost calculator#24860GopalGB wants to merge 2 commits intoBerriAI:mainfrom
Conversation
…culator The Fireworks AI cost_per_token function only calculated costs using prompt_tokens * input_cost_per_token, ignoring cache_read_input_tokens and cache_creation_input_tokens from the Usage object. This caused incorrect cost reporting when prompt caching was active. Now adjusts the prompt cost by applying the differential between the cache-specific rate and the standard input rate for cached tokens. Fixes BerriAI#24774
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
Greptile SummaryThis PR fixes
Confidence Score: 5/5Safe to merge — the fix is a targeted, mathematically correct cost adjustment with a deterministic regression test. No P0 or P1 issues found. The differential-rate adjustment logic is correct. The test now unconditionally exercises the critical assertion using a model with verified cache pricing in the config. No auth, security, or backwards-compatibility concerns. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/llms/fireworks_ai/cost_calculator.py | Adds differential cost adjustment for cache read/creation tokens; math is correct (already-charged base rate adjusted to the cached rate); no issues found. |
| tests/local_testing/test_completion_cost.py | Adds a pure unit test using kimi-k2p5 (which has cache_read_input_token_cost in the pricing config), no real network calls, both key assertions are always exercised. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[cost_per_token called] --> B[get_model_info for fireworks_ai model]
B --> C{Model found?}
C -- No --> D[Fallback: get_base_model_for_pricing\ne.g. fireworks-ai-above-16b]
D --> E[get_model_info for base tier]
C -- Yes --> F[input_cost = prompt_tokens x input_cost_per_token]
E --> F
F --> G{cache_read_input_tokens > 0\nAND cache_read_input_token_cost set?}
G -- Yes --> H[prompt_cost += cache_read_tokens\nx cache_read_cost - input_rate]
G -- No --> I{cache_creation_input_tokens > 0\nAND cache_creation_input_token_cost set?}
H --> I
I -- Yes --> J[prompt_cost += cache_creation_tokens\nx cache_creation_cost - input_rate]
I -- No --> K[completion_cost = completion_tokens\nx output_cost_per_token]
J --> K
K --> L[return prompt_cost, completion_cost]
Reviews (2): Last reviewed commit: "fix(test): use model with cache pricing ..." | Re-trigger Greptile
| if model_info.get("cache_read_input_token_cost") is not None: | ||
| assert prompt_cost_cached < prompt_cost_no_cache, ( | ||
| "Prompt cost with 800 cache-read tokens should be less than " | ||
| "full-price for the same total prompt tokens" | ||
| ) |
There was a problem hiding this comment.
Critical assertion never executes — test provides zero coverage for the fix
The guarded assertion if model_info.get("cache_read_input_token_cost") is not None: will always be False for fireworks_ai/llama-v3p3-70b-instruct. That model has no direct entry in model_prices_and_context_window.json (only the accounts/fireworks/models/ long-form path exists, and it has no cache_read_input_token_cost). cost_per_token therefore falls back to the generic fireworks-ai-above-16b tier, which also has no cache pricing.
As a result, the key assertion — prompt_cost_cached < prompt_cost_no_cache — is never reached, meaning the test always passes but never validates that the fix actually works.
The model to use is fireworks_ai/kimi-k2p5 (which does have cache_read_input_token_cost in the pricing config, at a rate lower than input_cost_per_token), making the assertion deterministically exercised. The conditional if guard can then be removed entirely:
prompt_cost_cached, completion_cost_cached = cost_per_token(
- model="fireworks_ai/llama-v3p3-70b-instruct", usage=usage_with_cache
+ model="fireworks_ai/kimi-k2p5", usage=usage_with_cache
)
prompt_cost_no_cache, completion_cost_no_cache = cost_per_token(
- model="fireworks_ai/llama-v3p3-70b-instruct", usage=usage_no_cache
+ model="fireworks_ai/kimi-k2p5", usage=usage_no_cache
)
assert completion_cost_cached == completion_cost_no_cache
- model_info = litellm.get_model_info(
- model="fireworks_ai/llama-v3p3-70b-instruct",
- custom_llm_provider="fireworks_ai",
- )
- if model_info.get("cache_read_input_token_cost") is not None:
- assert prompt_cost_cached < prompt_cost_no_cache, (
- "Prompt cost with 800 cache-read tokens should be less than "
- "full-price for the same total prompt tokens"
- )
+ # kimi-k2p5 defines cache_read_input_token_cost < input_cost_per_token,
+ # so 800 cache-read tokens must yield a lower prompt cost.
+ assert prompt_cost_cached < prompt_cost_no_cache, (
+ "Prompt cost with 800 cache-read tokens should be less than "
+ "full-price for the same total prompt tokens"
+ )Without this fix the regression test from the PR description never actually runs, defeating its purpose as a safeguard against future breakage.
Rule Used: What: Flag any modifications to existing tests and... (source)
|
I have read the CLA Document and I hereby sign the CLA |
Switch test from `fireworks_ai/llama-v3p3-70b-instruct` (no cache_read_input_token_cost) to `fireworks_ai/kimi-k2p5` (has cache pricing at 1e-07 vs input 6e-07). Remove the conditional guard so the assertion always runs. Addresses Greptile review feedback on BerriAI#24860. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
cost_per_token()to correctly pricecache_read_input_tokensandcache_creation_input_tokensprompt_tokens * input_cost_per_token, ignoring cache-specific rates already defined in model_infoRoot Cause
fireworks_ai/cost_calculator.py:cost_per_tokenused a simple multiplication ofprompt_tokens * input_cost_per_tokenwithout checking forcache_read_input_tokensorcache_creation_input_tokensin the Usage object. Model info already hadcache_read_input_token_costset correctly (e.g.,1e-07for kimi-k2p5), but it was never read.Approach
Cache tokens are already included in
prompt_tokens, so the fix applies the cost differential:This correctly subtracts the overcharge and applies the discounted rate.
Test Plan
test_fireworks_ai_cache_token_pricingaddedFixes #24774