fix(fireworks_ai): account for cache read/creation tokens in cost calculator by GopalGB · Pull Request #24860 · BerriAI/litellm

GopalGB · 2026-03-31T13:34:42Z

Summary

Fixes Fireworks AI cost_per_token() to correctly price cache_read_input_tokens and cache_creation_input_tokens
Previously only calculated prompt_tokens * input_cost_per_token, ignoring cache-specific rates already defined in model_info
Adds regression test validating that cached token pricing produces lower costs than full-price input tokens

Root Cause

fireworks_ai/cost_calculator.py:cost_per_token used a simple multiplication of prompt_tokens * input_cost_per_token without checking for cache_read_input_tokens or cache_creation_input_tokens in the Usage object. Model info already had cache_read_input_token_cost set correctly (e.g., 1e-07 for kimi-k2p5), but it was never read.

Approach

Cache tokens are already included in prompt_tokens, so the fix applies the cost differential:

prompt_cost += cache_read_tokens * (cache_rate - input_rate)

This correctly subtracts the overcharge and applies the discounted rate.

Test Plan

New test test_fireworks_ai_cache_token_pricing added
Validates completion cost unchanged by cache tokens
Validates prompt cost is lower when cache tokens are present (for models with cache pricing)

Fixes #24774

…culator The Fireworks AI cost_per_token function only calculated costs using prompt_tokens * input_cost_per_token, ignoring cache_read_input_tokens and cache_creation_input_tokens from the Usage object. This caused incorrect cost reporting when prompt caching was active. Now adjusts the prompt cost by applying the differential between the cache-specific rate and the standard input rate for cached tokens. Fixes BerriAI#24774

vercel · 2026-03-31T13:34:52Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 31, 2026 4:01pm

CLAassistant · 2026-03-31T13:34:55Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codspeed-hq · 2026-03-31T13:36:39Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing GopalGB:fix/fireworks-ai-cache-token-pricing (17dff85) with main (50a52f6)}

greptile-apps · 2026-03-31T13:39:54Z

Greptile Summary

This PR fixes fireworks_ai/cost_calculator.py to correctly account for cache_read_input_tokens and cache_creation_input_tokens when computing prompt costs, and adds a deterministic regression test using fireworks_ai/kimi-k2p5 (a model that actually has cache_read_input_token_cost in the pricing config).

Core fix: The old code charged all prompt_tokens at the flat input_cost_per_token rate. The new code applies a differential adjustment: since cache tokens are already counted in prompt_tokens, it adds cache_tokens * (cache_rate - input_rate) — a negative delta for cheaper cached reads, a positive delta for more-expensive cache creation. The arithmetic is correct.
Test: test_fireworks_ai_cache_token_pricing uses the local model cost map (no network calls), constructs mock Usage objects, and asserts prompt_cost_cached < prompt_cost_no_cache — an assertion that is deterministically exercised because kimi-k2p5 has cache_read_input_token_cost: 1e-07 < input_cost_per_token: 6e-07. The previous version of this test (from the prior review thread) had a conditional guard that silently skipped the critical assertion; that guard is gone in this iteration.
No security, auth, or backwards-compatibility concerns.

Confidence Score: 5/5

Safe to merge — the fix is a targeted, mathematically correct cost adjustment with a deterministic regression test.

No P0 or P1 issues found. The differential-rate adjustment logic is correct. The test now unconditionally exercises the critical assertion using a model with verified cache pricing in the config. No auth, security, or backwards-compatibility concerns.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/llms/fireworks_ai/cost_calculator.py	Adds differential cost adjustment for cache read/creation tokens; math is correct (already-charged base rate adjusted to the cached rate); no issues found.
tests/local_testing/test_completion_cost.py	Adds a pure unit test using `kimi-k2p5` (which has `cache_read_input_token_cost` in the pricing config), no real network calls, both key assertions are always exercised.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[cost_per_token called] --> B[get_model_info for fireworks_ai model]
    B --> C{Model found?}
    C -- No --> D[Fallback: get_base_model_for_pricing\ne.g. fireworks-ai-above-16b]
    D --> E[get_model_info for base tier]
    C -- Yes --> F[input_cost = prompt_tokens x input_cost_per_token]
    E --> F
    F --> G{cache_read_input_tokens > 0\nAND cache_read_input_token_cost set?}
    G -- Yes --> H[prompt_cost += cache_read_tokens\nx cache_read_cost - input_rate]
    G -- No --> I{cache_creation_input_tokens > 0\nAND cache_creation_input_token_cost set?}
    H --> I
    I -- Yes --> J[prompt_cost += cache_creation_tokens\nx cache_creation_cost - input_rate]
    I -- No --> K[completion_cost = completion_tokens\nx output_cost_per_token]
    J --> K
    K --> L[return prompt_cost, completion_cost]

_{Reviews (2): Last reviewed commit: "fix(test): use model with cache pricing ..." | Re-trigger Greptile}

greptile-apps · 2026-03-31T13:39:58Z

tests/local_testing/test_completion_cost.py

+    if model_info.get("cache_read_input_token_cost") is not None:
+        assert prompt_cost_cached < prompt_cost_no_cache, (
+            "Prompt cost with 800 cache-read tokens should be less than "
+            "full-price for the same total prompt tokens"
+        )


Critical assertion never executes — test provides zero coverage for the fix

The guarded assertion if model_info.get("cache_read_input_token_cost") is not None: will always be False for fireworks_ai/llama-v3p3-70b-instruct. That model has no direct entry in model_prices_and_context_window.json (only the accounts/fireworks/models/ long-form path exists, and it has no cache_read_input_token_cost). cost_per_token therefore falls back to the generic fireworks-ai-above-16b tier, which also has no cache pricing.

As a result, the key assertion — prompt_cost_cached < prompt_cost_no_cache — is never reached, meaning the test always passes but never validates that the fix actually works.

The model to use is fireworks_ai/kimi-k2p5 (which does have cache_read_input_token_cost in the pricing config, at a rate lower than input_cost_per_token), making the assertion deterministically exercised. The conditional if guard can then be removed entirely:

prompt_cost_cached, completion_cost_cached = cost_per_token( - model="fireworks_ai/llama-v3p3-70b-instruct", usage=usage_with_cache + model="fireworks_ai/kimi-k2p5", usage=usage_with_cache ) prompt_cost_no_cache, completion_cost_no_cache = cost_per_token( - model="fireworks_ai/llama-v3p3-70b-instruct", usage=usage_no_cache + model="fireworks_ai/kimi-k2p5", usage=usage_no_cache ) assert completion_cost_cached == completion_cost_no_cache - model_info = litellm.get_model_info( - model="fireworks_ai/llama-v3p3-70b-instruct", - custom_llm_provider="fireworks_ai", - ) - if model_info.get("cache_read_input_token_cost") is not None: - assert prompt_cost_cached < prompt_cost_no_cache, ( - "Prompt cost with 800 cache-read tokens should be less than " - "full-price for the same total prompt tokens" - ) + # kimi-k2p5 defines cache_read_input_token_cost < input_cost_per_token, + # so 800 cache-read tokens must yield a lower prompt cost. + assert prompt_cost_cached < prompt_cost_no_cache, ( + "Prompt cost with 800 cache-read tokens should be less than " + "full-price for the same total prompt tokens" + )

Without this fix the regression test from the PR description never actually runs, defeating its purpose as a safeguard against future breakage.

Rule Used: What: Flag any modifications to existing tests and... (source)

GopalGB · 2026-03-31T15:47:04Z

I have read the CLA Document and I hereby sign the CLA

Switch test from `fireworks_ai/llama-v3p3-70b-instruct` (no cache_read_input_token_cost) to `fireworks_ai/kimi-k2p5` (has cache pricing at 1e-07 vs input 6e-07). Remove the conditional guard so the assertion always runs. Addresses Greptile review feedback on BerriAI#24860. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel bot deployed to Preview March 31, 2026 13:37 View deployment

greptile-apps bot reviewed Mar 31, 2026

View reviewed changes

vercel bot deployed to Preview March 31, 2026 16:01 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(fireworks_ai): account for cache read/creation tokens in cost calculator#24860

fix(fireworks_ai): account for cache read/creation tokens in cost calculator#24860
GopalGB wants to merge 2 commits intoBerriAI:mainfrom
GopalGB:fix/fireworks-ai-cache-token-pricing

GopalGB commented Mar 31, 2026

Uh oh!

vercel bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 31, 2026

Uh oh!

codspeed-hq bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 31, 2026 •

edited

Loading

Important Files Changed

Uh oh!

greptile-apps bot Mar 31, 2026

Uh oh!

GopalGB commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

GopalGB commented Mar 31, 2026

Summary

Root Cause

Approach

Test Plan

Uh oh!

vercel bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Mar 31, 2026

Uh oh!

codspeed-hq bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

GopalGB commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Mar 31, 2026 •

edited

Loading

codspeed-hq bot commented Mar 31, 2026 •

edited

Loading

greptile-apps bot commented Mar 31, 2026 •

edited

Loading