Skip to content

fix: apply cache-read pricing in custom cost path#26893

Open
Genmin wants to merge 2 commits into
BerriAI:litellm_internal_stagingfrom
Genmin:fix/custom-pricing-cache-read
Open

fix: apply cache-read pricing in custom cost path#26893
Genmin wants to merge 2 commits into
BerriAI:litellm_internal_stagingfrom
Genmin:fix/custom-pricing-cache-read

Conversation

@Genmin

@Genmin Genmin commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #26807.

Validation

  • uv run --extra proxy pytest tests/test_litellm/test_cost_calculator.py -q -> 40 passed
  • uv run --extra proxy black --check litellm/cost_calculator.py tests/test_litellm/test_cost_calculator.py
  • uv run --extra proxy ruff check litellm/cost_calculator.py
  • uv run --extra proxy mypy litellm/cost_calculator.py
  • git diff --check -- litellm/cost_calculator.py tests/test_litellm/test_cost_calculator.py

Note: a direct local ruff check on tests/test_litellm/test_cost_calculator.py reports pre-existing print statements throughout that test file; CI lint for this repo runs on the package directory, and the touched implementation file is clean.

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@greptile-apps

greptile-apps Bot commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes the custom-pricing cost path (_cost_per_token_custom_pricing_helper) to honour cache_read_input_token_cost when set, splitting prompt tokens into uncached and cached portions and billing each at its respective rate. Existing behaviour (no cache-read rate configured) is preserved unchanged, and two regression tests covering both cases are added.

Confidence Score: 4/5

Safe to merge; the fix is correct and well-tested for the primary completion_cost path.

The implementation is mathematically correct and both new tests pass. The only finding is a P2 gap in the usage_object fallback that only affects direct callers of cost_per_token — the main completion_cost path is unaffected.

litellm/cost_calculator.py — specifically the _parse_prompt_tokens_details fallback at line 192.

Important Files Changed

Filename Overview
litellm/cost_calculator.py Adds cache_read_input_tokens + usage_object params to _cost_per_token_custom_pricing_helper; correctly splits prompt tokens into uncached/cached portions and applies the custom cache-read rate. Minor gap: fallback parsing via _parse_prompt_tokens_details misses Anthropic-style direct usage.cache_read_input_tokens.
tests/test_litellm/test_cost_calculator.py Adds two regression tests: one verifying that cache_read_input_token_cost splits pricing correctly, and one confirming that omitting the rate preserves the original full input_cost_per_token behavior. No real network calls; math is correct.

Reviews (1): Last reviewed commit: "fix: apply cached token rate for custom ..." | Re-trigger Greptile

Comment thread litellm/cost_calculator.py Outdated
Comment on lines +192 to +195
if not cache_read_input_tokens and usage_object is not None:
cache_read_input_tokens = _parse_prompt_tokens_details(usage_object)[
"cache_hit_tokens"
]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Fallback misses top-level cache_read_input_tokens attribute

The fallback path calls _parse_prompt_tokens_details(usage_object), which only reads usage.prompt_tokens_details.cached_tokens. For Anthropic-style Usage objects that store cache tokens as a direct top-level attribute (usage.cache_read_input_tokens) rather than inside prompt_tokens_details, this returns 0 — so cache-read pricing is silently skipped.

This doesn't affect calls routed through completion_cost (which always extracts and explicitly passes cache_read_input_tokens at line 1211), but any direct caller of cost_per_token that supplies a usage_object with usage_object.cache_read_input_tokens > 0 but no prompt_tokens_details would miss the cache discount.

Consider also checking getattr(usage_object, "cache_read_input_tokens", None) before falling back to _parse_prompt_tokens_details:

if not cache_read_input_tokens and usage_object is not None:
    direct = getattr(usage_object, "cache_read_input_tokens", None) or 0
    cache_read_input_tokens = (
        direct or _parse_prompt_tokens_details(usage_object)["cache_hit_tokens"]
    )

@codecov

codecov Bot commented Apr 30, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Bojun-Vvibe added a commit to Bojun-Vvibe/oss-contributions that referenced this pull request Apr 30, 2026
… #26275

Second batch of drip-213 reviews:
- openai/codex#20465 derive PermissionProfile directly in test helper (merge-as-is)
- BerriAI/litellm#26895 preserve aiohttp raw response headers for UTF-8 (merge-as-is)
- BerriAI/litellm#26893 cache-read pricing in custom-cost path (merge-after-nits)
- google-gemini/gemini-cli#26275 preserve non-text parts through hook translator (merge-after-nits)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Cached prompt tokens billed as regular input in custom pricing cost path

2 participants