fix: apply cache-read pricing in custom cost path by Genmin · Pull Request #26893 · BerriAI/litellm

Genmin · 2026-04-30T16:56:51Z

Summary

apply cache_read_input_token_cost when completion_cost uses custom_cost_per_token
preserve existing custom-pricing behavior when no cache-read rate is configured
add regression coverage for the reproduced custom-pricing cached-token case from [Bug]: Cached prompt tokens billed as regular input in custom pricing cost path #26807

Validation

uv run --extra proxy pytest tests/test_litellm/test_cost_calculator.py -q -> 40 passed
uv run --extra proxy black --check litellm/cost_calculator.py tests/test_litellm/test_cost_calculator.py
uv run --extra proxy ruff check litellm/cost_calculator.py
uv run --extra proxy mypy litellm/cost_calculator.py
git diff --check -- litellm/cost_calculator.py tests/test_litellm/test_cost_calculator.py

Note: a direct local ruff check on tests/test_litellm/test_cost_calculator.py reports pre-existing print statements throughout that test file; CI lint for this repo runs on the package directory, and the touched implementation file is clean.

CLAassistant · 2026-04-30T16:58:48Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

greptile-apps · 2026-04-30T17:00:02Z

Greptile Summary

This PR fixes the custom-pricing cost path (_cost_per_token_custom_pricing_helper) to honour cache_read_input_token_cost when set, splitting prompt tokens into uncached and cached portions and billing each at its respective rate. Existing behaviour (no cache-read rate configured) is preserved unchanged, and two regression tests covering both cases are added.

Confidence Score: 4/5

Safe to merge; the fix is correct and well-tested for the primary completion_cost path.

The implementation is mathematically correct and both new tests pass. The only finding is a P2 gap in the usage_object fallback that only affects direct callers of cost_per_token — the main completion_cost path is unaffected.

litellm/cost_calculator.py — specifically the _parse_prompt_tokens_details fallback at line 192.

Important Files Changed

Filename	Overview
litellm/cost_calculator.py	Adds cache_read_input_tokens + usage_object params to _cost_per_token_custom_pricing_helper; correctly splits prompt tokens into uncached/cached portions and applies the custom cache-read rate. Minor gap: fallback parsing via _parse_prompt_tokens_details misses Anthropic-style direct usage.cache_read_input_tokens.
tests/test_litellm/test_cost_calculator.py	Adds two regression tests: one verifying that cache_read_input_token_cost splits pricing correctly, and one confirming that omitting the rate preserves the original full input_cost_per_token behavior. No real network calls; math is correct.

_{Reviews (1): Last reviewed commit: "fix: apply cached token rate for custom ..." | Re-trigger Greptile}

greptile-apps · 2026-04-30T17:00:09Z

+            if not cache_read_input_tokens and usage_object is not None:
+                cache_read_input_tokens = _parse_prompt_tokens_details(usage_object)[
+                    "cache_hit_tokens"
+                ]


Fallback misses top-level cache_read_input_tokens attribute

The fallback path calls _parse_prompt_tokens_details(usage_object), which only reads usage.prompt_tokens_details.cached_tokens. For Anthropic-style Usage objects that store cache tokens as a direct top-level attribute (usage.cache_read_input_tokens) rather than inside prompt_tokens_details, this returns 0 — so cache-read pricing is silently skipped.

This doesn't affect calls routed through completion_cost (which always extracts and explicitly passes cache_read_input_tokens at line 1211), but any direct caller of cost_per_token that supplies a usage_object with usage_object.cache_read_input_tokens > 0 but no prompt_tokens_details would miss the cache discount.

Consider also checking getattr(usage_object, "cache_read_input_tokens", None) before falling back to _parse_prompt_tokens_details:

if not cache_read_input_tokens and usage_object is not None: direct = getattr(usage_object, "cache_read_input_tokens", None) or 0 cache_read_input_tokens = ( direct or _parse_prompt_tokens_details(usage_object)["cache_hit_tokens"] )

codecov · 2026-04-30T17:00:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

… #26275 Second batch of drip-213 reviews: - openai/codex#20465 derive PermissionProfile directly in test helper (merge-as-is) - BerriAI/litellm#26895 preserve aiohttp raw response headers for UTF-8 (merge-as-is) - BerriAI/litellm#26893 cache-read pricing in custom-cost path (merge-after-nits) - google-gemini/gemini-cli#26275 preserve non-text parts through hook translator (merge-after-nits)

greptile-apps Bot reviewed Apr 30, 2026

View reviewed changes

Genmin added 2 commits May 1, 2026 14:58

fix: apply cached token rate for custom pricing

92e8e98

fix: use top-level cache read tokens in custom pricing

023e05b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: apply cache-read pricing in custom cost path#26893

fix: apply cache-read pricing in custom cost path#26893
Genmin wants to merge 2 commits into
BerriAI:litellm_internal_stagingfrom
Genmin:fix/custom-pricing-cache-read

Genmin commented Apr 30, 2026

Uh oh!

CLAassistant commented Apr 30, 2026

Uh oh!

greptile-apps Bot commented Apr 30, 2026

Important Files Changed

Uh oh!

greptile-apps Bot Apr 30, 2026

Uh oh!

codecov Bot commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Genmin commented Apr 30, 2026

Summary

Validation

Uh oh!

CLAassistant commented Apr 30, 2026

Uh oh!

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 30, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants