fix: apply cache-read pricing in custom cost path#26893
Conversation
|
|
Greptile SummaryThis PR fixes the custom-pricing cost path ( Confidence Score: 4/5Safe to merge; the fix is correct and well-tested for the primary completion_cost path. The implementation is mathematically correct and both new tests pass. The only finding is a P2 gap in the usage_object fallback that only affects direct callers of cost_per_token — the main completion_cost path is unaffected. litellm/cost_calculator.py — specifically the _parse_prompt_tokens_details fallback at line 192.
|
| Filename | Overview |
|---|---|
| litellm/cost_calculator.py | Adds cache_read_input_tokens + usage_object params to _cost_per_token_custom_pricing_helper; correctly splits prompt tokens into uncached/cached portions and applies the custom cache-read rate. Minor gap: fallback parsing via _parse_prompt_tokens_details misses Anthropic-style direct usage.cache_read_input_tokens. |
| tests/test_litellm/test_cost_calculator.py | Adds two regression tests: one verifying that cache_read_input_token_cost splits pricing correctly, and one confirming that omitting the rate preserves the original full input_cost_per_token behavior. No real network calls; math is correct. |
Reviews (1): Last reviewed commit: "fix: apply cached token rate for custom ..." | Re-trigger Greptile
| if not cache_read_input_tokens and usage_object is not None: | ||
| cache_read_input_tokens = _parse_prompt_tokens_details(usage_object)[ | ||
| "cache_hit_tokens" | ||
| ] |
There was a problem hiding this comment.
Fallback misses top-level
cache_read_input_tokens attribute
The fallback path calls _parse_prompt_tokens_details(usage_object), which only reads usage.prompt_tokens_details.cached_tokens. For Anthropic-style Usage objects that store cache tokens as a direct top-level attribute (usage.cache_read_input_tokens) rather than inside prompt_tokens_details, this returns 0 — so cache-read pricing is silently skipped.
This doesn't affect calls routed through completion_cost (which always extracts and explicitly passes cache_read_input_tokens at line 1211), but any direct caller of cost_per_token that supplies a usage_object with usage_object.cache_read_input_tokens > 0 but no prompt_tokens_details would miss the cache discount.
Consider also checking getattr(usage_object, "cache_read_input_tokens", None) before falling back to _parse_prompt_tokens_details:
if not cache_read_input_tokens and usage_object is not None:
direct = getattr(usage_object, "cache_read_input_tokens", None) or 0
cache_read_input_tokens = (
direct or _parse_prompt_tokens_details(usage_object)["cache_hit_tokens"]
)
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
… #26275 Second batch of drip-213 reviews: - openai/codex#20465 derive PermissionProfile directly in test helper (merge-as-is) - BerriAI/litellm#26895 preserve aiohttp raw response headers for UTF-8 (merge-as-is) - BerriAI/litellm#26893 cache-read pricing in custom-cost path (merge-after-nits) - google-gemini/gemini-cli#26275 preserve non-text parts through hook translator (merge-after-nits)
Summary
cache_read_input_token_costwhencompletion_costusescustom_cost_per_tokenFixes #26807.
Validation
uv run --extra proxy pytest tests/test_litellm/test_cost_calculator.py -q-> 40 passeduv run --extra proxy black --check litellm/cost_calculator.py tests/test_litellm/test_cost_calculator.pyuv run --extra proxy ruff check litellm/cost_calculator.pyuv run --extra proxy mypy litellm/cost_calculator.pygit diff --check -- litellm/cost_calculator.py tests/test_litellm/test_cost_calculator.pyNote: a direct local
ruff checkontests/test_litellm/test_cost_calculator.pyreports pre-existingprintstatements throughout that test file; CI lint for this repo runs on the package directory, and the touched implementation file is clean.