Studio: pricing follow-up to #5690 (longest-prefix match + chat-style usage keys) by danielhanchen · Pull Request #5722 · unslothai/unsloth

danielhanchen · 2026-05-22T16:43:02Z

Summary

Two P1 / High follow-ups from review on the merged #5690 pricing module.

Longest-prefix pricing match. Looking up a dated snapshot like gpt-5.4-mini-2026-04-23 returned the first matching prefix, which was the shorter gpt-5.4 entry. That overbilled by 3x+ on the mini family. Sort table keys longest-first so the most specific entry wins.
Accept OpenAI-Chat-style usage keys. calculate_cost only read input_tokens / output_tokens, but Studio's OpenAI-Chat-style usage envelope re-emits prompt_tokens / completion_tokens (the Chat Completions vocabulary). Callers handing in the chat-style shape silently got a zeroed bill. Accept either pair so the calculator works against both raw upstream usage and the Studio-translated envelope.

Reported by Codex + Gemini bots on #5690.

Test plan

cd studio && python -m pytest backend/tests/test_pricing.py -v (27 passing, 4 new):
- test_longest_prefix_match_wins_for_dated_mini_snapshot
- test_longest_prefix_match_wins_for_dated_pro_snapshot
- test_openai_chat_style_usage_keys_priced_correctly
- test_input_tokens_preferred_when_both_keys_present
All 23 prior pricing tests still green.

gemini-code-assist · 2026-05-22T16:43:06Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1c2c68f969

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T16:45:15Z

+    # style usage chunk Studio re-emits (``_build_usage_chunk``) uses
+    # ``prompt_tokens`` / ``completion_tokens``. Either is fine here
+    # so a caller can hand whichever envelope landed first.
+    input_tokens = int(usage.get("input_tokens") or usage.get("prompt_tokens") or 0)


Normalize Anthropic prompt_tokens before costing

When usage comes from Studio’s chat-style envelope (_build_usage_chunk), Anthropic prompt_tokens already includes cache_creation_input_tokens and cache_read_input_tokens; assigning it directly to input_tokens here makes those cached tokens billed twice (once in input_usd, again in cache_write_usd/cache_read_usd) and also inflates billable_input_tokens in the Anthropic branch. This regression appears whenever callers pass the chat-style usage shape for Anthropic instead of raw input_tokens/output_tokens.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52ba6229d6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T16:56:36Z

+    output_tokens = int(
+        usage.get("output_tokens") or usage.get("completion_tokens") or 0
+    )


Prefer output_tokens over completion_tokens when both exist

The fallback usage.get("output_tokens") or usage.get("completion_tokens") treats 0 as missing, so a payload that includes both shapes can bill the wrong output amount whenever output_tokens is explicitly zero and completion_tokens is non-zero (or stale). This commit already enforces key-precedence for input tokens, but output tokens currently do not get the same deterministic behavior, which can inflate costs for mixed-envelope callers.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4bcd8bb9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T17:17:11Z

        details = usage.get("input_tokens_details") or {}
        if isinstance(details, dict):


Read OpenAI chat cached tokens from prompt_tokens_details

When calculate_cost receives Studio’s chat-style OpenAI usage shape (prompt_tokens/completion_tokens), cached-token counts are exposed under prompt_tokens_details.cached_tokens (see _build_usage_chunk), but this branch only checks input_tokens_details. In that case cache_read stays 0, so cached input is billed at full input_usd and cache_read_usd is never discounted, which overcharges cache-heavy turns. The new test uses cache_read_input_tokens for OpenAI, a key _build_usage_chunk does not emit, so this regression is not covered.

Useful? React with 👍 / 👎.

Two P1 / High follow-ups from PR 5690 review feedback: 1. Pricing prefix lookup returned the first key it iterated, so dated snapshots like ``gpt-5.4-mini-2026-04-23`` collided with the shorter ``gpt-5.4`` entry and overbilled by 3x+. Sort the table keys longest-first so the most specific entry wins. 2. ``calculate_cost`` only read ``input_tokens`` / ``output_tokens``, but Studio's OpenAI-Chat-style usage envelope re-emits ``prompt_tokens`` / ``completion_tokens`` (the OpenAI Chat Completions vocabulary). Callers handing in the chat-style shape silently got a zeroed bill. Accept either pair so the calculator works against both raw upstream usage and the Studio-translated envelope. Tests (4 new in test_pricing.py): dated mini/pro snapshots inherit the right rate; chat-style usage keys price correctly; raw key wins when both shapes are present.

for more information, see https://pre-commit.ci

When the caller hands in Studio's chat-style envelope (``prompt_tokens`` emitted by ``_build_usage_chunk``) for Anthropic, that value already folds ``cache_creation_input_tokens`` + ``cache_read_input_tokens`` into the total. The previous follow-up accepted the chat-style key but then re-added both cache buckets in ``billable_input_tokens`` and ``input_usd``, double-counting cache tokens on every Anthropic chat-style call. Detect which envelope landed (``input_tokens`` present = raw upstream; absent + ``prompt_tokens`` present = Studio chat-style) and peel the cache buckets off for Anthropic before the downstream math so both envelopes produce identical costs. OpenAI: ``input_tokens`` and Studio's ``prompt_tokens`` both already include ``cache_read`` and exclude any notional ``cache_creation``, so the OpenAI path stays a straight passthrough. Tests (2 new): both envelopes match for Anthropic on a triple (uncached + cache_creation + cache_read); OpenAI envelopes match on a cached-tokens fixture.

for more information, see https://pre-commit.ci

Codex flagged that the previous fallback chain 'usage.get("output_tokens") or usage.get("completion_tokens")' treats an explicit 0 as missing -- a mixed-envelope payload where 'output_tokens' is 0 but 'completion_tokens' is non-zero (or stale) bills the wrong amount. Mirror the has_input_tokens precedence pattern: when the raw key is present we use it even at 0; otherwise fall back to completion_tokens.

Codex flagged that the chat-style OpenAI envelope Studio re-emits via _build_usage_chunk surfaces cached prompt tokens under prompt_tokens_details.cached_tokens, not input_tokens_details. The OpenAI branch only checked input_tokens_details, so a cache-heavy chat-style turn billed every cached token at the full input rate instead of the 0.1x cache_read discount. Walk both keys when discovering the cached count. New regression test pins that the two envelopes price identically for a turn with 80k of 100k tokens cached.

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d8bb1dc77

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T14:03:51Z

+        for key in ("input_tokens_details", "prompt_tokens_details"):
+            details = usage.get(key) or {}
+            if isinstance(details, dict):
+                cache_read = max(cache_read, int(details.get("cached_tokens") or 0))


Preserve details-key precedence for mixed OpenAI envelopes

When both usage shapes are present, this branch takes max() across input_tokens_details.cached_tokens and prompt_tokens_details.cached_tokens, so a stale chat-style value can override the authoritative raw Responses value and over-discount input cost. The same commit already enforces raw-key precedence for input_tokens/output_tokens in mixed envelopes, but cached-token details do not follow that rule, which can silently underbill whenever both detail objects are present with different counts.

Useful? React with 👍 / 👎.

Three follow-ups on the longest-prefix pricing match landed in this PR: - Prefix match now requires a dash boundary or end-of-string. The longest-key sort alone still falsely landed "claude-opus-4-15" on the "claude-opus-4-1" row, and "gpt-5.5-prod" on the "gpt-5.5-pro" row (a 6x overcharge). Demanding the next character be "-" rules out the lookalikes while keeping dated snapshots ("gpt-5.4-mini-2026-04-23", "claude-opus-4-7-20260414") landing on their canonical row. - Clamp every token count to >= 0. A corrupted upstream payload (negative cached count, off-by-one in a fixture) could previously produce a negative bill that masked real spend in the session total tooltip. - Tolerate a non-dict "cache_creation" (e.g. an upstream proxy folded the field down to a single int). The current code raised AttributeError mid-turn; now it falls back to the 5m-default bucket so the rest of the cost calculation still runs. Adds tests/test_pricing_edge.py with 20 adversarial cases covering the boundary check, negative / None / zero token values across both envelopes, cache_read > prompt corruption, the OpenAI long-context threshold crossover on cache-inflated billable input, malformed sub-objects, and unknown-provider degradation. Combined suite is 51 tests, all green.

chatgpt-codex-connector · 2026-05-24T14:08:17Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-24T14:09:12Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Two correctness gaps surfaced on the chat-style usage envelope: 1) Anthropic cache_read fell through to "uncached input" pricing when the envelope arrived without the native ``cache_read_input_tokens`` key (e.g. via a proxy that only emits the mirrored ``prompt_tokens_details.cached_tokens`` block). Studio's canonical ``_build_usage_chunk`` always sets both so production traffic was never affected, but the calculator should accept either as a defense-in-depth measure. Add a fallback to read the mirrored field when the native one is missing or zero; the native key still wins when both are present so the math stays deterministic. 2) ``_build_usage_chunk`` dropped the ``cache_creation`` 5m / 1h breakdown. Downstream ``calculate_cost`` then could not apply the 2x 1h premium and silently fell back to the 5m default, underbilling 1h cache writes by 2x on chat-style traffic. Forward the breakdown verbatim when the upstream usage carries it. Tests grow by 4 (20 -> 24): two for the prompt_tokens_details fallback (with native-precedence pin), one for the chunk shape, one for the end-to-end pricing parity check at 1h.

chatgpt-codex-connector · 2026-05-24T14:51:12Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-24T14:51:45Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

PR 5715 wires the fast-mode-2026-02-01 beta header + speed:"fast" field through to Anthropic, but the cost calculator never learnt about the matching 6x premium documented at https://platform.claude.com/docs/en/build-with-claude/fast-mode (Opus 4.7 standard $5/$25 per MTok, fast $30/$150). This adds: - ANTHROPIC_FAST_MODE_MULT = 6.0 constant. - calculate_cost(..., fast_mode=True) applies the 6x to base input AND output rates before any cache multipliers (cache mults stack on top of fast per Anthropic docs). - Provider+model gate: silently no-op on every model that is not claude-opus-4-6 / claude-opus-4-7 so a stray fast_mode=True on Sonnet/Haiku can never over-charge. - model_priced label tagged "(fast)" so the cost tooltip can surface which rate fired. - pricing_snapshot now exposes fast_mode_mult so the frontend cost panel doesn't have to hard-code 6. 7 new edge tests pin the math; existing 55 still pass.

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d459691f4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-25T13:34:13Z

+    if cache_read == 0:
+        details = usage.get("prompt_tokens_details") or {}
        if isinstance(details, dict):
-            cache_read = max(cache_read, int(details.get("cached_tokens") or 0))
+            cache_read = max(0, int(details.get("cached_tokens") or 0))


Respect explicit zero cache_read_input_tokens

The new fallback treats cache_read_input_tokens == 0 as “missing” and replaces it from prompt_tokens_details.cached_tokens, which breaks mixed-envelope Anthropic payloads when the mirrored chat-style details are stale. In that case a real zero cache-read gets reinterpreted as non-zero, inflating billable_input_tokens and cache_read_usd (and therefore total_usd) even though the authoritative native key explicitly said no cached reads.

Useful? React with 👍 / 👎.

The previous follow-up fell back to ``prompt_tokens_details.cached_tokens`` whenever the native ``cache_read_input_tokens`` was missing OR equal to 0, even though the commit message stated the native key always wins when present. A proxy that forwards a stale ``prompt_tokens_details`` block alongside an authoritative ``cache_read_input_tokens: 0`` would then inflate cache_read past the real native count, posting a false cache_read line and bumping billable_input_tokens. Switch the gate to native-key presence so an explicit zero stays authoritative; the mirror only kicks in when the native key is absent. Add a regression test pinning the explicit-zero precedence.

danielhanchen · 2026-05-25T13:40:04Z

Audit follow-up: pushed 0f3c302 fixing one real precedence bug introduced by 6a939b4.

The mirrored prompt_tokens_details.cached_tokens fallback was gated on cache_read == 0, which conflated the missing-key case with an explicit native zero. A proxy forwarding a stale details block alongside an authoritative cache_read_input_tokens: 0 therefore inflated cache_read past the native count, posting a false cache-read line and bumping billable_input_tokens. Reproed at $5.00 expected vs $5.50 billed on a 1M-input Anthropic prompt.

Switched the gate to native-key presence so explicit zero stays authoritative; mirror only kicks in when the native key is absent. Added a regression test pinning that precedence. pytest studio/backend/tests/test_pricing*.py -> 62 passed.

Other reviewer findings (negative server-tool clamp, OpenAI dual-key explicit-zero asymmetry, _build_usage_chunk not clamping, gpt-5.x-pro lacking long-context rows, bare int() on malformed strings) are pre-existing in pricing.py and not introduced by this PR's diff; leaving them out of scope here.

Rates cross-checked against platform.claude.com and developers.openai.com on 2026-05-25: Opus 4.7 $5/$25, Sonnet 4.6 $3/$15, Haiku 4.5 $1/$5, gpt-5.5 $5/$30 (272k long-context 2x/1.5x), gpt-5.4 $2.5/$15, gpt-5.4-mini $0.75/$4.50, cache 1.25x/2.0x/0.1x. Dated snapshot resolution against claude-opus-4-7-20260201 and gpt-5.5-2026-04-23 verified.

The fast_mode 6x multiplier landed in two places at once -- here (f66df7b) and on #5715 (4f1afdb) -- since both audits ran in parallel. Drop the duplicate from this branch so the change lives in its natural home (#5715, which introduces fast_mode itself); this PR stays focused on the cache-read fallback + 1h breakdown.

for more information, see https://pre-commit.ci

…5690

chatgpt-codex-connector · 2026-05-26T01:09:44Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-26T01:10:25Z

Round 4 cross-cutting fix: merged origin/main into this branch (no conflicts) to bring in PR #5735 (orphan tool_call XML strip widening + 263-line test_tool_xml_strip.py). All 8 PRs in this audit cohort had been forked off a pre-#5735 main, so a squash-merge of any of them would have silently reverted the widened _TOOL_XML_RE regex and deleted the dedicated test file. Verified: diff against origin/main now shows zero unintended changes to routes/inference.py and test_tool_xml_strip.py outside the actual PR scope.

chatgpt-codex-connector · 2026-05-26T05:55:44Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

…5690 # Conflicts: # studio/backend/core/inference/external_provider.py

chatgpt-codex-connector · 2026-05-26T06:39:50Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen requested a review from rolandtannous as a code owner May 22, 2026 16:43

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

danielhanchen and others added 7 commits May 23, 2026 14:00

[pre-commit.ci] auto fixes from pre-commit.com hooks

72211d1

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

d069466

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

8d8bb1d

for more information, see https://pre-commit.ci

danielhanchen force-pushed the fix/pricing-followup-5690 branch from cb56e28 to 8d8bb1d Compare May 23, 2026 14:00

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

[pre-commit.ci] auto fixes from pre-commit.com hooks

da82a68

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

df1b980

for more information, see https://pre-commit.ci

danielhanchen and others added 2 commits May 25, 2026 13:27

[pre-commit.ci] auto fixes from pre-commit.com hooks

8d45969

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed May 25, 2026

View reviewed changes

danielhanchen and others added 3 commits May 26, 2026 00:35

[pre-commit.ci] auto fixes from pre-commit.com hooks

e1f5597

for more information, see https://pre-commit.ci

Merge remote-tracking branch 'origin/main' into fix/pricing-followup-…

b74e6b4

…5690

Shorten pricing comments for PR #5722

32fc68f

Merge remote-tracking branch 'origin/main' into fix/pricing-followup-…

1e6310f

…5690 # Conflicts: # studio/backend/core/inference/external_provider.py

danielhanchen merged commit 542d743 into main May 26, 2026
28 of 30 checks passed

danielhanchen deleted the fix/pricing-followup-5690 branch May 26, 2026 06:40

		details = usage.get("input_tokens_details") or {}
		if isinstance(details, dict):

Uh oh!

Conversation

danielhanchen commented May 22, 2026

Summary

Test plan

Uh oh!

gemini-code-assist Bot commented May 22, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

danielhanchen commented May 26, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant