Skip to content

Studio: surface prompt-cache token counts in /v1/chat/completions usage chunk#5670

Merged
danielhanchen merged 4 commits into
mainfrom
feat/surface-prompt-cache-usage
May 22, 2026
Merged

Studio: surface prompt-cache token counts in /v1/chat/completions usage chunk#5670
danielhanchen merged 4 commits into
mainfrom
feat/surface-prompt-cache-usage

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

_stream_anthropic and _stream_openai_responses already capture cache_creation_input_tokens, cache_read_input_tokens (Anthropic) and input_tokens_details.cached_tokens (OpenAI Responses) on last_usage but only write them to the structlog stream. Browser and SDK clients had no way to see how many tokens the prompt cache absorbed, so the chat UI cannot show users their per-turn cache savings without scraping the server log.

This change emits one extra OpenAI include_usage-style chunk (choices: [] with a populated usage block) just before [DONE] for Anthropic and after the final finish_reason chunk for OpenAI Responses (both response.completed and response.incomplete).

Chunk shape

// Anthropic
{
  "id": "chatcmpl-anthropic-claude-opus-4-7",
  "object": "chat.completion.chunk",
  "choices": [],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 862,
    "total_tokens": 870,
    "prompt_tokens_details": { "cached_tokens": 18901 },
    "cache_creation_input_tokens": 1367,
    "cache_read_input_tokens": 18901
  }
}
// OpenAI Responses
{
  "id": "chatcmpl-openai-gpt-5.5",
  "object": "chat.completion.chunk",
  "choices": [],
  "usage": {
    "prompt_tokens": 5507,
    "completion_tokens": 252,
    "total_tokens": 5759,
    "prompt_tokens_details": { "cached_tokens": 4736 }
  }
}

prompt_tokens_details.cached_tokens is the normalised cross-provider key so existing OpenAI SDK callers that already read prompt_tokens_details keep working. The Anthropic-only cache_creation_input_tokens / cache_read_input_tokens keys are kept on the same usage dict for callers that already key off the native Anthropic names.

The helper returns None (suppresses the chunk) when upstream errored before any usage event arrived, so failed turns do not show a misleading "0 tokens" line.

Verification

End-to-end against a live Studio routed to api.anthropic.com and api.openai.com:

anthropic / claude-haiku-4-5
  total SSE lines: 5; usage chunks: 1
  usage: {"prompt_tokens": 14, "completion_tokens": 6, "total_tokens": 20,
          "prompt_tokens_details": {"cached_tokens": 0},
          "cache_creation_input_tokens": 0, "cache_read_input_tokens": 0}

openai / gpt-4o-mini
  total SSE lines: 6; usage chunks: 1
  usage: {"prompt_tokens": 14, "completion_tokens": 4, "total_tokens": 18,
          "prompt_tokens_details": {"cached_tokens": 0}}

A 3-turn Opus 4.7 (xhigh) run with enabled_tools=["web_search","code_execution"] returned cache_read_input_tokens=18901 on turn 2, matching the value already logged by the existing structlog line ("Anthropic stream complete ... cache_read_input_tokens=18901").

Test plan

  • 7 new unit tests in tests/test_external_provider_usage_chunk.py covering the helper alone and the two streaming integrations (completed + incomplete for OpenAI).
  • Existing 218 tests across test_anthropic_code_execution.py, test_openai_code_execution.py, test_openai_responses_translation.py, test_anthropic_messages.py, test_anthropic_thinking_translation.py, test_openai_tool_passthrough.py, test_responses_tool_passthrough.py, test_responses_api.py still pass.
  • Verified against a live Studio that the chunk appears in real SSE streams for both providers.

…ge chunk

Studio's Anthropic and OpenAI Responses proxies already capture
cache_creation_input_tokens, cache_read_input_tokens (Anthropic) and
input_tokens_details.cached_tokens (OpenAI), but they were only written
to the structlog stream. Browser and SDK clients had no way to compute
"how many tokens hit the prompt cache" without scraping the server log,
so the chat UI could not show users how much money the cache was
saving on each turn.

This change emits one extra OpenAI include_usage-style chunk
(choices: [] with a populated usage block) just before the existing
[DONE] for Anthropic and after the final finish_reason chunk for
OpenAI Responses (both response.completed and response.incomplete).
The chunk shape:

  usage.prompt_tokens_details.cached_tokens
      normalised cache-read count, present for both providers.
  usage.cache_creation_input_tokens
      Anthropic-only; tokens billed at the cache-write premium.
  usage.cache_read_input_tokens
      Anthropic-only; same value as cached_tokens, kept for callers
      that already key off the native Anthropic name.

Smoke verified end to end against a live Studio (claude-haiku-4-5
and gpt-4o-mini) plus 7 new unit tests on the helper and the two
streaming paths.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 03500d34be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +3248 to +3250
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"total_tokens": prompt_tokens + completion_tokens,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include Anthropic cache tokens in prompt/total usage counts

prompt_tokens is populated from usage.input_tokens alone, and total_tokens is derived from that value plus output_tokens; for Anthropic this undercounts cached turns because cache_creation_input_tokens and cache_read_input_tokens are separate input buckets that still belong to total prompt usage. In cache-heavy conversations this will report much smaller prompt/total numbers than actually used, so downstream context/cost displays fed by this chunk become inaccurate. Compute Anthropic prompt/total with all three input components (while still exposing the native cache fields).

Useful? React with 👍 / 👎.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to surface prompt-cache accounting to clients by emitting OpenAI-style usage SSE chunks during streaming for both Anthropic and OpenAI providers. A new helper function, _build_usage_chunk, was implemented and integrated into the streaming response paths, supported by a comprehensive suite of unit and integration tests. Feedback was provided regarding logic duplication within the _build_usage_chunk function, suggesting a refactor to extract common token fields and improve maintainability.

Comment on lines +3234 to +3263
Returns ``None`` when there are no usage numbers to report (e.g. an
upstream error before ``message_start`` / ``response.completed``).
"""
if not isinstance(last_usage, dict):
return None

if provider == "anthropic":
prompt_tokens = last_usage.get("input_tokens") or 0
completion_tokens = last_usage.get("output_tokens") or 0
cache_creation = last_usage.get("cache_creation_input_tokens") or 0
cache_read = last_usage.get("cache_read_input_tokens") or 0
if not (prompt_tokens or completion_tokens or cache_creation or cache_read):
return None
usage_block: dict[str, Any] = {
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"total_tokens": prompt_tokens + completion_tokens,
"prompt_tokens_details": {"cached_tokens": cache_read},
"cache_creation_input_tokens": cache_creation,
"cache_read_input_tokens": cache_read,
}
else:
prompt_tokens = last_usage.get("input_tokens") or 0
completion_tokens = last_usage.get("output_tokens") or 0
cached = 0
details = last_usage.get("input_tokens_details")
if isinstance(details, dict):
cached = details.get("cached_tokens") or 0
if not (prompt_tokens or completion_tokens or cached):
return None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _build_usage_chunk function contains significant logic duplication between the anthropic and openai branches. Specifically, the extraction of prompt_tokens and completion_tokens, as well as the calculation of total_tokens, are identical. Refactoring this to extract common fields first would improve maintainability and reduce the risk of future inconsistencies.

    prompt_tokens = last_usage.get("input_tokens") or 0
    completion_tokens = last_usage.get("output_tokens") or 0

    usage_block: dict[str, Any] = {
        "prompt_tokens": prompt_tokens,
        "completion_tokens": completion_tokens,
        "total_tokens": prompt_tokens + completion_tokens,
    }

    if provider == "anthropic":
        cache_creation = last_usage.get("cache_creation_input_tokens") or 0
        cache_read = last_usage.get("cache_read_input_tokens") or 0
        if not (prompt_tokens or completion_tokens or cache_creation or cache_read):
            return None
        usage_block.update({
            "prompt_tokens_details": {"cached_tokens": cache_read},
            "cache_creation_input_tokens": cache_creation,
            "cache_read_input_tokens": cache_read,
        })
    else:
        cached = 0
        details = last_usage.get("input_tokens_details")
        if isinstance(details, dict):
            cached = details.get("cached_tokens") or 0
        if not (prompt_tokens or completion_tokens or cached):
            return None
        usage_block["prompt_tokens_details"] = {"cached_tokens": cached}
References
  1. When a condition or calculated value is used across multiple conditional branches, compute it once and reuse the result to ensure consistency and improve maintainability.

Anthropic's `input_tokens` field excludes the cache buckets -- the
real prompt size is `input_tokens + cache_creation_input_tokens +
cache_read_input_tokens`. Previously the new usage chunk reported
only `input_tokens` as `prompt_tokens`, which heavily undercounted
cache-hit turns (e.g. an 18.9k-token cache_read turn looked like an
8-token prompt) and broke any downstream context / cost display fed
by `prompt_tokens` or `total_tokens`.

Fix `_build_usage_chunk` to sum all three input buckets for the
Anthropic provider while keeping the OpenAI Responses path unchanged
(OpenAI already folds cached tokens into `input_tokens`). The native
`cache_creation_input_tokens` / `cache_read_input_tokens` keys and
`prompt_tokens_details.cached_tokens` mirror are still emitted, so
clients keep full visibility of the cache split.

Tests updated to assert the summed shape.
@danielhanchen danielhanchen merged commit ba9405b into main May 22, 2026
32 checks passed
@danielhanchen danielhanchen deleted the feat/surface-prompt-cache-usage branch May 22, 2026 13:02
danielhanchen added a commit to Imagineer99/unsloth that referenced this pull request May 22, 2026
…unslothai#5690)

* Studio: per-session cost calculator + /api/providers/pricing endpoint

Neither the Anthropic Messages API nor the OpenAI Responses API
reports a `cost` field on the response. Both expose detailed token
counts (input, output, cache hits, server-tool invocations); pricing
multipliers live in the provider docs. The frontend's "cost so far"
display was impossible without scraping the server log.

Land the math + a snapshot endpoint so the cost calculator can run
client-side from the existing usage chunk plumbing. The actual UI
hookup belongs in a frontend follow-up (and is gated on PR unslothai#5670's
usage-chunk emission landing so the frontend sees the usage block
in the first place).

Changes:

- New `core/inference/pricing.py` with:
  - Per-MTok base pricing tables for every active Anthropic and
    gpt-5.x family member. Dated snapshots inherit the canonical-id
    price via prefix match so future snapshots cost the same as the
    canonical id until pricing changes.
  - Shared multipliers for Anthropic cache writes (5m: 1.25x, 1h: 2x)
    and reads (0.1x); OpenAI cache reads (0.1x); Anthropic server
    tool surcharges ($10 / 1k web_search, $0.05 / hour code_exec
    beyond the 50-hour daily free tier).
  - `calculate_cost(provider, model, usage)` returns a per-turn USD
    breakdown plus billable token counts, with priced=False for
    unknown models so the UI can still render token counts.
  - `pricing_snapshot()` returns the whole table for the frontend
    so it doesn't re-implement the multipliers.
- New `GET /api/providers/pricing` returning the snapshot, scoped
  behind the existing auth dependency.
- New `backend/tests/test_pricing.py` with 12 cases pinning the
  math against documented values: base input/output multiplication,
  5m / 1h / read multipliers, default-to-5m fallback when the
  breakdown is absent, web_search per-1k pricing, code_execution
  per-hour pricing, dated-snapshot fallback, OpenAI cache-read
  discount accounting (cached tokens subtracted from full-price
  bucket and re-billed at 0.1x), unknown model graceful-degrade,
  and the snapshot endpoint shape.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: verified OpenAI pricing + fix billable input double-count

Address the cost-calculator review:

- OpenAI prices were 2-6x under the actual published rates.
  Cross-checked the live developers.openai.com/api/docs/pricing page
  and replaced every entry. gpt-5.5 is 5/30, gpt-5.5-pro is 30/180,
  gpt-5.4 is 2.5/15, gpt-5.4-mini 0.75/4.5, gpt-5.4-nano 0.20/1.25,
  gpt-5.3-codex 1.75/14. Added chat-latest alias to the canonical
  chat-snapshot rate. Dropped o3 / o4 / gpt-4.5 rows that are no
  longer listed on the page; calculator returns priced=False instead
  of silently billing at zero.

- billable_input_tokens was double-counting cached tokens for
  OpenAI. Anthropic excludes cache_* buckets from input_tokens so
  we add them; OpenAI folds cache_read_input_tokens into
  input_tokens already, so the tooltip read 1.8M for a 1.0M bill.
  Branched the math by provider and added a regression test.

Sourcing notes in the module docstring updated.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: canonical 4.5 ids, long-context tier, OpenAI tool fees

Three Codex P1 follow-ups on the cost calculator:

1. Canonical Anthropic 4.5 ids missing from ANTHROPIC_PRICING.
   claude-opus-4-5 / claude-sonnet-4-5 / claude-haiku-4-5 (no date
   suffix) are the ids used by backend defaults
   (PROVIDER_REGISTRY['anthropic'].default_models), but the table
   only had the dated forms. _lookup's prefix fallback doesn't help
   because the canonical id is SHORTER than the dated key, so
   str.startswith goes the wrong way and the calculator returned
   priced=False + zero cost. Added the canonical aliases for
   opus-4-5, sonnet-4-5, haiku-4-5, and opus-4-1.

2. OpenAI long-context tier. gpt-5.5 and gpt-5.4 cross over at
   272k input tokens to a 2x input / 1.5x output rate (gpt-5.5:
   $5/$30 -> $10/$45; gpt-5.4: $2.50/$15 -> $5/$22.50). Turns past
   the threshold were systematically undercounted at headline
   rates. Added long_context_threshold / long_context_input_per_mtok /
   long_context_output_per_mtok columns and a tier-selection step
   in calculate_cost; model_priced gains a "(long-context >272000)"
   suffix when the higher tier applies so the tooltip can show
   which rate was used. gpt-5.5-pro / gpt-5.4-pro / mini / nano /
   codex have no published long-context tier today, so they keep a
   single rate.

3. OpenAI server-tool surcharges. web_search is $10/1000 calls and
   the hosted shell container is $0.03 per 20-minute session on the
   default 1g tier (~$0.09/hr). server_tools_usd was previously
   stuck at 0.0 for OpenAI even when web_search and shell tools
   fired, so sessions with tool use understated cost. Added
   OPENAI_WEB_SEARCH_USD_PER_1K and OPENAI_CONTAINER_USD_PER_HOUR
   constants plus a parallel of the Anthropic surcharge block that
   reads counts from usage["openai_tool_use"]. The SSE translator
   wires the counts in a follow-up commit; the calculator is now
   ready for them. pricing_snapshot also exposes both constants so
   the frontend tooltip can render the per-call rate.

Existing tests updated to stay in the short-context tier where they
were testing base rates; new tests pin canonical 4.5 lookups,
long-context crossover on gpt-5.5/gpt-5.4, the absence of crossover
on mini/nano/codex, and OpenAI tool surcharges (web_search,
container hours, combined total).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
rsd-darshan pushed a commit to rsd-darshan/unsloth that referenced this pull request Jun 3, 2026
…ge chunk (unslothai#5670)

* Studio: surface prompt-cache token counts in /v1/chat/completions usage chunk

Studio's Anthropic and OpenAI Responses proxies already capture
cache_creation_input_tokens, cache_read_input_tokens (Anthropic) and
input_tokens_details.cached_tokens (OpenAI), but they were only written
to the structlog stream. Browser and SDK clients had no way to compute
"how many tokens hit the prompt cache" without scraping the server log,
so the chat UI could not show users how much money the cache was
saving on each turn.

This change emits one extra OpenAI include_usage-style chunk
(choices: [] with a populated usage block) just before the existing
[DONE] for Anthropic and after the final finish_reason chunk for
OpenAI Responses (both response.completed and response.incomplete).
The chunk shape:

  usage.prompt_tokens_details.cached_tokens
      normalised cache-read count, present for both providers.
  usage.cache_creation_input_tokens
      Anthropic-only; tokens billed at the cache-write premium.
  usage.cache_read_input_tokens
      Anthropic-only; same value as cached_tokens, kept for callers
      that already key off the native Anthropic name.

Smoke verified end to end against a live Studio (claude-haiku-4-5
and gpt-4o-mini) plus 7 new unit tests on the helper and the two
streaming paths.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Anthropic: include cache buckets in prompt_tokens / total_tokens

Anthropic's `input_tokens` field excludes the cache buckets -- the
real prompt size is `input_tokens + cache_creation_input_tokens +
cache_read_input_tokens`. Previously the new usage chunk reported
only `input_tokens` as `prompt_tokens`, which heavily undercounted
cache-hit turns (e.g. an 18.9k-token cache_read turn looked like an
8-token prompt) and broke any downstream context / cost display fed
by `prompt_tokens` or `total_tokens`.

Fix `_build_usage_chunk` to sum all three input buckets for the
Anthropic provider while keeping the OpenAI Responses path unchanged
(OpenAI already folds cached tokens into `input_tokens`). The native
`cache_creation_input_tokens` / `cache_read_input_tokens` keys and
`prompt_tokens_details.cached_tokens` mirror are still emitted, so
clients keep full visibility of the cache split.

Tests updated to assert the summed shape.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
rsd-darshan pushed a commit to rsd-darshan/unsloth that referenced this pull request Jun 3, 2026
…unslothai#5690)

* Studio: per-session cost calculator + /api/providers/pricing endpoint

Neither the Anthropic Messages API nor the OpenAI Responses API
reports a `cost` field on the response. Both expose detailed token
counts (input, output, cache hits, server-tool invocations); pricing
multipliers live in the provider docs. The frontend's "cost so far"
display was impossible without scraping the server log.

Land the math + a snapshot endpoint so the cost calculator can run
client-side from the existing usage chunk plumbing. The actual UI
hookup belongs in a frontend follow-up (and is gated on PR unslothai#5670's
usage-chunk emission landing so the frontend sees the usage block
in the first place).

Changes:

- New `core/inference/pricing.py` with:
  - Per-MTok base pricing tables for every active Anthropic and
    gpt-5.x family member. Dated snapshots inherit the canonical-id
    price via prefix match so future snapshots cost the same as the
    canonical id until pricing changes.
  - Shared multipliers for Anthropic cache writes (5m: 1.25x, 1h: 2x)
    and reads (0.1x); OpenAI cache reads (0.1x); Anthropic server
    tool surcharges ($10 / 1k web_search, $0.05 / hour code_exec
    beyond the 50-hour daily free tier).
  - `calculate_cost(provider, model, usage)` returns a per-turn USD
    breakdown plus billable token counts, with priced=False for
    unknown models so the UI can still render token counts.
  - `pricing_snapshot()` returns the whole table for the frontend
    so it doesn't re-implement the multipliers.
- New `GET /api/providers/pricing` returning the snapshot, scoped
  behind the existing auth dependency.
- New `backend/tests/test_pricing.py` with 12 cases pinning the
  math against documented values: base input/output multiplication,
  5m / 1h / read multipliers, default-to-5m fallback when the
  breakdown is absent, web_search per-1k pricing, code_execution
  per-hour pricing, dated-snapshot fallback, OpenAI cache-read
  discount accounting (cached tokens subtracted from full-price
  bucket and re-billed at 0.1x), unknown model graceful-degrade,
  and the snapshot endpoint shape.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: verified OpenAI pricing + fix billable input double-count

Address the cost-calculator review:

- OpenAI prices were 2-6x under the actual published rates.
  Cross-checked the live developers.openai.com/api/docs/pricing page
  and replaced every entry. gpt-5.5 is 5/30, gpt-5.5-pro is 30/180,
  gpt-5.4 is 2.5/15, gpt-5.4-mini 0.75/4.5, gpt-5.4-nano 0.20/1.25,
  gpt-5.3-codex 1.75/14. Added chat-latest alias to the canonical
  chat-snapshot rate. Dropped o3 / o4 / gpt-4.5 rows that are no
  longer listed on the page; calculator returns priced=False instead
  of silently billing at zero.

- billable_input_tokens was double-counting cached tokens for
  OpenAI. Anthropic excludes cache_* buckets from input_tokens so
  we add them; OpenAI folds cache_read_input_tokens into
  input_tokens already, so the tooltip read 1.8M for a 1.0M bill.
  Branched the math by provider and added a regression test.

Sourcing notes in the module docstring updated.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: canonical 4.5 ids, long-context tier, OpenAI tool fees

Three Codex P1 follow-ups on the cost calculator:

1. Canonical Anthropic 4.5 ids missing from ANTHROPIC_PRICING.
   claude-opus-4-5 / claude-sonnet-4-5 / claude-haiku-4-5 (no date
   suffix) are the ids used by backend defaults
   (PROVIDER_REGISTRY['anthropic'].default_models), but the table
   only had the dated forms. _lookup's prefix fallback doesn't help
   because the canonical id is SHORTER than the dated key, so
   str.startswith goes the wrong way and the calculator returned
   priced=False + zero cost. Added the canonical aliases for
   opus-4-5, sonnet-4-5, haiku-4-5, and opus-4-1.

2. OpenAI long-context tier. gpt-5.5 and gpt-5.4 cross over at
   272k input tokens to a 2x input / 1.5x output rate (gpt-5.5:
   $5/$30 -> $10/$45; gpt-5.4: $2.50/$15 -> $5/$22.50). Turns past
   the threshold were systematically undercounted at headline
   rates. Added long_context_threshold / long_context_input_per_mtok /
   long_context_output_per_mtok columns and a tier-selection step
   in calculate_cost; model_priced gains a "(long-context >272000)"
   suffix when the higher tier applies so the tooltip can show
   which rate was used. gpt-5.5-pro / gpt-5.4-pro / mini / nano /
   codex have no published long-context tier today, so they keep a
   single rate.

3. OpenAI server-tool surcharges. web_search is $10/1000 calls and
   the hosted shell container is $0.03 per 20-minute session on the
   default 1g tier (~$0.09/hr). server_tools_usd was previously
   stuck at 0.0 for OpenAI even when web_search and shell tools
   fired, so sessions with tool use understated cost. Added
   OPENAI_WEB_SEARCH_USD_PER_1K and OPENAI_CONTAINER_USD_PER_HOUR
   constants plus a parallel of the Anthropic surcharge block that
   reads counts from usage["openai_tool_use"]. The SSE translator
   wires the counts in a follow-up commit; the calculator is now
   ready for them. pricing_snapshot also exposes both constants so
   the frontend tooltip can render the per-call rate.

Existing tests updated to stay in the short-context tier where they
were testing base rates; new tests pin canonical 4.5 lookups,
long-context crossover on gpt-5.5/gpt-5.4, the absence of crossover
on mini/nano/codex, and OpenAI tool surcharges (web_search,
container hours, combined total).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant