Skip to content

Studio: Anthropic fast_mode toggle and streaming refusal handling#5715

Merged
danielhanchen merged 10 commits into
mainfrom
feat/anthropic-fast-mode-refusals
May 26, 2026
Merged

Studio: Anthropic fast_mode toggle and streaming refusal handling#5715
danielhanchen merged 10 commits into
mainfrom
feat/anthropic-fast-mode-refusals

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

  • Fast mode (beta). Adds a Configuration toggle that flips on speed: \"fast\" and the fast-mode-2026-02-01 beta header for Claude Opus 4.6 / 4.7. Up to 2.5x higher output tokens per second at 6x standard Opus pricing per the upstream docs. Hidden on every other model + provider so the picker never offers a knob the upstream API would 400 on; backend silently drops the flag as a second line of defence.
  • Streaming refusal notice. When Anthropic's safety classifier truncates a stream with message_delta.delta.stop_reason = \"refusal\" on Claude 4 models, the assistant message now ends with a short notice explaining what happened, followed by the existing OpenAI-spec finish_reason = \"content_filter\". Previously the chat bubble truncated silently with nothing to tell the user why the response stopped.

References:

Test plan

  • cd studio && python -m pytest backend/tests/test_anthropic_fast_mode_and_refusal.py -v -- 8 new tests, all passing.
    • fast_mode pass-through on Opus 4.6 + 4.7 (body field + beta header).
    • silent drop on Sonnet 4.6, Haiku 4.5, older Opus, False, None.
    • refusal stream emits the user-facing notice plus content_filter finish reason while preserving the partial assistant text.
  • cd studio/frontend && npx tsc -b --pretty false clean.
  • Confirmed the toggle is hidden when an OpenAI / OpenRouter / Kimi / local model is selected (capability gate keyed on Anthropic + Opus 4.6/4.7 model prefixes).
  • Manual smoke test against claude-opus-4-7 with fast mode on -- defer to QA once Anthropic surfaces the per-key fast-mode rate limit (waitlist gate).

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Anthropic's fast-mode beta for Claude Opus 4.6 and 4.7, including backend request handling for the speed parameter and beta headers, frontend UI controls, and improved feedback for streaming refusals. Feedback was provided regarding the implementation of model support checks; the reviewer noted that model discovery should be handled dynamically via the Anthropic models endpoint rather than using a static list of model prefixes to maintain consistency with repository standards.

Comment on lines +1647 to +1649
fast_mode_active = bool(fast_mode) and _anthropic_supports_fast_mode(model)
if fast_mode_active:
body["speed"] = "fast"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for activating fast_mode correctly implements the 'silent drop' policy for unsupported models. By checking both the user toggle and model support before modifying the request body, it prevents potential 400 errors from the upstream API while maintaining a clean user experience. Please verify that model support is determined dynamically via the Anthropic /v1/models endpoint rather than a static list, to ensure consistency with repository standards for model discovery.

References
  1. Anthropic provides a /v1/models endpoint for model discovery; do not implement special cases or static lists for Anthropic under the assumption that the standard models endpoint is unsupported.

Fast mode (beta `fast-mode-2026-02-01`) lets Claude Opus 4.6 and 4.7
generate output tokens up to 2.5x faster at 6x standard Opus
pricing. The toggle lives in Configuration → Provider when the
selected Anthropic model is Opus 4.6 or 4.7 and is otherwise
hidden. Backend gates the same prefixes a second time so a stale
frontend cannot make Anthropic 400 the request, and the
`fast-mode-2026-02-01` beta header is merged onto whatever other
betas the request already needed (code-execution, compaction).

Streaming refusals (`message_delta.delta.stop_reason="refusal"` on
Claude 4 models) now surface a short user-facing notice in the
assistant message before the translated OpenAI chunk emits the
existing `finish_reason="content_filter"`. Previously the chat
bubble truncated silently because the SSE stopped mid-stream with
no visible explanation. Per the upstream docs the conversation
must be reset before continuing, so the notice tells the user
exactly that.

Reference:
- https://platform.claude.com/docs/en/build-with-claude/fast-mode
- https://platform.claude.com/docs/en/test-and-evaluate/strengthen-guardrails/handle-streaming-refusals

Tests:
- studio/backend/tests/test_anthropic_fast_mode_and_refusal.py (8 cases
  pinning fast_mode pass-through on 4.6/4.7, silent drop on Sonnet /
  Haiku / older Opus / None / False, and the refusal notice + finish
  reason on a synthetic refusal stream).
Anthropic's streaming-refusal guidance says the refused assistant
turn must be removed or updated before the next call -- otherwise
the safety classifier keeps refusing. The PR only added a
user-visible notice; the partial assistant output (plus the notice
itself) still rode the next request via toOpenAIMessage.

Tag the refusal turn with an HTML-comment sentinel emitted alongside
the notice. The chat-adapter checks for that sentinel in
toOpenAIMessage and returns null, so the refused turn is excluded
from outboundMessages. The notice still renders in the transcript
(HTML comments don't display), so users keep the explanation.
test_refusal_maps_to_content_filter expects only ['content_filter']
in the finish_reasons list, but the post-PR refusal path emits a
user-visible content notice chunk first. Every _content_chunk
carries 'finish_reason: None' by construction; the helper was
appending those, so the assertion saw [None, 'content_filter']
instead of ['content_filter'].

None is not a finish reason -- it's just mid-stream delta noise.
Skip None values in _finish_reasons so the helper reflects what
the test names actually claim to check. Same fix applies cleanly
to the other helper usages (pause_turn test expects [] and the
sibling stop test expects ['stop'], both unaffected).
@danielhanchen danielhanchen force-pushed the feat/anthropic-fast-mode-refusals branch from c362bbd to aae360f Compare May 23, 2026 14:00
danielhanchen and others added 4 commits May 24, 2026 14:00
Adds 19 cases on top of the 9 in test_anthropic_fast_mode_and_refusal.
The base file pins the happy path; this file fills in the cliffs:

* Dated-snapshot prefix matching: claude-opus-4-7-2026-02-01 and
  claude-opus-4-6-2026-02-01 still gate fast_mode through, while
  claude-opus-4-5-2025-08-01 and claude-sonnet-4-6-2026-02-01 do not.
* Strict opt-in: a future claude-opus-4-8 or claude-opus-5 does NOT
  auto-enable fast_mode -- the prefix tuple must be bumped explicitly
  when a new family is whitelisted upstream.
* Beta-header merge: fast_mode coexists with code-execution-2025-08-25
  and compact-2026-01-12 in one comma-separated anthropic-beta header
  with no duplicates and no truncation. Pins the value to the exact
  fast-mode-2026-02-01 docs token so a typo would fail CI.
* Non-destruction: fast_mode=None produces byte-identical outbound
  body and headers to the version that omits the argument entirely.
  Same for fast_mode=False. Guarantees the upgrade path is
  non-breaking on existing Anthropic streams.
* Refusal stream ordering: the user-visible notice precedes the
  finish_reason chunk so a streaming UI paints text before flipping
  to content_filter. Refusal sentinel emitted exactly once. Notice
  rides a normal content delta chunk with finish_reason still null.
  Partial assistant deltas survive before the notice.
* Provider-side refusal coverage: a refusal on Sonnet (not just Opus)
  still emits the notice + sentinel + content_filter mapping, since
  refusal handling is not gated on fast-mode capability.
Two follow-ups on #5715:

1) sanitizeInferenceParams stripped fastMode. fastMode is in
   PERSISTED_INFERENCE_PARAM_KEYS but the storage sanitizer only kept
   numeric fields plus systemPrompt and trustRemoteCode, so the new
   toggle was silently dropped on reload and on the
   /api/chat/settings round-trip. Save it the same way trustRemoteCode
   is saved.

2) Refusal recovery now also drops the triggering user turn.
   Returning null from toOpenAIMessage on the assistant side left the
   user prompt that caused the refusal in the outbound history, so
   the very next request would re-trigger the same classifier.
   Anthropic's refusal-handling guidance is explicit on this: remove
   the refused turn AND the user message that triggered it before
   the next call. Implemented via a pre-pass that pops the trailing
   user message when an assistant carries the refusal sentinel.

Typecheck clean.
…ixes

The text sentinel for the Anthropic refusal drop signal was spoofable:
any assistant message containing the literal
<!--studio:anthropic-refusal--> would prune the prior user + assistant
pair on the next request. Move the signal onto a separate _toolEvent
chunk that the chat adapter latches into
assistant.metadata.custom.anthropicRefusal; assistant text can no
longer control the pruner.

Tighten the fast-mode model gate (backend + frontend) to require a "-"
family boundary so claude-opus-4-70 / claude-opus-4-7b style IDs do
not get speed: "fast" on a naive startswith match.

Use survivingMessages for the image / audio attachment scan so a
refused user turn does not gate or mis-attribute the next non-refused
turn.

Propagate Anthropic usage.speed onto the OpenAI-style usage chunk and
apply the documented 6x fast-mode multiplier in the cost calculator
(stacks with prompt-cache multipliers per the docs); expose the new
multiplier on the pricing snapshot for the UI tooltip.

Tests cover the tool-event chunk shape, the prefix-collision rejects,
usage.speed propagation, the 6x pricing math, and that the visible
refusal text carries no embedded sentinel.
@danielhanchen

Copy link
Copy Markdown
Member Author

Round 3 audit (10-parallel reviewer + manual cross-check against Anthropic docs).

Verified the round-2 surface against platform.claude.com/docs/en/build-with-claude/fast-mode and .../strengthen-guardrails/handle-streaming-refusals: header (fast-mode-2026-02-01), body field (speed: "fast"), models (claude-opus-4-7, claude-opus-4-6), 6x pricing, 2.5x OTPS, Priority-Tier / Batch incompatibility, and message_delta.delta.stop_reason == "refusal" semantics all match the diff. The full 256 + 531 fuzz pass at the round-2 follow-up was clean, and 28/28 fast-mode + refusal tests still pass on this branch.

Pushed 4f1afdb5 on top of 41531cdb. The fixes address real findings the reviewer surfaced:

  • The text sentinel <!--studio:anthropic-refusal--> was spoofable. Any assistant message echoing that literal (a user can ask for it explicitly) would prune the prior user + assistant pair on the next request. Moved the drop signal to an out-of-band _toolEvent envelope ({"type": "anthropic_refusal"}). The adapter latches it into assistant.metadata.custom.anthropicRefusal; assistant text can no longer control the pruner.
  • Tightened the fast-mode prefix gate (backend + frontend) to require a - family boundary so hypothetical IDs like claude-opus-4-70 / claude-opus-4-7b do not match. Pinned with three new edge tests.
  • Switched the image / audio attachment scan in chat-adapter.ts to survivingMessages so a refused user turn cannot gate or mis-attribute the next non-refused turn.
  • Propagated Anthropic's usage.speed field onto the OpenAI-style usage chunk so clients can verify a fast-mode request actually ran fast.
  • Added ANTHROPIC_FAST_MODE_MULT = 6.0 and applied it to base + output rates in calculate_cost (stacks with cache multipliers per the docs). Exposed on pricing_snapshot.

Test deltas:

  • test_anthropic_fast_mode_and_refusal.py: renamed sentinel test to assert the new _toolEvent chunk and verify the visible refusal text carries no studio:anthropic-refusal marker.
  • test_anthropic_fast_mode_edge.py: new test_refusal_tool_event_chunk_shape, test_fast_mode_rejects_prefix_collision_4_70 / 4_7b / 4_60, and test_usage_speed_propagates_to_final_usage_chunk_{fast,standard} + _absent_when_anthropic_does_not_report.
  • test_pricing.py: new test_anthropic_fast_mode_charges_6x_standard_opus, _does_not_affect_standard_speed, _stacks_with_cache_read_multiplier, and snapshot assertion for fast_mode_mult.

pytest tests/test_anthropic_fast_mode_and_refusal.py tests/test_anthropic_fast_mode_edge.py tests/test_anthropic_web_fetch.py tests/test_pricing.py is green (71 passed). npm run typecheck is green.

Reviewer ranked P3 cosmetic findings (async test cleanup warnings on Python 3.13) intentionally left out of this round; tests pass, warnings are unraisable post-collection and do not affect CI.

@danielhanchen

Copy link
Copy Markdown
Member Author

Round 4 cross-cutting fix: merged origin/main into this branch (no conflicts) to bring in PR #5735 (orphan tool_call XML strip widening + 263-line test_tool_xml_strip.py). All 8 PRs in this audit cohort had been forked off a pre-#5735 main, so a squash-merge of any of them would have silently reverted the widened _TOOL_XML_RE regex and deleted the dedicated test file. Verified: diff against origin/main now shows zero unintended changes to routes/inference.py and test_tool_xml_strip.py outside the actual PR scope.

rhsCZ pushed a commit to rhsCZ/unsloth that referenced this pull request May 26, 2026
The fast_mode 6x multiplier landed in two places at once -- here
(f66df7b) and on unslothai#5715 (4f1afdb) -- since both audits ran in
parallel. Drop the duplicate from this branch so the change lives
in its natural home (unslothai#5715, which introduces fast_mode itself);
this PR stays focused on the cache-read fallback + 1h breakdown.
@danielhanchen danielhanchen merged commit 7d1b680 into main May 26, 2026
34 checks passed
@danielhanchen danielhanchen deleted the feat/anthropic-fast-mode-refusals branch May 26, 2026 06:37
stophobia pushed a commit to stophobia/unsloth that referenced this pull request May 26, 2026
…hat-style usage keys) (unslothai#5722)

* Studio: longest-prefix pricing match + accept chat-style usage keys

Two P1 / High follow-ups from PR 5690 review feedback:

1. Pricing prefix lookup returned the first key it iterated, so
   dated snapshots like ``gpt-5.4-mini-2026-04-23`` collided with
   the shorter ``gpt-5.4`` entry and overbilled by 3x+. Sort the
   table keys longest-first so the most specific entry wins.

2. ``calculate_cost`` only read ``input_tokens`` / ``output_tokens``,
   but Studio's OpenAI-Chat-style usage envelope re-emits
   ``prompt_tokens`` / ``completion_tokens`` (the OpenAI Chat
   Completions vocabulary). Callers handing in the chat-style
   shape silently got a zeroed bill. Accept either pair so the
   calculator works against both raw upstream usage and the
   Studio-translated envelope.

Tests (4 new in test_pricing.py): dated mini/pro snapshots inherit
the right rate; chat-style usage keys price correctly; raw key wins
when both shapes are present.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: dedupe cache buckets when costing chat-style Anthropic usage

When the caller hands in Studio's chat-style envelope (``prompt_tokens``
emitted by ``_build_usage_chunk``) for Anthropic, that value already
folds ``cache_creation_input_tokens`` + ``cache_read_input_tokens`` into
the total. The previous follow-up accepted the chat-style key but then
re-added both cache buckets in ``billable_input_tokens`` and ``input_usd``,
double-counting cache tokens on every Anthropic chat-style call.

Detect which envelope landed (``input_tokens`` present = raw upstream;
absent + ``prompt_tokens`` present = Studio chat-style) and peel the
cache buckets off for Anthropic before the downstream math so both
envelopes produce identical costs.

OpenAI: ``input_tokens`` and Studio's ``prompt_tokens`` both already
include ``cache_read`` and exclude any notional ``cache_creation``, so
the OpenAI path stays a straight passthrough.

Tests (2 new): both envelopes match for Anthropic on a triple
(uncached + cache_creation + cache_read); OpenAI envelopes match on a
cached-tokens fixture.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: prefer raw output_tokens over chat-style completion_tokens

Codex flagged that the previous fallback chain
'usage.get("output_tokens") or usage.get("completion_tokens")'
treats an explicit 0 as missing -- a mixed-envelope payload where
'output_tokens' is 0 but 'completion_tokens' is non-zero (or
stale) bills the wrong amount. Mirror the has_input_tokens
precedence pattern: when the raw key is present we use it even at
0; otherwise fall back to completion_tokens.

* Studio: read OpenAI cached tokens from prompt_tokens_details too

Codex flagged that the chat-style OpenAI envelope Studio re-emits
via _build_usage_chunk surfaces cached prompt tokens under
prompt_tokens_details.cached_tokens, not input_tokens_details. The
OpenAI branch only checked input_tokens_details, so a cache-heavy
chat-style turn billed every cached token at the full input rate
instead of the 0.1x cache_read discount.

Walk both keys when discovering the cached count. New regression
test pins that the two envelopes price identically for a turn with
80k of 100k tokens cached.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: tighten pricing prefix match + clamp corrupt usage

Three follow-ups on the longest-prefix pricing match landed in this PR:

- Prefix match now requires a dash boundary or end-of-string. The
  longest-key sort alone still falsely landed "claude-opus-4-15" on
  the "claude-opus-4-1" row, and "gpt-5.5-prod" on the "gpt-5.5-pro"
  row (a 6x overcharge). Demanding the next character be "-" rules
  out the lookalikes while keeping dated snapshots
  ("gpt-5.4-mini-2026-04-23", "claude-opus-4-7-20260414") landing on
  their canonical row.
- Clamp every token count to >= 0. A corrupted upstream payload
  (negative cached count, off-by-one in a fixture) could previously
  produce a negative bill that masked real spend in the session
  total tooltip.
- Tolerate a non-dict "cache_creation" (e.g. an upstream proxy
  folded the field down to a single int). The current code raised
  AttributeError mid-turn; now it falls back to the 5m-default
  bucket so the rest of the cost calculation still runs.

Adds tests/test_pricing_edge.py with 20 adversarial cases covering
the boundary check, negative / None / zero token values across both
envelopes, cache_read > prompt corruption, the OpenAI long-context
threshold crossover on cache-inflated billable input, malformed
sub-objects, and unknown-provider degradation. Combined suite is
51 tests, all green.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Surface Anthropic cache-read fallback and forward 1h breakdown

Two correctness gaps surfaced on the chat-style usage envelope:

1) Anthropic cache_read fell through to "uncached input" pricing when
   the envelope arrived without the native ``cache_read_input_tokens``
   key (e.g. via a proxy that only emits the mirrored
   ``prompt_tokens_details.cached_tokens`` block). Studio's canonical
   ``_build_usage_chunk`` always sets both so production traffic was
   never affected, but the calculator should accept either as a
   defense-in-depth measure. Add a fallback to read the mirrored
   field when the native one is missing or zero; the native key still
   wins when both are present so the math stays deterministic.

2) ``_build_usage_chunk`` dropped the ``cache_creation`` 5m / 1h
   breakdown. Downstream ``calculate_cost`` then could not apply the
   2x 1h premium and silently fell back to the 5m default,
   underbilling 1h cache writes by 2x on chat-style traffic. Forward
   the breakdown verbatim when the upstream usage carries it.

Tests grow by 4 (20 -> 24): two for the prompt_tokens_details
fallback (with native-precedence pin), one for the chunk shape, one
for the end-to-end pricing parity check at 1h.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add Anthropic fast_mode pricing multiplier

PR 5715 wires the fast-mode-2026-02-01 beta header + speed:"fast"
field through to Anthropic, but the cost calculator never learnt
about the matching 6x premium documented at
https://platform.claude.com/docs/en/build-with-claude/fast-mode
(Opus 4.7 standard $5/$25 per MTok, fast $30/$150).

This adds:
- ANTHROPIC_FAST_MODE_MULT = 6.0 constant.
- calculate_cost(..., fast_mode=True) applies the 6x to base input
  AND output rates before any cache multipliers (cache mults stack
  on top of fast per Anthropic docs).
- Provider+model gate: silently no-op on every model that is not
  claude-opus-4-6 / claude-opus-4-7 so a stray fast_mode=True on
  Sonnet/Haiku can never over-charge.
- model_priced label tagged "(fast)" so the cost tooltip can
  surface which rate fired.
- pricing_snapshot now exposes fast_mode_mult so the frontend cost
  panel doesn't have to hard-code 6.

7 new edge tests pin the math; existing 55 still pass.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Honor explicit zero cache_read_input_tokens on Anthropic envelopes

The previous follow-up fell back to ``prompt_tokens_details.cached_tokens``
whenever the native ``cache_read_input_tokens`` was missing OR equal to 0,
even though the commit message stated the native key always wins when
present. A proxy that forwards a stale ``prompt_tokens_details`` block
alongside an authoritative ``cache_read_input_tokens: 0`` would then
inflate cache_read past the real native count, posting a false cache_read
line and bumping billable_input_tokens. Switch the gate to native-key
presence so an explicit zero stays authoritative; the mirror only kicks
in when the native key is absent. Add a regression test pinning the
explicit-zero precedence.

* Move fast_mode pricing back to unslothai#5715

The fast_mode 6x multiplier landed in two places at once -- here
(f66df7b) and on unslothai#5715 (4f1afdb) -- since both audits ran in
parallel. Drop the duplicate from this branch so the change lives
in its natural home (unslothai#5715, which introduces fast_mode itself);
this PR stays focused on the cache-read fallback + 1h breakdown.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Shorten pricing comments for PR unslothai#5722

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
rhsCZ pushed a commit to rhsCZ/unsloth that referenced this pull request May 26, 2026
Main moved forward 17 commits during PR review (latest: 953c8bf). Real
conflicts in five files; resolved by combining both branches' changes.

studio/backend/core/inference/external_provider.py
- Add fast_mode (Anthropic Opus 4.6/4.7 speed flag, unslothai#5715) to
  stream_chat_completion and Anthropic-branch call site, alongside
  existing Gemini tools/tool_choice forwarding.
- Add _openai_image_generation_tool() helper (action:"edit" for follow-
  up image edits, unslothai#5712) and use it inside the existing
  _responses_hosted_builtins_allowed gate so the forced-function /
  tool_choice="none" suppression added in rounds 21+ still applies.
- Keep Anthropic web_fetch gated on _anthropic_hosted_builtins_allowed
  (round 19+ hosted-builtin gate) while taking main's per-model
  version selector (web_fetch_20260209 vs _20250910).

studio/backend/routes/inference.py
- Add `openai = provider_type == "openai"` (used by main's reasoning
  content forwarding for follow-up image edits).
- Keep the round 25/26 Gemini filter chain (_filter_tool_calls drops
  synthetic server-builtin cards, marks tc_id so the matching
  role="tool" follow-up gets skipped, extra_content gated to native
  Gemini host).
- Forward fast_mode alongside tools/tool_choice.

studio/backend/tests/test_openai_image_generation.py
- Combine assertions: both _server_tool: True (PR) and
  openai_image_generation_call_id (main) are present on the tool_start
  arguments.

studio/frontend/src/features/chat/shared-composer.tsx
- Add supportsBuiltinWebFetch declaration (separate Fetch pill from
  unslothai#5742) before the PR's isExternalGemini constant so both the Gemini
  image-tier gating and the standalone Anthropic Fetch pill compile.

studio/frontend/src/features/chat/api/chat-adapter.ts
- Add main's normalizeOpenAIReasoningItem, toOpenAIImageEditReferenceMessage,
  isAnthropicRefusalMessage helpers alongside PR's collectAssistantToolCalls,
  collectToolResultMessages, SerializedMessage, collectAssistantTextThoughtSignature.
- toOpenAIMessages (PR) now also early-returns on isAnthropicRefusalMessage
  so refused turns get pruned from outbound history.
- Add a thin toOpenAIMessage (singular) wrapper for the OpenAI image-
  edit replay path's flat .map() usage.
- Merge per-turn enable flags: keep PR's imageGenerationEnabledForThisTurn,
  geminiImageModeForThisTurn, codeExecEnabledForThisTurn !geminiImageMode
  gate; take main's webFetchEnabledForThisTurn (sourced from independent
  webFetchToolsEnabled pill state).
- Outbound build chains main's anthropic_refusal survivingMessages prune,
  then flatMap(toOpenAIMessages) (PR), then PR's selectedImageEditReference
  reference message prepend; image-edit unavailable toast from main fires
  before any of that when the pill is off.
- tool_end merge: do main's nextArgs spread first, then PR's Gemini
  native_part parts concat so both OpenAI image-call ids and Gemini
  executableCode/codeExecutionResult/inlineData round-trip.
- Cumulative + final yields: orderAssistantContent(pinTextThoughtSignature(...))
  composes main's tool-vs-text ordering with PR's per-text thoughtSignature pin.

Tests: gemini provider 148/148; openai_responses_translation + openai_code_execution
+ openai_image_generation + anthropic_code_execution + anthropic_web_fetch +
external_provider_usage_chunk + providers_api: 50 passed, 42 skipped; main's
new anthropic_fast_mode + citations + openai_citation_markers + openai_tool_result_fallbacks
suites all 43/43.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant