Skip to content

test(2/N): fix bedrock 1M-context beta + max_tokens retry hardening#8

Merged
Bartok9 merged 1 commit into
mainfrom
fix/test-suite-green-2n-agent
May 30, 2026
Merged

test(2/N): fix bedrock 1M-context beta + max_tokens retry hardening#8
Bartok9 merged 1 commit into
mainfrom
fix/test-suite-green-2n-agent

Conversation

@Bartok9

@Bartok9 Bartok9 commented May 30, 2026

Copy link
Copy Markdown
Owner

Context

Second PR in the test-suite-green series. Follows PR #6 (1/N) which made the
timezone-dependent flow-peak scheduler tests timezone-independent. This PR turns
5 more pre-existing failures green in the agent/ model-API cluster.

Tests fixed (5)

Bedrock 1M-context beta cluster:

  1. tests/agent/test_bedrock_1m_context.py::TestBedrockContext1MBeta::test_common_betas_includes_1m
  2. tests/agent/test_bedrock_1m_context.py::TestBedrockContext1MBeta::test_common_betas_for_native_anthropic_includes_1m
  3. tests/agent/test_bedrock_1m_context.py::TestBedrockContext1MBeta::test_build_anthropic_kwargs_includes_1m_for_bedrock_fastmode

max_tokens retry-hardening cluster:
4. tests/agent/test_unsupported_parameter_retry.py::TestMaxTokensRetryHardening::test_sync_max_tokens_retry_matches_generic_phrasing
5. tests/agent/test_unsupported_parameter_retry.py::TestMaxTokensRetryHardening::test_async_max_tokens_retry_matches_generic_phrasing

Root cause

Bedrock cluster: context-1m-2025-08-07 was deliberately kept OUT of
_COMMON_BETAS and only appended for Azure, so neither the common-betas set nor
the non-OAuth fast-mode request path carried it — Bedrock/native Opus 4.6/4.7
silently capped at 200K. The test suite now requires the beta to live in
_COMMON_BETAS (and flow to Bedrock + non-OAuth fast-mode) while still being
absent from the client-level native/custom default header and the OAuth
fast-mode path (subscriptions without long-context 400 on it).

max_tokens cluster: the unsupported-parameter retry branch did
kwargs.pop("max_tokens"); kwargs.pop("max_completion_tokens") but never
re-added the cap under the renamed key. Providers that reject the parameter as a
rename ("Unknown parameter: max_tokens — use max_completion_tokens") lost their
token cap entirely, and the test (which asserts max_completion_tokens == 512)
hit KeyError.

Fix

agent/anthropic_adapter.py

  • Add context-1m-2025-08-07 to _COMMON_BETAS.
  • _common_betas_for_base_url: keep the MiniMax bearer-auth strip; rely on the
    common set for everything else.
  • build_anthropic_client: compute an effective drop flag so the client-level
    default header omits 1M for native/custom endpoints but keeps it for Azure
    (and MiniMax strips it regardless). Dedupe the Bedrock client header join.
  • build_anthropic_kwargs fast-mode: always re-emit the common betas in
    per-request extra_headers (they override client defaults), drop 1M only on
    the OAuth path, and gate speed=fast + the fast-mode beta to models that
    actually support Fast Mode (Opus 4.6) — so Opus 4.7 fast-mode still carries 1M
    without 400'ing on speed.

agent/auxiliary_client.py

  • In both the sync and async max_tokens retry branches, re-add
    max_completion_tokens = max_tokens after popping max_tokens, except for the
    ZAI 1210 case which wants the param removed entirely.

Validation

Before (clean main): pytest tests/agent/18 failed, 2520 passed, 7 skipped
(the 5 above + 13 environment-dependent OAuth/keychain/botocore-region tests).

After this PR: pytest tests/agent/13 failed, 2525 passed, 7 skipped.

  • Target files: pytest tests/agent/test_bedrock_1m_context.py tests/agent/test_unsupported_parameter_retry.py23 passed (was 5 failed / 18 passed).
  • No new breakage: verified the 13 remaining failures are byte-identical to the
    clean-main set (TestReadClaudeCodeCredentials ×5, TestResolveBedrocRegion ×3,
    TestResolveAnthropicToken ×2, TestRunOauthSetupToken ×3) — all environment-
    dependent (keychain / credential files / botocore profile), untouched here.
  • Confirmed previously-passing neighbors still pass: TestBuildAnthropicClient,
    TestBuildAnthropicKwargs, MiniMax/Azure/Bedrock header tests.

Remaining on main

After merge: ~33 pre-existing failures remain repo-wide (38 at series start
minus PR #6's batch minus these 5). The 13 still-red agent/ tests in this run
are environment-dependent and out of scope for this PR.


Note

Medium Risk
Changes default Anthropic beta headers and fast-mode/OAuth branching affect all Claude API paths; incorrect stripping could 400 auxiliary calls or cap context at 200K on Bedrock.

Overview
Moves the 1M context beta (context-1m-2025-08-07) into _COMMON_BETAS so Bedrock and non-OAuth fast-mode requests keep full beta headers when per-request extra_headers override client defaults, while native/custom Anthropic clients still omit that beta on the default header (subscriptions without long-context can 400). Azure keeps 1M on the client; MiniMax bearer endpoints strip tool-streaming and 1M betas. Fast mode always re-emits applicable betas; OAuth fast-mode drops the 1M beta; speed: fast and the fast-mode beta apply only on models that support fast mode (e.g. Opus 4.6), so Opus 4.7 fast-mode can retain 1M without invalid speed.

In auxiliary_client, sync/async retries after max_tokens is rejected as unknown/renamed now set max_completion_tokens to the prior cap instead of dropping the limit entirely (ZAI 1210 still strips both).

Reviewed by Cursor Bugbot for commit e7252d4. Bugbot is set up for automated code reviews on this repo. Configure here.

Bedrock 1M cluster (3 tests): add context-1m-2025-08-07 to _COMMON_BETAS
so it reaches Bedrock + non-OAuth fast-mode requests, while keeping the
client-level native/custom default header (and OAuth fast-mode) stripping
it for subscriptions that 400 on the long-context beta. Fast-mode now
re-includes the common betas even on models without speed=fast support.

max_tokens cluster (2 tests): the unsupported-parameter retry branch popped
max_tokens but never re-added max_completion_tokens, so providers that
rename the param ("Unknown parameter: max_tokens") lost their cap. Re-add
under the renamed key on retry (sync + async), preserving the ZAI 1210
strip-entirely behavior.

Follows PR #6 (1/N: timezone-independent flow-peak scheduler tests).
@Bartok9 Bartok9 merged commit 5d2bb67 into main May 30, 2026
7 of 8 checks passed
@Bartok9 Bartok9 deleted the fix/test-suite-green-2n-agent branch May 30, 2026 06:56
@github-actions

Copy link
Copy Markdown

🔎 Lint report: fix/test-suite-green-2n-agent vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 7746 on HEAD, 7746 on base (➖ 0)

🆕 New issues (3):

Rule Count
invalid-argument-type 3
First entries
run_agent.py:6649: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown | str, Unknown | str | dict[str, str]] | Any | ... omitted 3 union elements`
run_agent.py:12539: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown | str, Unknown | str | dict[str, str]] | Any | ... omitted 3 union elements`
run_agent.py:12542: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown | str, Unknown | str | dict[str, str]] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`

✅ Fixed issues (3):

Rule Count
invalid-argument-type 3
First entries
run_agent.py:12542: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:12539: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
run_agent.py:6649: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`

Unchanged: 4073 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant