feat(bedrock): full Converse API support for cross-region inference profiles#12979
Open
bbo268 wants to merge 5 commits into
Open
feat(bedrock): full Converse API support for cross-region inference profiles#12979bbo268 wants to merge 5 commits into
bbo268 wants to merge 5 commits into
Conversation
- Add inject_cache_points() to insert cachePoint blocks for Bedrock Converse API prompt caching (system_and_3 strategy, up to 4 breakpoints) - Add _model_supports_prompt_caching() allowlist (Claude family only) - Fix image base64 double-encoding: Bedrock expects raw bytes in source.bytes, not the base64 string (boto3 handles wire encoding) - Increase bedrock-runtime client timeout from 60s to 300s with adaptive retries (large models with 500K+ context can have TTFT > 60s)
…e path - Replace hardcoded max_tokens=4096 with model-aware output cap from _get_anthropic_max_output() (e.g. 128K for claude-opus-4-7) - Pass enable_caching flag to build_converse_kwargs() when prompt caching is active - Extend _anthropic_prompt_cache_policy() to return (True, True) for Bedrock Converse Claude models (was previously only native Anthropic)
… API AnthropicBedrock SDK does not support cross-region inference profiles (global./us./eu./ap./jp. prefixed model IDs). Route these models through the Converse API path which handles them natively, while keeping non-prefixed Claude models on the AnthropicBedrock SDK path.
normalize_usage() was only matching api_mode=='anthropic_messages' for cache metrics extraction. Bedrock Converse returns the same cache_read_input_tokens / cache_creation_input_tokens fields but was falling through to the generic OpenAI else branch, zeroing out all cache hit/miss statistics.
Add _BedrockCompletionsAdapter and BedrockAuxiliaryClient so that auxiliary tasks (context compression, session search, web extract, vision) can use Bedrock-hosted models without a separate API key. Uses the same AWS credential chain as the main model. Also add provider aliases: aws, aws-bedrock, amazon-bedrock, amazon → bedrock.
This was referenced Apr 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds complete Bedrock Converse API support for cross-region inference profiles (
global./us./eu./ap./jp.prefixed model IDs), which are not supported by the AnthropicBedrock SDK.Problem
The AnthropicBedrock SDK does not recognize cross-region inference profile model IDs (e.g.
global.anthropic.claude-opus-4-6-v1,us.anthropic.claude-sonnet-4-6-v1). These profiles are AWS's recommended way to get automatic cross-region routing for higher availability and throughput. Users who configure these model IDs get errors because the SDK rejects the model ID format.The Converse API path existed but was incomplete — missing prompt caching, having a hardcoded 4096 max_tokens, double-encoding images, and lacking an auxiliary client.
Changes
Route cross-region profiles through Converse API (
runtime_provider.py)global./us./eu./ap./jp.) now route through Converse APIPrompt caching for Converse API (
bedrock_adapter.py,run_agent.py)inject_cache_points()with native BedrockcachePointblockssystem_and_3strategy (up to 4 breakpoints) matching the Anthropic native path_anthropic_prompt_cache_policy()Fix image encoding (
bedrock_adapter.py)source.bytes(boto3 handles base64 on the wire)Dynamic max_tokens (
run_agent.py)max_tokens=4096with model-aware output cap from_get_anthropic_max_output()Client timeout and retries (
bedrock_adapter.py)Fix usage metrics (
usage_pricing.py)bedrock_conversein Anthropic-style usage parsingBedrock auxiliary client (
auxiliary_client.py)_BedrockCompletionsAdapterandBedrockAuxiliaryClientfor auxiliary tasksaws/aws-bedrock/amazon-bedrock/amazon→bedrockTesting
Tested in production with:
global.anthropic.claude-opus-4-6-v1(main model)global.anthropic.claude-sonnet-4-6-v1(compression)global.anthropic.claude-haiku-4-5-20251001-v1:0(auxiliary)Verified:
Breaking Changes
None. This is purely additive — existing configurations using non-prefixed Bedrock model IDs are unaffected.