Skip to content

feat(bedrock): full Converse API support for cross-region inference profiles#12979

Open
bbo268 wants to merge 5 commits into
NousResearch:mainfrom
bbo268:feat/bedrock-converse-full-support
Open

feat(bedrock): full Converse API support for cross-region inference profiles#12979
bbo268 wants to merge 5 commits into
NousResearch:mainfrom
bbo268:feat/bedrock-converse-full-support

Conversation

@bbo268

@bbo268 bbo268 commented Apr 20, 2026

Copy link
Copy Markdown

Summary

This PR adds complete Bedrock Converse API support for cross-region inference profiles (global./us./eu./ap./jp. prefixed model IDs), which are not supported by the AnthropicBedrock SDK.

Problem

The AnthropicBedrock SDK does not recognize cross-region inference profile model IDs (e.g. global.anthropic.claude-opus-4-6-v1, us.anthropic.claude-sonnet-4-6-v1). These profiles are AWS's recommended way to get automatic cross-region routing for higher availability and throughput. Users who configure these model IDs get errors because the SDK rejects the model ID format.

The Converse API path existed but was incomplete — missing prompt caching, having a hardcoded 4096 max_tokens, double-encoding images, and lacking an auxiliary client.

Changes

  1. Route cross-region profiles through Converse API (runtime_provider.py)

    • Non-prefixed Claude models continue to use AnthropicBedrock SDK
    • Cross-region prefixed Claude models (global./us./eu./ap./jp.) now route through Converse API
    • Non-Claude models continue to use Converse API
  2. Prompt caching for Converse API (bedrock_adapter.py, run_agent.py)

    • Implement inject_cache_points() with native Bedrock cachePoint blocks
    • Uses system_and_3 strategy (up to 4 breakpoints) matching the Anthropic native path
    • Enable caching policy for Bedrock Claude models in _anthropic_prompt_cache_policy()
  3. Fix image encoding (bedrock_adapter.py)

    • Bedrock Converse expects raw bytes in source.bytes (boto3 handles base64 on the wire)
    • Previous code passed the base64 string directly, causing double-encoding and "Could not process image" errors
  4. Dynamic max_tokens (run_agent.py)

    • Replace hardcoded max_tokens=4096 with model-aware output cap from _get_anthropic_max_output()
    • Prevents silent truncation of long responses (e.g. Opus 4.7 supports 128K output)
  5. Client timeout and retries (bedrock_adapter.py)

    • Increase read_timeout from 60s to 300s (large models with 500K+ context can have TTFT > 60s)
    • Add adaptive retry config (3 attempts)
  6. Fix usage metrics (usage_pricing.py)

    • Include bedrock_converse in Anthropic-style usage parsing
    • Previously cache_read/cache_creation metrics were zeroed out for Converse mode
  7. Bedrock auxiliary client (auxiliary_client.py)

    • Add _BedrockCompletionsAdapter and BedrockAuxiliaryClient for auxiliary tasks
    • Allows context compression, session search, web extract, vision to use Bedrock-hosted models
    • Same AWS credential chain, no separate API key needed
    • Add provider aliases: aws/aws-bedrock/amazon-bedrock/amazonbedrock

Testing

Tested in production with:

  • global.anthropic.claude-opus-4-6-v1 (main model)
  • global.anthropic.claude-sonnet-4-6-v1 (compression)
  • global.anthropic.claude-haiku-4-5-20251001-v1:0 (auxiliary)

Verified:

  • ✅ Prompt caching active (~75% input cost reduction on multi-turn)
  • ✅ Images processed correctly (vision tool works)
  • ✅ Long outputs not truncated (tested 8K+ token responses)
  • ✅ Cache metrics properly reported in usage stats
  • ✅ Auxiliary tasks (compression, session search) work via Bedrock
  • ✅ Non-cross-region Claude models still route through AnthropicBedrock SDK (backwards compatible)

Breaking Changes

None. This is purely additive — existing configurations using non-prefixed Bedrock model IDs are unaffected.

bbo268 added 5 commits April 20, 2026 11:07
- Add inject_cache_points() to insert cachePoint blocks for Bedrock
  Converse API prompt caching (system_and_3 strategy, up to 4 breakpoints)
- Add _model_supports_prompt_caching() allowlist (Claude family only)
- Fix image base64 double-encoding: Bedrock expects raw bytes in
  source.bytes, not the base64 string (boto3 handles wire encoding)
- Increase bedrock-runtime client timeout from 60s to 300s with adaptive
  retries (large models with 500K+ context can have TTFT > 60s)
…e path

- Replace hardcoded max_tokens=4096 with model-aware output cap from
  _get_anthropic_max_output() (e.g. 128K for claude-opus-4-7)
- Pass enable_caching flag to build_converse_kwargs() when prompt
  caching is active
- Extend _anthropic_prompt_cache_policy() to return (True, True) for
  Bedrock Converse Claude models (was previously only native Anthropic)
… API

AnthropicBedrock SDK does not support cross-region inference profiles
(global./us./eu./ap./jp. prefixed model IDs). Route these models
through the Converse API path which handles them natively, while
keeping non-prefixed Claude models on the AnthropicBedrock SDK path.
normalize_usage() was only matching api_mode=='anthropic_messages'
for cache metrics extraction. Bedrock Converse returns the same
cache_read_input_tokens / cache_creation_input_tokens fields but was
falling through to the generic OpenAI else branch, zeroing out all
cache hit/miss statistics.
Add _BedrockCompletionsAdapter and BedrockAuxiliaryClient so that
auxiliary tasks (context compression, session search, web extract,
vision) can use Bedrock-hosted models without a separate API key.
Uses the same AWS credential chain as the main model.

Also add provider aliases: aws, aws-bedrock, amazon-bedrock, amazon
→ bedrock.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/bedrock AWS Bedrock (boto3, IAM) type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants