LLM gateways provide a centralized proxy layer between Claude Code and model providers, often providing:Documentation Index
Fetch the complete documentation index at: https://code.claude.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
- Centralized authentication - Single point for API key management
- Usage tracking - Monitor usage across teams and projects
- Cost controls - Implement budgets and rate limits
- Audit logging - Track all model interactions for compliance
- Model routing - Switch between providers without code changes
Gateway requirements
For an LLM gateway to work with Claude Code, it must meet the following requirements: API format The gateway must expose to clients at least one of the following API formats:-
Anthropic Messages:
/v1/messages,/v1/messages/count_tokens- Must forward request headers:
anthropic-beta,anthropic-version
- Must forward request headers:
-
Bedrock InvokeModel:
/invoke,/invoke-with-response-stream- Must preserve request body fields:
anthropic_beta,anthropic_version
- Must preserve request body fields:
-
Vertex rawPredict:
:rawPredict,:streamRawPredict,/count-tokens:rawPredict- Must forward request headers:
anthropic-beta,anthropic-version
- Must forward request headers:
Claude Code determines which features to enable based on the API format. When using the Anthropic Messages format with Bedrock or Vertex, you may need to set environment variable
CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1.| Header | Description |
|---|---|
X-Claude-Code-Session-Id | A unique identifier for the current Claude Code session. Proxies can use this to aggregate all API requests from a single session without parsing the request body. |
CLAUDE_CODE_ATTRIBUTION_HEADER=0 to omit it.
Configuration
Model selection
By default, Claude Code uses standard model names for the selected API format. WhenANTHROPIC_BASE_URL points at a gateway that exposes the Anthropic Messages format, Claude Code can query the gateway’s /v1/models endpoint at startup and add the returned models to the /model picker. Set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 to enable this. Discovery is off by default so that gateways backed by a shared API key do not surface every model the key can access to every user. Each discovered entry is labeled “From gateway” and uses the display_name field from the response when one is provided. This requires Claude Code v2.1.129 or later.
Discovery applies only to the Anthropic Messages format. It does not run for Bedrock or Vertex pass-through endpoints, and it does not run when ANTHROPIC_BASE_URL is unset or points at api.anthropic.com.
The discovery request authenticates the same way as inference requests: it sends ANTHROPIC_AUTH_TOKEN as a bearer token, or ANTHROPIC_API_KEY as the x-api-key header when no auth token is set, along with any headers from ANTHROPIC_CUSTOM_HEADERS. Only models whose ID begins with claude or anthropic are added to the picker. Results are cached to ~/.claude/cache/gateway-models.json and refreshed on each startup. If the request fails or the gateway does not implement /v1/models, the picker falls back to the cached list from the previous startup or to the built-in model list.
If your gateway uses model names that do not match the discovery filter, use the environment variables documented in Model configuration to add them manually.
LiteLLM configuration
Prerequisites
- Claude Code updated to the latest version
- LiteLLM Proxy Server deployed and accessible
- Access to Claude models through your chosen provider
Basic LiteLLM setup
Configure Claude Code:Authentication methods
Static API key
Simplest method using a fixed API key:Authorization header.
Dynamic API key with helper
For rotating keys or per-user authentication:- Create an API key helper script:
- Configure Claude Code settings to use the helper:
- Set token refresh interval:
Authorization and X-Api-Key headers. The apiKeyHelper has lower precedence than ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY.
Unified endpoint (recommended)
Using LiteLLM’s Anthropic format endpoint:- Load balancing
- Fallbacks
- Consistent support for cost tracking and end-user tracking