[internal copy of #28008] Support MCP OAuth passthrough and issuer-scoped JWT auth#28356
Conversation
Operators upgrading past 35bbca6 (which made /metrics auth default-on) see "Malformed API Key passed in. Ensure Key has 'Bearer ' prefix." with no hint that litellm_settings.require_auth_for_metrics_endpoint: false restores the previous unauthenticated behavior. Append that discovery hint to the existing 401 body so a Prometheus scraper that breaks after upgrade has a clear migration path. No behavior change.
…t-rc2 fix(proxy): point /metrics 401 at the opt-out flag
…o remaining headroom reserve_budget_for_request fell back to reserving the entire remaining team/key/user headroom whenever a request omitted max_tokens, which pinned the spend counter at max_budget for the duration of the in-flight request and false-positive-blocked every concurrent or back-to-back request until the success callback reconciled. Surfaced as an integration-test team being budget-blocked at its $2000 cap while DB spend was $0.144. Switch the missing-max_tokens path to a fixed default of 16384 output tokens (mirrors parallel_request_limiter_v3's DEFAULT_MAX_TOKENS_ESTIMATE precedent), and clamp explicit max_tokens at the model's max_output_tokens for reservation accounting only. The outbound request body is unchanged, so providers see whatever the caller actually sent; only the local integer used to compute reservation cost is bounded. This also prevents a hostile max_tokens=999999999 from inflating one request's reservation up to the entire team headroom. For Opus 4.7 (output $25/M, max_output 128K) on a $2000 budget the worst-case per-request reservation drops from "everything left" to $3.20, raising admittable concurrency from 1 to ~625.
Image-generation routes (dall-e-3, flux, etc.) have no per-token output cost so they fell through to the no-reservation read-time-only path. Concurrent image requests against a depleted budget could all pass common_checks (counter exactly at max_budget passes the strict-`>` gate) and reach the provider before reconciliation caught up. Add per-image reservation in _estimate_request_max_cost_for_model: when the model has a per-image cost field, reserve `n × cost_per_image` upfront. The atomic counter increment serializes concurrent admissions, so the second request sees the post-first-reservation counter and raises BudgetExceededError instead of silently leaking through. Both `output_cost_per_image` and `input_cost_per_image` are honored — naming is inconsistent across providers (OpenAI dall-e-3 uses input_cost_per_image, aiml/dall-e-3 uses output_cost_per_image for the same per-generated-image price). Per-pixel pricing (DALL-E 2 size variants) and TTS/STT routes still fall through to read-time enforcement; those are follow-ups.
The previous detection treated any model with input_cost_per_image
or output_cost_per_image as image generation. Several chat and
embedding models carry those fields to price multimodal vision input,
not generated images:
- gemini-3.1-pro-preview (mode=chat) has output_cost_per_image=0.00012
alongside input/output token pricing.
- azure/gpt-realtime-* (mode=chat) has input_cost_per_image=5e-6.
- amazon.titan-embed-image-v1 (mode=embedding) has
input_cost_per_image=6e-5.
For these models the image-gen branch fired first and reserved a
fraction of a cent per request, short-circuiting the token-priced
path entirely. Long Gemini chats reserved 1 × $0.00012 instead of
the true token cost.
Gate strictly on mode in {"image_generation", "image_edit"}. All 197
real image_generation entries and all 31 image_edit entries
(Flux Kontext, Stability inpaint/outpaint, etc.) carry the right mode,
so the field-presence fallback was unnecessary.
Adds regression tests for the chat-model-with-image-cost-field case
and for image_edit reservation.
…2-backport fix(proxy): bound budget reservation per request (backport of #27509 to 1.84.0rc2)
Backport of #27241 onto litellm_1.84.0rc2. The 12 entries in `[project.dependencies]` were exact `==` pins, a side effect of the Poetry -> uv migration. This forces every downstream package that lists litellm as a dependency to downgrade common runtime libraries (openai, pydantic, aiohttp, click, jsonschema, ...) to the exact versions we ship. Switch to lower-bounded ranges with upper bounds where the upstream package is pre-1.0 or has a known breaking-major-version policy. Reproducibility for our Docker proxy and CI continues to come from `uv.lock`, which is regenerated here as a metadata-only diff. Conflict resolution vs upstream merge: - The upstream merge commit also surfaced unrelated context entries (nvidia-riva-client, soundfile/stt-nvidia-riva extra) that exist in staging but not in rc2. Those are not part of #27241's intent and were dropped from the resolution; the rc2 uv.lock keeps its existing entry set, only the 12 specifier strings changed. - `uv lock --check` passes (392 packages resolved, no drift).
build(packaging): relax core runtime pins to ranges (rc2 backport of #27241)
Our `uv.lock` already resolves jinja2 to 3.1.6, so Docker / CI installs get that version. The `pyproject.toml` floor was lagging at 3.1.0, which means downstream consumers using `--resolution=lowest-direct` or older constraint files can land on 3.1.0-3.1.5 instead of the version we actually test against. Aligns the declared floor with the resolved version so external installers see the same baseline our test matrix exercises. `uv lock` diff is metadata-only (no resolved-version drift).
build(packaging): raise jinja2 floor to 3.1.6 (rc2 backport of #27552)
OpenAPI-generated tools only applied static closure headers and BYOK Authorization via ContextVar. Copy MCPServer.extra_headers from the incoming MCP request into _request_extra_headers (set in server.py before local tool dispatch), merge in openapi_to_mcp_generator via a small helper. OAuth2 M2M: do not forward caller Authorization from raw_headers (same rule as _prepare_mcp_server_headers for managed MCP). Adds TestRequestExtraHeaders and clarifies mcp_server_manager registration comment. Fixes #26794 Co-authored-by: Cursor <cursoragent@cursor.com>
Greptile: getattr default was redundant; property exists on MCPServer and mcp_server is non-None inside the extra_headers forwarding block. Co-authored-by: Cursor <cursoragent@cursor.com>
Match the existing MCP invariant in merge_mcp_headers and the managed MCP path: operator-configured static headers always override caller-forwarded headers on name conflict, with case-insensitive comparison so different casing cannot bypass the precedence. _request_auth_header (BYOK) still overrides Authorization last. Addresses Veria review on PR #27383. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
…onto-1.84.0rc2 cherry-pick: OpenAPI MCP extra_headers (#27383) onto litellm_1.84.0rc2
Caller-supplied tags (`x-litellm-tags` header, body `tags`, `metadata.tags`) were silently dropped unless the key/team had `metadata.allow_client_tags: true` set. Restore the documented behavior: tags from the request always flow into `metadata.tags` and union with any admin-configured static tags from key/team/project metadata. Removes the `allow_client_tags` opt-in flag from the pre-call pipeline. The flag was only ever read here; it has no schema or endpoint footprint, so leftover values in existing key metadata are inert. Test cleanup mirrors the simplification: drop the three tests that verified the strip-when-not-opted-in path, drop the `allow_client_tags` fixture lines from the merge/union tests.
The tag-strip block was removed in the parent commit but two surrounding comments still referenced "tags without opt-in" and "runs AFTER the strip". Update them to describe the remaining user_api_key_* and _pipeline_managed_guardrails strip that the snapshot/merge ordering actually protects against.
fix(proxy): always merge caller-supplied tags into request metadata
…#27762) Cherry-pick of #27762 onto litellm_1.84.0rc2. * chore: reject bare str at file-input sinks to prevent local-file read (#27667) * fix: use os.PathLike in ocr sink and check truthy reasoningSummary for bridge - ocr/main.py: widen Path check to os.PathLike for consistency with other sinks - main.py: bridge condition checks truthiness of reasoning_summary, not just None * fix: remove unused pathlib.Path import in ocr/main.py Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>
…86be cherry-pick: reject bare str at file-input sinks (#27762) onto litellm_1.84.0rc2
LazyFeatureMiddleware compared the raw scope path against registered prefixes (e.g. /policies), so requests under a server root path like /api/v1/policies/... never matched, the feature never loaded, and the endpoint returned 404. Strip the configured root path before matching, normalizing trailing slashes and enforcing a component boundary so /api does not falsely match /apiv2.
SERVER_ROOT_PATH is a process-startup env var. Read it once in __init__ instead of calling get_server_root_path() + rstrip on every request that arrives before all lazy features have loaded.
…e guards (#27793) Backport of #27793 onto litellm_1.84.0rc2. A non-admin caller could rebind their own key's user_id via /key/regenerate. _execute_virtual_key_regeneration had org/team guards but no user_id guard, and prepare_key_update_data did not strip the field — it survived model_dump(exclude_unset=True) into the Prisma update. On the next request, _return_user_api_key_auth_obj resolved the rebound user_id against litellm_usertable and returned PROXY_ADMIN whenever the target row's user_role was admin. /key/update had the equivalent guard inline at _validate_update_key_data; extract it to a shared helper _validate_caller_can_change_key_ownership and call from both /key/update and _execute_virtual_key_regeneration. Also tighten the premium gate that allowed the master-key rotation branch to skip the enterprise check. The previous predicate was a field-presence test, not an identity check. Verify the caller actually holds the master key via _is_master_key before allowing the non-premium path. Block explicit-null user_id and empty-string user_id as removal attempts; both 403-reject for non-admin callers.
[Fix] Lazy feature loading under SERVER_ROOT_PATH returns 404 (backport of #27812)
…ard_84rc2 [Fix] Backport /key/regenerate ownership-rebind + premium-gate guards (#27793)
Backport of #27866 onto litellm_1.84.0rc2. External readiness probes consumed the legacy detailed payload's `db` field to drive alerting and pod-rotation decisions. Stripping the body to {"status": "healthy"} broke those probes silently — the HTTP code still flipped to 503, but probes checking body.db == "connected" treated the response as healthy. Add `db` back to the unauthenticated payload. The rest of the diagnostic fields (litellm_version, callbacks, cache, log_level) stay behind /health/readiness/details so the recon-leak gate from #26912 holds. Values match the legacy contract: "connected", "disconnected", "Not connected". The 503-on-DB-disconnect behavior from LIT-2607 is preserved.
fix(proxy): expose db status on public /health/readiness (backport #27866)
The proxy moved `litellm_version`, `is_detailed_debug`, and other diagnostic fields off the public `/health/readiness` payload behind an auth-gated `/health/readiness/details` endpoint. The navbar version tag and the detailed-debug-mode banner stopped working because they were still reading those fields from the unauthed response, which no longer contains them. Replace `useHealthReadiness` with a `useHealthReadinessDetails` hook that takes an `accessToken` argument and sends a Bearer header to the auth-gated endpoint. The hook stays disabled while `accessToken` is falsy, so the navbar can keep rendering on the public model hub (where the token is null) without triggering an auth redirect or a 401-loop.
Two small follow-ups on the readiness/details migration: - Set `retry: false` on the query. The payload feeds a passive navbar tag and a debug banner; a 401 from an expired token shouldn't fan out into three retries against the proxy. - Add navbar specs that assert the `accessToken` prop is forwarded into the hook (matches the DebugWarningBanner spec). Without this, the navbar could silently regress to passing `undefined` and the existing tests wouldn't catch it.
fix(ui): fetch version + debug flag from /health/readiness/details (backport #27896)
…nt wolfi-base SHA The older sha256:3258be... pin has been intermittently returning 500/not-found from cgr.dev, breaking the test-server-root-path GitHub Action and the build_docker_database_image CircleCI job. Move both Dockerfiles onto the same sha256:31da65... digest already in use by Dockerfile, gateway/Dockerfile, backend/Dockerfile, and migrations/Dockerfile so the base image is consistent across the repo.
|
Replying to @greptile-apps[bot]'s summary comment (#issuecomment-4499663072) — the lone outstanding concern is not actionable on this branch:
Three things make this a no-op-or-worse change in the PR's configuration: 1. 2. The probe is correctness-critical, not optional. The MCP StreamableHTTP SDK commits # Start the SSE response (this will send headers immediately)
async with anyio.create_task_group() as tg:
tg.start_soon(response, scope, receive, send)
# Then send the message to be processed by the server
...
await writer.send(session_message)The 3. The probe is already narrowly scoped and bounded. It is a no-op for the typical deployment:
The extra upstream RTT on passthrough servers is the intentional trade-off: the alternative is that a client with an expired upstream token receives a 200 SSE stream with an empty/garbled body and no way to discover that it needs to refresh, which defeats the entire RFC 9728 challenge flow this PR adds. Leaving the probe as-is. |
|
Acknowledged — all three points hold up. The stateless-mode argument alone is sufficient to invalidate the Concern withdrawn. No action needed. |
The previously aligned sha256:31da6565f35a... and the older sha256:3258be... both return HTTP 500 from cgr.dev's manifest endpoint, breaking the build_docker_database_image CircleCI job and test-server-root-path GitHub Action. The current 'latest' tag resolves to sha256:5743937d521c... which serves manifests normally, so move docker/Dockerfile.database and docker/Dockerfile.non_root onto that digest.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Preemptive 401 fires before toolset scope narrows servers
- Moved _raise_preemptive_401_for_unauthenticated_servers to run after toolset scoping in both handle_streamable_http_mcp and handle_sse_mcp, and added an optional allowed_server_ids parameter so passthrough/oauth2 servers excluded by the active toolset are skipped instead of triggering a spurious OAuth challenge.
Preview (625e3bc526)
diff --git a/docker/Dockerfile.database b/docker/Dockerfile.database
--- a/docker/Dockerfile.database
+++ b/docker/Dockerfile.database
@@ -1,8 +1,8 @@
# Base image for building
-ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31
+ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:5743937d521cbeb9e8c73bf1bd7ba2589c178940eb03d7b148efecc962be8587
# Runtime image
-ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31
+ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:5743937d521cbeb9e8c73bf1bd7ba2589c178940eb03d7b148efecc962be8587
ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:240fb85ab0f263ef12f492d8476aa3a2e4e1e333f7d67fbdd923d00a506a516a
FROM $UV_IMAGE AS uvbin
@@ -15,16 +15,20 @@
COPY --from=uvbin /uv /usr/local/bin/uv
COPY --from=uvbin /uvx /usr/local/bin/uvx
-RUN apk add --no-cache \
- bash \
- gcc \
- python3 \
- python3-dev \
- openssl \
- openssl-dev \
- nodejs \
- npm \
- libsndfile
+RUN for i in 1 2 3; do \
+ apk add --no-cache \
+ bash \
+ gcc \
+ python3 \
+ python3-dev \
+ openssl \
+ openssl-dev \
+ nodejs \
+ npm \
+ libsndfile && break; \
+ [ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \
+ sleep 5; \
+ done
ENV UV_PROJECT_ENVIRONMENT=/app/.venv \
UV_LINK_MODE=copy \
@@ -66,7 +70,11 @@
USER root
-RUN apk add --no-cache bash openssl tzdata nodejs npm python3 libsndfile && \
+RUN for i in 1 2 3; do \
+ apk add --no-cache bash openssl tzdata nodejs npm python3 libsndfile && break; \
+ [ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \
+ sleep 5; \
+ done && \
npm install -g npm@11.12.1 tar@7.5.11 glob@11.1.0 @isaacs/brace-expansion@5.0.1 minimatch@10.2.4 diff@8.0.3 && \
GLOBAL="$(npm root -g)" && \
find "$GLOBAL/npm" -type d -name "tar" -path "*/node_modules/tar" | while read d; do \
diff --git a/docker/Dockerfile.non_root b/docker/Dockerfile.non_root
--- a/docker/Dockerfile.non_root
+++ b/docker/Dockerfile.non_root
@@ -1,6 +1,6 @@
# Base images
-ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31
-ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31
+ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:5743937d521cbeb9e8c73bf1bd7ba2589c178940eb03d7b148efecc962be8587
+ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:5743937d521cbeb9e8c73bf1bd7ba2589c178940eb03d7b148efecc962be8587
ARG PROXY_EXTRAS_SOURCE=published
ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:240fb85ab0f263ef12f492d8476aa3a2e4e1e333f7d67fbdd923d00a506a516a
diff --git a/litellm-proxy-extras/litellm_proxy_extras/migrations/20260526120000_add_oauth_passthrough_to_mcp_servers/migration.sql b/litellm-proxy-extras/litellm_proxy_extras/migrations/20260526120000_add_oauth_passthrough_to_mcp_servers/migration.sql
new file mode 100644
--- /dev/null
+++ b/litellm-proxy-extras/litellm_proxy_extras/migrations/20260526120000_add_oauth_passthrough_to_mcp_servers/migration.sql
@@ -1,0 +1,2 @@
+-- AlterTable
+ALTER TABLE "LiteLLM_MCPServerTable" ADD COLUMN IF NOT EXISTS "oauth_passthrough" BOOLEAN NOT NULL DEFAULT false;
diff --git a/litellm-proxy-extras/litellm_proxy_extras/schema.prisma b/litellm-proxy-extras/litellm_proxy_extras/schema.prisma
--- a/litellm-proxy-extras/litellm_proxy_extras/schema.prisma
+++ b/litellm-proxy-extras/litellm_proxy_extras/schema.prisma
@@ -325,6 +325,7 @@
allow_all_keys Boolean @default(false)
available_on_public_internet Boolean @default(true)
delegate_auth_to_upstream Boolean @default(false)
+ oauth_passthrough Boolean @default(false)
is_byok Boolean @default(false)
byok_description String[] @default([])
byok_api_key_help_url String?
diff --git a/litellm/experimental_mcp_client/client.py b/litellm/experimental_mcp_client/client.py
--- a/litellm/experimental_mcp_client/client.py
+++ b/litellm/experimental_mcp_client/client.py
@@ -421,8 +421,16 @@
return factory
- async def list_tools(self) -> List[MCPTool]:
- """List available tools from the server."""
+ async def list_tools(self, raise_on_error: bool = False) -> List[MCPTool]:
+ """List available tools from the server.
+
+ Args:
+ raise_on_error: When True, re-raise exceptions instead of returning
+ an empty list. Used by the proxy's pass-through MCP flow so it
+ can surface upstream HTTP 401 responses as a proper 401 to the
+ MCP client (triggering the upstream OAuth flow) rather than
+ masking them as "connected, no tools".
+ """
verbose_logger.debug(
f"MCP client listing tools from {self.server_url or 'stdio'}"
)
@@ -458,6 +466,8 @@
"the MCP server may have crashed, disconnected, or timed out"
)
+ if raise_on_error:
+ raise
# Return empty list instead of raising to allow graceful degradation
return []
diff --git a/litellm/proxy/_experimental/mcp_server/auth/user_api_key_auth_mcp.py b/litellm/proxy/_experimental/mcp_server/auth/user_api_key_auth_mcp.py
--- a/litellm/proxy/_experimental/mcp_server/auth/user_api_key_auth_mcp.py
+++ b/litellm/proxy/_experimental/mcp_server/auth/user_api_key_auth_mcp.py
@@ -7,6 +7,7 @@
from starlette.types import Scope
from litellm._logging import verbose_logger
+from litellm.constants import DEFAULT_MANAGEMENT_OBJECT_IN_MEMORY_CACHE_TTL
from litellm.proxy._types import (
LiteLLM_TeamTable,
ProxyException,
@@ -14,8 +15,90 @@
UserAPIKeyAuth,
)
from litellm.proxy.auth.user_api_key_auth import user_api_key_auth
+from litellm.proxy.auth.ip_address_utils import IPAddressUtils
+def _parse_mcp_server_names_from_path(
+ path: str, mcp_servers_header: Optional[List[str]] = None
+) -> Optional[List[str]]:
+ """Resolve the single MCP server name a cold-start passthrough bypass may
+ target. Delegates parsing to
+ :meth:`MCPRequestHandler._extract_target_server_names_from_path` so the
+ names used here always match the names downstream routing uses; returns
+ ``None`` whenever the bypass must not activate (aggregate ``/mcp``,
+ multi-server CSV paths, or any other unrecognized path).
+
+ Also fails closed when the ``x-mcp-servers`` header introduces any server
+ not present in the path-derived target set. Downstream routing for
+ ``/mcp/...`` paths overrides the header with path-derived names, but a
+ header/path mismatch here is a sign of a confused or hostile caller —
+ refuse the cold-start bypass rather than admit anonymously based on the
+ path while the header advertises a stricter, non-passthrough target."""
+ servers = MCPRequestHandler._extract_target_server_names_from_path(path)
+ if len(servers) != 1:
+ verbose_logger.debug(
+ "MCP cold-start: path %r resolved to %r; passthrough 401 bypass "
+ "requires exactly one target and will not activate",
+ path,
+ servers,
+ )
+ return None
+ if mcp_servers_header is not None and (set(mcp_servers_header) - set(servers)):
+ verbose_logger.debug(
+ "MCP cold-start: x-mcp-servers header %r introduces target(s) not "
+ "in path-derived set %r; passthrough 401 bypass will not activate",
+ mcp_servers_header,
+ servers,
+ )
+ return None
+ return servers
+
+
+def _is_mcp_passthrough_cold_start(
+ mcp_servers: Optional[List[str]], client_ip: Optional[str]
+) -> bool:
+ """True only when EVERY targeted server is a pass-through server with no
+ auth headers — the cold-start OAuth discovery case per RFC 9728 / MCP
+ Authorization spec. Lets the route handler's 401 emitter produce the
+ spec-compliant WWW-Authenticate challenge instead of surfacing a generic
+ admission error.
+
+ Uses "all" semantics (mirrors :meth:`MCPRequestHandler._target_servers_use_oauth2`):
+ one non-passthrough target in a co-targeted set must not flip the bypass
+ open for the others. Fails closed when any target cannot be resolved."""
+ if not mcp_servers:
+ return False
+ from litellm.proxy._experimental.mcp_server.mcp_server_manager import (
+ global_mcp_server_manager,
+ )
+
+ for name in mcp_servers:
+ server = global_mcp_server_manager.get_mcp_server_by_name(
+ name, client_ip=client_ip
+ )
+ if server is None or not getattr(server, "is_oauth_passthrough", False):
+ return False
+ return True
+
+
+def _is_litellm_auth_admission_error(exc: Exception) -> bool:
+ if isinstance(exc, HTTPException):
+ return exc.status_code == 401
+ if isinstance(exc, ProxyException):
+ try:
+ return int(exc.code) == 401
+ except (TypeError, ValueError):
+ return False
+ return False
+
+
+def _has_client_supplied_mcp_auth(
+ mcp_auth_header: Optional[str],
+ mcp_server_auth_headers: Optional[Dict[str, Dict[str, str]]],
+) -> bool:
+ return bool(mcp_auth_header) or bool(mcp_server_auth_headers)
+
+
class MCPRequestHandler:
"""
Class to handle MCP request processing, including:
@@ -37,7 +120,7 @@
LITELLM_MCP_ACCESS_GROUPS_HEADER_NAME = SpecialHeaders.mcp_access_groups.value
@staticmethod
- async def process_mcp_request(
+ async def process_mcp_request( # noqa: PLR0915
scope: Scope,
) -> Tuple[
UserAPIKeyAuth,
@@ -130,7 +213,9 @@
elif (
not litellm_api_key
and MCPRequestHandler._target_servers_delegate_auth_to_upstream( # noqa: E501
- path=request_route, mcp_servers=mcp_servers
+ path=request_route,
+ mcp_servers=mcp_servers,
+ client_ip=IPAddressUtils.get_mcp_client_ip(request),
)
):
# Operator opted this oauth2 server into upstream-delegated auth
@@ -172,25 +257,87 @@
# than coercing (``int("None")`` would raise ValueError and
# rewrite the auth error as a 500).
status = e.status_code if isinstance(e, HTTPException) else e.code
- if status in (
- 401,
- 403,
- "401",
- "403",
- ) and MCPRequestHandler._target_servers_use_oauth2(
- path=request_route, mcp_servers=mcp_servers
+ is_auth_error = status in (401, 403, "401", "403")
+ is_unauthenticated = status in (401, "401")
+ client_ip = IPAddressUtils.get_mcp_client_ip(request)
+ if is_auth_error and MCPRequestHandler._target_servers_use_oauth2(
+ path=request_route,
+ mcp_servers=mcp_servers,
+ client_ip=client_ip,
):
verbose_logger.debug(
"MCP OAuth2: target server is OAuth2-mode, treating "
"Authorization as upstream OAuth2 token passthrough"
)
validated_user_api_key_auth = UserAPIKeyAuth()
+ elif is_unauthenticated:
+ # Pass-through cold-start return: per RFC 9728 / MCP
+ # Authorization spec the client completes upstream OAuth
+ # discovery and returns with ``Authorization: Bearer
+ # <upstream-token>``. For ``auth_type=none`` passthrough
+ # servers that bearer is not a LiteLLM key (auth above
+ # failed) but is meant to be forwarded upstream
+ # unchanged. Fall back to anonymous admission so the
+ # caller is not rejected for following the discovery
+ # flow without also setting ``x-litellm-api-key``.
+ # Only trigger on 401 (token unrecognized); a 403 means
+ # the key WAS recognized but is forbidden (e.g. over
+ # budget / rate limited) and must propagate so those
+ # controls are not bypassed via anonymous admission.
+ mcp_servers_from_path = _parse_mcp_server_names_from_path(
+ request_route, mcp_servers
+ )
+ if (
+ mcp_servers_from_path is not None
+ and not _has_client_supplied_mcp_auth(
+ mcp_auth_header,
+ mcp_server_auth_headers,
+ )
+ and _is_mcp_passthrough_cold_start(
+ mcp_servers_from_path, client_ip=client_ip
+ )
+ ):
+ verbose_logger.debug(
+ "MCP pass-through return: target server is "
+ "passthrough, treating Authorization as "
+ "upstream OAuth token for delegated auth"
+ )
+ validated_user_api_key_auth = UserAPIKeyAuth()
+ else:
+ raise
else:
raise
else:
- validated_user_api_key_auth = await user_api_key_auth(
- api_key=litellm_api_key, request=request
- )
+ try:
+ validated_user_api_key_auth = await user_api_key_auth(
+ api_key=litellm_api_key, request=request
+ )
+ except (HTTPException, ProxyException) as exc:
+ # Cold-start MCP OAuth discovery: RFC 9728 / MCP Authorization spec
+ # require unauthenticated requests to protected resources to receive
+ # 401 + WWW-Authenticate. Defer to _raise_preemptive_401_for_unauthenticated_servers
+ # for pass-through servers instead of surfacing a generic admission error.
+ mcp_servers_from_path = _parse_mcp_server_names_from_path(
+ request_route, mcp_servers
+ )
+ client_ip = IPAddressUtils.get_mcp_client_ip(request)
+ if (
+ mcp_servers_from_path is not None
+ and not _has_client_supplied_mcp_auth(
+ mcp_auth_header,
+ mcp_server_auth_headers,
+ )
+ and _is_litellm_auth_admission_error(exc)
+ and _is_mcp_passthrough_cold_start(
+ mcp_servers_from_path, client_ip=client_ip
+ )
+ ):
+ verbose_logger.debug(
+ "MCP pass-through cold start: deferring admission to route 401 emitter"
+ )
+ validated_user_api_key_auth = UserAPIKeyAuth()
+ else:
+ raise
return (
validated_user_api_key_auth,
@@ -262,7 +409,9 @@
return [servers_and_path]
@staticmethod
- def _target_servers_use_oauth2(path: str, mcp_servers: Optional[List[str]]) -> bool:
+ def _target_servers_use_oauth2(
+ path: str, mcp_servers: Optional[List[str]], client_ip: Optional[str]
+ ) -> bool:
"""
True only when EVERY MCP server the request targets is configured for
``auth_type == oauth2``. If any target is non-OAuth2 — or if the target
@@ -291,14 +440,16 @@
return False
for name in target_names:
- server = global_mcp_server_manager.get_mcp_server_by_name(name)
+ server = global_mcp_server_manager.get_mcp_server_by_name(
+ name, client_ip=client_ip
+ )
if server is None or server.auth_type != MCPAuth.oauth2:
return False
return True
@staticmethod
def _target_servers_delegate_auth_to_upstream(
- path: str, mcp_servers: Optional[List[str]]
+ path: str, mcp_servers: Optional[List[str]], client_ip: Optional[str]
) -> bool:
"""
True only when EVERY MCP server the request targets is configured for
@@ -328,7 +479,9 @@
return False
for name in target_names:
- server = global_mcp_server_manager.get_mcp_server_by_name(name)
+ server = global_mcp_server_manager.get_mcp_server_by_name(
+ name, client_ip=client_ip
+ )
if server is None or server.auth_type != MCPAuth.oauth2:
return False
# `is True` is intentional: opt-in must be an explicit boolean
@@ -1035,22 +1188,21 @@
)
return []
- # Sentinel stored in cache when an org has no object_permission, so we
- # don't re-query the DB on every MCP request for that org.
- _ORG_NO_PERMISSION_SENTINEL = "__org_no_mcp_permission__"
-
@staticmethod
async def _get_org_object_permission(
user_api_key_auth: Optional[UserAPIKeyAuth] = None,
):
"""
- Get org object_permission, using user_api_key_cache to avoid DB hits on every request.
-
- Caches both positive results and the absence of an object_permission so that orgs
- with no MCP permissions configured (the common default) do not trigger a DB query
- on every request.
+ Get org object_permission via the established ``get_org_object`` /
+ ``get_object_permission`` helpers so MCP requests share the same
+ ``user_api_key_cache`` entries as the rest of the proxy.
"""
- from litellm.proxy.proxy_server import prisma_client, user_api_key_cache
+ from litellm.proxy.auth.auth_checks import get_object_permission, get_org_object
+ from litellm.proxy.proxy_server import (
+ prisma_client,
+ proxy_logging_obj,
+ user_api_key_cache,
+ )
if not user_api_key_auth or not user_api_key_auth.org_id:
return None
@@ -1059,45 +1211,25 @@
verbose_logger.debug("prisma_client is None")
return None
- org_id = user_api_key_auth.org_id
- cache_key = f"org_object_permission:{org_id}"
-
- from litellm.proxy._types import LiteLLM_ObjectPermissionTable
-
try:
- cached = await user_api_key_cache.async_get_cache(key=cache_key)
- if cached is not None:
- # Sentinel means the DB confirmed no object_permission for this org
- if cached == MCPRequestHandler._ORG_NO_PERMISSION_SENTINEL:
- return None
- # Redis deserialises to a plain dict; reconstruct the Pydantic model
- # so callers can access .mcp_servers / .mcp_tool_permissions as attrs.
- if isinstance(cached, dict):
- return LiteLLM_ObjectPermissionTable(**cached)
- return cached
-
- org_row = await prisma_client.db.litellm_organizationtable.find_unique(
- where={"organization_id": org_id},
- include={"object_permission": True},
+ org_obj = await get_org_object(
+ org_id=user_api_key_auth.org_id,
+ prisma_client=prisma_client,
+ user_api_key_cache=user_api_key_cache,
+ parent_otel_span=user_api_key_auth.parent_otel_span,
+ proxy_logging_obj=proxy_logging_obj,
)
- if org_row is None or org_row.object_permission is None:
- # Cache the negative result so subsequent calls skip the DB
- await user_api_key_cache.async_set_cache(
- key=cache_key,
- value=MCPRequestHandler._ORG_NO_PERMISSION_SENTINEL,
- )
+ if org_obj is None or not org_obj.object_permission_id:
return None
- # Convert raw Prisma model → Pydantic before caching. Caching the
- # Pydantic .dict() ensures the value survives a Redis JSON round-trip
- # as a plain dict that we can reconstruct above (same pattern used by
- # get_end_user_object / get_team_object in auth_checks.py).
- obj_perm = LiteLLM_ObjectPermissionTable(**org_row.object_permission.dict())
- await user_api_key_cache.async_set_cache(
- key=cache_key, value=obj_perm.dict()
+ return await get_object_permission(
+ object_permission_id=org_obj.object_permission_id,
+ prisma_client=prisma_client,
+ user_api_key_cache=user_api_key_cache,
+ parent_otel_span=user_api_key_auth.parent_otel_span,
+ proxy_logging_obj=proxy_logging_obj,
)
- return obj_perm
except Exception as e:
verbose_logger.warning(f"Failed to get org object permission: {str(e)}")
return None
@@ -1218,16 +1350,26 @@
)
return []
+ # Sentinel stored in cache when an agent has no object_permission, so we
+ # don't re-query the DB on every MCP request for that agent.
+ _AGENT_NO_PERMISSION_SENTINEL = "__agent_no_mcp_permission__"
+
@staticmethod
async def _get_agent_object_permission(
user_api_key_auth: Optional[UserAPIKeyAuth] = None,
):
"""
- Fetch the agent's object_permission from the DB (single query).
-
- Returns the object_permission object or None.
+ Get agent object_permission via the established ``get_object_permission``
+ helper. Caches the ``agent_id -> object_permission_id`` mapping so we
+ avoid re-reading the agent row on every request, and reuses the shared
+ ``object_permission_id`` cache populated by the org / team / key paths.
"""
- from litellm.proxy.proxy_server import prisma_client
+ from litellm.proxy.auth.auth_checks import get_object_permission
+ from litellm.proxy.proxy_server import (
+ prisma_client,
+ proxy_logging_obj,
+ user_api_key_cache,
+ )
if not user_api_key_auth or not user_api_key_auth.agent_id:
return None
@@ -1236,15 +1378,42 @@
verbose_logger.debug("prisma_client is None")
return None
+ agent_id = user_api_key_auth.agent_id
+ cache_key = f"agent_object_permission_id:{agent_id}"
+
try:
- agent_row = await prisma_client.db.litellm_agentstable.find_unique(
- where={"agent_id": user_api_key_auth.agent_id},
- include={"object_permission": True},
+ object_permission_id: Optional[str] = (
+ await user_api_key_cache.async_get_cache(key=cache_key)
)
- if agent_row is None or agent_row.object_permission is None:
+
+ if object_permission_id == MCPRequestHandler._AGENT_NO_PERMISSION_SENTINEL:
return None
- return agent_row.object_permission
+ if object_permission_id is None:
+ agent_row = await prisma_client.db.litellm_agentstable.find_unique(
+ where={"agent_id": agent_id},
+ )
+ object_permission_id = (
+ getattr(agent_row, "object_permission_id", None)
+ if agent_row is not None
+ else None
+ )
+ await user_api_key_cache.async_set_cache(
+ key=cache_key,
+ value=object_permission_id
+ or MCPRequestHandler._AGENT_NO_PERMISSION_SENTINEL,
+ ttl=DEFAULT_MANAGEMENT_OBJECT_IN_MEMORY_CACHE_TTL,
+ )
+ if not object_permission_id:
+ return None
+
+ return await get_object_permission(
+ object_permission_id=object_permission_id,
+ prisma_client=prisma_client,
+ user_api_key_cache=user_api_key_cache,
+ parent_otel_span=user_api_key_auth.parent_otel_span,
+ proxy_logging_obj=proxy_logging_obj,
+ )
except Exception as e:
verbose_logger.warning(f"Failed to get agent object permission: {str(e)}")
return None
diff --git a/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py b/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py
--- a/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py
+++ b/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py
@@ -1,8 +1,11 @@
+import asyncio
import html as _html
import json
-from typing import Any, Dict, Optional
+import time
+from typing import Any, Dict, Optional, Tuple
from urllib.parse import parse_qsl, urlencode, urlparse, urlunparse
+import httpx
from fastapi import APIRouter, Form, HTTPException, Request
from fastapi.responses import HTMLResponse, JSONResponse, RedirectResponse
@@ -26,11 +29,54 @@
from litellm.types.mcp import MCPAuth
from litellm.types.mcp_server.mcp_server_manager import MCPServer
+# TTL cache for upstream OAuth metadata fetched from pass-through MCP servers.
+# Keeps us from hammering the upstream IdP on each discovery request.
+# Keyed by (server_id, resource_url) → (expires_at_epoch, payload).
+# A payload of ``None`` is a negative-result entry that prevents repeated
+# upstream fetches when the IdP consistently has no metadata to serve.
+_OAUTH_METADATA_CACHE: Dict[Tuple[str, str], Tuple[float, Optional[dict]]] = {}
+_OAUTH_METADATA_CACHE_TTL_SECONDS = 300
+_OAUTH_METADATA_NEGATIVE_CACHE_TTL_SECONDS = 60
+_OAUTH_METADATA_CACHE_MAX_SIZE = 128
+# Per-(server_id, resource_url) async locks so concurrent discovery requests
+# coalesce onto a single upstream fetch instead of issuing N parallel calls.
+_OAUTH_METADATA_FETCH_LOCKS: Dict[Tuple[str, str], asyncio.Lock] = {}
+
router = APIRouter(
tags=["mcp"],
)
+def _prune_oauth_metadata_cache(now: Optional[float] = None) -> None:
+ now = now if now is not None else time.time()
+ expired_cache_keys = [
+ cache_key
+ for cache_key, (expires_at, _payload) in _OAUTH_METADATA_CACHE.items()
+ if expires_at <= now
+ ]
+ for cache_key in expired_cache_keys:
+ _OAUTH_METADATA_CACHE.pop(cache_key, None)
+
+ if len(_OAUTH_METADATA_CACHE) > _OAUTH_METADATA_CACHE_MAX_SIZE:
+ overflow = len(_OAUTH_METADATA_CACHE) - _OAUTH_METADATA_CACHE_MAX_SIZE
+ cache_keys_by_expiry = sorted(
+ _OAUTH_METADATA_CACHE,
+ key=lambda cache_key: _OAUTH_METADATA_CACHE[cache_key][0],
+ )
+ for cache_key in cache_keys_by_expiry[:overflow]:
+ _OAUTH_METADATA_CACHE.pop(cache_key, None)
+
+ # Drop locks whose cache entry has been evicted and that aren't currently
+ # held; held locks stay so in-flight callers continue to coalesce.
+ for cache_key in list(_OAUTH_METADATA_FETCH_LOCKS):
+ if cache_key in _OAUTH_METADATA_CACHE:
+ continue
+ lock = _OAUTH_METADATA_FETCH_LOCKS.get(cache_key)
+ if lock is None or lock.locked():
+ continue
+ _OAUTH_METADATA_FETCH_LOCKS.pop(cache_key, None)
+
+
def encode_state_with_base_url(
base_url: str,
original_state: str,
@@ -125,6 +171,17 @@
return None
+def _normalize_for_token_comparison(value: Any) -> str:
+ """Stringify ``value`` for token-rule comparison.
+
+ Booleans are lower-cased so Python's ``True`` / ``False`` line up with
+ JSON-style ``"true"`` / ``"false"`` rules from admin config.
+ """
+ if isinstance(value, bool):
+ return "true" if value else "false"
+ return str(value)
+
+
def _validate_token_response(
token_response: Dict[str, Any],
validation_rules: Dict[str, Any],
@@ -136,7 +193,9 @@
``token_response["team"]["enterprise_id"]``). Top-level keys are tried first,
then dot-split traversal. All comparisons are string-coerced so that numeric
values in the response (e.g. ``"org_id": 12345``) match string rules
- (``"org_id": "12345"``).
+ (``"org_id": "12345"``). Booleans are normalised to JSON-style ``"true"`` /
+ ``"false"`` so admin rules written as ``{"verified": "true"}`` match upstream
+ responses of ``{"verified": true}``.
"""
for key, expected in validation_rules.items():
actual: Any = token_response.get(key)
@@ -163,7 +222,9 @@
),
},
)
- if str(actual) != str(expected):
+ if _normalize_for_token_comparison(actual) != _normalize_for_token_comparison(
+ expected
+ ):
raise HTTPException(
status_code=403,
detail={
@@ -400,6 +461,11 @@
headers={"Accept": "application/json"},
data=token_data,
)
+ if response is None:
+ raise HTTPException(
+ status_code=502,
+ detail="MCP upstream token endpoint returned no response",
+ )
response.raise_for_status()
token_response = response.json()
@@ -505,6 +571,11 @@
headers=headers,
json=register_data,
)
+ if response is None:
+ raise HTTPException(
+ status_code=502,
+ detail="MCP upstream registration endpoint returned no response",
+ )
response.raise_for_status()
token_response = response.json()
@@ -766,7 +837,119 @@
"""
-def _build_oauth_protected_resource_response(
+async def fetch_upstream_oauth_protected_resource(
+ mcp_server: MCPServer,
+) -> Optional[dict]:
+ """Fetch the upstream MCP server's ``.well-known/oauth-protected-resource``
+ metadata for a pass-through server.
+
+ Tries host-only first, then falls back to the RFC 9728 §3.1 path-suffix
+ form (e.g. ``https://host/.well-known/oauth-protected-resource/mcp``) to
+ cover upstreams that scope metadata per resource path.
+
+ Responses are cached in-process for ~5 minutes keyed on
+ ``(server_id, resource_url)`` so we do not hammer the IdP.
+
+ Returns the parsed JSON dict on success, or ``None`` if neither form
+ responds with a 2xx JSON payload. Raises on network/connection errors so
+ the caller can emit HTTP 502 rather than fabricate a gateway response.
+ """
+ if not mcp_server.url:
+ return None
+
+ upstream = urlparse(mcp_server.url)
+ if not upstream.scheme or not upstream.netloc:
+ return None
+
+ cache_key = (mcp_server.server_id, mcp_server.url)
+ now = time.time()
+ _prune_oauth_metadata_cache(now)
+ cached = _OAUTH_METADATA_CACHE.get(cache_key)
+ if cached is not None and cached[0] > now:
+ return cached[1]
+
+ lock = _OAUTH_METADATA_FETCH_LOCKS.setdefault(cache_key, asyncio.Lock())
+ async with lock:
+ now = time.time()
+ cached = _OAUTH_METADATA_CACHE.get(cache_key)
+ if cached is not None and cached[0] > now:
+ return cached[1]
+
+ host_base = f"{upstream.scheme}://{upstream.netloc}"
+ candidates = [f"{host_base}/.well-known/oauth-protected-resource"]
+ # RFC 9728 §3.1 path fallback
+ if upstream.path and upstream.path not in ("", "/"):
+ candidates.append(
+ f"{host_base}/.well-known/oauth-protected-resource"
+ f"{upstream.path.rstrip('/')}"
+ )
+
+ async_client = get_async_httpx_client(
+ llm_provider=httpxSpecialProvider.Oauth2Check
+ )
+
+ network_errors: list[Exception] = []
+ for candidate in candidates:
+ try:
+ response = await async_client.get(
+ candidate,
+ headers={"Accept": "application/json"},
+ )
+ except Exception as exc:
+ if is_network_error(exc):
+ network_errors.append(exc)
+ else:
+ verbose_logger.warning(
+ "MCP OAuth metadata fetch for %s raised non-transport "
+ "%s: %s — treating as no metadata for this candidate",
+ candidate,
+ type(exc).__name__,
+ exc,
+ )
+ continue
+ if response.status_code == 200:
+ try:
+ payload = response.json()
+ except Exception as exc:
+ verbose_logger.warning(
+ "MCP OAuth metadata at %s returned 200 but JSON "
+ "decode failed (%s: %s) — treating as no metadata",
+ candidate,
+ type(exc).__name__,
+ exc,
+ )
+ continue
+ if isinstance(payload, dict):
+ now = time.time()
+ _OAUTH_METADATA_CACHE[cache_key] = (
+ now + _OAUTH_METADATA_CACHE_TTL_SECONDS,
+ payload,
+ )
+ _prune_oauth_metadata_cache(now)
+ return payload
+
+ if len(network_errors) == len(candidates):
+ raise network_errors[-1]
+
+ # Negative-result caching: when no candidate yielded a usable payload,
+ # remember that for a shorter TTL so we don't re-fetch on every
+ # subsequent discovery request (and so the per-key lock can be pruned).
+ now = time.time()
+ _OAUTH_METADATA_CACHE[cache_key] = (
+ now + _OAUTH_METADATA_NEGATIVE_CACHE_TTL_SECONDS,
+ None,
+ )
+ _prune_oauth_metadata_cache(now)
+ return None
... diff truncated: showing 800 of 6799 linesYou can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 98472c9. Configure here.
Mirror the retry-loop pattern from #28888 (which fixed backend/Dockerfile, gateway/Dockerfile, and migrations/Dockerfile) into docker/Dockerfile.database. The build_docker_database_image CI job has been intermittently failing with "remote server returned error (try 'apk update')" when apk.cgr.dev flakes mid-fetch; bumping the wolfi-base SHA doesn't address the mirror, only a retry does. Same explicit-failure form as #28888: exit non-zero on the 3rd miss instead of silently succeeding because `sleep 5` was the last command in the `&& break || sleep 5` chain.
Move _raise_preemptive_401_for_unauthenticated_servers after toolset scoping in both the StreamableHTTP and SSE handlers, and add an optional allowed_server_ids parameter so passthrough/oauth2 servers that the active toolset excludes no longer trigger a spurious 401 challenge. Without this, a client targeting a toolset whose scope excludes a passthrough server could be pushed into an OAuth flow for a server it would be 403'd on immediately after authentication. Co-authored-by: Yassin Kortam <yassin@berri.ai>
| @@ -1,8 +1,8 @@ | |||
| # Base image for building | |||
| ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31 | |||
| ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:5743937d521cbeb9e8c73bf1bd7ba2589c178940eb03d7b148efecc962be8587 | |||
There was a problem hiding this comment.
WHy do we need these changes for this PR?
| nodejs \ | ||
| npm \ | ||
| libsndfile && break; \ | ||
| [ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \ |
There was a problem hiding this comment.
Again? Why do we need this change?
There was a problem hiding this comment.
Good catch, reverted
…/JWT PR These Docker changes are out of scope for the MCP OAuth passthrough + JWT auth work and duplicate the build-reliability fix already merged to litellm_internal_staging in #28888, which adds the same apk retry loop on the componentized backend/gateway/migrations Dockerfiles and also fixes the underlying nodeenv/libatomic root cause. Restoring docker/Dockerfile.database and docker/Dockerfile.non_root to the base so this PR is purely the MCP/JWT change.
…itellm_feat/v1.84.0-mcp-gateway-jwt-auth # Conflicts: # litellm/proxy/_experimental/mcp_server/server.py
The single-server pass-through path converted an upstream MCPUpstreamAuthError into an HTTPException, but list_tool_rest_api only re-raised 401s; an upstream 403 (valid token, insufficient scope) collapsed into a 200 response with error=unexpected_error, so clients never saw the status or WWW-Authenticate challenge needed to refresh scopes. Let MCPUpstreamAuthError propagate and convert it once in list_tool_rest_api so both 401 and 403 reach the client, while internal access/IP 403s keep the legacy error-dict shape.
…nset When use_x_forwarded_for is enabled but mcp_trusted_proxy_ranges is not configured, get_mcp_client_ip previously fell back to the direct peer IP. Behind an internal reverse proxy that peer is the proxy's private address, so every external caller was classified as internal and could reach MCP servers with available_on_public_internet=false. Return an empty string in that case so is_internal_ip treats the caller as external.
|
Replying to @greptile-apps[bot]'s summary comment (#issuecomment-4499663072) — the two awareness items in the latest 4/5 summary are non-actionable on this branch, and the summary itself already labels each "not a current defect." 1.
2. The lock dict is already bounded on two independent axes. The cache key is No code changes; both items are working as intended. |
|
Both explanations are sound — acknowledging and closing out both items. Item 1 ( Item 2 ( No changes needed. Both are working as intended. |
The #29809 and #29960 cherry-picks landed onto v1.88.0, which predates the per-user env-var rotation feature (#28917) and the MCP upstream-auth exception module (#28356). Git's conflict resolution bundled those intermediate-commit tests into the picked test files, where they fail against this base because the code under test does not exist here (encrypt_value_helper, the env-var rotation helper, and litellm.proxy._experimental.mcp_server.exceptions.MCPUpstreamAuthError). Remove the leaked tests; the genuine regression tests added by #29809 (OBO refresh-on-expiry) and #29960 (admin-only include_disabled_tools) stay and pass.
…9.0) (#93) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/berriai/litellm](https://images.chainguard.dev/directory/image/wolfi-base/overview) ([source](https://github.com/BerriAI/litellm)) | minor | `v1.88.1` → `v1.89.0` | --- ### Release Notes <details> <summary>BerriAI/litellm (ghcr.io/berriai/litellm)</summary> ### [`v1.89.0`](https://github.com/BerriAI/litellm/releases/tag/v1.89.0) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.89.0...v1.89.0) ##### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.89.0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** ##### What's Changed - test(responses): bump deprecated gemini-3-pro-preview to gemini-3.1-pro-preview by [@​mateo-berri](https://github.com/mateo-berri) in [#​29433](https://github.com/BerriAI/litellm/pull/29433) - fix: map mistral/ministral-8b-latest in model price map by [@​mateo-berri](https://github.com/mateo-berri) in [#​29453](https://github.com/BerriAI/litellm/pull/29453) - fix(datadog): split oversized batches on 413 instead of re-queueing forever by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29444](https://github.com/BerriAI/litellm/pull/29444) - feat(otel): allowlist team\_metadata sub-keys promoted to baggage by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29442](https://github.com/BerriAI/litellm/pull/29442) - fix: stop use\_chat\_completions\_api flag from leaking into provider request body by [@​mateo-berri](https://github.com/mateo-berri) in [#​29447](https://github.com/BerriAI/litellm/pull/29447) - fix(anthropic, fireworks): inline legacy $ref defs in tool schemas by [@​milan-berri](https://github.com/milan-berri) in [#​28646](https://github.com/BerriAI/litellm/pull/28646) - fix(proxy): omit OpenAI \[DONE] on google-genai streamGenerateContent by [@​Sameerlite](https://github.com/Sameerlite) in [#​29426](https://github.com/BerriAI/litellm/pull/29426) - ci(release): create stable/X.Y.x line branch on X.Y.0 tags by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29457](https://github.com/BerriAI/litellm/pull/29457) - fix(vector-stores): support engines URL for Vertex AI Search by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​27885](https://github.com/BerriAI/litellm/pull/27885) - fix(ui): render caller-supplied filter options in caller order by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29462](https://github.com/BerriAI/litellm/pull/29462) - fix(batches): skip unnecessary batch input file reads by [@​Sameerlite](https://github.com/Sameerlite) in [#​29114](https://github.com/BerriAI/litellm/pull/29114) - docs(agents): clarify when to create new test files by [@​Sameerlite](https://github.com/Sameerlite) in [#​29472](https://github.com/BerriAI/litellm/pull/29472) - Litellm OSS Staging by [@​Sameerlite](https://github.com/Sameerlite) in [#​29161](https://github.com/BerriAI/litellm/pull/29161) - fix(mcp): clear allowed\_tools and tool overrides on MCP server edit by [@​Sameerlite](https://github.com/Sameerlite) in [#​29411](https://github.com/BerriAI/litellm/pull/29411) - Litellm OSS Staging 010626 by [@​Sameerlite](https://github.com/Sameerlite) in [#​29422](https://github.com/BerriAI/litellm/pull/29422) - fix(ci): make CircleCI rerun-failed-tests collect tests when 2+ test files fail by [@​mateo-berri](https://github.com/mateo-berri) in [#​29475](https://github.com/BerriAI/litellm/pull/29475) - feat(a2a): watsonx Orchestrate agent provider by [@​Sameerlite](https://github.com/Sameerlite) in [#​29410](https://github.com/BerriAI/litellm/pull/29410) - fix(azure\_ai): strip tool-level extra fields on 400 and retry by [@​Sameerlite](https://github.com/Sameerlite) in [#​29479](https://github.com/BerriAI/litellm/pull/29479) - fix(docs): remove fixed dimensions from README hero image by [@​mateo-berri](https://github.com/mateo-berri) in [#​29496](https://github.com/BerriAI/litellm/pull/29496) - Litellm oss staging by [@​Sameerlite](https://github.com/Sameerlite) in [#​29492](https://github.com/BerriAI/litellm/pull/29492) - fix: small CLAUDE.md nits by [@​mateo-berri](https://github.com/mateo-berri) in [#​29504](https://github.com/BerriAI/litellm/pull/29504) - Add MCP semantic conventions to otelv2 by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29468](https://github.com/BerriAI/litellm/pull/29468) - fix(passthrough): emit otel guardrail span when a guardrail blocks by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29470](https://github.com/BerriAI/litellm/pull/29470) - fix(proxy): strip NUL bytes from spend log payloads to prevent PostgreSQL 22P05 by [@​milan-berri](https://github.com/milan-berri) in [#​29515](https://github.com/BerriAI/litellm/pull/29515) - \[internal copy of [#​28008](https://github.com/BerriAI/litellm/issues/28008)] Support MCP OAuth passthrough and issuer-scoped JWT auth by [@​mateo-berri](https://github.com/mateo-berri) in [#​28356](https://github.com/BerriAI/litellm/pull/28356) - feat(vector-stores): forward per-request params to Vertex AI Search by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29459](https://github.com/BerriAI/litellm/pull/29459) - feat(proxy): add per-MCP-server RPM rate limiting for keys and teams by [@​Sameerlite](https://github.com/Sameerlite) in [#​29482](https://github.com/BerriAI/litellm/pull/29482) - fix(tests): drop module-level test calls that break local\_testing collection by [@​mateo-berri](https://github.com/mateo-berri) in [#​29520](https://github.com/BerriAI/litellm/pull/29520) - feat(agents): add LangFlow agent provider with A2A session bridging by [@​Sameerlite](https://github.com/Sameerlite) in [#​28963](https://github.com/BerriAI/litellm/pull/28963) - fix(ui/agents): make A2A skill tags enterable and validated by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29512](https://github.com/BerriAI/litellm/pull/29512) - \[internal copy of [#​29232](https://github.com/BerriAI/litellm/issues/29232)] feat: route future Claude models to Anthropic provider via pattern matching by [@​mateo-berri](https://github.com/mateo-berri) in [#​29239](https://github.com/BerriAI/litellm/pull/29239) - fix(tests): drop import-time completion call in test\_register\_model by [@​mateo-berri](https://github.com/mateo-berri) in [#​29521](https://github.com/BerriAI/litellm/pull/29521) - test: stabilize batch VCR coverage and stop live upload/network leaks by [@​mateo-berri](https://github.com/mateo-berri) in [#​29477](https://github.com/BerriAI/litellm/pull/29477) - \[internal copy of [#​29003](https://github.com/BerriAI/litellm/issues/29003)] fix(vertex\_ai): use user-supplied api\_base as is for Model Garden OpenAI-compat path by [@​mateo-berri](https://github.com/mateo-berri) in [#​29530](https://github.com/BerriAI/litellm/pull/29530) - feat(proxy): native /health/drain preStop hook for graceful shutdown by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29439](https://github.com/BerriAI/litellm/pull/29439) - fix(auth): preserve 401 status for expired JWTs in OTel traces by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29510](https://github.com/BerriAI/litellm/pull/29510) - fix(otel): capture 401 error details in management endpoint spans by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29535](https://github.com/BerriAI/litellm/pull/29535) - test(proxy/utils): pin bottom-of-file helper behavior by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29509](https://github.com/BerriAI/litellm/pull/29509) - test(proxy/utils): pin PrismaClient and spend-update behavior by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29488](https://github.com/BerriAI/litellm/pull/29488) - test(proxy/utils): pin ProxyLogging behavior by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29485](https://github.com/BerriAI/litellm/pull/29485) - fix: missing span for guardrail passthrough by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29552](https://github.com/BerriAI/litellm/pull/29552) - fix(auth): let internal users view search tools by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29542](https://github.com/BerriAI/litellm/pull/29542) - fix: missing mcp otel attributes by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29554](https://github.com/BerriAI/litellm/pull/29554) - fix(proxy): resolve managed video model ids for auth by [@​shivamrawat1](https://github.com/shivamrawat1) in [#​29545](https://github.com/BerriAI/litellm/pull/29545) - fix(key\_generate): allow team members to create keys on org-scoped teams by [@​milan-berri](https://github.com/milan-berri) in [#​29310](https://github.com/BerriAI/litellm/pull/29310) - test(pass-through): move Gemini pass-through tests to gemini-3.1-flash-lite by [@​mateo-berri](https://github.com/mateo-berri) in [#​29595](https://github.com/BerriAI/litellm/pull/29595) - Litellm oss staging 030626 by [@​Sameerlite](https://github.com/Sameerlite) in [#​29578](https://github.com/BerriAI/litellm/pull/29578) - Fix : a2a bugs 030626 by [@​Sameerlite](https://github.com/Sameerlite) in [#​29566](https://github.com/BerriAI/litellm/pull/29566) - \[internal copy of [#​29533](https://github.com/BerriAI/litellm/issues/29533)] fix(anthropic/adapter): emit thinking block for reasoning\_content-only streaming chunks by [@​mateo-berri](https://github.com/mateo-berri) in [#​29600](https://github.com/BerriAI/litellm/pull/29600) - ci: reproduce default-Windows wheel install to guard MAX\_PATH by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29597](https://github.com/BerriAI/litellm/pull/29597) - fix(vertex): strip output\_config.effort for Vertex Claude models that reject it (Haiku 4.5) by [@​mateo-berri](https://github.com/mateo-berri) in [#​29585](https://github.com/BerriAI/litellm/pull/29585) - Litellm websocket improvements by [@​Sameerlite](https://github.com/Sameerlite) in [#​29563](https://github.com/BerriAI/litellm/pull/29563) - feat(arize/phoenix): OpenInference rendering parity — tool\_calls, cost, passthrough I/O, session/user, multimodal, cache tokens by [@​milan-berri](https://github.com/milan-berri) in [#​28800](https://github.com/BerriAI/litellm/pull/28800) - \[internal copy of [#​29550](https://github.com/BerriAI/litellm/issues/29550)] fix: passthrough endpoints duplicate logs by [@​mateo-berri](https://github.com/mateo-berri) in [#​29598](https://github.com/BerriAI/litellm/pull/29598) - fix(ci): keep coverage rename green when a parallel node runs no tests by [@​mateo-berri](https://github.com/mateo-berri) in [#​29608](https://github.com/BerriAI/litellm/pull/29608) - test(vcr): close out the remaining VCR live-call leaks by [@​mateo-berri](https://github.com/mateo-berri) in [#​29603](https://github.com/BerriAI/litellm/pull/29603) - fix(key\_generate): exempt UI/CLI session tokens from the budget ceiling for team keys by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29612](https://github.com/BerriAI/litellm/pull/29612) - fix(realtime): allow null transcripts in stream logging payloads by [@​milan-berri](https://github.com/milan-berri) in [#​29625](https://github.com/BerriAI/litellm/pull/29625) - build(ui): migrate eslint to flat config + bump eslint-config-next to 16 by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29626](https://github.com/BerriAI/litellm/pull/29626) - fix(key\_generate): scope session-token team-key budget exemption to caller-supplied team\_id by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29641](https://github.com/BerriAI/litellm/pull/29641) - fix(proxy): disable proxy buffering on streaming SSE responses by [@​mateo-berri](https://github.com/mateo-berri) in [#​29557](https://github.com/BerriAI/litellm/pull/29557) - fix(mcp): gate /public/mcp\_hub strictly on litellm.public\_mcp\_servers by [@​michelligabriele](https://github.com/michelligabriele) in [#​27764](https://github.com/BerriAI/litellm/pull/27764) - ci(ui): frontend-lint job enforcing prettier + eslint on changed files by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29633](https://github.com/BerriAI/litellm/pull/29633) - fix(gemini): googleSearch + server-side tools and googleMaps JSON schema by [@​Sameerlite](https://github.com/Sameerlite) in [#​29582](https://github.com/BerriAI/litellm/pull/29582) - fix(proxy): passthrough 404 when SERVER\_ROOT\_PATH is set by [@​Sameerlite](https://github.com/Sameerlite) in [#​29658](https://github.com/BerriAI/litellm/pull/29658) - fix(gemini-realtime): use GA event names for Pipecat 1.3.x compatibility by [@​Sameerlite](https://github.com/Sameerlite) in [#​29662](https://github.com/BerriAI/litellm/pull/29662) - Litellm oss staging 040626 by [@​Sameerlite](https://github.com/Sameerlite) in [#​29671](https://github.com/BerriAI/litellm/pull/29671) - style(ui): prettier formatting pass over the dashboard by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29622](https://github.com/BerriAI/litellm/pull/29622) - chore: ignore prettier dashboard reformat in git blame by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29695](https://github.com/BerriAI/litellm/pull/29695) - fix(helm): Enable Backend Deployment to mount Gateway config.yaml by [@​tin-berri](https://github.com/tin-berri) in [#​29605](https://github.com/BerriAI/litellm/pull/29605) - \[internal copy of [#​29277](https://github.com/BerriAI/litellm/issues/29277)] fix(proxy): add default=None to LiteLLM\_TeamMembership.litellm\_budget\_table by [@​mateo-berri](https://github.com/mateo-berri) in [#​29684](https://github.com/BerriAI/litellm/pull/29684) - test: make custom\_tokenizer proxy tests hermetic by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29643](https://github.com/BerriAI/litellm/pull/29643) - test(proxy): stop running real-DB tests in GitHub Actions unit jobs by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29700](https://github.com/BerriAI/litellm/pull/29700) - chore(ui): remove the bare-fetch lint rule by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29712](https://github.com/BerriAI/litellm/pull/29712) - Litellm jwt mapping virtualkeys by [@​shivamrawat1](https://github.com/shivamrawat1) in [#​28510](https://github.com/BerriAI/litellm/pull/28510) - refactor(ui): shared HTTP client + location-pinned fetch() lint rule by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29723](https://github.com/BerriAI/litellm/pull/29723) - fix(proxy): stop team BYOK model name corruption on model edit by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29731](https://github.com/BerriAI/litellm/pull/29731) - \[internal copy of [#​29511](https://github.com/BerriAI/litellm/issues/29511)] feat(guardrails): add sensitive data routing to on-premise models by [@​mateo-berri](https://github.com/mateo-berri) in [#​29531](https://github.com/BerriAI/litellm/pull/29531) - fix(proxy/hooks): populate llm\_provider on internal rate-limit errors by [@​mateo-berri](https://github.com/mateo-berri) in [#​27707](https://github.com/BerriAI/litellm/pull/27707) - fix(vertex/anthropic): handle namespace tools and strip client\_metadata for codex compatibility by [@​Sameerlite](https://github.com/Sameerlite) in [#​29489](https://github.com/BerriAI/litellm/pull/29489) - Support OAuth M2M for Databricks Apps A2A agents by [@​mateo-berri](https://github.com/mateo-berri) in [#​29586](https://github.com/BerriAI/litellm/pull/29586) - fix: small CLAUDE.md nit by [@​mateo-berri](https://github.com/mateo-berri) in [#​29749](https://github.com/BerriAI/litellm/pull/29749) - fix(anthropic): route Claude Opus 4.8 through adaptive thinking by [@​mateo-berri](https://github.com/mateo-berri) in [#​29702](https://github.com/BerriAI/litellm/pull/29702) - fix(proxy): persist oauth2\_flow on MCP server registration by [@​michelligabriele](https://github.com/michelligabriele) in [#​29690](https://github.com/BerriAI/litellm/pull/29690) - \[internal copy of [#​27491](https://github.com/BerriAI/litellm/issues/27491)] fix(realtime): Fix Realtime Audio Token Cost Tracking by [@​mateo-berri](https://github.com/mateo-berri) in [#​29722](https://github.com/BerriAI/litellm/pull/29722) - fix(galileo): use ingest traces API and standard logging payload by [@​Sameerlite](https://github.com/Sameerlite) in [#​29651](https://github.com/BerriAI/litellm/pull/29651) - fix(auth): expand all-team-models sentinel in can\_key\_call\_model for batch validation by [@​Sameerlite](https://github.com/Sameerlite) in [#​29746](https://github.com/BerriAI/litellm/pull/29746) - test(vcr): stop refreshing cassette TTL on read so cassettes lapse after 24h by [@​mateo-berri](https://github.com/mateo-berri) in [#​29784](https://github.com/BerriAI/litellm/pull/29784) - test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound by [@​mateo-berri](https://github.com/mateo-berri) in [#​29787](https://github.com/BerriAI/litellm/pull/29787) - fix(ui): route MCP playground auth by oauth2 mode instead of token\_url by [@​tin-berri](https://github.com/tin-berri) in [#​29714](https://github.com/BerriAI/litellm/pull/29714) - refactor(ui): centralize proxy base URL resolution into tested resolver by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29793](https://github.com/BerriAI/litellm/pull/29793) - Litellm oss staging 050626 by [@​Sameerlite](https://github.com/Sameerlite) in [#​29774](https://github.com/BerriAI/litellm/pull/29774) - test(google): add google-genai SDK proxy integration tests by [@​Sameerlite](https://github.com/Sameerlite) in [#​29781](https://github.com/BerriAI/litellm/pull/29781) - fix(jwt): use resolved DB user\_id for spend on legacy email match by [@​milan-berri](https://github.com/milan-berri) in [#​29217](https://github.com/BerriAI/litellm/pull/29217) - feat(ui): generate dashboard API types from the proxy OpenAPI spec by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29816](https://github.com/BerriAI/litellm/pull/29816) - fix(proxy): drop deleted team BYOK model name from team.models by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29820](https://github.com/BerriAI/litellm/pull/29820) - feat(mcp): per-server env vars with global + per-user scopes by [@​mateo-berri](https://github.com/mateo-berri) in [#​28917](https://github.com/BerriAI/litellm/pull/28917) - refactor(ui): route behavior-preserving networking calls through apiClient by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29806](https://github.com/BerriAI/litellm/pull/29806) - fix(mcp): persist Tools-tab MCP OAuth token to DB by [@​tin-berri](https://github.com/tin-berri) in [#​29809](https://github.com/BerriAI/litellm/pull/29809) - fix(ui): require new expiration when regenerating an expired key by [@​milan-berri](https://github.com/milan-berri) in [#​29838](https://github.com/BerriAI/litellm/pull/29838) - refactor(ui): route query-building networking calls through apiClient by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29815](https://github.com/BerriAI/litellm/pull/29815) - Make the image-gen record/replay proxy report cache mode and per-request HIT/MISS by [@​mateo-berri](https://github.com/mateo-berri) in [#​29802](https://github.com/BerriAI/litellm/pull/29802) - feat(proxy): hot-reload .env in dev when running with --reload by [@​mateo-berri](https://github.com/mateo-berri) in [#​29783](https://github.com/BerriAI/litellm/pull/29783) - fix(ui): stop MCP playground tool calls from sending twice by [@​tin-berri](https://github.com/tin-berri) in [#​29821](https://github.com/BerriAI/litellm/pull/29821) - feat(fal\_ai): add Nano Banana / Gemini 2.5 Flash Image generation support by [@​mateo-berri](https://github.com/mateo-berri) in [#​29798](https://github.com/BerriAI/litellm/pull/29798) - Title: Fix managed batch cancel credential resolution by [@​shivamrawat1](https://github.com/shivamrawat1) in [#​29734](https://github.com/BerriAI/litellm/pull/29734) - Title: fix(proxy): resolve vector store file list credentials from team deployments by [@​shivamrawat1](https://github.com/shivamrawat1) in [#​29739](https://github.com/BerriAI/litellm/pull/29739) - refactor: convert AWS and GCP Terraform stacks into reusable modules … by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​28103](https://github.com/BerriAI/litellm/pull/28103) - chore(ui): build ui for release by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29853](https://github.com/BerriAI/litellm/pull/29853) - fix(terraform/gcp): prompt for image\_registry in DeployStack one-click by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29852](https://github.com/BerriAI/litellm/pull/29852) - fix(terraform/gcp): abandon SQL user on destroy by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29855](https://github.com/BerriAI/litellm/pull/29855) - Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic by [@​mateo-berri](https://github.com/mateo-berri) in [#​29847](https://github.com/BerriAI/litellm/pull/29847) - chore(deps): bump deps by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29860](https://github.com/BerriAI/litellm/pull/29860) - chore(ci): promote internal staging to main by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29861](https://github.com/BerriAI/litellm/pull/29861) - fix: 400 on Anthropic context overflow; seed identity on failed auth by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29848](https://github.com/BerriAI/litellm/pull/29848) - chore(ci): promote internal staging to main by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29862](https://github.com/BerriAI/litellm/pull/29862) - chore(release): patch v1.89.0-rc.1 with [#​30064](https://github.com/BerriAI/litellm/issues/30064) (Claude Fable 5) for v1.89.0-rc.2 by [@​mateo-berri](https://github.com/mateo-berri) in [#​30143](https://github.com/BerriAI/litellm/pull/30143) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.0...v1.89.0> ### [`v1.89.0`](https://github.com/BerriAI/litellm/releases/tag/v1.89.0) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.2...v1.89.0) ##### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.89.0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** ##### What's Changed - test(responses): bump deprecated gemini-3-pro-preview to gemini-3.1-pro-preview by [@​mateo-berri](https://github.com/mateo-berri) in [#​29433](https://github.com/BerriAI/litellm/pull/29433) - fix: map mistral/ministral-8b-latest in model price map by [@​mateo-berri](https://github.com/mateo-berri) in [#​29453](https://github.com/BerriAI/litellm/pull/29453) - fix(datadog): split oversized batches on 413 instead of re-queueing forever by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29444](https://github.com/BerriAI/litellm/pull/29444) - feat(otel): allowlist team\_metadata sub-keys promoted to baggage by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29442](https://github.com/BerriAI/litellm/pull/29442) - fix: stop use\_chat\_completions\_api flag from leaking into provider request body by [@​mateo-berri](https://github.com/mateo-berri) in [#​29447](https://github.com/BerriAI/litellm/pull/29447) - fix(anthropic, fireworks): inline legacy $ref defs in tool schemas by [@​milan-berri](https://github.com/milan-berri) in [#​28646](https://github.com/BerriAI/litellm/pull/28646) - fix(proxy): omit OpenAI \[DONE] on google-genai streamGenerateContent by [@​Sameerlite](https://github.com/Sameerlite) in [#​29426](https://github.com/BerriAI/litellm/pull/29426) - ci(release): create stable/X.Y.x line branch on X.Y.0 tags by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29457](https://github.com/BerriAI/litellm/pull/29457) - fix(vector-stores): support engines URL for Vertex AI Search by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​27885](https://github.com/BerriAI/litellm/pull/27885) - fix(ui): render caller-supplied filter options in caller order by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29462](https://github.com/BerriAI/litellm/pull/29462) - fix(batches): skip unnecessary batch input file reads by [@​Sameerlite](https://github.com/Sameerlite) in [#​29114](https://github.com/BerriAI/litellm/pull/29114) - docs(agents): clarify when to create new test files by [@​Sameerlite](https://github.com/Sameerlite) in [#​29472](https://github.com/BerriAI/litellm/pull/29472) - Litellm OSS Staging by [@​Sameerlite](https://github.com/Sameerlite) in [#​29161](https://github.com/BerriAI/litellm/pull/29161) - fix(mcp): clear allowed\_tools and tool overrides on MCP server edit by [@​Sameerlite](https://github.com/Sameerlite) in [#​29411](https://github.com/BerriAI/litellm/pull/29411) - Litellm OSS Staging 010626 by [@​Sameerlite](https://github.com/Sameerlite) in [#​29422](https://github.com/BerriAI/litellm/pull/29422) - fix(ci): make CircleCI rerun-failed-tests collect tests when 2+ test files fail by [@​mateo-berri](https://github.com/mateo-berri) in [#​29475](https://github.com/BerriAI/litellm/pull/29475) - feat(a2a): watsonx Orchestrate agent provider by [@​Sameerlite](https://github.com/Sameerlite) in [#​29410](https://github.com/BerriAI/litellm/pull/29410) - fix(azure\_ai): strip tool-level extra fields on 400 and retry by [@​Sameerlite](https://github.com/Sameerlite) in [#​29479](https://github.com/BerriAI/litellm/pull/29479) - fix(docs): remove fixed dimensions from README hero image by [@​mateo-berri](https://github.com/mateo-berri) in [#​29496](https://github.com/BerriAI/litellm/pull/29496) - Litellm oss staging by [@​Sameerlite](https://github.com/Sameerlite) in [#​29492](https://github.com/BerriAI/litellm/pull/29492) - fix: small CLAUDE.md nits by [@​mateo-berri](https://github.com/mateo-berri) in [#​29504](https://github.com/BerriAI/litellm/pull/29504) - Add MCP semantic conventions to otelv2 by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29468](https://github.com/BerriAI/litellm/pull/29468) - fix(passthrough): emit otel guardrail span when a guardrail blocks by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29470](https://github.com/BerriAI/litellm/pull/29470) - fix(proxy): strip NUL bytes from spend log payloads to prevent PostgreSQL 22P05 by [@​milan-berri](https://github.com/milan-berri) in [#​29515](https://github.com/BerriAI/litellm/pull/29515) - \[internal copy of [#​28008](https://github.com/BerriAI/litellm/issues/28008)] Support MCP OAuth passthrough and issuer-scoped JWT auth by [@​mateo-berri](https://github.com/mateo-berri) in [#​28356](https://github.com/BerriAI/litellm/pull/28356) - feat(vector-stores): forward per-request params to Vertex AI Search by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29459](https://github.com/BerriAI/litellm/pull/29459) - feat(proxy): add per-MCP-server RPM rate limiting for keys and teams by [@​Sameerlite](https://github.com/Sameerlite) in [#​29482](https://github.com/BerriAI/litellm/pull/29482) - fix(tests): drop module-level test calls that break local\_testing collection by [@​mateo-berri](https://github.com/mateo-berri) in [#​29520](https://github.com/BerriAI/litellm/pull/29520) - feat(agents): add LangFlow agent provider with A2A session bridging by [@​Sameerlite](https://github.com/Sameerlite) in [#​28963](https://github.com/BerriAI/litellm/pull/28963) - fix(ui/agents): make A2A skill tags enterable and validated by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29512](https://github.com/BerriAI/litellm/pull/29512) - \[internal copy of [#​29232](https://github.com/BerriAI/litellm/issues/29232)] feat: route future Claude models to Anthropic provider via pattern matching by [@​mateo-berri](https://github.com/mateo-berri) in [#​29239](https://github.com/BerriAI/litellm/pull/29239) - fix(tests): drop import-time completion call in test\_register\_model by [@​mateo-berri](https://github.com/mateo-berri) in [#​29521](https://github.com/BerriAI/litellm/pull/29521) - test: stabilize batch VCR coverage and stop live upload/network leaks by [@​mateo-berri](https://github.com/mateo-berri) in [#​29477](https://github.com/BerriAI/litellm/pull/29477) - \[internal copy of [#​29003](https://github.com/BerriAI/litellm/issues/29003)] fix(vertex\_ai): use user-supplied api\_base as is for Model Garden OpenAI-compat path by [@​mateo-berri](https://github.com/mateo-berri) in [#​29530](https://github.com/BerriAI/litellm/pull/29530) - feat(proxy): native /health/drain preStop hook for graceful shutdown by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29439](https://github.com/BerriAI/litellm/pull/29439) - fix(auth): preserve 401 status for expired JWTs in OTel traces by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29510](https://github.com/BerriAI/litellm/pull/29510) - fix(otel): capture 401 error details in management endpoint spans by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29535](https://github.com/BerriAI/litellm/pull/29535) - test(proxy/utils): pin bottom-of-file helper behavior by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29509](https://github.com/BerriAI/litellm/pull/29509) - test(proxy/utils): pin PrismaClient and spend-update behavior by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29488](https://github.com/BerriAI/litellm/pull/29488) - test(proxy/utils): pin ProxyLogging behavior by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29485](https://github.com/BerriAI/litellm/pull/29485) - fix: missing span for guardrail passthrough by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29552](https://github.com/BerriAI/litellm/pull/29552) - fix(auth): let internal users view search tools by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29542](https://github.com/BerriAI/litellm/pull/29542) - fix: missing mcp otel attributes by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29554](https://github.com/BerriAI/litellm/pull/29554) - fix(proxy): resolve managed video model ids for auth by [@​shivamrawat1](https://github.com/shivamrawat1) in [#​29545](https://github.com/BerriAI/litellm/pull/29545) - fix(key\_generate): allow team members to create keys on org-scoped teams by [@​milan-berri](https://github.com/milan-berri) in [#​29310](https://github.com/BerriAI/litellm/pull/29310) - test(pass-through): move Gemini pass-through tests to gemini-3.1-flash-lite by [@​mateo-berri](https://github.com/mateo-berri) in [#​29595](https://github.com/BerriAI/litellm/pull/29595) - Litellm oss staging 030626 by [@​Sameerlite](https://github.com/Sameerlite) in [#​29578](https://github.com/BerriAI/litellm/pull/29578) - Fix : a2a bugs 030626 by [@​Sameerlite](https://github.com/Sameerlite) in [#​29566](https://github.com/BerriAI/litellm/pull/29566) - \[internal copy of [#​29533](https://github.com/BerriAI/litellm/issues/29533)] fix(anthropic/adapter): emit thinking block for reasoning\_content-only streaming chunks by [@​mateo-berri](https://github.com/mateo-berri) in [#​29600](https://github.com/BerriAI/litellm/pull/29600) - ci: reproduce default-Windows wheel install to guard MAX\_PATH by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29597](https://github.com/BerriAI/litellm/pull/29597) - fix(vertex): strip output\_config.effort for Vertex Claude models that reject it (Haiku 4.5) by [@​mateo-berri](https://github.com/mateo-berri) in [#​29585](https://github.com/BerriAI/litellm/pull/29585) - Litellm websocket improvements by [@​Sameerlite](https://github.com/Sameerlite) in [#​29563](https://github.com/BerriAI/litellm/pull/29563) - feat(arize/phoenix): OpenInference rendering parity — tool\_calls, cost, passthrough I/O, session/user, multimodal, cache tokens by [@​milan-berri](https://github.com/milan-berri) in [#​28800](https://github.com/BerriAI/litellm/pull/28800) - \[internal copy of [#​29550](https://github.com/BerriAI/litellm/issues/29550)] fix: passthrough endpoints duplicate logs by [@​mateo-berri](https://github.com/mateo-berri) in [#​29598](https://github.com/BerriAI/litellm/pull/29598) - fix(ci): keep coverage rename green when a parallel node runs no tests by [@​mateo-berri](https://github.com/mateo-berri) in [#​29608](https://github.com/BerriAI/litellm/pull/29608) - test(vcr): close out the remaining VCR live-call leaks by [@​mateo-berri](https://github.com/mateo-berri) in [#​29603](https://github.com/BerriAI/litellm/pull/29603) - fix(key\_generate): exempt UI/CLI session tokens from the budget ceiling for team keys by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29612](https://github.com/BerriAI/litellm/pull/29612) - fix(realtime): allow null transcripts in stream logging payloads by [@​milan-berri](https://github.com/milan-berri) in [#​29625](https://github.com/BerriAI/litellm/pull/29625) - build(ui): migrate eslint to flat config + bump eslint-config-next to 16 by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29626](https://github.com/BerriAI/litellm/pull/29626) - fix(key\_generate): scope session-token team-key budget exemption to caller-supplied team\_id by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29641](https://github.com/BerriAI/litellm/pull/29641) - fix(proxy): disable proxy buffering on streaming SSE responses by [@​mateo-berri](https://github.com/mateo-berri) in [#​29557](https://github.com/BerriAI/litellm/pull/29557) - fix(mcp): gate /public/mcp\_hub strictly on litellm.public\_mcp\_servers by [@​michelligabriele](https://github.com/michelligabriele) in [#​27764](https://github.com/BerriAI/litellm/pull/27764) - ci(ui): frontend-lint job enforcing prettier + eslint on changed files by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29633](https://github.com/BerriAI/litellm/pull/29633) - fix(gemini): googleSearch + server-side tools and googleMaps JSON schema by [@​Sameerlite](https://github.com/Sameerlite) in [#​29582](https://github.com/BerriAI/litellm/pull/29582) - fix(proxy): passthrough 404 when SERVER\_ROOT\_PATH is set by [@​Sameerlite](https://github.com/Sameerlite) in [#​29658](https://github.com/BerriAI/litellm/pull/29658) - fix(gemini-realtime): use GA event names for Pipecat 1.3.x compatibility by [@​Sameerlite](https://github.com/Sameerlite) in [#​29662](https://github.com/BerriAI/litellm/pull/29662) - Litellm oss staging 040626 by [@​Sameerlite](https://github.com/Sameerlite) in [#​29671](https://github.com/BerriAI/litellm/pull/29671) - style(ui): prettier formatting pass over the dashboard by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29622](https://github.com/BerriAI/litellm/pull/29622) - chore: ignore prettier dashboard reformat in git blame by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29695](https://github.com/BerriAI/litellm/pull/29695) - fix(helm): Enable Backend Deployment to mount Gateway config.yaml by [@​tin-berri](https://github.com/tin-berri) in [#​29605](https://github.com/BerriAI/litellm/pull/29605) - \[internal copy of [#​29277](https://github.com/BerriAI/litellm/issues/29277)] fix(proxy): add default=None to LiteLLM\_TeamMembership.litellm\_budget\_table by [@​mateo-berri](https://github.com/mateo-berri) in [#​29684](https://github.com/BerriAI/litellm/pull/29684) - test: make custom\_tokenizer proxy tests hermetic by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29643](https://github.com/BerriAI/litellm/pull/29643) - test(proxy): stop running real-DB tests in GitHub Actions unit jobs by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29700](https://github.com/BerriAI/litellm/pull/29700) - chore(ui): remove the bare-fetch lint rule by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29712](https://github.com/BerriAI/litellm/pull/29712) - Litellm jwt mapping virtualkeys by [@​shivamrawat1](https://github.com/shivamrawat1) in [#​28510](https://github.com/BerriAI/litellm/pull/28510) - refactor(ui): shared HTTP client + location-pinned fetch() lint rule by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29723](https://github.com/BerriAI/litellm/pull/29723) - fix(proxy): stop team BYOK model name corruption on model edit by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29731](https://github.com/BerriAI/litellm/pull/29731) - \[internal copy of [#​29511](https://github.com/BerriAI/litellm/issues/29511)] feat(guardrails): add sensitive data routing to on-premise models by [@​mateo-berri](https://github.com/mateo-berri) in [#​29531](https://github.com/BerriAI/litellm/pull/29531) - fix(proxy/hooks): populate llm\_provider on internal rate-limit errors by [@​mateo-berri](https://github.com/mateo-berri) in [#​27707](https://github.com/BerriAI/litellm/pull/27707) - fix(vertex/anthropic): handle namespace tools and strip client\_metadata for codex compatibility by [@​Sameerlite](https://github.com/Sameerlite) in [#​29489](https://github.com/BerriAI/litellm/pull/29489) - Support OAuth M2M for Databricks Apps A2A agents by [@​mateo-berri](https://github.com/mateo-berri) in [#​29586](https://github.com/BerriAI/litellm/pull/29586) - fix: small CLAUDE.md nit by [@​mateo-berri](https://github.com/mateo-berri) in [#​29749](https://github.com/BerriAI/litellm/pull/29749) - fix(anthropic): route Claude Opus 4.8 through adaptive thinking by [@​mateo-berri](https://github.com/mateo-berri) in [#​29702](https://github.com/BerriAI/litellm/pull/29702) - fix(proxy): persist oauth2\_flow on MCP server registration by [@​michelligabriele](https://github.com/michelligabriele) in [#​29690](https://github.com/BerriAI/litellm/pull/29690) - \[internal copy of [#​27491](https://github.com/BerriAI/litellm/issues/27491)] fix(realtime): Fix Realtime Audio Token Cost Tracking by [@​mateo-berri](https://github.com/mateo-berri) in [#​29722](https://github.com/BerriAI/litellm/pull/29722) - fix(galileo): use ingest traces API and standard logging payload by [@​Sameerlite](https://github.com/Sameerlite) in [#​29651](https://github.com/BerriAI/litellm/pull/29651) - fix(auth): expand all-team-models sentinel in can\_key\_call\_model for batch validation by [@​Sameerlite](https://github.com/Sameerlite) in [#​29746](https://github.com/BerriAI/litellm/pull/29746) - test(vcr): stop refreshing cassette TTL on read so cassettes lapse after 24h by [@​mateo-berri](https://github.com/mateo-berri) in [#​29784](https://github.com/BerriAI/litellm/pull/29784) - test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound by [@​mateo-berri](https://github.com/mateo-berri) in [#​29787](https://github.com/BerriAI/litellm/pull/29787) - fix(ui): route MCP playground auth by oauth2 mode instead of token\_url by [@​tin-berri](https://github.com/tin-berri) in [#​29714](https://github.com/BerriAI/litellm/pull/29714) - refactor(ui): centralize proxy base URL resolution into tested resolver by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29793](https://github.com/BerriAI/litellm/pull/29793) - Litellm oss staging 050626 by [@​Sameerlite](https://github.com/Sameerlite) in [#​29774](https://github.com/BerriAI/litellm/pull/29774) - test(google): add google-genai SDK proxy integration tests by [@​Sameerlite](https://github.com/Sameerlite) in [#​29781](https://github.com/BerriAI/litellm/pull/29781) - fix(jwt): use resolved DB user\_id for spend on legacy email match by [@​milan-berri](https://github.com/milan-berri) in [#​29217](https://github.com/BerriAI/litellm/pull/29217) - feat(ui): generate dashboard API types from the proxy OpenAPI spec by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29816](https://github.com/BerriAI/litellm/pull/29816) - fix(proxy): drop deleted team BYOK model name from team.models by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29820](https://github.com/BerriAI/litellm/pull/29820) - feat(mcp): per-server env vars with global + per-user scopes by [@​mateo-berri](https://github.com/mateo-berri) in [#​28917](https://github.com/BerriAI/litellm/pull/28917) - refactor(ui): route behavior-preserving networking calls through apiClient by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29806](https://github.com/BerriAI/litellm/pull/29806) - fix(mcp): persist Tools-tab MCP OAuth token to DB by [@​tin-berri](https://github.com/tin-berri) in [#​29809](https://github.com/BerriAI/litellm/pull/29809) - fix(ui): require new expiration when regenerating an expired key by [@​milan-berri](https://github.com/milan-berri) in [#​29838](https://github.com/BerriAI/litellm/pull/29838) - refactor(ui): route query-building networking calls through apiClient by [@​ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#​29815](https://github.com/BerriAI/litellm/pull/29815) - Make the image-gen record/replay proxy report cache mode and per-request HIT/MISS by [@​mateo-berri](https://github.com/mateo-berri) in [#​29802](https://github.com/BerriAI/litellm/pull/29802) - feat(proxy): hot-reload .env in dev when running with --reload by [@​mateo-berri](https://github.com/mateo-berri) in [#​29783](https://github.com/BerriAI/litellm/pull/29783) - fix(ui): stop MCP playground tool calls from sending twice by [@​tin-berri](https://github.com/tin-berri) in [#​29821](https://github.com/BerriAI/litellm/pull/29821) - feat(fal\_ai): add Nano Banana / Gemini 2.5 Flash Image generation support by [@​mateo-berri](https://github.com/mateo-berri) in [#​29798](https://github.com/BerriAI/litellm/pull/29798) - Title: Fix managed batch cancel credential resolution by [@​shivamrawat1](https://github.com/shivamrawat1) in [#​29734](https://github.com/BerriAI/litellm/pull/29734) - Title: fix(proxy): resolve vector store file list credentials from team deployments by [@​shivamrawat1](https://github.com/shivamrawat1) in [#​29739](https://github.com/BerriAI/litellm/pull/29739) - refactor: convert AWS and GCP Terraform stacks into reusable modules … by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​28103](https://github.com/BerriAI/litellm/pull/28103) - chore(ui): build ui for release by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29853](https://github.com/BerriAI/litellm/pull/29853) - fix(terraform/gcp): prompt for image\_registry in DeployStack one-click by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29852](https://github.com/BerriAI/litellm/pull/29852) - fix(terraform/gcp): abandon SQL user on destroy by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29855](https://github.com/BerriAI/litellm/pull/29855) - Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic by [@​mateo-berri](https://github.com/mateo-berri) in [#​29847](https://github.com/BerriAI/litellm/pull/29847) - chore(deps): bump deps by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29860](https://github.com/BerriAI/litellm/pull/29860) - chore(ci): promote internal staging to main by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29861](https://github.com/BerriAI/litellm/pull/29861) - fix: 400 on Anthropic context overflow; seed identity on failed auth by [@​yassin-berriai](https://github.com/yassin-berriai) in [#​29848](https://github.com/BerriAI/litellm/pull/29848) - chore(ci): promote internal staging to main by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​29862](https://github.com/BerriAI/litellm/pull/29862) - chore(release): patch v1.89.0-rc.1 with [#​30064](https://github.com/BerriAI/litellm/issues/30064) (Claude Fable 5) for v1.89.0-rc.2 by [@​mateo-berri](https://github.com/mateo-berri) in [#​30143](https://github.com/BerriAI/litellm/pull/30143) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.0...v1.89.0> ### [`v1.88.2`](https://github.com/BerriAI/litellm/releases/tag/v1.88.2) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.2...v1.88.2) ##### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.88.2/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** ##### What's Changed - chore(release): backport Fable 5, batch-file auth, CrowdStrike AIDR, Mantle Responses SigV4, and NetApp streaming-cost fix to stable/1.88.x and cut 1.88.2 by [@​mateo-berri](https://github.com/mateo-berri) in [#​30144](https://github.com/BerriAI/litellm/pull/30144) - chore(release): backport DB-resilience, passthrough, model-info, budget, and deps fixes to stable/1.88.x by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​30408](https://github.com/BerriAI/litellm/pull/30408) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.1...v1.88.2> ### [`v1.88.2`](https://github.com/BerriAI/litellm/releases/tag/v1.88.2) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.1...v1.88.2) ##### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.88.2/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** ##### What's Changed - chore(release): backport Fable 5, batch-file auth, CrowdStrike AIDR, Mantle Responses SigV4, and NetApp streaming-cost fix to stable/1.88.x and cut 1.88.2 by [@​mateo-berri](https://github.com/mateo-berri) in [#​30144](https://github.com/BerriAI/litellm/pull/30144) - chore(release): backport DB-resilience, passthrough, model-info, budget, and deps fixes to stable/1.88.x by [@​yuneng-berri](https://github.com/yuneng-berri) in [#​30408](https://github.com/BerriAI/litellm/pull/30408) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.1...v1.88.2> </details> --- ### Configuration 📅 **Schedule**: (in timezone Europe/London) - Branch creation - At any time (no schedule defined) - Automerge - At any time (no schedule defined) 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4yMTkuMCIsInVwZGF0ZWRJblZlciI6IjQzLjIxOS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL21pbm9yIl19--> Reviewed-on: https://forgejo.hayden.moe/hayden/phoebe/pulls/93

Automated copy of #28008 into
litellm_internal_stagingfor pr-babysitter.Original head:
gym-cmd/litellm:feat/v1.84.0-mcp-gateway-jwt-auth@216e05591029Original PR: #28008
Note
High Risk
Changes authentication and authorization for MCP (anonymous cold-start bypasses, header stripping, 401 propagation) and JWT validation paths—security-critical areas requiring careful rollout and migration of
oauth_passthrough.Overview
Adds MCP OAuth passthrough via a new
oauth_passthroughflag on MCP servers (DB migration + schema). When enabled onauth_type=noneservers that forwardAuthorization, the gateway proxies upstream RFC 9728oauth-protected-resourcemetadata (cached), allows anonymous cold-start for OAuth discovery, emits spec-correctWWW-Authenticatechallenges, and surfaces upstream 401/403 throughMCPUpstreamAuthErrorinstead of empty tool lists. Authorization forwarding is centralized in_should_strip_caller_authorizationso LiteLLM API keys are not leaked upstream while upstream bearer tokens are preserved on the passthrough return path.JWT auth gains optional
LiteLLM_JWTAuth.issuersentries (JWTIssuerConfig) for per-issuer JWKS, audience, and claim normalization; unmatchedissstill uses the globalJWT_PUBLIC_KEY_URLpath.Hardening / ops: X-Forwarded-* for MCP OAuth and client IP only when the direct peer is in
mcp_trusted_proxy_ranges; saferroot_pathstripping for sub-path deployments; org/agent MCP permission lookups aligned with sharedauth_checkscaches; Docker Wolfi image bump and retriedapk add.Reviewed by Cursor Bugbot for commit 625e3bc. Bugbot is set up for automated code reviews on this repo. Configure here.