[internal copy of #28008] Support MCP OAuth passthrough and issuer-scoped JWT auth by mateo-berri · Pull Request #28356 · BerriAI/litellm

mateo-berri · 2026-05-20T14:49:54Z

Automated copy of #28008 into litellm_internal_staging for pr-babysitter.

Original head: gym-cmd/litellm:feat/v1.84.0-mcp-gateway-jwt-auth @ 216e05591029
Original PR: #28008

Note

High Risk
Changes authentication and authorization for MCP (anonymous cold-start bypasses, header stripping, 401 propagation) and JWT validation paths—security-critical areas requiring careful rollout and migration of oauth_passthrough.

Overview
Adds MCP OAuth passthrough via a new oauth_passthrough flag on MCP servers (DB migration + schema). When enabled on auth_type=none servers that forward Authorization, the gateway proxies upstream RFC 9728 oauth-protected-resource metadata (cached), allows anonymous cold-start for OAuth discovery, emits spec-correct WWW-Authenticate challenges, and surfaces upstream 401/403 through MCPUpstreamAuthError instead of empty tool lists. Authorization forwarding is centralized in _should_strip_caller_authorization so LiteLLM API keys are not leaked upstream while upstream bearer tokens are preserved on the passthrough return path.

JWT auth gains optional LiteLLM_JWTAuth.issuers entries (JWTIssuerConfig) for per-issuer JWKS, audience, and claim normalization; unmatched iss still uses the global JWT_PUBLIC_KEY_URL path.

Hardening / ops: X-Forwarded-* for MCP OAuth and client IP only when the direct peer is in mcp_trusted_proxy_ranges; safer root_path stripping for sub-path deployments; org/agent MCP permission lookups aligned with shared auth_checks caches; Docker Wolfi image bump and retried apk add.

^{Reviewed by Cursor Bugbot for commit 625e3bc. Bugbot is set up for automated code reviews on this repo. Configure here.}

Operators upgrading past 35bbca6 (which made /metrics auth default-on) see "Malformed API Key passed in. Ensure Key has 'Bearer ' prefix." with no hint that litellm_settings.require_auth_for_metrics_endpoint: false restores the previous unauthenticated behavior. Append that discovery hint to the existing 401 body so a Prometheus scraper that breaks after upgrade has a clear migration path. No behavior change.

…t-rc2 fix(proxy): point /metrics 401 at the opt-out flag

…o remaining headroom reserve_budget_for_request fell back to reserving the entire remaining team/key/user headroom whenever a request omitted max_tokens, which pinned the spend counter at max_budget for the duration of the in-flight request and false-positive-blocked every concurrent or back-to-back request until the success callback reconciled. Surfaced as an integration-test team being budget-blocked at its $2000 cap while DB spend was $0.144. Switch the missing-max_tokens path to a fixed default of 16384 output tokens (mirrors parallel_request_limiter_v3's DEFAULT_MAX_TOKENS_ESTIMATE precedent), and clamp explicit max_tokens at the model's max_output_tokens for reservation accounting only. The outbound request body is unchanged, so providers see whatever the caller actually sent; only the local integer used to compute reservation cost is bounded. This also prevents a hostile max_tokens=999999999 from inflating one request's reservation up to the entire team headroom. For Opus 4.7 (output $25/M, max_output 128K) on a $2000 budget the worst-case per-request reservation drops from "everything left" to $3.20, raising admittable concurrency from 1 to ~625.

Image-generation routes (dall-e-3, flux, etc.) have no per-token output cost so they fell through to the no-reservation read-time-only path. Concurrent image requests against a depleted budget could all pass common_checks (counter exactly at max_budget passes the strict-`>` gate) and reach the provider before reconciliation caught up. Add per-image reservation in _estimate_request_max_cost_for_model: when the model has a per-image cost field, reserve `n × cost_per_image` upfront. The atomic counter increment serializes concurrent admissions, so the second request sees the post-first-reservation counter and raises BudgetExceededError instead of silently leaking through. Both `output_cost_per_image` and `input_cost_per_image` are honored — naming is inconsistent across providers (OpenAI dall-e-3 uses input_cost_per_image, aiml/dall-e-3 uses output_cost_per_image for the same per-generated-image price). Per-pixel pricing (DALL-E 2 size variants) and TTS/STT routes still fall through to read-time enforcement; those are follow-ups.

The previous detection treated any model with input_cost_per_image or output_cost_per_image as image generation. Several chat and embedding models carry those fields to price multimodal vision input, not generated images: - gemini-3.1-pro-preview (mode=chat) has output_cost_per_image=0.00012 alongside input/output token pricing. - azure/gpt-realtime-* (mode=chat) has input_cost_per_image=5e-6. - amazon.titan-embed-image-v1 (mode=embedding) has input_cost_per_image=6e-5. For these models the image-gen branch fired first and reserved a fraction of a cent per request, short-circuiting the token-priced path entirely. Long Gemini chats reserved 1 × $0.00012 instead of the true token cost. Gate strictly on mode in {"image_generation", "image_edit"}. All 197 real image_generation entries and all 31 image_edit entries (Flux Kontext, Stability inpaint/outpaint, etc.) carry the right mode, so the field-presence fallback was unnecessary. Adds regression tests for the chat-model-with-image-cost-field case and for image_edit reservation.

…2-backport fix(proxy): bound budget reservation per request (backport of #27509 to 1.84.0rc2)

Backport of #27241 onto litellm_1.84.0rc2. The 12 entries in `[project.dependencies]` were exact `==` pins, a side effect of the Poetry -> uv migration. This forces every downstream package that lists litellm as a dependency to downgrade common runtime libraries (openai, pydantic, aiohttp, click, jsonschema, ...) to the exact versions we ship. Switch to lower-bounded ranges with upper bounds where the upstream package is pre-1.0 or has a known breaking-major-version policy. Reproducibility for our Docker proxy and CI continues to come from `uv.lock`, which is regenerated here as a metadata-only diff. Conflict resolution vs upstream merge: - The upstream merge commit also surfaced unrelated context entries (nvidia-riva-client, soundfile/stt-nvidia-riva extra) that exist in staging but not in rc2. Those are not part of #27241's intent and were dropped from the resolution; the rc2 uv.lock keeps its existing entry set, only the 12 specifier strings changed. - `uv lock --check` passes (392 packages resolved, no drift).

build(packaging): relax core runtime pins to ranges (rc2 backport of #27241)

Our `uv.lock` already resolves jinja2 to 3.1.6, so Docker / CI installs get that version. The `pyproject.toml` floor was lagging at 3.1.0, which means downstream consumers using `--resolution=lowest-direct` or older constraint files can land on 3.1.0-3.1.5 instead of the version we actually test against. Aligns the declared floor with the resolved version so external installers see the same baseline our test matrix exercises. `uv lock` diff is metadata-only (no resolved-version drift).

build(packaging): raise jinja2 floor to 3.1.6 (rc2 backport of #27552)

OpenAPI-generated tools only applied static closure headers and BYOK Authorization via ContextVar. Copy MCPServer.extra_headers from the incoming MCP request into _request_extra_headers (set in server.py before local tool dispatch), merge in openapi_to_mcp_generator via a small helper. OAuth2 M2M: do not forward caller Authorization from raw_headers (same rule as _prepare_mcp_server_headers for managed MCP). Adds TestRequestExtraHeaders and clarifies mcp_server_manager registration comment. Fixes #26794 Co-authored-by: Cursor <cursoragent@cursor.com>

Greptile: getattr default was redundant; property exists on MCPServer and mcp_server is non-None inside the extra_headers forwarding block. Co-authored-by: Cursor <cursoragent@cursor.com>

Match the existing MCP invariant in merge_mcp_headers and the managed MCP path: operator-configured static headers always override caller-forwarded headers on name conflict, with case-insensitive comparison so different casing cannot bypass the precedence. _request_auth_header (BYOK) still overrides Authorization last. Addresses Veria review on PR #27383. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

…onto-1.84.0rc2 cherry-pick: OpenAPI MCP extra_headers (#27383) onto litellm_1.84.0rc2

Caller-supplied tags (`x-litellm-tags` header, body `tags`, `metadata.tags`) were silently dropped unless the key/team had `metadata.allow_client_tags: true` set. Restore the documented behavior: tags from the request always flow into `metadata.tags` and union with any admin-configured static tags from key/team/project metadata. Removes the `allow_client_tags` opt-in flag from the pre-call pipeline. The flag was only ever read here; it has no schema or endpoint footprint, so leftover values in existing key metadata are inert. Test cleanup mirrors the simplification: drop the three tests that verified the strip-when-not-opted-in path, drop the `allow_client_tags` fixture lines from the merge/union tests.

The tag-strip block was removed in the parent commit but two surrounding comments still referenced "tags without opt-in" and "runs AFTER the strip". Update them to describe the remaining user_api_key_* and _pipeline_managed_guardrails strip that the snapshot/merge ordering actually protects against.

fix(proxy): always merge caller-supplied tags into request metadata

…#27762) Cherry-pick of #27762 onto litellm_1.84.0rc2. * chore: reject bare str at file-input sinks to prevent local-file read (#27667) * fix: use os.PathLike in ocr sink and check truthy reasoningSummary for bridge - ocr/main.py: widen Path check to os.PathLike for consistency with other sinks - main.py: bridge condition checks truthiness of reasoning_summary, not just None * fix: remove unused pathlib.Path import in ocr/main.py Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>

…86be cherry-pick: reject bare str at file-input sinks (#27762) onto litellm_1.84.0rc2

LazyFeatureMiddleware compared the raw scope path against registered prefixes (e.g. /policies), so requests under a server root path like /api/v1/policies/... never matched, the feature never loaded, and the endpoint returned 404. Strip the configured root path before matching, normalizing trailing slashes and enforcing a component boundary so /api does not falsely match /apiv2.

SERVER_ROOT_PATH is a process-startup env var. Read it once in __init__ instead of calling get_server_root_path() + rstrip on every request that arrives before all lazy features have loaded.

…e guards (#27793) Backport of #27793 onto litellm_1.84.0rc2. A non-admin caller could rebind their own key's user_id via /key/regenerate. _execute_virtual_key_regeneration had org/team guards but no user_id guard, and prepare_key_update_data did not strip the field — it survived model_dump(exclude_unset=True) into the Prisma update. On the next request, _return_user_api_key_auth_obj resolved the rebound user_id against litellm_usertable and returned PROXY_ADMIN whenever the target row's user_role was admin. /key/update had the equivalent guard inline at _validate_update_key_data; extract it to a shared helper _validate_caller_can_change_key_ownership and call from both /key/update and _execute_virtual_key_regeneration. Also tighten the premium gate that allowed the master-key rotation branch to skip the enterprise check. The previous predicate was a field-presence test, not an identity check. Verify the caller actually holds the master key via _is_master_key before allowing the non-premium path. Block explicit-null user_id and empty-string user_id as removal attempts; both 403-reject for non-admin callers.

[Fix] Lazy feature loading under SERVER_ROOT_PATH returns 404 (backport of #27812)

…ard_84rc2 [Fix] Backport /key/regenerate ownership-rebind + premium-gate guards (#27793)

Backport of #27866 onto litellm_1.84.0rc2. External readiness probes consumed the legacy detailed payload's `db` field to drive alerting and pod-rotation decisions. Stripping the body to {"status": "healthy"} broke those probes silently — the HTTP code still flipped to 503, but probes checking body.db == "connected" treated the response as healthy. Add `db` back to the unauthenticated payload. The rest of the diagnostic fields (litellm_version, callbacks, cache, log_level) stay behind /health/readiness/details so the recon-leak gate from #26912 holds. Values match the legacy contract: "connected", "disconnected", "Not connected". The 503-on-DB-disconnect behavior from LIT-2607 is preserved.

fix(proxy): expose db status on public /health/readiness (backport #27866)

The proxy moved `litellm_version`, `is_detailed_debug`, and other diagnostic fields off the public `/health/readiness` payload behind an auth-gated `/health/readiness/details` endpoint. The navbar version tag and the detailed-debug-mode banner stopped working because they were still reading those fields from the unauthed response, which no longer contains them. Replace `useHealthReadiness` with a `useHealthReadinessDetails` hook that takes an `accessToken` argument and sends a Bearer header to the auth-gated endpoint. The hook stays disabled while `accessToken` is falsy, so the navbar can keep rendering on the public model hub (where the token is null) without triggering an auth redirect or a 401-loop.

Two small follow-ups on the readiness/details migration: - Set `retry: false` on the query. The payload feeds a passive navbar tag and a debug banner; a 401 from an expired token shouldn't fan out into three retries against the proxy. - Add navbar specs that assert the `accessToken` prop is forwarded into the hook (matches the DebugWarningBanner spec). Without this, the navbar could silently regress to passing `undefined` and the existing tests wouldn't catch it.

fix(ui): fetch version + debug flag from /health/readiness/details (backport #27896)

….20.2)

mateo-berri · 2026-05-26T21:11:14Z

@greptileai

…nt wolfi-base SHA The older sha256:3258be... pin has been intermittently returning 500/not-found from cgr.dev, breaking the test-server-root-path GitHub Action and the build_docker_database_image CircleCI job. Move both Dockerfiles onto the same sha256:31da65... digest already in use by Dockerfile, gateway/Dockerfile, backend/Dockerfile, and migrations/Dockerfile so the base image is consistent across the repo.

mateo-berri · 2026-05-26T22:02:41Z

@greptileai

mateo-berri · 2026-05-26T22:28:05Z

Replying to @greptile-apps[bot]'s summary comment (#issuecomment-4499663072) — the lone outstanding concern is not actionable on this branch:

The pre-flight upstream probe _check_passthrough_upstream_auth fires on every StreamableHTTP MCP operation rather than just on session initialization, doubling upstream traffic and risking upstream rate-limit interference. The _check_passthrough_upstream_auth call should be skipped for requests carrying a valid Mcp-Session-Id, probing only on first contact.

Three things make this a no-op-or-worse change in the PR's configuration:

1. Mcp-Session-Id carries no information in this configuration. Both session_manager and sse_session_manager are constructed with stateless=True (server.py:251, 259). The MCP SDK's stateless path (mcp/server/streamable_http_manager.py::_handle_stateless_request) explicitly sets mcp_session_id=None on every request and never tracks sessions — _server_instances stays empty for the lifetime of the worker. Compliant clients in stateless mode receive no session id from initialize and won't send one on subsequent calls. The gating heuristic Greptile suggests ("skip when Mcp-Session-Id is valid / known to this worker") is therefore equivalent to "never skip" in the only configuration this code exercises, so the proposed change wouldn't reduce upstream traffic at all.

2. The probe is correctness-critical, not optional. The MCP StreamableHTTP SDK commits 200 OK + SSE response headers before dispatching the JSON-RPC message to the server handler — see mcp/server/streamable_http.py:633-637:

# Start the SSE response (this will send headers immediately)
async with anyio.create_task_group() as tg:
    tg.start_soon(response, scope, receive, send)
    # Then send the message to be processed by the server
    ...
    await writer.send(session_message)

The start_soon(response, ...) call emits the http.response.start ASGI event with status 200 before the handler runs. Once that happens, any upstream 401 surfaced via MCPUpstreamAuthError inside _fetch_tools_with_timeout is unrecoverable as an HTTP status — the response is already in flight and there is no way to retroactively send a 401 + WWW-Authenticate challenge. The pre-flight probe is the only window in which a 401 can actually reach the client. This is exactly what the docstring at _check_passthrough_upstream_auth (server.py:3094-3097) is documenting, and removing it re-introduces the silently-empty-tool-list bug this PR is fixing — see the now-resolved review thread on handle_sse_mcp (review #3275745610) where adding the probe to the SSE handler was specifically requested for the same reason.

3. The probe is already narrowly scoped and bounded. It is a no-op for the typical deployment:

_get_forwarded_auth_from_scope returns None unless both Authorization and x-litellm-api-key are present on the request (server.py:3001-3022), so any request that isn't a deliberate pass-through forwarding attempt exits at line 3101 without touching upstream.
The probe set is filtered to srv.is_oauth_passthrough only (server.py:3121), which requires auth_type ∈ (None, MCPAuth.none) AND Authorization in extra_headers AND explicit oauth_passthrough=True on the server row (types/mcp_server/mcp_server_manager.py:159-181). For any non-passthrough server set passthrough_servers is empty and the function returns at line 3124.
For passthrough servers it does cost one extra upstream POST per request, but it's bounded by a 5-second timeout (server.py:3027) and fails open on network errors (_probe_upstream_auth line 3076-3080), so a slow/unreachable upstream doesn't block valid requests.

The extra upstream RTT on passthrough servers is the intentional trade-off: the alternative is that a client with an expired upstream token receives a 200 SSE stream with an empty/garbled body and no way to discover that it needs to refresh, which defeats the entire RFC 9728 challenge flow this PR adds. Leaving the probe as-is.

greptile-apps · 2026-05-26T22:28:43Z

Acknowledged — all three points hold up.

The stateless-mode argument alone is sufficient to invalidate the Mcp-Session-Id heuristic, but the ASGI response-headers-committed-before-handler point is the deeper reason the probe has to be pre-flight regardless of session tracking. Once http.response.start with status 200 is in flight there is no recovery path for a 401 challenge, so the probe isn't an optimization opportunity — it's the only window that exists. The narrow gating (oauth_passthrough=True + both auth headers present) means the cost is limited to exactly the deployments that opted into the pass-through contract.

Concern withdrawn. No action needed.

The previously aligned sha256:31da6565f35a... and the older sha256:3258be... both return HTTP 500 from cgr.dev's manifest endpoint, breaking the build_docker_database_image CircleCI job and test-server-root-path GitHub Action. The current 'latest' tag resolves to sha256:5743937d521c... which serves manifests normally, so move docker/Dockerfile.database and docker/Dockerfile.non_root onto that digest.

mateo-berri · 2026-05-26T22:41:47Z

@greptileai

cursor

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Preemptive 401 fires before toolset scope narrows servers
- Moved _raise_preemptive_401_for_unauthenticated_servers to run after toolset scoping in both handle_streamable_http_mcp and handle_sse_mcp, and added an optional allowed_server_ids parameter so passthrough/oauth2 servers excluded by the active toolset are skipped instead of triggering a spurious OAuth challenge.

Preview (625e3bc526)

diff --git a/docker/Dockerfile.database b/docker/Dockerfile.database
--- a/docker/Dockerfile.database
+++ b/docker/Dockerfile.database
@@ -1,8 +1,8 @@
 # Base image for building
-ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31
+ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:5743937d521cbeb9e8c73bf1bd7ba2589c178940eb03d7b148efecc962be8587
 
 # Runtime image
-ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31
+ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:5743937d521cbeb9e8c73bf1bd7ba2589c178940eb03d7b148efecc962be8587
 ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:240fb85ab0f263ef12f492d8476aa3a2e4e1e333f7d67fbdd923d00a506a516a
 
 FROM $UV_IMAGE AS uvbin
@@ -15,16 +15,20 @@
 COPY --from=uvbin /uv /usr/local/bin/uv
 COPY --from=uvbin /uvx /usr/local/bin/uvx
 
-RUN apk add --no-cache \
-    bash \
-    gcc \
-    python3 \
-    python3-dev \
-    openssl \
-    openssl-dev \
-    nodejs \
-    npm \
-    libsndfile
+RUN for i in 1 2 3; do \
+      apk add --no-cache \
+        bash \
+        gcc \
+        python3 \
+        python3-dev \
+        openssl \
+        openssl-dev \
+        nodejs \
+        npm \
+        libsndfile && break; \
+      [ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \
+      sleep 5; \
+    done
 
 ENV UV_PROJECT_ENVIRONMENT=/app/.venv \
     UV_LINK_MODE=copy \
@@ -66,7 +70,11 @@
 
 USER root
 
-RUN apk add --no-cache bash openssl tzdata nodejs npm python3 libsndfile && \
+RUN for i in 1 2 3; do \
+      apk add --no-cache bash openssl tzdata nodejs npm python3 libsndfile && break; \
+      [ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \
+      sleep 5; \
+    done && \
     npm install -g npm@11.12.1 tar@7.5.11 glob@11.1.0 @isaacs/brace-expansion@5.0.1 minimatch@10.2.4 diff@8.0.3 && \
     GLOBAL="$(npm root -g)" && \
     find "$GLOBAL/npm" -type d -name "tar" -path "*/node_modules/tar" | while read d; do \

diff --git a/docker/Dockerfile.non_root b/docker/Dockerfile.non_root
--- a/docker/Dockerfile.non_root
+++ b/docker/Dockerfile.non_root
@@ -1,6 +1,6 @@
 # Base images
-ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31
-ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31
+ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:5743937d521cbeb9e8c73bf1bd7ba2589c178940eb03d7b148efecc962be8587
+ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:5743937d521cbeb9e8c73bf1bd7ba2589c178940eb03d7b148efecc962be8587
 ARG PROXY_EXTRAS_SOURCE=published
 ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:240fb85ab0f263ef12f492d8476aa3a2e4e1e333f7d67fbdd923d00a506a516a
 

diff --git a/litellm-proxy-extras/litellm_proxy_extras/migrations/20260526120000_add_oauth_passthrough_to_mcp_servers/migration.sql b/litellm-proxy-extras/litellm_proxy_extras/migrations/20260526120000_add_oauth_passthrough_to_mcp_servers/migration.sql
new file mode 100644
--- /dev/null
+++ b/litellm-proxy-extras/litellm_proxy_extras/migrations/20260526120000_add_oauth_passthrough_to_mcp_servers/migration.sql
@@ -1,0 +1,2 @@
+-- AlterTable
+ALTER TABLE "LiteLLM_MCPServerTable" ADD COLUMN IF NOT EXISTS "oauth_passthrough" BOOLEAN NOT NULL DEFAULT false;

diff --git a/litellm-proxy-extras/litellm_proxy_extras/schema.prisma b/litellm-proxy-extras/litellm_proxy_extras/schema.prisma
--- a/litellm-proxy-extras/litellm_proxy_extras/schema.prisma
+++ b/litellm-proxy-extras/litellm_proxy_extras/schema.prisma
@@ -325,6 +325,7 @@
   allow_all_keys Boolean @default(false)
   available_on_public_internet Boolean @default(true)
   delegate_auth_to_upstream Boolean @default(false)
+  oauth_passthrough Boolean @default(false)
   is_byok               Boolean  @default(false)
   byok_description      String[] @default([])
   byok_api_key_help_url String?

diff --git a/litellm/experimental_mcp_client/client.py b/litellm/experimental_mcp_client/client.py
--- a/litellm/experimental_mcp_client/client.py
+++ b/litellm/experimental_mcp_client/client.py
@@ -421,8 +421,16 @@
 
         return factory
 
-    async def list_tools(self) -> List[MCPTool]:
-        """List available tools from the server."""
+    async def list_tools(self, raise_on_error: bool = False) -> List[MCPTool]:
+        """List available tools from the server.
+
+        Args:
+            raise_on_error: When True, re-raise exceptions instead of returning
+                an empty list. Used by the proxy's pass-through MCP flow so it
+                can surface upstream HTTP 401 responses as a proper 401 to the
+                MCP client (triggering the upstream OAuth flow) rather than
+                masking them as "connected, no tools".
+        """
         verbose_logger.debug(
             f"MCP client listing tools from {self.server_url or 'stdio'}"
         )
@@ -458,6 +466,8 @@
                     "the MCP server may have crashed, disconnected, or timed out"
                 )
 
+            if raise_on_error:
+                raise
             # Return empty list instead of raising to allow graceful degradation
             return []
 

diff --git a/litellm/proxy/_experimental/mcp_server/auth/user_api_key_auth_mcp.py b/litellm/proxy/_experimental/mcp_server/auth/user_api_key_auth_mcp.py
--- a/litellm/proxy/_experimental/mcp_server/auth/user_api_key_auth_mcp.py
+++ b/litellm/proxy/_experimental/mcp_server/auth/user_api_key_auth_mcp.py
@@ -7,6 +7,7 @@
 from starlette.types import Scope
 
 from litellm._logging import verbose_logger
+from litellm.constants import DEFAULT_MANAGEMENT_OBJECT_IN_MEMORY_CACHE_TTL
 from litellm.proxy._types import (
     LiteLLM_TeamTable,
     ProxyException,
@@ -14,8 +15,90 @@
     UserAPIKeyAuth,
 )
 from litellm.proxy.auth.user_api_key_auth import user_api_key_auth
+from litellm.proxy.auth.ip_address_utils import IPAddressUtils
 
 
+def _parse_mcp_server_names_from_path(
+    path: str, mcp_servers_header: Optional[List[str]] = None
+) -> Optional[List[str]]:
+    """Resolve the single MCP server name a cold-start passthrough bypass may
+    target. Delegates parsing to
+    :meth:`MCPRequestHandler._extract_target_server_names_from_path` so the
+    names used here always match the names downstream routing uses; returns
+    ``None`` whenever the bypass must not activate (aggregate ``/mcp``,
+    multi-server CSV paths, or any other unrecognized path).
+
+    Also fails closed when the ``x-mcp-servers`` header introduces any server
+    not present in the path-derived target set. Downstream routing for
+    ``/mcp/...`` paths overrides the header with path-derived names, but a
+    header/path mismatch here is a sign of a confused or hostile caller —
+    refuse the cold-start bypass rather than admit anonymously based on the
+    path while the header advertises a stricter, non-passthrough target."""
+    servers = MCPRequestHandler._extract_target_server_names_from_path(path)
+    if len(servers) != 1:
+        verbose_logger.debug(
+            "MCP cold-start: path %r resolved to %r; passthrough 401 bypass "
+            "requires exactly one target and will not activate",
+            path,
+            servers,
+        )
+        return None
+    if mcp_servers_header is not None and (set(mcp_servers_header) - set(servers)):
+        verbose_logger.debug(
+            "MCP cold-start: x-mcp-servers header %r introduces target(s) not "
+            "in path-derived set %r; passthrough 401 bypass will not activate",
+            mcp_servers_header,
+            servers,
+        )
+        return None
+    return servers
+
+
+def _is_mcp_passthrough_cold_start(
+    mcp_servers: Optional[List[str]], client_ip: Optional[str]
+) -> bool:
+    """True only when EVERY targeted server is a pass-through server with no
+    auth headers — the cold-start OAuth discovery case per RFC 9728 / MCP
+    Authorization spec. Lets the route handler's 401 emitter produce the
+    spec-compliant WWW-Authenticate challenge instead of surfacing a generic
+    admission error.
+
+    Uses "all" semantics (mirrors :meth:`MCPRequestHandler._target_servers_use_oauth2`):
+    one non-passthrough target in a co-targeted set must not flip the bypass
+    open for the others. Fails closed when any target cannot be resolved."""
+    if not mcp_servers:
+        return False
+    from litellm.proxy._experimental.mcp_server.mcp_server_manager import (
+        global_mcp_server_manager,
+    )
+
+    for name in mcp_servers:
+        server = global_mcp_server_manager.get_mcp_server_by_name(
+            name, client_ip=client_ip
+        )
+        if server is None or not getattr(server, "is_oauth_passthrough", False):
+            return False
+    return True
+
+
+def _is_litellm_auth_admission_error(exc: Exception) -> bool:
+    if isinstance(exc, HTTPException):
+        return exc.status_code == 401
+    if isinstance(exc, ProxyException):
+        try:
+            return int(exc.code) == 401
+        except (TypeError, ValueError):
+            return False
+    return False
+
+
+def _has_client_supplied_mcp_auth(
+    mcp_auth_header: Optional[str],
+    mcp_server_auth_headers: Optional[Dict[str, Dict[str, str]]],
+) -> bool:
+    return bool(mcp_auth_header) or bool(mcp_server_auth_headers)
+
+
 class MCPRequestHandler:
     """
     Class to handle MCP request processing, including:
@@ -37,7 +120,7 @@
     LITELLM_MCP_ACCESS_GROUPS_HEADER_NAME = SpecialHeaders.mcp_access_groups.value
 
     @staticmethod
-    async def process_mcp_request(
+    async def process_mcp_request(  # noqa: PLR0915
         scope: Scope,
     ) -> Tuple[
         UserAPIKeyAuth,
@@ -130,7 +213,9 @@
         elif (
             not litellm_api_key
             and MCPRequestHandler._target_servers_delegate_auth_to_upstream(  # noqa: E501
-                path=request_route, mcp_servers=mcp_servers
+                path=request_route,
+                mcp_servers=mcp_servers,
+                client_ip=IPAddressUtils.get_mcp_client_ip(request),
             )
         ):
             # Operator opted this oauth2 server into upstream-delegated auth
@@ -172,25 +257,87 @@
                 # than coercing (``int("None")`` would raise ValueError and
                 # rewrite the auth error as a 500).
                 status = e.status_code if isinstance(e, HTTPException) else e.code
-                if status in (
-                    401,
-                    403,
-                    "401",
-                    "403",
-                ) and MCPRequestHandler._target_servers_use_oauth2(
-                    path=request_route, mcp_servers=mcp_servers
+                is_auth_error = status in (401, 403, "401", "403")
+                is_unauthenticated = status in (401, "401")
+                client_ip = IPAddressUtils.get_mcp_client_ip(request)
+                if is_auth_error and MCPRequestHandler._target_servers_use_oauth2(
+                    path=request_route,
+                    mcp_servers=mcp_servers,
+                    client_ip=client_ip,
                 ):
                     verbose_logger.debug(
                         "MCP OAuth2: target server is OAuth2-mode, treating "
                         "Authorization as upstream OAuth2 token passthrough"
                     )
                     validated_user_api_key_auth = UserAPIKeyAuth()
+                elif is_unauthenticated:
+                    # Pass-through cold-start return: per RFC 9728 / MCP
+                    # Authorization spec the client completes upstream OAuth
+                    # discovery and returns with ``Authorization: Bearer
+                    # <upstream-token>``. For ``auth_type=none`` passthrough
+                    # servers that bearer is not a LiteLLM key (auth above
+                    # failed) but is meant to be forwarded upstream
+                    # unchanged. Fall back to anonymous admission so the
+                    # caller is not rejected for following the discovery
+                    # flow without also setting ``x-litellm-api-key``.
+                    # Only trigger on 401 (token unrecognized); a 403 means
+                    # the key WAS recognized but is forbidden (e.g. over
+                    # budget / rate limited) and must propagate so those
+                    # controls are not bypassed via anonymous admission.
+                    mcp_servers_from_path = _parse_mcp_server_names_from_path(
+                        request_route, mcp_servers
+                    )
+                    if (
+                        mcp_servers_from_path is not None
+                        and not _has_client_supplied_mcp_auth(
+                            mcp_auth_header,
+                            mcp_server_auth_headers,
+                        )
+                        and _is_mcp_passthrough_cold_start(
+                            mcp_servers_from_path, client_ip=client_ip
+                        )
+                    ):
+                        verbose_logger.debug(
+                            "MCP pass-through return: target server is "
+                            "passthrough, treating Authorization as "
+                            "upstream OAuth token for delegated auth"
+                        )
+                        validated_user_api_key_auth = UserAPIKeyAuth()
+                    else:
+                        raise
                 else:
                     raise
         else:
-            validated_user_api_key_auth = await user_api_key_auth(
-                api_key=litellm_api_key, request=request
-            )
+            try:
+                validated_user_api_key_auth = await user_api_key_auth(
+                    api_key=litellm_api_key, request=request
+                )
+            except (HTTPException, ProxyException) as exc:
+                # Cold-start MCP OAuth discovery: RFC 9728 / MCP Authorization spec
+                # require unauthenticated requests to protected resources to receive
+                # 401 + WWW-Authenticate. Defer to _raise_preemptive_401_for_unauthenticated_servers
+                # for pass-through servers instead of surfacing a generic admission error.
+                mcp_servers_from_path = _parse_mcp_server_names_from_path(
+                    request_route, mcp_servers
+                )
+                client_ip = IPAddressUtils.get_mcp_client_ip(request)
+                if (
+                    mcp_servers_from_path is not None
+                    and not _has_client_supplied_mcp_auth(
+                        mcp_auth_header,
+                        mcp_server_auth_headers,
+                    )
+                    and _is_litellm_auth_admission_error(exc)
+                    and _is_mcp_passthrough_cold_start(
+                        mcp_servers_from_path, client_ip=client_ip
+                    )
+                ):
+                    verbose_logger.debug(
+                        "MCP pass-through cold start: deferring admission to route 401 emitter"
+                    )
+                    validated_user_api_key_auth = UserAPIKeyAuth()
+                else:
+                    raise
 
         return (
             validated_user_api_key_auth,
@@ -262,7 +409,9 @@
         return [servers_and_path]
 
     @staticmethod
-    def _target_servers_use_oauth2(path: str, mcp_servers: Optional[List[str]]) -> bool:
+    def _target_servers_use_oauth2(
+        path: str, mcp_servers: Optional[List[str]], client_ip: Optional[str]
+    ) -> bool:
         """
         True only when EVERY MCP server the request targets is configured for
         ``auth_type == oauth2``. If any target is non-OAuth2 — or if the target
@@ -291,14 +440,16 @@
             return False
 
         for name in target_names:
-            server = global_mcp_server_manager.get_mcp_server_by_name(name)
+            server = global_mcp_server_manager.get_mcp_server_by_name(
+                name, client_ip=client_ip
+            )
             if server is None or server.auth_type != MCPAuth.oauth2:
                 return False
         return True
 
     @staticmethod
     def _target_servers_delegate_auth_to_upstream(
-        path: str, mcp_servers: Optional[List[str]]
+        path: str, mcp_servers: Optional[List[str]], client_ip: Optional[str]
     ) -> bool:
         """
         True only when EVERY MCP server the request targets is configured for
@@ -328,7 +479,9 @@
             return False
 
         for name in target_names:
-            server = global_mcp_server_manager.get_mcp_server_by_name(name)
+            server = global_mcp_server_manager.get_mcp_server_by_name(
+                name, client_ip=client_ip
+            )
             if server is None or server.auth_type != MCPAuth.oauth2:
                 return False
             # `is True` is intentional: opt-in must be an explicit boolean
@@ -1035,22 +1188,21 @@
             )
             return []
 
-    # Sentinel stored in cache when an org has no object_permission, so we
-    # don't re-query the DB on every MCP request for that org.
-    _ORG_NO_PERMISSION_SENTINEL = "__org_no_mcp_permission__"
-
     @staticmethod
     async def _get_org_object_permission(
         user_api_key_auth: Optional[UserAPIKeyAuth] = None,
     ):
         """
-        Get org object_permission, using user_api_key_cache to avoid DB hits on every request.
-
-        Caches both positive results and the absence of an object_permission so that orgs
-        with no MCP permissions configured (the common default) do not trigger a DB query
-        on every request.
+        Get org object_permission via the established ``get_org_object`` /
+        ``get_object_permission`` helpers so MCP requests share the same
+        ``user_api_key_cache`` entries as the rest of the proxy.
         """
-        from litellm.proxy.proxy_server import prisma_client, user_api_key_cache
+        from litellm.proxy.auth.auth_checks import get_object_permission, get_org_object
+        from litellm.proxy.proxy_server import (
+            prisma_client,
+            proxy_logging_obj,
+            user_api_key_cache,
+        )
 
         if not user_api_key_auth or not user_api_key_auth.org_id:
             return None
@@ -1059,45 +1211,25 @@
             verbose_logger.debug("prisma_client is None")
             return None
 
-        org_id = user_api_key_auth.org_id
-        cache_key = f"org_object_permission:{org_id}"
-
-        from litellm.proxy._types import LiteLLM_ObjectPermissionTable
-
         try:
-            cached = await user_api_key_cache.async_get_cache(key=cache_key)
-            if cached is not None:
-                # Sentinel means the DB confirmed no object_permission for this org
-                if cached == MCPRequestHandler._ORG_NO_PERMISSION_SENTINEL:
-                    return None
-                # Redis deserialises to a plain dict; reconstruct the Pydantic model
-                # so callers can access .mcp_servers / .mcp_tool_permissions as attrs.
-                if isinstance(cached, dict):
-                    return LiteLLM_ObjectPermissionTable(**cached)
-                return cached
-
-            org_row = await prisma_client.db.litellm_organizationtable.find_unique(
-                where={"organization_id": org_id},
-                include={"object_permission": True},
+            org_obj = await get_org_object(
+                org_id=user_api_key_auth.org_id,
+                prisma_client=prisma_client,
+                user_api_key_cache=user_api_key_cache,
+                parent_otel_span=user_api_key_auth.parent_otel_span,
+                proxy_logging_obj=proxy_logging_obj,
             )
 
-            if org_row is None or org_row.object_permission is None:
-                # Cache the negative result so subsequent calls skip the DB
-                await user_api_key_cache.async_set_cache(
-                    key=cache_key,
-                    value=MCPRequestHandler._ORG_NO_PERMISSION_SENTINEL,
-                )
+            if org_obj is None or not org_obj.object_permission_id:
                 return None
 
-            # Convert raw Prisma model → Pydantic before caching.  Caching the
-            # Pydantic .dict() ensures the value survives a Redis JSON round-trip
-            # as a plain dict that we can reconstruct above (same pattern used by
-            # get_end_user_object / get_team_object in auth_checks.py).
-            obj_perm = LiteLLM_ObjectPermissionTable(**org_row.object_permission.dict())
-            await user_api_key_cache.async_set_cache(
-                key=cache_key, value=obj_perm.dict()
+            return await get_object_permission(
+                object_permission_id=org_obj.object_permission_id,
+                prisma_client=prisma_client,
+                user_api_key_cache=user_api_key_cache,
+                parent_otel_span=user_api_key_auth.parent_otel_span,
+                proxy_logging_obj=proxy_logging_obj,
             )
-            return obj_perm
         except Exception as e:
             verbose_logger.warning(f"Failed to get org object permission: {str(e)}")
             return None
@@ -1218,16 +1350,26 @@
             )
             return []
 
+    # Sentinel stored in cache when an agent has no object_permission, so we
+    # don't re-query the DB on every MCP request for that agent.
+    _AGENT_NO_PERMISSION_SENTINEL = "__agent_no_mcp_permission__"
+
     @staticmethod
     async def _get_agent_object_permission(
         user_api_key_auth: Optional[UserAPIKeyAuth] = None,
     ):
         """
-        Fetch the agent's object_permission from the DB (single query).
-
-        Returns the object_permission object or None.
+        Get agent object_permission via the established ``get_object_permission``
+        helper. Caches the ``agent_id -> object_permission_id`` mapping so we
+        avoid re-reading the agent row on every request, and reuses the shared
+        ``object_permission_id`` cache populated by the org / team / key paths.
         """
-        from litellm.proxy.proxy_server import prisma_client
+        from litellm.proxy.auth.auth_checks import get_object_permission
+        from litellm.proxy.proxy_server import (
+            prisma_client,
+            proxy_logging_obj,
+            user_api_key_cache,
+        )
 
         if not user_api_key_auth or not user_api_key_auth.agent_id:
             return None
@@ -1236,15 +1378,42 @@
             verbose_logger.debug("prisma_client is None")
             return None
 
+        agent_id = user_api_key_auth.agent_id
+        cache_key = f"agent_object_permission_id:{agent_id}"
+
         try:
-            agent_row = await prisma_client.db.litellm_agentstable.find_unique(
-                where={"agent_id": user_api_key_auth.agent_id},
-                include={"object_permission": True},
+            object_permission_id: Optional[str] = (
+                await user_api_key_cache.async_get_cache(key=cache_key)
             )
-            if agent_row is None or agent_row.object_permission is None:
+
+            if object_permission_id == MCPRequestHandler._AGENT_NO_PERMISSION_SENTINEL:
                 return None
 
-            return agent_row.object_permission
+            if object_permission_id is None:
+                agent_row = await prisma_client.db.litellm_agentstable.find_unique(
+                    where={"agent_id": agent_id},
+                )
+                object_permission_id = (
+                    getattr(agent_row, "object_permission_id", None)
+                    if agent_row is not None
+                    else None
+                )
+                await user_api_key_cache.async_set_cache(
+                    key=cache_key,
+                    value=object_permission_id
+                    or MCPRequestHandler._AGENT_NO_PERMISSION_SENTINEL,
+                    ttl=DEFAULT_MANAGEMENT_OBJECT_IN_MEMORY_CACHE_TTL,
+                )
+                if not object_permission_id:
+                    return None
+
+            return await get_object_permission(
+                object_permission_id=object_permission_id,
+                prisma_client=prisma_client,
+                user_api_key_cache=user_api_key_cache,
+                parent_otel_span=user_api_key_auth.parent_otel_span,
+                proxy_logging_obj=proxy_logging_obj,
+            )
         except Exception as e:
             verbose_logger.warning(f"Failed to get agent object permission: {str(e)}")
             return None

diff --git a/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py b/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py
--- a/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py
+++ b/litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py
@@ -1,8 +1,11 @@
+import asyncio
 import html as _html
 import json
-from typing import Any, Dict, Optional
+import time
+from typing import Any, Dict, Optional, Tuple
 from urllib.parse import parse_qsl, urlencode, urlparse, urlunparse
 
+import httpx
 from fastapi import APIRouter, Form, HTTPException, Request
 from fastapi.responses import HTMLResponse, JSONResponse, RedirectResponse
 
@@ -26,11 +29,54 @@
 from litellm.types.mcp import MCPAuth
 from litellm.types.mcp_server.mcp_server_manager import MCPServer
 
+# TTL cache for upstream OAuth metadata fetched from pass-through MCP servers.
+# Keeps us from hammering the upstream IdP on each discovery request.
+# Keyed by (server_id, resource_url) → (expires_at_epoch, payload).
+# A payload of ``None`` is a negative-result entry that prevents repeated
+# upstream fetches when the IdP consistently has no metadata to serve.
+_OAUTH_METADATA_CACHE: Dict[Tuple[str, str], Tuple[float, Optional[dict]]] = {}
+_OAUTH_METADATA_CACHE_TTL_SECONDS = 300
+_OAUTH_METADATA_NEGATIVE_CACHE_TTL_SECONDS = 60
+_OAUTH_METADATA_CACHE_MAX_SIZE = 128
+# Per-(server_id, resource_url) async locks so concurrent discovery requests
+# coalesce onto a single upstream fetch instead of issuing N parallel calls.
+_OAUTH_METADATA_FETCH_LOCKS: Dict[Tuple[str, str], asyncio.Lock] = {}
+
 router = APIRouter(
     tags=["mcp"],
 )
 
 
+def _prune_oauth_metadata_cache(now: Optional[float] = None) -> None:
+    now = now if now is not None else time.time()
+    expired_cache_keys = [
+        cache_key
+        for cache_key, (expires_at, _payload) in _OAUTH_METADATA_CACHE.items()
+        if expires_at <= now
+    ]
+    for cache_key in expired_cache_keys:
+        _OAUTH_METADATA_CACHE.pop(cache_key, None)
+
+    if len(_OAUTH_METADATA_CACHE) > _OAUTH_METADATA_CACHE_MAX_SIZE:
+        overflow = len(_OAUTH_METADATA_CACHE) - _OAUTH_METADATA_CACHE_MAX_SIZE
+        cache_keys_by_expiry = sorted(
+            _OAUTH_METADATA_CACHE,
+            key=lambda cache_key: _OAUTH_METADATA_CACHE[cache_key][0],
+        )
+        for cache_key in cache_keys_by_expiry[:overflow]:
+            _OAUTH_METADATA_CACHE.pop(cache_key, None)
+
+    # Drop locks whose cache entry has been evicted and that aren't currently
+    # held; held locks stay so in-flight callers continue to coalesce.
+    for cache_key in list(_OAUTH_METADATA_FETCH_LOCKS):
+        if cache_key in _OAUTH_METADATA_CACHE:
+            continue
+        lock = _OAUTH_METADATA_FETCH_LOCKS.get(cache_key)
+        if lock is None or lock.locked():
+            continue
+        _OAUTH_METADATA_FETCH_LOCKS.pop(cache_key, None)
+
+
 def encode_state_with_base_url(
     base_url: str,
     original_state: str,
@@ -125,6 +171,17 @@
     return None
 
 
+def _normalize_for_token_comparison(value: Any) -> str:
+    """Stringify ``value`` for token-rule comparison.
+
+    Booleans are lower-cased so Python's ``True`` / ``False`` line up with
+    JSON-style ``"true"`` / ``"false"`` rules from admin config.
+    """
+    if isinstance(value, bool):
+        return "true" if value else "false"
+    return str(value)
+
+
 def _validate_token_response(
     token_response: Dict[str, Any],
     validation_rules: Dict[str, Any],
@@ -136,7 +193,9 @@
     ``token_response["team"]["enterprise_id"]``).  Top-level keys are tried first,
     then dot-split traversal.  All comparisons are string-coerced so that numeric
     values in the response (e.g. ``"org_id": 12345``) match string rules
-    (``"org_id": "12345"``).
+    (``"org_id": "12345"``).  Booleans are normalised to JSON-style ``"true"`` /
+    ``"false"`` so admin rules written as ``{"verified": "true"}`` match upstream
+    responses of ``{"verified": true}``.
     """
     for key, expected in validation_rules.items():
         actual: Any = token_response.get(key)
@@ -163,7 +222,9 @@
                     ),
                 },
             )
-        if str(actual) != str(expected):
+        if _normalize_for_token_comparison(actual) != _normalize_for_token_comparison(
+            expected
+        ):
             raise HTTPException(
                 status_code=403,
                 detail={
@@ -400,6 +461,11 @@
         headers={"Accept": "application/json"},
         data=token_data,
     )
+    if response is None:
+        raise HTTPException(
+            status_code=502,
+            detail="MCP upstream token endpoint returned no response",
+        )
 
     response.raise_for_status()
     token_response = response.json()
@@ -505,6 +571,11 @@
         headers=headers,
         json=register_data,
     )
+    if response is None:
+        raise HTTPException(
+            status_code=502,
+            detail="MCP upstream registration endpoint returned no response",
+        )
     response.raise_for_status()
 
     token_response = response.json()
@@ -766,7 +837,119 @@
 """
 
 
-def _build_oauth_protected_resource_response(
+async def fetch_upstream_oauth_protected_resource(
+    mcp_server: MCPServer,
+) -> Optional[dict]:
+    """Fetch the upstream MCP server's ``.well-known/oauth-protected-resource``
+    metadata for a pass-through server.
+
+    Tries host-only first, then falls back to the RFC 9728 §3.1 path-suffix
+    form (e.g. ``https://host/.well-known/oauth-protected-resource/mcp``) to
+    cover upstreams that scope metadata per resource path.
+
+    Responses are cached in-process for ~5 minutes keyed on
+    ``(server_id, resource_url)`` so we do not hammer the IdP.
+
+    Returns the parsed JSON dict on success, or ``None`` if neither form
+    responds with a 2xx JSON payload. Raises on network/connection errors so
+    the caller can emit HTTP 502 rather than fabricate a gateway response.
+    """
+    if not mcp_server.url:
+        return None
+
+    upstream = urlparse(mcp_server.url)
+    if not upstream.scheme or not upstream.netloc:
+        return None
+
+    cache_key = (mcp_server.server_id, mcp_server.url)
+    now = time.time()
+    _prune_oauth_metadata_cache(now)
+    cached = _OAUTH_METADATA_CACHE.get(cache_key)
+    if cached is not None and cached[0] > now:
+        return cached[1]
+
+    lock = _OAUTH_METADATA_FETCH_LOCKS.setdefault(cache_key, asyncio.Lock())
+    async with lock:
+        now = time.time()
+        cached = _OAUTH_METADATA_CACHE.get(cache_key)
+        if cached is not None and cached[0] > now:
+            return cached[1]
+
+        host_base = f"{upstream.scheme}://{upstream.netloc}"
+        candidates = [f"{host_base}/.well-known/oauth-protected-resource"]
+        # RFC 9728 §3.1 path fallback
+        if upstream.path and upstream.path not in ("", "/"):
+            candidates.append(
+                f"{host_base}/.well-known/oauth-protected-resource"
+                f"{upstream.path.rstrip('/')}"
+            )
+
+        async_client = get_async_httpx_client(
+            llm_provider=httpxSpecialProvider.Oauth2Check
+        )
+
+        network_errors: list[Exception] = []
+        for candidate in candidates:
+            try:
+                response = await async_client.get(
+                    candidate,
+                    headers={"Accept": "application/json"},
+                )
+            except Exception as exc:
+                if is_network_error(exc):
+                    network_errors.append(exc)
+                else:
+                    verbose_logger.warning(
+                        "MCP OAuth metadata fetch for %s raised non-transport "
+                        "%s: %s — treating as no metadata for this candidate",
+                        candidate,
+                        type(exc).__name__,
+                        exc,
+                    )
+                continue
+            if response.status_code == 200:
+                try:
+                    payload = response.json()
+                except Exception as exc:
+                    verbose_logger.warning(
+                        "MCP OAuth metadata at %s returned 200 but JSON "
+                        "decode failed (%s: %s) — treating as no metadata",
+                        candidate,
+                        type(exc).__name__,
+                        exc,
+                    )
+                    continue
+                if isinstance(payload, dict):
+                    now = time.time()
+                    _OAUTH_METADATA_CACHE[cache_key] = (
+                        now + _OAUTH_METADATA_CACHE_TTL_SECONDS,
+                        payload,
+                    )
+                    _prune_oauth_metadata_cache(now)
+                    return payload
+
+        if len(network_errors) == len(candidates):
+            raise network_errors[-1]
+
+        # Negative-result caching: when no candidate yielded a usable payload,
+        # remember that for a shorter TTL so we don't re-fetch on every
+        # subsequent discovery request (and so the per-key lock can be pruned).
+        now = time.time()
+        _OAUTH_METADATA_CACHE[cache_key] = (
+            now + _OAUTH_METADATA_NEGATIVE_CACHE_TTL_SECONDS,
+            None,
+        )
+        _prune_oauth_metadata_cache(now)
+        return None
... diff truncated: showing 800 of 6799 lines

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit 98472c9. Configure here.}

Mirror the retry-loop pattern from #28888 (which fixed backend/Dockerfile, gateway/Dockerfile, and migrations/Dockerfile) into docker/Dockerfile.database. The build_docker_database_image CI job has been intermittently failing with "remote server returned error (try 'apk update')" when apk.cgr.dev flakes mid-fetch; bumping the wolfi-base SHA doesn't address the mirror, only a retry does. Same explicit-failure form as #28888: exit non-zero on the 3rd miss instead of silently succeeding because `sleep 5` was the last command in the `&& break || sleep 5` chain.

mateo-berri · 2026-05-26T22:55:29Z

@greptileai

Move _raise_preemptive_401_for_unauthenticated_servers after toolset scoping in both the StreamableHTTP and SSE handlers, and add an optional allowed_server_ids parameter so passthrough/oauth2 servers that the active toolset excludes no longer trigger a spurious 401 challenge. Without this, a client targeting a toolset whose scope excludes a passthrough server could be pushed into an OAuth flow for a server it would be 403'd on immediately after authentication. Co-authored-by: Yassin Kortam <yassin@berri.ai>

mateo-berri · 2026-05-26T23:03:09Z

@greptileai

Sameerlite

nit

Sameerlite · 2026-05-28T11:14:05Z

@@ -1,8 +1,8 @@
 # Base image for building
-ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31
+ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:5743937d521cbeb9e8c73bf1bd7ba2589c178940eb03d7b148efecc962be8587


WHy do we need these changes for this PR?

Sameerlite · 2026-05-28T11:14:17Z

+        nodejs \
+        npm \
+        libsndfile && break; \
+      [ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \


Again? Why do we need this change?

Good catch, reverted

…/JWT PR These Docker changes are out of scope for the MCP OAuth passthrough + JWT auth work and duplicate the build-reliability fix already merged to litellm_internal_staging in #28888, which adds the same apk retry loop on the componentized backend/gateway/migrations Dockerfiles and also fixes the underlying nodeenv/libatomic root cause. Restoring docker/Dockerfile.database and docker/Dockerfile.non_root to the base so this PR is purely the MCP/JWT change.

Sameerlite

LGTM

…itellm_feat/v1.84.0-mcp-gateway-jwt-auth # Conflicts: # litellm/proxy/_experimental/mcp_server/server.py

mateo-berri · 2026-06-02T13:09:02Z

@greptileai

The single-server pass-through path converted an upstream MCPUpstreamAuthError into an HTTPException, but list_tool_rest_api only re-raised 401s; an upstream 403 (valid token, insufficient scope) collapsed into a 200 response with error=unexpected_error, so clients never saw the status or WWW-Authenticate challenge needed to refresh scopes. Let MCPUpstreamAuthError propagate and convert it once in list_tool_rest_api so both 401 and 403 reach the client, while internal access/IP 403s keep the legacy error-dict shape.

mateo-berri · 2026-06-02T13:45:55Z

@greptileai

…nset When use_x_forwarded_for is enabled but mcp_trusted_proxy_ranges is not configured, get_mcp_client_ip previously fell back to the direct peer IP. Behind an internal reverse proxy that peer is the proxy's private address, so every external caller was classified as internal and could reach MCP servers with available_on_public_internet=false. Return an empty string in that case so is_internal_ip treats the caller as external.

mateo-berri · 2026-06-02T13:48:40Z

@greptileai

mateo-berri · 2026-06-02T14:10:11Z

Replying to @greptile-apps[bot]'s summary comment (#issuecomment-4499663072) — the two awareness items in the latest 4/5 summary are non-actionable on this branch, and the summary itself already labels each "not a current defect."

1. _call_regular_mcp_tool default user_api_key_auth=None (the _should_strip_caller_authorization footgun)

None is a legitimate value along this whole path, not an accident waiting to happen. _should_strip_caller_authorization is written to handle it and fails safe: with user_api_key_auth is None and no x-litellm-api-key admission header it returns True, so the caller's Authorization is dropped rather than forwarded; the worst case is an upstream 401, never a LiteLLM API key leaking upstream. The sole caller of _call_regular_mcp_tool is call_tool, which always passes user_api_key_auth=user_api_key_auth (mcp_server_manager.py:3267), so the default is never exercised today. Removing the default from _call_regular_mcp_tool would not even close the "future call site" gap, because call_tool itself is typed user_api_key_auth: Optional[UserAPIKeyAuth] = None and _prepare_mcp_server_headers (server.py) does the same for the genuine anonymous cold-start path; scrubbing None everywhere would be a cascading signature change across call_tool, _prepare_mcp_server_headers, and its call sites for a concern this summary itself rates non-exploitable. That is exactly the speculative future-proofing CLAUDE.md asks us not to add, so it stays as-is.

2. _OAUTH_METADATA_FETCH_LOCKS has no independent size cap (discoverable_endpoints.py)

The lock dict is already bounded on two independent axes. The cache key is (mcp_server.server_id, mcp_server.url) (line 864); both come from admin-configured MCP servers, so the key space is finite and not attacker-controllable. On top of that, _prune_oauth_metadata_cache caps the cache at _OAUTH_METADATA_CACHE_MAX_SIZE = 128 and then drops every lock whose cache entry has been evicted and that is not currently held (lines 69-77), so the dict converges to the locks for in-cache keys (at most 128) plus the locks for in-flight fetches. The held-lock carve-out is deliberate: a blunt numeric cap that evicted a lock for a key whose fetch is still running would let a concurrent caller setdefault a fresh lock for the same key and break the single-flight coalescing the lock exists to provide. An independent size cap would be redundant at best and coalescing-breaking at worst.

No code changes; both items are working as intended.

greptile-apps · 2026-06-02T14:10:54Z

Both explanations are sound — acknowledging and closing out both items.

Item 1 (user_api_key_auth=None default): The fail-safe direction is correct — None strips rather than forwards, so the worst reachable outcome is an upstream 401, not a credential leak. The only live call site always supplies the argument explicitly, and cascading the signature change to remove the default would touch call_tool, _prepare_mcp_server_headers, and their call sites for a purely speculative future-caller concern. That's exactly the kind of preemptive hardening CLAUDE.md asks to avoid. Non-issue.

Item 2 (_OAUTH_METADATA_FETCH_LOCKS size): The two-axis bound you describe is correct: the key space is admin-controlled (not attacker-reachable), and _prune_oauth_metadata_cache converges the dict to at most 128 entries plus in-flight locks. The held-lock carve-out is the right design — a blunt numeric eviction of an in-flight lock would break single-flight coalescing for the same key, creating a race rather than preventing one. An independent cap adds no safety and introduces a new failure mode. Non-issue.

No changes needed. Both are working as intended.

The #29809 and #29960 cherry-picks landed onto v1.88.0, which predates the per-user env-var rotation feature (#28917) and the MCP upstream-auth exception module (#28356). Git's conflict resolution bundled those intermediate-commit tests into the picked test files, where they fail against this base because the code under test does not exist here (encrypt_value_helper, the env-var rotation helper, and litellm.proxy._experimental.mcp_server.exceptions.MCPUpstreamAuthError). Remove the leaked tests; the genuine regression tests added by #29809 (OBO refresh-on-expiry) and #29960 (admin-only include_disabled_tools) stay and pass.

…9.0) (#93) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/berriai/litellm](https://images.chainguard.dev/directory/image/wolfi-base/overview) ([source](https://github.com/BerriAI/litellm)) | minor | `v1.88.1` → `v1.89.0` | --- ### Release Notes <details> <summary>BerriAI/litellm (ghcr.io/berriai/litellm)</summary> ### [`v1.89.0`](https://github.com/BerriAI/litellm/releases/tag/v1.89.0) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.89.0...v1.89.0) ##### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.89.0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** ##### What's Changed - test(responses): bump deprecated gemini-3-pro-preview to gemini-3.1-pro-preview by [@mateo-berri](https://github.com/mateo-berri) in [#29433](https://github.com/BerriAI/litellm/pull/29433) - fix: map mistral/ministral-8b-latest in model price map by [@mateo-berri](https://github.com/mateo-berri) in [#29453](https://github.com/BerriAI/litellm/pull/29453) - fix(datadog): split oversized batches on 413 instead of re-queueing forever by [@yassin-berriai](https://github.com/yassin-berriai) in [#29444](https://github.com/BerriAI/litellm/pull/29444) - feat(otel): allowlist team\_metadata sub-keys promoted to baggage by [@yassin-berriai](https://github.com/yassin-berriai) in [#29442](https://github.com/BerriAI/litellm/pull/29442) - fix: stop use\_chat\_completions\_api flag from leaking into provider request body by [@mateo-berri](https://github.com/mateo-berri) in [#29447](https://github.com/BerriAI/litellm/pull/29447) - fix(anthropic, fireworks): inline legacy $ref defs in tool schemas by [@milan-berri](https://github.com/milan-berri) in [#28646](https://github.com/BerriAI/litellm/pull/28646) - fix(proxy): omit OpenAI \[DONE] on google-genai streamGenerateContent by [@Sameerlite](https://github.com/Sameerlite) in [#29426](https://github.com/BerriAI/litellm/pull/29426) - ci(release): create stable/X.Y.x line branch on X.Y.0 tags by [@yuneng-berri](https://github.com/yuneng-berri) in [#29457](https://github.com/BerriAI/litellm/pull/29457) - fix(vector-stores): support engines URL for Vertex AI Search by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#27885](https://github.com/BerriAI/litellm/pull/27885) - fix(ui): render caller-supplied filter options in caller order by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29462](https://github.com/BerriAI/litellm/pull/29462) - fix(batches): skip unnecessary batch input file reads by [@Sameerlite](https://github.com/Sameerlite) in [#29114](https://github.com/BerriAI/litellm/pull/29114) - docs(agents): clarify when to create new test files by [@Sameerlite](https://github.com/Sameerlite) in [#29472](https://github.com/BerriAI/litellm/pull/29472) - Litellm OSS Staging by [@Sameerlite](https://github.com/Sameerlite) in [#29161](https://github.com/BerriAI/litellm/pull/29161) - fix(mcp): clear allowed\_tools and tool overrides on MCP server edit by [@Sameerlite](https://github.com/Sameerlite) in [#29411](https://github.com/BerriAI/litellm/pull/29411) - Litellm OSS Staging 010626 by [@Sameerlite](https://github.com/Sameerlite) in [#29422](https://github.com/BerriAI/litellm/pull/29422) - fix(ci): make CircleCI rerun-failed-tests collect tests when 2+ test files fail by [@mateo-berri](https://github.com/mateo-berri) in [#29475](https://github.com/BerriAI/litellm/pull/29475) - feat(a2a): watsonx Orchestrate agent provider by [@Sameerlite](https://github.com/Sameerlite) in [#29410](https://github.com/BerriAI/litellm/pull/29410) - fix(azure\_ai): strip tool-level extra fields on 400 and retry by [@Sameerlite](https://github.com/Sameerlite) in [#29479](https://github.com/BerriAI/litellm/pull/29479) - fix(docs): remove fixed dimensions from README hero image by [@mateo-berri](https://github.com/mateo-berri) in [#29496](https://github.com/BerriAI/litellm/pull/29496) - Litellm oss staging by [@Sameerlite](https://github.com/Sameerlite) in [#29492](https://github.com/BerriAI/litellm/pull/29492) - fix: small CLAUDE.md nits by [@mateo-berri](https://github.com/mateo-berri) in [#29504](https://github.com/BerriAI/litellm/pull/29504) - Add MCP semantic conventions to otelv2 by [@yassin-berriai](https://github.com/yassin-berriai) in [#29468](https://github.com/BerriAI/litellm/pull/29468) - fix(passthrough): emit otel guardrail span when a guardrail blocks by [@yassin-berriai](https://github.com/yassin-berriai) in [#29470](https://github.com/BerriAI/litellm/pull/29470) - fix(proxy): strip NUL bytes from spend log payloads to prevent PostgreSQL 22P05 by [@milan-berri](https://github.com/milan-berri) in [#29515](https://github.com/BerriAI/litellm/pull/29515) - \[internal copy of [#28008](https://github.com/BerriAI/litellm/issues/28008)] Support MCP OAuth passthrough and issuer-scoped JWT auth by [@mateo-berri](https://github.com/mateo-berri) in [#28356](https://github.com/BerriAI/litellm/pull/28356) - feat(vector-stores): forward per-request params to Vertex AI Search by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29459](https://github.com/BerriAI/litellm/pull/29459) - feat(proxy): add per-MCP-server RPM rate limiting for keys and teams by [@Sameerlite](https://github.com/Sameerlite) in [#29482](https://github.com/BerriAI/litellm/pull/29482) - fix(tests): drop module-level test calls that break local\_testing collection by [@mateo-berri](https://github.com/mateo-berri) in [#29520](https://github.com/BerriAI/litellm/pull/29520) - feat(agents): add LangFlow agent provider with A2A session bridging by [@Sameerlite](https://github.com/Sameerlite) in [#28963](https://github.com/BerriAI/litellm/pull/28963) - fix(ui/agents): make A2A skill tags enterable and validated by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29512](https://github.com/BerriAI/litellm/pull/29512) - \[internal copy of [#29232](https://github.com/BerriAI/litellm/issues/29232)] feat: route future Claude models to Anthropic provider via pattern matching by [@mateo-berri](https://github.com/mateo-berri) in [#29239](https://github.com/BerriAI/litellm/pull/29239) - fix(tests): drop import-time completion call in test\_register\_model by [@mateo-berri](https://github.com/mateo-berri) in [#29521](https://github.com/BerriAI/litellm/pull/29521) - test: stabilize batch VCR coverage and stop live upload/network leaks by [@mateo-berri](https://github.com/mateo-berri) in [#29477](https://github.com/BerriAI/litellm/pull/29477) - \[internal copy of [#29003](https://github.com/BerriAI/litellm/issues/29003)] fix(vertex\_ai): use user-supplied api\_base as is for Model Garden OpenAI-compat path by [@mateo-berri](https://github.com/mateo-berri) in [#29530](https://github.com/BerriAI/litellm/pull/29530) - feat(proxy): native /health/drain preStop hook for graceful shutdown by [@yassin-berriai](https://github.com/yassin-berriai) in [#29439](https://github.com/BerriAI/litellm/pull/29439) - fix(auth): preserve 401 status for expired JWTs in OTel traces by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29510](https://github.com/BerriAI/litellm/pull/29510) - fix(otel): capture 401 error details in management endpoint spans by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29535](https://github.com/BerriAI/litellm/pull/29535) - test(proxy/utils): pin bottom-of-file helper behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29509](https://github.com/BerriAI/litellm/pull/29509) - test(proxy/utils): pin PrismaClient and spend-update behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29488](https://github.com/BerriAI/litellm/pull/29488) - test(proxy/utils): pin ProxyLogging behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29485](https://github.com/BerriAI/litellm/pull/29485) - fix: missing span for guardrail passthrough by [@yassin-berriai](https://github.com/yassin-berriai) in [#29552](https://github.com/BerriAI/litellm/pull/29552) - fix(auth): let internal users view search tools by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29542](https://github.com/BerriAI/litellm/pull/29542) - fix: missing mcp otel attributes by [@yassin-berriai](https://github.com/yassin-berriai) in [#29554](https://github.com/BerriAI/litellm/pull/29554) - fix(proxy): resolve managed video model ids for auth by [@shivamrawat1](https://github.com/shivamrawat1) in [#29545](https://github.com/BerriAI/litellm/pull/29545) - fix(key\_generate): allow team members to create keys on org-scoped teams by [@milan-berri](https://github.com/milan-berri) in [#29310](https://github.com/BerriAI/litellm/pull/29310) - test(pass-through): move Gemini pass-through tests to gemini-3.1-flash-lite by [@mateo-berri](https://github.com/mateo-berri) in [#29595](https://github.com/BerriAI/litellm/pull/29595) - Litellm oss staging 030626 by [@Sameerlite](https://github.com/Sameerlite) in [#29578](https://github.com/BerriAI/litellm/pull/29578) - Fix : a2a bugs 030626 by [@Sameerlite](https://github.com/Sameerlite) in [#29566](https://github.com/BerriAI/litellm/pull/29566) - \[internal copy of [#29533](https://github.com/BerriAI/litellm/issues/29533)] fix(anthropic/adapter): emit thinking block for reasoning\_content-only streaming chunks by [@mateo-berri](https://github.com/mateo-berri) in [#29600](https://github.com/BerriAI/litellm/pull/29600) - ci: reproduce default-Windows wheel install to guard MAX\_PATH by [@yuneng-berri](https://github.com/yuneng-berri) in [#29597](https://github.com/BerriAI/litellm/pull/29597) - fix(vertex): strip output\_config.effort for Vertex Claude models that reject it (Haiku 4.5) by [@mateo-berri](https://github.com/mateo-berri) in [#29585](https://github.com/BerriAI/litellm/pull/29585) - Litellm websocket improvements by [@Sameerlite](https://github.com/Sameerlite) in [#29563](https://github.com/BerriAI/litellm/pull/29563) - feat(arize/phoenix): OpenInference rendering parity — tool\_calls, cost, passthrough I/O, session/user, multimodal, cache tokens by [@milan-berri](https://github.com/milan-berri) in [#28800](https://github.com/BerriAI/litellm/pull/28800) - \[internal copy of [#29550](https://github.com/BerriAI/litellm/issues/29550)] fix: passthrough endpoints duplicate logs by [@mateo-berri](https://github.com/mateo-berri) in [#29598](https://github.com/BerriAI/litellm/pull/29598) - fix(ci): keep coverage rename green when a parallel node runs no tests by [@mateo-berri](https://github.com/mateo-berri) in [#29608](https://github.com/BerriAI/litellm/pull/29608) - test(vcr): close out the remaining VCR live-call leaks by [@mateo-berri](https://github.com/mateo-berri) in [#29603](https://github.com/BerriAI/litellm/pull/29603) - fix(key\_generate): exempt UI/CLI session tokens from the budget ceiling for team keys by [@yuneng-berri](https://github.com/yuneng-berri) in [#29612](https://github.com/BerriAI/litellm/pull/29612) - fix(realtime): allow null transcripts in stream logging payloads by [@milan-berri](https://github.com/milan-berri) in [#29625](https://github.com/BerriAI/litellm/pull/29625) - build(ui): migrate eslint to flat config + bump eslint-config-next to 16 by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29626](https://github.com/BerriAI/litellm/pull/29626) - fix(key\_generate): scope session-token team-key budget exemption to caller-supplied team\_id by [@yuneng-berri](https://github.com/yuneng-berri) in [#29641](https://github.com/BerriAI/litellm/pull/29641) - fix(proxy): disable proxy buffering on streaming SSE responses by [@mateo-berri](https://github.com/mateo-berri) in [#29557](https://github.com/BerriAI/litellm/pull/29557) - fix(mcp): gate /public/mcp\_hub strictly on litellm.public\_mcp\_servers by [@michelligabriele](https://github.com/michelligabriele) in [#27764](https://github.com/BerriAI/litellm/pull/27764) - ci(ui): frontend-lint job enforcing prettier + eslint on changed files by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29633](https://github.com/BerriAI/litellm/pull/29633) - fix(gemini): googleSearch + server-side tools and googleMaps JSON schema by [@Sameerlite](https://github.com/Sameerlite) in [#29582](https://github.com/BerriAI/litellm/pull/29582) - fix(proxy): passthrough 404 when SERVER\_ROOT\_PATH is set by [@Sameerlite](https://github.com/Sameerlite) in [#29658](https://github.com/BerriAI/litellm/pull/29658) - fix(gemini-realtime): use GA event names for Pipecat 1.3.x compatibility by [@Sameerlite](https://github.com/Sameerlite) in [#29662](https://github.com/BerriAI/litellm/pull/29662) - Litellm oss staging 040626 by [@Sameerlite](https://github.com/Sameerlite) in [#29671](https://github.com/BerriAI/litellm/pull/29671) - style(ui): prettier formatting pass over the dashboard by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29622](https://github.com/BerriAI/litellm/pull/29622) - chore: ignore prettier dashboard reformat in git blame by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29695](https://github.com/BerriAI/litellm/pull/29695) - fix(helm): Enable Backend Deployment to mount Gateway config.yaml by [@tin-berri](https://github.com/tin-berri) in [#29605](https://github.com/BerriAI/litellm/pull/29605) - \[internal copy of [#29277](https://github.com/BerriAI/litellm/issues/29277)] fix(proxy): add default=None to LiteLLM\_TeamMembership.litellm\_budget\_table by [@mateo-berri](https://github.com/mateo-berri) in [#29684](https://github.com/BerriAI/litellm/pull/29684) - test: make custom\_tokenizer proxy tests hermetic by [@yuneng-berri](https://github.com/yuneng-berri) in [#29643](https://github.com/BerriAI/litellm/pull/29643) - test(proxy): stop running real-DB tests in GitHub Actions unit jobs by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29700](https://github.com/BerriAI/litellm/pull/29700) - chore(ui): remove the bare-fetch lint rule by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29712](https://github.com/BerriAI/litellm/pull/29712) - Litellm jwt mapping virtualkeys by [@shivamrawat1](https://github.com/shivamrawat1) in [#28510](https://github.com/BerriAI/litellm/pull/28510) - refactor(ui): shared HTTP client + location-pinned fetch() lint rule by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29723](https://github.com/BerriAI/litellm/pull/29723) - fix(proxy): stop team BYOK model name corruption on model edit by [@yuneng-berri](https://github.com/yuneng-berri) in [#29731](https://github.com/BerriAI/litellm/pull/29731) - \[internal copy of [#29511](https://github.com/BerriAI/litellm/issues/29511)] feat(guardrails): add sensitive data routing to on-premise models by [@mateo-berri](https://github.com/mateo-berri) in [#29531](https://github.com/BerriAI/litellm/pull/29531) - fix(proxy/hooks): populate llm\_provider on internal rate-limit errors by [@mateo-berri](https://github.com/mateo-berri) in [#27707](https://github.com/BerriAI/litellm/pull/27707) - fix(vertex/anthropic): handle namespace tools and strip client\_metadata for codex compatibility by [@Sameerlite](https://github.com/Sameerlite) in [#29489](https://github.com/BerriAI/litellm/pull/29489) - Support OAuth M2M for Databricks Apps A2A agents by [@mateo-berri](https://github.com/mateo-berri) in [#29586](https://github.com/BerriAI/litellm/pull/29586) - fix: small CLAUDE.md nit by [@mateo-berri](https://github.com/mateo-berri) in [#29749](https://github.com/BerriAI/litellm/pull/29749) - fix(anthropic): route Claude Opus 4.8 through adaptive thinking by [@mateo-berri](https://github.com/mateo-berri) in [#29702](https://github.com/BerriAI/litellm/pull/29702) - fix(proxy): persist oauth2\_flow on MCP server registration by [@michelligabriele](https://github.com/michelligabriele) in [#29690](https://github.com/BerriAI/litellm/pull/29690) - \[internal copy of [#27491](https://github.com/BerriAI/litellm/issues/27491)] fix(realtime): Fix Realtime Audio Token Cost Tracking by [@mateo-berri](https://github.com/mateo-berri) in [#29722](https://github.com/BerriAI/litellm/pull/29722) - fix(galileo): use ingest traces API and standard logging payload by [@Sameerlite](https://github.com/Sameerlite) in [#29651](https://github.com/BerriAI/litellm/pull/29651) - fix(auth): expand all-team-models sentinel in can\_key\_call\_model for batch validation by [@Sameerlite](https://github.com/Sameerlite) in [#29746](https://github.com/BerriAI/litellm/pull/29746) - test(vcr): stop refreshing cassette TTL on read so cassettes lapse after 24h by [@mateo-berri](https://github.com/mateo-berri) in [#29784](https://github.com/BerriAI/litellm/pull/29784) - test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound by [@mateo-berri](https://github.com/mateo-berri) in [#29787](https://github.com/BerriAI/litellm/pull/29787) - fix(ui): route MCP playground auth by oauth2 mode instead of token\_url by [@tin-berri](https://github.com/tin-berri) in [#29714](https://github.com/BerriAI/litellm/pull/29714) - refactor(ui): centralize proxy base URL resolution into tested resolver by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29793](https://github.com/BerriAI/litellm/pull/29793) - Litellm oss staging 050626 by [@Sameerlite](https://github.com/Sameerlite) in [#29774](https://github.com/BerriAI/litellm/pull/29774) - test(google): add google-genai SDK proxy integration tests by [@Sameerlite](https://github.com/Sameerlite) in [#29781](https://github.com/BerriAI/litellm/pull/29781) - fix(jwt): use resolved DB user\_id for spend on legacy email match by [@milan-berri](https://github.com/milan-berri) in [#29217](https://github.com/BerriAI/litellm/pull/29217) - feat(ui): generate dashboard API types from the proxy OpenAPI spec by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29816](https://github.com/BerriAI/litellm/pull/29816) - fix(proxy): drop deleted team BYOK model name from team.models by [@yuneng-berri](https://github.com/yuneng-berri) in [#29820](https://github.com/BerriAI/litellm/pull/29820) - feat(mcp): per-server env vars with global + per-user scopes by [@mateo-berri](https://github.com/mateo-berri) in [#28917](https://github.com/BerriAI/litellm/pull/28917) - refactor(ui): route behavior-preserving networking calls through apiClient by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29806](https://github.com/BerriAI/litellm/pull/29806) - fix(mcp): persist Tools-tab MCP OAuth token to DB by [@tin-berri](https://github.com/tin-berri) in [#29809](https://github.com/BerriAI/litellm/pull/29809) - fix(ui): require new expiration when regenerating an expired key by [@milan-berri](https://github.com/milan-berri) in [#29838](https://github.com/BerriAI/litellm/pull/29838) - refactor(ui): route query-building networking calls through apiClient by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29815](https://github.com/BerriAI/litellm/pull/29815) - Make the image-gen record/replay proxy report cache mode and per-request HIT/MISS by [@mateo-berri](https://github.com/mateo-berri) in [#29802](https://github.com/BerriAI/litellm/pull/29802) - feat(proxy): hot-reload .env in dev when running with --reload by [@mateo-berri](https://github.com/mateo-berri) in [#29783](https://github.com/BerriAI/litellm/pull/29783) - fix(ui): stop MCP playground tool calls from sending twice by [@tin-berri](https://github.com/tin-berri) in [#29821](https://github.com/BerriAI/litellm/pull/29821) - feat(fal\_ai): add Nano Banana / Gemini 2.5 Flash Image generation support by [@mateo-berri](https://github.com/mateo-berri) in [#29798](https://github.com/BerriAI/litellm/pull/29798) - Title: Fix managed batch cancel credential resolution by [@shivamrawat1](https://github.com/shivamrawat1) in [#29734](https://github.com/BerriAI/litellm/pull/29734) - Title: fix(proxy): resolve vector store file list credentials from team deployments by [@shivamrawat1](https://github.com/shivamrawat1) in [#29739](https://github.com/BerriAI/litellm/pull/29739) - refactor: convert AWS and GCP Terraform stacks into reusable modules … by [@yassin-berriai](https://github.com/yassin-berriai) in [#28103](https://github.com/BerriAI/litellm/pull/28103) - chore(ui): build ui for release by [@yuneng-berri](https://github.com/yuneng-berri) in [#29853](https://github.com/BerriAI/litellm/pull/29853) - fix(terraform/gcp): prompt for image\_registry in DeployStack one-click by [@yassin-berriai](https://github.com/yassin-berriai) in [#29852](https://github.com/BerriAI/litellm/pull/29852) - fix(terraform/gcp): abandon SQL user on destroy by [@yassin-berriai](https://github.com/yassin-berriai) in [#29855](https://github.com/BerriAI/litellm/pull/29855) - Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic by [@mateo-berri](https://github.com/mateo-berri) in [#29847](https://github.com/BerriAI/litellm/pull/29847) - chore(deps): bump deps by [@yuneng-berri](https://github.com/yuneng-berri) in [#29860](https://github.com/BerriAI/litellm/pull/29860) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29861](https://github.com/BerriAI/litellm/pull/29861) - fix: 400 on Anthropic context overflow; seed identity on failed auth by [@yassin-berriai](https://github.com/yassin-berriai) in [#29848](https://github.com/BerriAI/litellm/pull/29848) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29862](https://github.com/BerriAI/litellm/pull/29862) - chore(release): patch v1.89.0-rc.1 with [#30064](https://github.com/BerriAI/litellm/issues/30064) (Claude Fable 5) for v1.89.0-rc.2 by [@mateo-berri](https://github.com/mateo-berri) in [#30143](https://github.com/BerriAI/litellm/pull/30143) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.0...v1.89.0> ### [`v1.89.0`](https://github.com/BerriAI/litellm/releases/tag/v1.89.0) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.2...v1.89.0) ##### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.89.0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** ##### What's Changed - test(responses): bump deprecated gemini-3-pro-preview to gemini-3.1-pro-preview by [@mateo-berri](https://github.com/mateo-berri) in [#29433](https://github.com/BerriAI/litellm/pull/29433) - fix: map mistral/ministral-8b-latest in model price map by [@mateo-berri](https://github.com/mateo-berri) in [#29453](https://github.com/BerriAI/litellm/pull/29453) - fix(datadog): split oversized batches on 413 instead of re-queueing forever by [@yassin-berriai](https://github.com/yassin-berriai) in [#29444](https://github.com/BerriAI/litellm/pull/29444) - feat(otel): allowlist team\_metadata sub-keys promoted to baggage by [@yassin-berriai](https://github.com/yassin-berriai) in [#29442](https://github.com/BerriAI/litellm/pull/29442) - fix: stop use\_chat\_completions\_api flag from leaking into provider request body by [@mateo-berri](https://github.com/mateo-berri) in [#29447](https://github.com/BerriAI/litellm/pull/29447) - fix(anthropic, fireworks): inline legacy $ref defs in tool schemas by [@milan-berri](https://github.com/milan-berri) in [#28646](https://github.com/BerriAI/litellm/pull/28646) - fix(proxy): omit OpenAI \[DONE] on google-genai streamGenerateContent by [@Sameerlite](https://github.com/Sameerlite) in [#29426](https://github.com/BerriAI/litellm/pull/29426) - ci(release): create stable/X.Y.x line branch on X.Y.0 tags by [@yuneng-berri](https://github.com/yuneng-berri) in [#29457](https://github.com/BerriAI/litellm/pull/29457) - fix(vector-stores): support engines URL for Vertex AI Search by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#27885](https://github.com/BerriAI/litellm/pull/27885) - fix(ui): render caller-supplied filter options in caller order by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29462](https://github.com/BerriAI/litellm/pull/29462) - fix(batches): skip unnecessary batch input file reads by [@Sameerlite](https://github.com/Sameerlite) in [#29114](https://github.com/BerriAI/litellm/pull/29114) - docs(agents): clarify when to create new test files by [@Sameerlite](https://github.com/Sameerlite) in [#29472](https://github.com/BerriAI/litellm/pull/29472) - Litellm OSS Staging by [@Sameerlite](https://github.com/Sameerlite) in [#29161](https://github.com/BerriAI/litellm/pull/29161) - fix(mcp): clear allowed\_tools and tool overrides on MCP server edit by [@Sameerlite](https://github.com/Sameerlite) in [#29411](https://github.com/BerriAI/litellm/pull/29411) - Litellm OSS Staging 010626 by [@Sameerlite](https://github.com/Sameerlite) in [#29422](https://github.com/BerriAI/litellm/pull/29422) - fix(ci): make CircleCI rerun-failed-tests collect tests when 2+ test files fail by [@mateo-berri](https://github.com/mateo-berri) in [#29475](https://github.com/BerriAI/litellm/pull/29475) - feat(a2a): watsonx Orchestrate agent provider by [@Sameerlite](https://github.com/Sameerlite) in [#29410](https://github.com/BerriAI/litellm/pull/29410) - fix(azure\_ai): strip tool-level extra fields on 400 and retry by [@Sameerlite](https://github.com/Sameerlite) in [#29479](https://github.com/BerriAI/litellm/pull/29479) - fix(docs): remove fixed dimensions from README hero image by [@mateo-berri](https://github.com/mateo-berri) in [#29496](https://github.com/BerriAI/litellm/pull/29496) - Litellm oss staging by [@Sameerlite](https://github.com/Sameerlite) in [#29492](https://github.com/BerriAI/litellm/pull/29492) - fix: small CLAUDE.md nits by [@mateo-berri](https://github.com/mateo-berri) in [#29504](https://github.com/BerriAI/litellm/pull/29504) - Add MCP semantic conventions to otelv2 by [@yassin-berriai](https://github.com/yassin-berriai) in [#29468](https://github.com/BerriAI/litellm/pull/29468) - fix(passthrough): emit otel guardrail span when a guardrail blocks by [@yassin-berriai](https://github.com/yassin-berriai) in [#29470](https://github.com/BerriAI/litellm/pull/29470) - fix(proxy): strip NUL bytes from spend log payloads to prevent PostgreSQL 22P05 by [@milan-berri](https://github.com/milan-berri) in [#29515](https://github.com/BerriAI/litellm/pull/29515) - \[internal copy of [#28008](https://github.com/BerriAI/litellm/issues/28008)] Support MCP OAuth passthrough and issuer-scoped JWT auth by [@mateo-berri](https://github.com/mateo-berri) in [#28356](https://github.com/BerriAI/litellm/pull/28356) - feat(vector-stores): forward per-request params to Vertex AI Search by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29459](https://github.com/BerriAI/litellm/pull/29459) - feat(proxy): add per-MCP-server RPM rate limiting for keys and teams by [@Sameerlite](https://github.com/Sameerlite) in [#29482](https://github.com/BerriAI/litellm/pull/29482) - fix(tests): drop module-level test calls that break local\_testing collection by [@mateo-berri](https://github.com/mateo-berri) in [#29520](https://github.com/BerriAI/litellm/pull/29520) - feat(agents): add LangFlow agent provider with A2A session bridging by [@Sameerlite](https://github.com/Sameerlite) in [#28963](https://github.com/BerriAI/litellm/pull/28963) - fix(ui/agents): make A2A skill tags enterable and validated by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29512](https://github.com/BerriAI/litellm/pull/29512) - \[internal copy of [#29232](https://github.com/BerriAI/litellm/issues/29232)] feat: route future Claude models to Anthropic provider via pattern matching by [@mateo-berri](https://github.com/mateo-berri) in [#29239](https://github.com/BerriAI/litellm/pull/29239) - fix(tests): drop import-time completion call in test\_register\_model by [@mateo-berri](https://github.com/mateo-berri) in [#29521](https://github.com/BerriAI/litellm/pull/29521) - test: stabilize batch VCR coverage and stop live upload/network leaks by [@mateo-berri](https://github.com/mateo-berri) in [#29477](https://github.com/BerriAI/litellm/pull/29477) - \[internal copy of [#29003](https://github.com/BerriAI/litellm/issues/29003)] fix(vertex\_ai): use user-supplied api\_base as is for Model Garden OpenAI-compat path by [@mateo-berri](https://github.com/mateo-berri) in [#29530](https://github.com/BerriAI/litellm/pull/29530) - feat(proxy): native /health/drain preStop hook for graceful shutdown by [@yassin-berriai](https://github.com/yassin-berriai) in [#29439](https://github.com/BerriAI/litellm/pull/29439) - fix(auth): preserve 401 status for expired JWTs in OTel traces by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29510](https://github.com/BerriAI/litellm/pull/29510) - fix(otel): capture 401 error details in management endpoint spans by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29535](https://github.com/BerriAI/litellm/pull/29535) - test(proxy/utils): pin bottom-of-file helper behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29509](https://github.com/BerriAI/litellm/pull/29509) - test(proxy/utils): pin PrismaClient and spend-update behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29488](https://github.com/BerriAI/litellm/pull/29488) - test(proxy/utils): pin ProxyLogging behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29485](https://github.com/BerriAI/litellm/pull/29485) - fix: missing span for guardrail passthrough by [@yassin-berriai](https://github.com/yassin-berriai) in [#29552](https://github.com/BerriAI/litellm/pull/29552) - fix(auth): let internal users view search tools by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29542](https://github.com/BerriAI/litellm/pull/29542) - fix: missing mcp otel attributes by [@yassin-berriai](https://github.com/yassin-berriai) in [#29554](https://github.com/BerriAI/litellm/pull/29554) - fix(proxy): resolve managed video model ids for auth by [@shivamrawat1](https://github.com/shivamrawat1) in [#29545](https://github.com/BerriAI/litellm/pull/29545) - fix(key\_generate): allow team members to create keys on org-scoped teams by [@milan-berri](https://github.com/milan-berri) in [#29310](https://github.com/BerriAI/litellm/pull/29310) - test(pass-through): move Gemini pass-through tests to gemini-3.1-flash-lite by [@mateo-berri](https://github.com/mateo-berri) in [#29595](https://github.com/BerriAI/litellm/pull/29595) - Litellm oss staging 030626 by [@Sameerlite](https://github.com/Sameerlite) in [#29578](https://github.com/BerriAI/litellm/pull/29578) - Fix : a2a bugs 030626 by [@Sameerlite](https://github.com/Sameerlite) in [#29566](https://github.com/BerriAI/litellm/pull/29566) - \[internal copy of [#29533](https://github.com/BerriAI/litellm/issues/29533)] fix(anthropic/adapter): emit thinking block for reasoning\_content-only streaming chunks by [@mateo-berri](https://github.com/mateo-berri) in [#29600](https://github.com/BerriAI/litellm/pull/29600) - ci: reproduce default-Windows wheel install to guard MAX\_PATH by [@yuneng-berri](https://github.com/yuneng-berri) in [#29597](https://github.com/BerriAI/litellm/pull/29597) - fix(vertex): strip output\_config.effort for Vertex Claude models that reject it (Haiku 4.5) by [@mateo-berri](https://github.com/mateo-berri) in [#29585](https://github.com/BerriAI/litellm/pull/29585) - Litellm websocket improvements by [@Sameerlite](https://github.com/Sameerlite) in [#29563](https://github.com/BerriAI/litellm/pull/29563) - feat(arize/phoenix): OpenInference rendering parity — tool\_calls, cost, passthrough I/O, session/user, multimodal, cache tokens by [@milan-berri](https://github.com/milan-berri) in [#28800](https://github.com/BerriAI/litellm/pull/28800) - \[internal copy of [#29550](https://github.com/BerriAI/litellm/issues/29550)] fix: passthrough endpoints duplicate logs by [@mateo-berri](https://github.com/mateo-berri) in [#29598](https://github.com/BerriAI/litellm/pull/29598) - fix(ci): keep coverage rename green when a parallel node runs no tests by [@mateo-berri](https://github.com/mateo-berri) in [#29608](https://github.com/BerriAI/litellm/pull/29608) - test(vcr): close out the remaining VCR live-call leaks by [@mateo-berri](https://github.com/mateo-berri) in [#29603](https://github.com/BerriAI/litellm/pull/29603) - fix(key\_generate): exempt UI/CLI session tokens from the budget ceiling for team keys by [@yuneng-berri](https://github.com/yuneng-berri) in [#29612](https://github.com/BerriAI/litellm/pull/29612) - fix(realtime): allow null transcripts in stream logging payloads by [@milan-berri](https://github.com/milan-berri) in [#29625](https://github.com/BerriAI/litellm/pull/29625) - build(ui): migrate eslint to flat config + bump eslint-config-next to 16 by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29626](https://github.com/BerriAI/litellm/pull/29626) - fix(key\_generate): scope session-token team-key budget exemption to caller-supplied team\_id by [@yuneng-berri](https://github.com/yuneng-berri) in [#29641](https://github.com/BerriAI/litellm/pull/29641) - fix(proxy): disable proxy buffering on streaming SSE responses by [@mateo-berri](https://github.com/mateo-berri) in [#29557](https://github.com/BerriAI/litellm/pull/29557) - fix(mcp): gate /public/mcp\_hub strictly on litellm.public\_mcp\_servers by [@michelligabriele](https://github.com/michelligabriele) in [#27764](https://github.com/BerriAI/litellm/pull/27764) - ci(ui): frontend-lint job enforcing prettier + eslint on changed files by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29633](https://github.com/BerriAI/litellm/pull/29633) - fix(gemini): googleSearch + server-side tools and googleMaps JSON schema by [@Sameerlite](https://github.com/Sameerlite) in [#29582](https://github.com/BerriAI/litellm/pull/29582) - fix(proxy): passthrough 404 when SERVER\_ROOT\_PATH is set by [@Sameerlite](https://github.com/Sameerlite) in [#29658](https://github.com/BerriAI/litellm/pull/29658) - fix(gemini-realtime): use GA event names for Pipecat 1.3.x compatibility by [@Sameerlite](https://github.com/Sameerlite) in [#29662](https://github.com/BerriAI/litellm/pull/29662) - Litellm oss staging 040626 by [@Sameerlite](https://github.com/Sameerlite) in [#29671](https://github.com/BerriAI/litellm/pull/29671) - style(ui): prettier formatting pass over the dashboard by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29622](https://github.com/BerriAI/litellm/pull/29622) - chore: ignore prettier dashboard reformat in git blame by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29695](https://github.com/BerriAI/litellm/pull/29695) - fix(helm): Enable Backend Deployment to mount Gateway config.yaml by [@tin-berri](https://github.com/tin-berri) in [#29605](https://github.com/BerriAI/litellm/pull/29605) - \[internal copy of [#29277](https://github.com/BerriAI/litellm/issues/29277)] fix(proxy): add default=None to LiteLLM\_TeamMembership.litellm\_budget\_table by [@mateo-berri](https://github.com/mateo-berri) in [#29684](https://github.com/BerriAI/litellm/pull/29684) - test: make custom\_tokenizer proxy tests hermetic by [@yuneng-berri](https://github.com/yuneng-berri) in [#29643](https://github.com/BerriAI/litellm/pull/29643) - test(proxy): stop running real-DB tests in GitHub Actions unit jobs by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29700](https://github.com/BerriAI/litellm/pull/29700) - chore(ui): remove the bare-fetch lint rule by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29712](https://github.com/BerriAI/litellm/pull/29712) - Litellm jwt mapping virtualkeys by [@shivamrawat1](https://github.com/shivamrawat1) in [#28510](https://github.com/BerriAI/litellm/pull/28510) - refactor(ui): shared HTTP client + location-pinned fetch() lint rule by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29723](https://github.com/BerriAI/litellm/pull/29723) - fix(proxy): stop team BYOK model name corruption on model edit by [@yuneng-berri](https://github.com/yuneng-berri) in [#29731](https://github.com/BerriAI/litellm/pull/29731) - \[internal copy of [#29511](https://github.com/BerriAI/litellm/issues/29511)] feat(guardrails): add sensitive data routing to on-premise models by [@mateo-berri](https://github.com/mateo-berri) in [#29531](https://github.com/BerriAI/litellm/pull/29531) - fix(proxy/hooks): populate llm\_provider on internal rate-limit errors by [@mateo-berri](https://github.com/mateo-berri) in [#27707](https://github.com/BerriAI/litellm/pull/27707) - fix(vertex/anthropic): handle namespace tools and strip client\_metadata for codex compatibility by [@Sameerlite](https://github.com/Sameerlite) in [#29489](https://github.com/BerriAI/litellm/pull/29489) - Support OAuth M2M for Databricks Apps A2A agents by [@mateo-berri](https://github.com/mateo-berri) in [#29586](https://github.com/BerriAI/litellm/pull/29586) - fix: small CLAUDE.md nit by [@mateo-berri](https://github.com/mateo-berri) in [#29749](https://github.com/BerriAI/litellm/pull/29749) - fix(anthropic): route Claude Opus 4.8 through adaptive thinking by [@mateo-berri](https://github.com/mateo-berri) in [#29702](https://github.com/BerriAI/litellm/pull/29702) - fix(proxy): persist oauth2\_flow on MCP server registration by [@michelligabriele](https://github.com/michelligabriele) in [#29690](https://github.com/BerriAI/litellm/pull/29690) - \[internal copy of [#27491](https://github.com/BerriAI/litellm/issues/27491)] fix(realtime): Fix Realtime Audio Token Cost Tracking by [@mateo-berri](https://github.com/mateo-berri) in [#29722](https://github.com/BerriAI/litellm/pull/29722) - fix(galileo): use ingest traces API and standard logging payload by [@Sameerlite](https://github.com/Sameerlite) in [#29651](https://github.com/BerriAI/litellm/pull/29651) - fix(auth): expand all-team-models sentinel in can\_key\_call\_model for batch validation by [@Sameerlite](https://github.com/Sameerlite) in [#29746](https://github.com/BerriAI/litellm/pull/29746) - test(vcr): stop refreshing cassette TTL on read so cassettes lapse after 24h by [@mateo-berri](https://github.com/mateo-berri) in [#29784](https://github.com/BerriAI/litellm/pull/29784) - test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound by [@mateo-berri](https://github.com/mateo-berri) in [#29787](https://github.com/BerriAI/litellm/pull/29787) - fix(ui): route MCP playground auth by oauth2 mode instead of token\_url by [@tin-berri](https://github.com/tin-berri) in [#29714](https://github.com/BerriAI/litellm/pull/29714) - refactor(ui): centralize proxy base URL resolution into tested resolver by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29793](https://github.com/BerriAI/litellm/pull/29793) - Litellm oss staging 050626 by [@Sameerlite](https://github.com/Sameerlite) in [#29774](https://github.com/BerriAI/litellm/pull/29774) - test(google): add google-genai SDK proxy integration tests by [@Sameerlite](https://github.com/Sameerlite) in [#29781](https://github.com/BerriAI/litellm/pull/29781) - fix(jwt): use resolved DB user\_id for spend on legacy email match by [@milan-berri](https://github.com/milan-berri) in [#29217](https://github.com/BerriAI/litellm/pull/29217) - feat(ui): generate dashboard API types from the proxy OpenAPI spec by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29816](https://github.com/BerriAI/litellm/pull/29816) - fix(proxy): drop deleted team BYOK model name from team.models by [@yuneng-berri](https://github.com/yuneng-berri) in [#29820](https://github.com/BerriAI/litellm/pull/29820) - feat(mcp): per-server env vars with global + per-user scopes by [@mateo-berri](https://github.com/mateo-berri) in [#28917](https://github.com/BerriAI/litellm/pull/28917) - refactor(ui): route behavior-preserving networking calls through apiClient by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29806](https://github.com/BerriAI/litellm/pull/29806) - fix(mcp): persist Tools-tab MCP OAuth token to DB by [@tin-berri](https://github.com/tin-berri) in [#29809](https://github.com/BerriAI/litellm/pull/29809) - fix(ui): require new expiration when regenerating an expired key by [@milan-berri](https://github.com/milan-berri) in [#29838](https://github.com/BerriAI/litellm/pull/29838) - refactor(ui): route query-building networking calls through apiClient by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29815](https://github.com/BerriAI/litellm/pull/29815) - Make the image-gen record/replay proxy report cache mode and per-request HIT/MISS by [@mateo-berri](https://github.com/mateo-berri) in [#29802](https://github.com/BerriAI/litellm/pull/29802) - feat(proxy): hot-reload .env in dev when running with --reload by [@mateo-berri](https://github.com/mateo-berri) in [#29783](https://github.com/BerriAI/litellm/pull/29783) - fix(ui): stop MCP playground tool calls from sending twice by [@tin-berri](https://github.com/tin-berri) in [#29821](https://github.com/BerriAI/litellm/pull/29821) - feat(fal\_ai): add Nano Banana / Gemini 2.5 Flash Image generation support by [@mateo-berri](https://github.com/mateo-berri) in [#29798](https://github.com/BerriAI/litellm/pull/29798) - Title: Fix managed batch cancel credential resolution by [@shivamrawat1](https://github.com/shivamrawat1) in [#29734](https://github.com/BerriAI/litellm/pull/29734) - Title: fix(proxy): resolve vector store file list credentials from team deployments by [@shivamrawat1](https://github.com/shivamrawat1) in [#29739](https://github.com/BerriAI/litellm/pull/29739) - refactor: convert AWS and GCP Terraform stacks into reusable modules … by [@yassin-berriai](https://github.com/yassin-berriai) in [#28103](https://github.com/BerriAI/litellm/pull/28103) - chore(ui): build ui for release by [@yuneng-berri](https://github.com/yuneng-berri) in [#29853](https://github.com/BerriAI/litellm/pull/29853) - fix(terraform/gcp): prompt for image\_registry in DeployStack one-click by [@yassin-berriai](https://github.com/yassin-berriai) in [#29852](https://github.com/BerriAI/litellm/pull/29852) - fix(terraform/gcp): abandon SQL user on destroy by [@yassin-berriai](https://github.com/yassin-berriai) in [#29855](https://github.com/BerriAI/litellm/pull/29855) - Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic by [@mateo-berri](https://github.com/mateo-berri) in [#29847](https://github.com/BerriAI/litellm/pull/29847) - chore(deps): bump deps by [@yuneng-berri](https://github.com/yuneng-berri) in [#29860](https://github.com/BerriAI/litellm/pull/29860) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29861](https://github.com/BerriAI/litellm/pull/29861) - fix: 400 on Anthropic context overflow; seed identity on failed auth by [@yassin-berriai](https://github.com/yassin-berriai) in [#29848](https://github.com/BerriAI/litellm/pull/29848) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29862](https://github.com/BerriAI/litellm/pull/29862) - chore(release): patch v1.89.0-rc.1 with [#30064](https://github.com/BerriAI/litellm/issues/30064) (Claude Fable 5) for v1.89.0-rc.2 by [@mateo-berri](https://github.com/mateo-berri) in [#30143](https://github.com/BerriAI/litellm/pull/30143) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.0...v1.89.0> ### [`v1.88.2`](https://github.com/BerriAI/litellm/releases/tag/v1.88.2) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.2...v1.88.2) ##### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.88.2/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** ##### What's Changed - chore(release): backport Fable 5, batch-file auth, CrowdStrike AIDR, Mantle Responses SigV4, and NetApp streaming-cost fix to stable/1.88.x and cut 1.88.2 by [@mateo-berri](https://github.com/mateo-berri) in [#30144](https://github.com/BerriAI/litellm/pull/30144) - chore(release): backport DB-resilience, passthrough, model-info, budget, and deps fixes to stable/1.88.x by [@yuneng-berri](https://github.com/yuneng-berri) in [#30408](https://github.com/BerriAI/litellm/pull/30408) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.1...v1.88.2> ### [`v1.88.2`](https://github.com/BerriAI/litellm/releases/tag/v1.88.2) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.1...v1.88.2) ##### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.88.2/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** ##### What's Changed - chore(release): backport Fable 5, batch-file auth, CrowdStrike AIDR, Mantle Responses SigV4, and NetApp streaming-cost fix to stable/1.88.x and cut 1.88.2 by [@mateo-berri](https://github.com/mateo-berri) in [#30144](https://github.com/BerriAI/litellm/pull/30144) - chore(release): backport DB-resilience, passthrough, model-info, budget, and deps fixes to stable/1.88.x by [@yuneng-berri](https://github.com/yuneng-berri) in [#30408](https://github.com/BerriAI/litellm/pull/30408) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.1...v1.88.2> </details> --- ### Configuration 📅 **Schedule**: (in timezone Europe/London) - Branch creation - At any time (no schedule defined) - Automerge - At any time (no schedule defined) 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate).  Reviewed-on: https://forgejo.hayden.moe/hayden/phoebe/pulls/93

yuneng-berri and others added 30 commits May 8, 2026 18:42

Merge pull request #27505 from BerriAI/cherry-pick-metrics-optout-hin…

7e0347c

…t-rc2 fix(proxy): point /metrics 401 at the opt-out flag

Merge pull request #27539 from BerriAI/litellm_/budget-reservation-rc…

18c14d9

…2-backport fix(proxy): bound budget reservation per request (backport of #27509 to 1.84.0rc2)

Merge pull request #27545 from BerriAI/litellm_/charming-sammet-8d77c3

5bb2803

build(packaging): relax core runtime pins to ranges (rc2 backport of #27241)

Merge pull request #27554 from BerriAI/litellm_/charming-sammet-8d77c3

188875b

build(packaging): raise jinja2 floor to 3.1.6 (rc2 backport of #27552)

refactor(mcp): access has_client_credentials on MCPServer directly

c31ead8

Greptile: getattr default was redundant; property exists on MCPServer and mcp_server is non-None inside the extra_headers forwarding block. Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request #27768 from milan-berri/litellm_cherry-pick-27383-…

4046cb1

…onto-1.84.0rc2 cherry-pick: OpenAPI MCP extra_headers (#27383) onto litellm_1.84.0rc2

Merge pull request #27789 from BerriAI/litellm_/wizardly-sammet-94baac

74bb525

fix(proxy): always merge caller-supplied tags into request metadata

Merge pull request #27794 from BerriAI/litellm_/fervent-archimedes-7e…

037d4c5

…86be cherry-pick: reject bare str at file-input sinks (#27762) onto litellm_1.84.0rc2

Cache normalized SERVER_ROOT_PATH at middleware init

c14cab9

SERVER_ROOT_PATH is a process-startup env var. Read it once in __init__ instead of calling get_server_root_path() + rstrip on every request that arrives before all lazy features have loaded.

Merge pull request #27818 from BerriAI/litellm_backport_27812_to_1840rc2

8b90a54

[Fix] Lazy feature loading under SERVER_ROOT_PATH returns 404 (backport of #27812)

Merge pull request #27819 from BerriAI/litellm_backport_regenerate_gu…

2b189be

…ard_84rc2 [Fix] Backport /key/regenerate ownership-rebind + premium-gate guards (#27793)

Merge pull request #27868 from BerriAI/backport-pr-27866-into-1.84.0rc2

be0526b

fix(proxy): expose db status on public /health/readiness (backport #27866)

Merge pull request #27899 from BerriAI/backport-pr-27896-into-1.84.0rc2

305875f

fix(ui): fetch version + debug flag from /health/readiness/details (backport #27896)

chore: update Next.js build artifacts (2026-05-14 03:52 UTC, node v20…

02c5e1c

….20.2)

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread litellm/proxy/_experimental/mcp_server/server.py Outdated

Sameerlite reviewed May 28, 2026

View reviewed changes

mateo-berri requested a review from Sameerlite June 1, 2026 17:51

Sameerlite approved these changes Jun 2, 2026

View reviewed changes

Merge remote-tracking branch 'origin/litellm_internal_staging' into l…

06eb433

…itellm_feat/v1.84.0-mcp-gateway-jwt-auth # Conflicts: # litellm/proxy/_experimental/mcp_server/server.py

veria-ai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread litellm/proxy/auth/ip_address_utils.py Outdated

mateo-berri merged commit 6d6eda8 into litellm_internal_staging Jun 2, 2026
121 checks passed

mateo-berri deleted the litellm_feat/v1.84.0-mcp-gateway-jwt-auth branch June 2, 2026 19:22

tin-berri mentioned this pull request Jun 10, 2026

Adhoc v1.88.0 backport: MCP OAuth/Tools fixes + allowlist update-500 fix (#29411) + card grid #30136

Closed

6 tasks

Uh oh!

Conversation

mateo-berri commented May 20, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mateo-berri commented May 26, 2026

Uh oh!

mateo-berri commented May 26, 2026

Uh oh!

mateo-berri commented May 26, 2026

Uh oh!

greptile-apps Bot commented May 26, 2026

Uh oh!

mateo-berri commented May 26, 2026

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mateo-berri commented May 26, 2026

Uh oh!

mateo-berri commented May 26, 2026

Uh oh!

Sameerlite left a comment

Choose a reason for hiding this comment

Uh oh!

Sameerlite May 28, 2026

Choose a reason for hiding this comment

Uh oh!

mateo-berri May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sameerlite May 28, 2026

Choose a reason for hiding this comment

Uh oh!

mateo-berri May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sameerlite left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mateo-berri commented Jun 2, 2026

Uh oh!

mateo-berri commented Jun 2, 2026

Uh oh!

mateo-berri commented Jun 2, 2026

Uh oh!

mateo-berri commented Jun 2, 2026

Uh oh!

greptile-apps Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

mateo-berri commented May 20, 2026 •

edited by cursor Bot

Loading

cursor Bot left a comment •

edited

Loading

mateo-berri May 30, 2026 •

edited

Loading

mateo-berri May 30, 2026 •

edited

Loading