tests(vcr): redis-backed vcrpy cache for offline LLM e2e replay by mateo-berri · Pull Request #26838 · BerriAI/litellm

mateo-berri · 2026-04-30T00:46:37Z

Relevant issues

Long-term solution to the recurring "out of Anthropic credits" CI failures discussed in #sdlc Slack on 2026-04-29.

Linear ticket

Resolves LIT-2683

Summary

Live LLM e2e tests have been draining provider billing accounts and going flaky on outages. This PR introduces a Redis-backed vcrpy cache so CI exercises the same end-to-end LiteLLM transformation paths (request shaping, response parsing, streaming, headers) without hitting the live provider on every run — ~zero per-PR cost while still smoke-testing against reality once a day.

The cache lives on a dedicated Redis (CASSETTE_REDIS_URL) so it's isolated from the application Redis (REDIS_URL/REDIS_HOST) used by other test suites — those flush their Redis as part of teardown, which would otherwise wipe cassettes.

Observed impact on llm_translation_testing: ~47% wall-clock reduction (8:11 → 4:21) once the cache is warm, with 0 provider calls.

How it works

Every test under tests/llm_translation/ and tests/llm_responses_api_testing/ is auto-marked with @pytest.mark.vcr via conftest.py. No per-test annotation needed.
First run hits the live provider and stores the HTTP exchange in Redis under litellm:vcr:cassette:<test_id> with a 24h TTL.
All subsequent runs within 24h replay from Redis, no network, no API keys.
The 24h TTL means each new day's first run records again, so upstream API drift surfaces within a day.
Only 2xx responses are persisted (filter_non_2xx_response) — transient 5xx/4xx never poison the cache.
record_mode="new_episodes" so partial recordings can be completed without nuking what already replays.

Cache-poisoning safeguards

A naive cassette cache picks up a lot of "bad luck" recordings. Three layers prevent that:

Outcome gate — only persist cassettes from tests that pass. A pytest_runtest_makereport hook stamps each test's call-phase outcome onto the cassette key; save_cassette consults it and refuses to write when the test failed. This means the failed retries pytest-rerunfailures produces before a green retry never overwrite a known-good cassette.
Episode cap — MAX_EPISODES_PER_CASSETTE = 50 catches the pathology where a test produces non-deterministic request bodies (e.g. uuids), record_mode=new_episodes keeps appending unmatched episodes, and the cassette balloons forever. Refusing to persist past 50 surfaces the issue loudly instead of silently inflating Redis.
VCR-incompatible opt-out — _VCR_INCOMPATIBLE_NODEID_SUFFIXES lists the handful of tests that observe live cross-call provider state (prompt-cache warm-up, streaming response_cost calc, Bedrock Nova tool-call nondeterminism). They fall through to live calls with their existing @pytest.mark.flaky retry logic.

Resilience

Redis client uses retry=Retry(ExponentialBackoff(cap=2, base=0.1), retries=2) on ConnectionError / TimeoutError so a single dropped TLS socket on Upstash doesn't fail teardown.
load_cassette outages convert to CassetteNotFoundError (cache miss → live call, not a test setup error).
save_cassette outages log a warning and return (persistence is a cache, not test correctness).

Verbose mode

Set LITELLM_VCR_VERBOSE=1 to surface a per-test verdict in the live CI log alongside PASSED/FAILED markers:

PASSED tests/llm_translation/test_anthropic_completion.py::test_anthropic_basic_completion_replay
[VCR HIT] 2 replayed, 0 new (2 cassette entries) :: tests/llm_translation/test_anthropic_completion.py::test_anthropic_basic_completion_replay

Verdicts: HIT (pure replay), MISS (cold cache, recorded), PARTIAL (mix), NOOP (no HTTP traffic). Implemented as a worker-side user_properties stash that the controller's pytest_runtest_logreport hook picks up and prints — needed because xdist worker stderr is captured and only released on test failure.

Required environment

CASSETTE_REDIS_URL — dedicated Redis for cassettes. Already configured in CircleCI; locally, set it in .env.rc (or equivalent) and source it. If unset, VCR registration is skipped and tests fall back to live calls.
Provider credentials (ANTHROPIC_API_KEY, OPENAI_API_KEY, AWS_*, etc.) — only needed on cache-miss (recording). Replay needs nothing.

Flushing the cache

Force a re-record on the next run instead of waiting for the 24h TTL:

make test-llm-translation-flush-vcr-cache

The flush script only deletes keys under the litellm:vcr:cassette: prefix.

Disabling VCR

Skip the cache entirely (every call goes live, no recording):

LITELLM_VCR_DISABLE=1 uv run pytest tests/llm_translation/test_<file>.py

What's in the diff

Core infrastructure

tests/_vcr_redis_persister.py — Redis-backed vcrpy persister (24h TTL, litellm:vcr:cassette: key prefix), 2xx-only response filter, outcome-gated persistence, episode cap, transient-error resilience, and an aiohttp record-path patch so vcrpy doesn't drain the response stream out from under LiteLLMAiohttpTransport. Reads only CASSETTE_REDIS_URL — no fallback to the application Redis.
tests/_flush_vcr_cache.py — scoped flush utility (only touches keys under the litellm:vcr:cassette: prefix).
tests/llm_translation/conftest.py / tests/llm_responses_api_testing/conftest.py — register the Redis persister, define the vcr_config fixture (auth/header scrubbing, request-shape matching), auto-apply @pytest.mark.vcr to every test in the directory, wire the outcome-gate hook + fixture, and ship the controller-side verbose-mode pytest_runtest_logreport hook. Files using respx (which patches the same httpx transport vcrpy does) are excluded via _RESPX_CONFLICTING_FILES to avoid one library silently winning.

Tests & demo

tests/llm_translation/test_anthropic_completion.py — two replay tests (test_anthropic_basic_completion_replay, test_anthropic_streaming_completion_replay) demonstrating the flow on a real e2e path.
tests/llm_translation/test_vcr_redis_persister.py — 23 unit tests covering: roundtrip, 24h TTL, missing-key behavior, key normalization, 2xx filter coverage, transient-error handling on read & write, outcome gate (skip-on-fail, proceed-on-pass, default-on-unknown), and episode cap (refuse-above, allow-at-threshold).

Glue

Makefile — test-llm-translation-flush-vcr-cache target.
pyproject.toml / uv.lock — adds vcrpy==8.1.1 and pytest-recording==0.13.4 to dev.
tests/code_coverage_tests/liccheck.ini — license allowlist entry for pytest-recording.
litellm/litellm_core_utils/llm_request_utils.py — small null-safety fix in get_proxy_server_request_headers (when proxy_server_request is None rather than missing, the previous .get(...).get(...) chain raised AttributeError).
tests/llm_translation/base_llm_unit_tests.py — switched the test_async_pdf_handling_with_file_id PDF URL from Wikimedia (intermittent 400s from Anthropic's server-side fetcher) to a SHA-pinned jsDelivr mirror of the in-repo fixture (raw GitHub serves PDFs as application/octet-stream which OpenAI/Gemini reject).
tests/llm_translation/Readme.md — record / replay / flush / disable workflow.

Pre-Submission checklist

Tests added (23 unit tests in test_vcr_redis_persister.py, replay demos in test_anthropic_completion.py)
No raw secrets in cassettes — request/response header filter scrubs Authorization, x-api-key, anthropic-api-key, AWS sigv4, GCP keys, cookies, organization IDs, request IDs
Cassettes Redis is isolated from application Redis — CASSETTE_REDIS_URL only, no fallback
Cassettes from failed tests never poison the cache (outcome gate)
PR's scope is isolated to test infrastructure plus one minor null-safety fix in llm_request_utils.py and one PDF-fixture URL fix

Type

🚄 Infrastructure
✅ Test
🐛 Bug Fix (null-safety in get_proxy_server_request_headers)

Follow-ups

Mark the highest-cost live tests in other directories (Anthropic, then OpenAI/Bedrock/Vertex) with @pytest.mark.vcr. The auto-marker already covers llm_translation/ and llm_responses_api_testing/.
Add a nightly job that runs against the live API and re-records — this preserves the "smoke test against reality" guarantee while keeping per-PR runs offline.
Promote the conftest VCR plumbing into a shared tests/_vcr_pytest_plugin.py so other directories can opt in by importing instead of copy-pasting.

Slack Thread

…eplay Live LLM e2e tests have been draining provider billing accounts and going flaky on outages (LIT-2683). This change introduces vcrpy-backed cassette replay so CI can exercise the same end-to-end LiteLLM transformation paths without hitting the live provider: - Add 'vcrpy==8.1.1' to the dev dependency group. - New 'tests/llm_translation/vcr_config.py' centralises the VCR config: filters auth/secret headers and per-request response headers, matches on method+URI+body, and exposes 'LITELLM_VCR_RECORD_MODE' for re-recording. - New 'tests/llm_translation/test_anthropic_completion_vcr.py' demonstrates the pattern with one non-streaming and one streaming Anthropic test that replay from cassettes shipped under 'cassettes/'. - New 'tests/llm_translation/cassettes/_record_anthropic_fixtures.py' lets contributors regenerate the canned Anthropic cassettes against a local in-process mock (no API key required), and 'cassettes/README.md' documents the full record/replay/refresh workflow. - New 'make test-llm-translation-record FILE=...' Makefile target to refresh cassettes against the live API. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

CLAassistant · 2026-04-30T00:46:45Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ mateo-berri
❌ cursoragent
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

mateo-berri · 2026-04-30T01:21:37Z

bugbot run

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Committed cassette diverges from fixture generator output
- Updated _record_anthropic_fixtures.py to delete cassettes before recording (so vcrpy's record_mode="all" doesn't append to stale content) and strip non-deterministic Date/Server headers, then regenerated the committed cassettes so they match the script's output byte-for-byte.

Preview (72c92920b2)

diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -185,3 +185,18 @@
 	$(UV_RUN) pytest tests/llm_translation/$(FILE) \
 		--junitxml=test-results/junit.xml \
 		-v --tb=short --maxfail=100 --timeout=300
+
+# VCR cassette helpers --------------------------------------------------------
+# Re-record a single VCR-backed translation test file against the live API.
+# Provider credentials must be exported (e.g. ANTHROPIC_API_KEY).
+#
+# Example:
+#   ANTHROPIC_API_KEY=sk-ant-... make test-llm-translation-record \
+#       FILE=test_anthropic_completion_vcr.py
+test-llm-translation-record: install-test-deps
+	@if [ -z "$(FILE)" ]; then \
+		echo "Usage: make test-llm-translation-record FILE=test_filename.py"; \
+		exit 1; \
+	fi
+	LITELLM_VCR_RECORD_MODE=once \
+		$(UV_RUN) pytest tests/llm_translation/$(FILE) -v --tb=short

diff --git a/pyproject.toml b/pyproject.toml
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -149,6 +149,7 @@
     "parameterized==0.9.0",
     "openapi-core==0.22.0; python_version < '3.14'",
     "pytest-timeout==2.4.0",
+    "vcrpy==8.1.1",
 ]
 proxy-dev = [
     "prisma==0.11.0",

diff --git a/tests/llm_translation/Readme.md b/tests/llm_translation/Readme.md
--- a/tests/llm_translation/Readme.md
+++ b/tests/llm_translation/Readme.md
@@ -1,3 +1,20 @@
-Unit tests for individual LLM providers. 
+Unit tests for individual LLM providers.
 
\ No newline at end of file
-Name of the test file is the name of the LLM provider - e.g. `test_openai.py` is for OpenAI. 
+Name of the test file is the name of the LLM provider - e.g. `test_openai.py` is for OpenAI.
+
+## VCR-backed tests
+
+Files matching `*_vcr.py` (e.g. `test_anthropic_completion_vcr.py`) replay
+recorded HTTP traffic from `cassettes/` instead of calling the real provider.
+They run offline by default — no API keys required, no per-PR cost.
+
+To re-record against the live API:
+
+```bash
+ANTHROPIC_API_KEY=sk-ant-... \
+  make test-llm-translation-record FILE=test_anthropic_completion_vcr.py
+```
+
+See [`cassettes/README.md`](./cassettes/README.md) for the full workflow,
+including how to add a new cassette-backed test and what to scrub from
+recordings before committing.

diff --git a/tests/llm_translation/cassettes/README.md b/tests/llm_translation/cassettes/README.md
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/cassettes/README.md
@@ -1,0 +1,80 @@
+# VCR cassettes for LLM translation tests
+
+This directory holds [vcrpy](https://vcrpy.readthedocs.io/) cassettes used by
+`tests/llm_translation/` to replay real provider HTTP traffic without hitting
+the live API.
+
+Why this exists is tracked in
+[LIT-2683](https://linear.app/litellm-ai/issue/LIT-2683) and discussed in
+`#sdlc` on Slack: e2e tests were repeatedly draining provider billing accounts
+and producing flaky CI on outages. Recording the HTTP exchange once and
+replaying it on subsequent runs gives us realistic provider responses
+(streaming, headers, edge-case payloads) at zero per-PR cost.
+
+## How to add a new cassette-backed test
+
+1. Pick a small, deterministic call. Avoid prompts whose output depends on
+   wall-clock time, randomness, or live web data.
+2. Add a test in a `*_vcr.py` file under `tests/llm_translation/`. Wrap it
+   with `@litellm_vcr.use_cassette("<some_name>.yaml")` from
+   `tests/llm_translation/vcr_config.py`.
+3. Record the cassette once:
+
+   ```bash
+   LITELLM_VCR_RECORD_MODE=once \
+     ANTHROPIC_API_KEY=sk-ant-... \
+     uv run pytest tests/llm_translation/test_my_provider_vcr.py::test_my_case -v
+   ```
+
+   or, equivalently:
+
+   ```bash
+   ANTHROPIC_API_KEY=sk-ant-... \
+     make test-llm-translation-record FILE=test_my_provider_vcr.py
+   ```
+
+4. Inspect the resulting YAML file:
+   - **Strip any secrets** that survived `vcr_config.py`'s header filter.
+     `vcr_config.py` already removes the common ones (`Authorization`,
+     `x-api-key`, `cookie`, AWS sigv4 headers, etc.) — but a request *body*
+     might contain a token if your test passed one inline.
+   - Trim very large response bodies if they aren't load-bearing for the
+     assertion.
+5. Commit the cassette alongside the test.
+
+## Re-recording
+
+Run the same `make test-llm-translation-record` command. vcrpy's `once` mode
+will *not* overwrite an existing cassette — delete the file first if you're
+intentionally refreshing it:
+
+```bash
+rm tests/llm_translation/cassettes/anthropic_basic_completion.yaml
+ANTHROPIC_API_KEY=sk-ant-... make test-llm-translation-record \
+    FILE=test_anthropic_completion_vcr.py
+```
+
+## Refreshing the canned Anthropic fixtures
+
+The two Anthropic cassettes in this directory
+(`anthropic_basic_completion.yaml` and `anthropic_streaming_completion.yaml`)
+are recorded against an in-process mock so contributors can regenerate them
+without an `ANTHROPIC_API_KEY`:
+
+```bash
+uv run python tests/llm_translation/cassettes/_record_anthropic_fixtures.py
+```
+
+For a full refresh against the real API, delete the cassettes first and use
+the `LITELLM_VCR_RECORD_MODE=once` path with a real key.
+
+## Don't
+
+- Don't commit cassettes containing real API keys, OAuth tokens, or PII.
+  When in doubt, `grep -i 'sk-\|bearer\|api-key' cassettes/*.yaml` after
+  recording.
+- Don't rely on cassettes for tests of *non-deterministic* behavior
+  (rate-limit retries, timeouts, the model itself making a creative choice).
+  Mock those at the LiteLLM layer instead.
+- Don't record both real and mock host names into the same cassette without
+  rewriting the URL — vcrpy matches on host/port by default.

diff --git a/tests/llm_translation/cassettes/_record_anthropic_fixtures.py b/tests/llm_translation/cassettes/_record_anthropic_fixtures.py
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/cassettes/_record_anthropic_fixtures.py
@@ -1,0 +1,258 @@
+"""Helper script that records Anthropic-shaped cassettes against a local mock.
+
+This is a *one-shot* utility, not a test. It exists so we can deterministically
+regenerate the canned Anthropic cassettes shipped under
+``tests/llm_translation/cassettes/`` without spending real provider credits and
+without needing an ``ANTHROPIC_API_KEY``.
+
+Run it with::
+
+    uv run python tests/llm_translation/cassettes/_record_anthropic_fixtures.py
+
+The script:
+
+1. Spins up a tiny in-process HTTP server that returns canned Anthropic
+   ``/v1/messages`` payloads (one non-streaming, one SSE streaming).
+2. Records LiteLLM's real outbound HTTP through vcrpy.
+3. Rewrites the cassette URL/Host so replay matches genuine
+   ``https://api.anthropic.com/v1/messages`` traffic.
+
+If you want to refresh against the *real* Anthropic API instead, use the
+``LITELLM_VCR_RECORD_MODE=once`` workflow described in
+``tests/llm_translation/vcr_config.py`` — that path needs a real API key.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import re
+import sys
+import threading
+from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
+from pathlib import Path
+from typing import Any, Iterable
+
+import vcr  # type: ignore[import-not-found]
+
+REPO_ROOT = Path(__file__).resolve().parents[3]
+sys.path.insert(0, str(REPO_ROOT))
+
+import litellm  # noqa: E402
+
+CASSETTE_DIR = Path(__file__).parent
+MOCK_HOST = "127.0.0.1"
+NON_STREAM_PORT = 18765
+STREAM_PORT = 18766
+REAL_ANTHROPIC_HOST = "api.anthropic.com"
+
+NON_STREAM_RESPONSE: dict[str, Any] = {
+    "id": "msg_01ABCDEFGHIJKLMNOPQRSTUV",
+    "type": "message",
+    "role": "assistant",
+    "model": "claude-sonnet-4-5-20250929",
+    "content": [{"type": "text", "text": "Hello! How can I help you today?"}],
+    "stop_reason": "end_turn",
+    "stop_sequence": None,
+    "usage": {
+        "input_tokens": 12,
+        "cache_creation_input_tokens": 0,
+        "cache_read_input_tokens": 0,
+        "output_tokens": 11,
+    },
+}
+
+STREAM_EVENTS: list[tuple[str, dict[str, Any]]] = [
+    (
+        "message_start",
+        {
+            "type": "message_start",
+            "message": {
+                "id": "msg_01STREAMABCDEFGH",
+                "type": "message",
+                "role": "assistant",
+                "model": "claude-sonnet-4-5-20250929",
+                "content": [],
+                "stop_reason": None,
+                "stop_sequence": None,
+                "usage": {"input_tokens": 14, "output_tokens": 1},
+            },
+        },
+    ),
+    (
+        "content_block_start",
+        {
+            "type": "content_block_start",
+            "index": 0,
+            "content_block": {"type": "text", "text": ""},
+        },
+    ),
+    (
+        "content_block_delta",
+        {
+            "type": "content_block_delta",
+            "index": 0,
+            "delta": {"type": "text_delta", "text": "Hello"},
+        },
+    ),
+    (
+        "content_block_delta",
+        {
+            "type": "content_block_delta",
+            "index": 0,
+            "delta": {"type": "text_delta", "text": " from"},
+        },
+    ),
+    (
+        "content_block_delta",
+        {
+            "type": "content_block_delta",
+            "index": 0,
+            "delta": {"type": "text_delta", "text": " LiteLLM!"},
+        },
+    ),
+    ("content_block_stop", {"type": "content_block_stop", "index": 0}),
+    (
+        "message_delta",
+        {
+            "type": "message_delta",
+            "delta": {"stop_reason": "end_turn", "stop_sequence": None},
+            "usage": {"output_tokens": 5},
+        },
+    ),
+    ("message_stop", {"type": "message_stop"}),
+]
+
+
+def _make_handler(mode: str) -> type[BaseHTTPRequestHandler]:
+    class Handler(BaseHTTPRequestHandler):
+        def log_message(self, *args: Any, **kwargs: Any) -> None:  # silence
+            return
+
+        def do_POST(self) -> None:  # noqa: N802
+            length = int(self.headers.get("Content-Length", "0"))
+            self.rfile.read(length)
+            if mode == "json":
+                body = json.dumps(NON_STREAM_RESPONSE).encode("utf-8")
+                self.send_response(200)
+                self.send_header("Content-Type", "application/json")
+                self.send_header("Content-Length", str(len(body)))
+                self.send_header("anthropic-ratelimit-requests-limit", "4000")
+                self.send_header("anthropic-ratelimit-requests-remaining", "3999")
+                self.end_headers()
+                self.wfile.write(body)
+            else:
+                self.send_response(200)
+                self.send_header("Content-Type", "text/event-stream")
+                self.send_header("Cache-Control", "no-cache")
+                self.end_headers()
+                for event_name, data in STREAM_EVENTS:
+                    chunk = (
+                        f"event: {event_name}\n" f"data: {json.dumps(data)}\n\n"
+                    ).encode("utf-8")
+                    self.wfile.write(chunk)
+                    self.wfile.flush()
+
+    return Handler
+
+
+def _serve(port: int, mode: str) -> ThreadingHTTPServer:
+    srv = ThreadingHTTPServer((MOCK_HOST, port), _make_handler(mode))
+    threading.Thread(target=srv.serve_forever, daemon=True).start()
+    return srv
+
+
+# Headers that vary every run (timestamps, server build) and must be stripped
+# so the cassette is byte-stable across regenerations. Replay does not depend
+# on them.
+_NON_DETERMINISTIC_HEADERS = ("Date", "Server")
+
+
+def _strip_nondeterministic_headers(path: Path) -> None:
+    """Remove headers whose values change every run from the cassette."""
+    text = path.read_text()
+    for header in _NON_DETERMINISTIC_HEADERS:
+        # Matches a YAML block like::
+        #
+        #       Date:
+        #       - Thu, 30 Apr 2026 00:43:16 GMT
+        #
+        # under the response ``headers:`` mapping. Indentation is fixed by vcrpy.
+        pattern = re.compile(
+            rf"^      {re.escape(header)}:\n      - .*\n",
+            re.MULTILINE,
+        )
+        text = pattern.sub("", text)
+    path.write_text(text)
+
+
+def _rewrite_cassette_to_real_host(path: Path, mock_host_port: str) -> None:
+    """Replace mock host/port in the cassette with the real Anthropic host."""
+    text = path.read_text()
+    text = text.replace(f"http://{mock_host_port}", f"https://{REAL_ANTHROPIC_HOST}")
+    text = text.replace(mock_host_port, REAL_ANTHROPIC_HOST)
+    path.write_text(text)
+    _strip_nondeterministic_headers(path)
+
+
+def _consume(iterable: Iterable[Any]) -> None:
+    for _ in iterable:
+        pass
+
+
+def record_non_streaming() -> None:
+    cassette = CASSETTE_DIR / "anthropic_basic_completion.yaml"
+    if cassette.exists():
+        cassette.unlink()
+    server = _serve(NON_STREAM_PORT, "json")
+    try:
+        my_vcr = vcr.VCR(
+            record_mode="all",
+            filter_headers=["authorization", "x-api-key", "anthropic-version"],
+        )
+        with my_vcr.use_cassette(str(cassette)):
+            response = litellm.completion(
+                model="anthropic/claude-sonnet-4-5-20250929",
+                messages=[{"role": "user", "content": "Hello!"}],
+                api_base=f"http://{MOCK_HOST}:{NON_STREAM_PORT}",
+                api_key="sk-ant-recording",
+            )
+            assert response.choices[0].message.content
+    finally:
+        server.shutdown()
+    _rewrite_cassette_to_real_host(cassette, f"{MOCK_HOST}:{NON_STREAM_PORT}")
+
+
+def record_streaming() -> None:
+    cassette = CASSETTE_DIR / "anthropic_streaming_completion.yaml"
+    if cassette.exists():
+        cassette.unlink()
+    server = _serve(STREAM_PORT, "stream")
+    try:
+        my_vcr = vcr.VCR(
+            record_mode="all",
+            filter_headers=["authorization", "x-api-key", "anthropic-version"],
+        )
+        with my_vcr.use_cassette(str(cassette)):
+            stream = litellm.completion(
+                model="anthropic/claude-sonnet-4-5-20250929",
+                messages=[{"role": "user", "content": "Hello!"}],
+                api_base=f"http://{MOCK_HOST}:{STREAM_PORT}",
+                api_key="sk-ant-recording",
+                stream=True,
+            )
+            _consume(stream)
+    finally:
+        server.shutdown()
+    _rewrite_cassette_to_real_host(cassette, f"{MOCK_HOST}:{STREAM_PORT}")
+
+
+def main() -> None:
+    os.environ.setdefault("LITELLM_LOG", "WARNING")
+    record_non_streaming()
+    record_streaming()
+    print(f"Wrote cassettes to {CASSETTE_DIR}")
+
+
+if __name__ == "__main__":
+    main()

diff --git a/tests/llm_translation/cassettes/anthropic_basic_completion.yaml b/tests/llm_translation/cassettes/anthropic_basic_completion.yaml
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/cassettes/anthropic_basic_completion.yaml
@@ -1,0 +1,41 @@
+interactions:
+- request:
+    body: '{"model": "claude-sonnet-4-5-20250929", "messages": [{"role": "user", "content":
+      [{"type": "text", "text": "Hello!"}]}], "max_tokens": 64000}'
+    headers:
+      Accept-Encoding:
+      - gzip, deflate
+      Connection:
+      - keep-alive
+      Content-Length:
+      - '141'
+      Host:
+      - api.anthropic.com
+      User-Agent:
+      - litellm/1.84.0
+      accept:
+      - application/json
+      content-type:
+      - application/json
+    method: POST
+    uri: https://api.anthropic.com/v1/messages
+  response:
+    body:
+      string: '{"id": "msg_01ABCDEFGHIJKLMNOPQRSTUV", "type": "message", "role": "assistant",
+        "model": "claude-sonnet-4-5-20250929", "content": [{"type": "text", "text":
+        "Hello! How can I help you today?"}], "stop_reason": "end_turn", "stop_sequence":
+        null, "usage": {"input_tokens": 12, "cache_creation_input_tokens": 0, "cache_read_input_tokens":
+        0, "output_tokens": 11}}'
+    headers:
+      Content-Length:
+      - '358'
+      Content-Type:
+      - application/json
+      anthropic-ratelimit-requests-limit:
+      - '4000'
+      anthropic-ratelimit-requests-remaining:
+      - '3999'
+    status:
+      code: 200
+      message: OK
+version: 1

diff --git a/tests/llm_translation/cassettes/anthropic_streaming_completion.yaml b/tests/llm_translation/cassettes/anthropic_streaming_completion.yaml
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/cassettes/anthropic_streaming_completion.yaml
@@ -1,0 +1,81 @@
+interactions:
+- request:
+    body: '{"model": "claude-sonnet-4-5-20250929", "messages": [{"role": "user", "content":
+      [{"type": "text", "text": "Hello!"}]}], "max_tokens": 64000, "stream": true}'
+    headers:
+      Accept-Encoding:
+      - gzip, deflate
+      Connection:
+      - keep-alive
+      Content-Length:
+      - '157'
+      Host:
+      - api.anthropic.com
+      User-Agent:
+      - litellm/1.84.0
+      accept:
+      - application/json
+      content-type:
+      - application/json
+    method: POST
+    uri: https://api.anthropic.com/v1/messages
+  response:
+    body:
+      string: 'event: message_start
+
+        data: {"type": "message_start", "message": {"id": "msg_01STREAMABCDEFGH",
+        "type": "message", "role": "assistant", "model": "claude-sonnet-4-5-20250929",
+        "content": [], "stop_reason": null, "stop_sequence": null, "usage": {"input_tokens":
+        14, "output_tokens": 1}}}
+
+
+        event: content_block_start
+
+        data: {"type": "content_block_start", "index": 0, "content_block": {"type":
+        "text", "text": ""}}
+
+
+        event: content_block_delta
+
+        data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta",
+        "text": "Hello"}}
+
+
+        event: content_block_delta
+
+        data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta",
+        "text": " from"}}
+
+
+        event: content_block_delta
+
+        data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta",
+        "text": " LiteLLM!"}}
+
+
+        event: content_block_stop
+
+        data: {"type": "content_block_stop", "index": 0}
+
+
+        event: message_delta
+
+        data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence":
+        null}, "usage": {"output_tokens": 5}}
+
+
+        event: message_stop
+
+        data: {"type": "message_stop"}
+
+
+        '
+    headers:
+      Cache-Control:
+      - no-cache
+      Content-Type:
+      - text/event-stream
+    status:
+      code: 200
+      message: OK
+version: 1

diff --git a/tests/llm_translation/test_anthropic_completion_vcr.py b/tests/llm_translation/test_anthropic_completion_vcr.py
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/test_anthropic_completion_vcr.py
@@ -1,0 +1,100 @@
+"""
+VCR-backed Anthropic completion tests.
+
+These tests exercise the same end-to-end ``litellm.completion`` code paths
+as ``test_anthropic_completion.py`` but replay HTTP traffic from cassettes
+under ``cassettes/`` instead of calling ``api.anthropic.com``. CI can run
+them with no API key and zero cost.
+
+To re-record after a deliberate change to request shape (or to refresh
+against the live API), set ``LITELLM_VCR_RECORD_MODE=once`` and provide a
+real ``ANTHROPIC_API_KEY``::
+
+    LITELLM_VCR_RECORD_MODE=once \\
+        ANTHROPIC_API_KEY=sk-ant-... \\
+        uv run pytest tests/llm_translation/test_anthropic_completion_vcr.py -v
+
+See ``tests/llm_translation/vcr_config.py`` and ``tests/llm_translation/cassettes/README.md``
+for the full workflow.
+"""
+
+import os
+import sys
+
+import pytest
+
+sys.path.insert(0, os.path.abspath("../.."))
+sys.path.insert(0, os.path.dirname(__file__))
+
+import litellm  # noqa: E402
+
+from vcr_config import litellm_vcr  # noqa: E402
+
+
+# A non-secret placeholder API key. We never want a real key written to a
+# cassette, and ``vcr_config`` filters Authorization / x-api-key headers
+# anyway. Using a deterministic placeholder also stops the SDK from raising
+# when ``ANTHROPIC_API_KEY`` is unset (the common CI case).
+PLACEHOLDER_ANTHROPIC_API_KEY = "sk-ant-vcr-placeholder"
+
+
+@pytest.fixture(autouse=True)
+def _placeholder_anthropic_key(monkeypatch):
+    """Provide a placeholder key when none is set so replay works offline.
+
+    If a real key is present in the environment (e.g. when re-recording),
+    we leave it untouched.
+    """
+    if not os.environ.get("ANTHROPIC_API_KEY"):
+        monkeypatch.setenv("ANTHROPIC_API_KEY", PLACEHOLDER_ANTHROPIC_API_KEY)
+
+
+@litellm_vcr.use_cassette("anthropic_basic_completion.yaml")
+def test_anthropic_basic_completion_replay():
+    """Smoke-test that a vanilla Anthropic completion replays from a cassette.
+
+    This is the canonical example for the cassette-based testing pattern:
+    no API key required at runtime, deterministic output, and the full
+    LiteLLM transformation pipeline (request shaping + response parsing)
+    runs against a real-shape Anthropic payload.
+    """
+    response = litellm.completion(
+        model="anthropic/claude-sonnet-4-5-20250929",
+        messages=[{"role": "user", "content": "Hello!"}],
+    )
+
+    assert response is not None
+    assert response.choices[0].message.content == ("Hello! How can I help you today?")
+    assert response.usage.prompt_tokens == 12
+    assert response.usage.completion_tokens == 11
+    # Anthropic sets stop_reason="end_turn" → litellm normalises to "stop"
+    assert response.choices[0].finish_reason == "stop"
+
+
+@litellm_vcr.use_cassette("anthropic_streaming_completion.yaml")
+def test_anthropic_streaming_completion_replay():
+    """Replay a streaming Anthropic completion from a cassette.
+
+    Exercises the SSE chunk parser and the public streaming surface. The
+    underlying cassette captures every ``content_block_delta`` event Anthropic
+    emits, so any regression in the streaming transformation will surface here.
+    """
+    stream = litellm.completion(
+        model="anthropic/claude-sonnet-4-5-20250929",
+        messages=[{"role": "user", "content": "Hello!"}],
+        stream=True,
+    )
+
+    collected_text = ""
+    finish_reason = None
+    for chunk in stream:
+        if not chunk.choices:
+            continue
+        delta = chunk.choices[0].delta
+        if delta and delta.content:
+            collected_text += delta.content
+        if chunk.choices[0].finish_reason:
+            finish_reason = chunk.choices[0].finish_reason
+
+    assert collected_text == "Hello from LiteLLM!"
+    assert finish_reason == "stop"

diff --git a/tests/llm_translation/vcr_config.py b/tests/llm_translation/vcr_config.py
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/vcr_config.py
@@ -1,0 +1,123 @@
+"""
+Shared VCR configuration for ``tests/llm_translation``.
+
+This module centralises the cassette setup used by tests that would otherwise
+hit a real LLM provider over the network. The goal is to let CI replay
+recorded HTTP traffic by default — no API keys required — and to provide a
+single switch for re-recording cassettes against the live provider.
+
+Usage in a test::
+
+    from .vcr_config import litellm_vcr  # noqa: E402
+
+    @litellm_vcr.use_cassette("anthropic_basic_completion.yaml")
+    def test_basic_completion():
+        resp = litellm.completion(
+            model="anthropic/claude-sonnet-4-5-20250929",
+            messages=[{"role": "user", "content": "Hello!"}],
+        )
+        assert resp.choices[0].message.content
+
+Recording mode
+--------------
+By default the cassette is replayed (``record_mode='none'``). To re-record:
+
+    LITELLM_VCR_RECORD_MODE=once \\
+        ANTHROPIC_API_KEY=sk-ant-... \\
+        uv run pytest tests/llm_translation/test_anthropic_completion_vcr.py
+
+Valid values for ``LITELLM_VCR_RECORD_MODE`` mirror vcrpy's record modes:
+``none`` (replay only — fail on missing cassette), ``once`` (record if the
+cassette doesn't exist), ``new_episodes`` (append new interactions), and
+``all`` (always re-record). See the vcrpy docs for details.
+
+Why this exists
+---------------
+Per the discussion that produced LIT-2683, our e2e tests repeatedly drained
+provider billing accounts and produced flaky CI on outages. Recording the
+HTTP exchange once and replaying it on subsequent runs gives us realistic
+provider responses (including streaming, headers, and edge-case payloads)
+without per-PR cost or rate-limit risk. Re-record periodically to catch
+real provider drift.
+"""
+
+from __future__ import annotations
+
+import os
+from pathlib import Path
+from typing import Any
+
+import vcr
+
+CASSETTE_DIR: Path = Path(__file__).parent / "cassettes"
+
+# Headers that must never be persisted to a cassette. These are matched
+# case-insensitively by vcrpy.
+_FILTERED_REQUEST_HEADERS = (
+    "authorization",
+    "x-api-key",
+    "anthropic-api-key",
+    "openai-api-key",
+    "azure-api-key",
+    "api-key",
+    "cookie",
+    "x-amz-security-token",
+    "x-amz-date",
+    "x-amz-content-sha256",
+    "amz-sdk-invocation-id",
+    "amz-sdk-request",
+)
+
+_FILTERED_RESPONSE_HEADERS = (
+    "set-cookie",
+    "x-request-id",
+    "cf-ray",
+    "anthropic-organization-id",
+    "openai-organization",
+    "request-id",
+)
+
+
+def _record_mode() -> str:
+    """Resolve the active vcrpy record mode from the environment.
+
+    Defaults to ``"none"`` so CI never accidentally hits the live provider.
+    """
+    mode = os.environ.get("LITELLM_VCR_RECORD_MODE", "none").strip().lower()
+    if mode not in {"none", "once", "new_episodes", "all"}:
+        raise ValueError(
+            f"LITELLM_VCR_RECORD_MODE={mode!r} is not a valid vcrpy record mode."
+        )
+    return mode
+
+
+def _build_vcr() -> vcr.VCR:
+    """Construct the shared ``VCR`` instance used by translation tests."""
+    return vcr.VCR(
+        cassette_library_dir=str(CASSETTE_DIR),
+        record_mode=_record_mode(),
+        # Match on method + URI + body so streaming vs non-streaming and
+        # different prompts get distinct cassettes.
+        match_on=("method", "scheme", "host", "port", "path", "query", "body"),
+        filter_headers=list(_FILTERED_REQUEST_HEADERS),
+        decode_compressed_response=True,
+    )
+
+
+def _scrub_response(response: Any) -> Any:
+    """Strip per-request response headers we don't want in the cassette."""
+    if not isinstance(response, dict):
+        return response
+    headers = response.get("headers") or {}
+    if isinstance(headers, dict):
+        for header in list(headers):
+            if header.lower() in _FILTERED_RESPONSE_HEADERS:
+                headers.pop(header, None)
+    return response
+
+
+litellm_vcr: vcr.VCR = _build_vcr()
+litellm_vcr.before_record_response = _scrub_response
+
+
+__all__ = ["litellm_vcr", "CASSETTE_DIR"]

diff --git a/uv.lock b/uv.lock
--- a/uv.lock
+++ b/uv.lock
@@ -9,7 +9,7 @@
 ]
 
 [options]
-exclude-newer = "0001-01-01T00:00:00Z" # This has no effect and is included for backwards compatibility when using relative exclude-newer values.
+exclude-newer = "2026-04-27T00:38:13.673780212Z"
 exclude-newer-span = "P3D"
 
 [manifest]
@@ -3242,6 +3242,7 @@
     { name = "types-redis" },
     { name = "types-requests" },
     { name = "types-setuptools" },
+    { name = "vcrpy" },
 ]
... diff truncated: showing 800 of 830 lines

_{You can send follow-ups to the cloud agent here.}

Delete existing cassettes before recording (record_mode='all' with vcrpy appends rather than overwriting), and strip non-deterministic response headers (Date, Server) so re-running the helper produces a byte-stable diff. Regenerate the committed cassettes with the fixed script so they match what contributors get when following the README.

mateo-berri · 2026-04-30T17:31:48Z

bugbot run

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 72c9292. Configure here.}

mateo-berri · 2026-04-30T17:56:45Z

@greptileai

greptile-apps · 2026-04-30T18:00:24Z

Greptile Summary

This PR introduces a Redis-backed vcrpy cache (CASSETTE_REDIS_URL) that replays recorded HTTP exchanges during CI, eliminating per-run provider costs and flakiness from upstream outages. The infrastructure is well-designed — outcome gate, episode cap, 2xx-only filter, and transient-error resilience are all present and thoroughly unit-tested with fakeredis.

One concrete issue worth addressing before enabling this in production CI: redis_key_for derives its key via os.path.relpath without an explicit start, so keys are anchored to the process's CWD at call time. Running pytest from any directory other than the repo root (e.g. cd tests/llm_translation && pytest) produces a different key prefix than CI, which silently breaks cache sharing.

Confidence Score: 4/5

Safe to merge with the CWD-sensitivity fix applied; all other findings are P2 or below

One P1 finding (redis_key_for CWD dependency can cause cache key mismatches between CI and local) caps the score at 4/5. The two P2 findings do not lower the score further.

tests/_vcr_redis_persister.py — redis_key_for key derivation and save_cassette guard ordering

Important Files Changed

Filename	Overview
tests/_vcr_redis_persister.py	New Redis-backed vcrpy persister with TTL, outcome gate, episode cap, and aiohttp body-rewind patch; one CWD-sensitive key-derivation path worth noting
tests/llm_translation/conftest.py	Adds VCR auto-marker, persister registration, outcome-gate fixture, and verbose-mode logreport hook; respx/incompatible-test exclusion lists look correct
tests/llm_responses_api_testing/conftest.py	Near-identical VCR plumbing to llm_translation/conftest.py; intentional duplication acknowledged in PR follow-ups; no per-file exclusion list (appropriate since no respx conflicts here)
tests/llm_translation/test_vcr_redis_persister.py	23 focused unit tests using fakeredis; covers roundtrip, TTL, cache miss, error resilience, outcome gate, and episode cap — thorough coverage
litellm/litellm_core_utils/llm_request_utils.py	Correct null-safety fix: guards against proxy_server_request being None (not just absent) by using `(... or {}).get("headers")` chaining
tests/_flush_vcr_cache.py	SCAN+pipeline batch-delete targeting only the litellm:vcr:cassette: prefix; correctly isolated from application Redis keys
tests/llm_translation/test_anthropic_completion.py	Two new VCR replay demo tests (basic + streaming); will fall back to live API on cold cache, which is expected and intended

_{Reviews (3): Last reviewed commit: "Merge branch 'litellm_internal_staging' ..." | Re-trigger Greptile}

…ulk capture Per Yuneng's feedback, use a single @pytest.mark.vcr marker so one record sweep populates cassettes for every marked test across all providers, instead of forcing each test to bind to a hard-coded cassette path. Changes vs. the initial scaffolding: - Add 'pytest-recording==0.13.4' on top of vcrpy. Adopt its layout: cassettes live at 'cassettes/<test_module>/<test_name>.yaml', resolved automatically. New tests just decorate with '@pytest.mark.vcr' — no imports or path bookkeeping. - Move the shared filter/match config into a 'vcr_config' fixture in 'tests/llm_translation/conftest.py' (consumed by pytest-recording for every marked test in the dir). Drop the standalone 'vcr_config.py'. - Bulk record / replay via the standard '--record-mode' CLI flag: 'make test-llm-translation-record' now sweeps every '@pytest.mark.vcr' test under tests/llm_translation in one shot. Optional 'TARGET=' var scopes to a single file. - Move existing cassettes to the per-test paths and update the local in-process Anthropic regenerator to write to the same paths. - Refresh README + Makefile target docs to match the sweep workflow. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

…reptile fixes CI's license check fails on the new dev dep because liccheck cannot read the PEP 639 'License-Expression' field that pytest-recording uses. Add the package to the manually-verified allowlist (MIT, confirmed via PyPI classifier). Also addresses greptile P2 review comments: - Add 'anthropic-version' to the request-header filter list so live and mock recordings produce structurally identical cassettes. - Replace the indentation-sensitive regex in '_strip_nondeterministic_headers' with a YAML parse-and-rewrite so the helper keeps working if vcrpy ever changes its serialization style. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

Stores VCR cassettes in Redis under litellm:vcr:cassette:<rel_path> with a 24h expiry instead of YAML on disk. The TTL means each daily CI run starts with an aged-out cache, naturally re-records against live providers, and surfaces upstream API drift within a day without a manual `make` re-record sweep. Opt-in via LITELLM_VCR_REDIS=1; default behaviour is unchanged so local dev keeps the on-disk cassettes. before_record_response now drops non-2xx responses so a transient 5xx or 429 from a provider can't poison the cache for the rest of the TTL window. Vcr-marked tests bump litellm.num_retries to 3 during recording so provider-SDK exponential backoff kicks in on the cache-miss path. Tests cover the three surfaces we depend on in CI: serialize/deserialize roundtrip via the real vcrpy serializer, TTL is actually applied to saved keys, cache miss raises CassetteNotFoundError so vcrpy falls through to record mode, and 2xx-only filtering across the status-code matrix (2xx kept, 3xx/4xx/5xx dropped, with 429 and 503 explicitly pinned).

Removes the YAML cassette feature entirely and replaces it with a Redis-only flow. Every test in tests/llm_translation/ and tests/llm_responses_api_testing/ is auto-marked @pytest.mark.vcr via conftest.pytest_collection_modifyitems, so any provider call lands in the Redis cache (litellm:vcr:cassette:<rel_path>, 24h TTL). First run records, runs within the day replay, day rollover re-records and surfaces upstream API drift within 24h. VCR is on by default. Set LITELLM_VCR_DISABLE=1, or simply leave REDIS_HOST unset, to opt out — both bypass the auto-marker entirely so nothing about cassettes runs. record_mode is "once" so cache-miss records and cache-hit replays. The 8 existing respx-using files in tests/llm_translation are excluded from the auto-marker (vcrpy and respx both patch the httpx transport; applying both makes one silently win). The persister's own unit-test file is also excluded so it doesn't recursively run inside a cassette. The persister moved from tests/llm_translation/_vcr_redis_persister.py to tests/_vcr_redis_persister.py so both conftests share it. The two demo tests in test_anthropic_completion_vcr.py were ported into test_anthropic_completion.py and the demo file was deleted. Adds tests/_flush_vcr_cache.py + a Make target (test-llm-translation-flush-vcr-cache) that scans litellm:vcr:cassette:* and pipelines DELETEs, for the "I want the next CI run to re-record now" workflow. Drops the now-dead test-llm-translation-record target. Provider keys are still required on cache-miss (which happens on first run and once a day after that). Replay-mode runs need only Redis.

Removes commentary that restated the code, including: - module-level banners explaining what the conftest does (covered by Readme.md and the function bodies) - docstrings on _scrub_response, _before_record_response, vcr_config, _vcr_disabled, pytest_recording_configure (function names + bodies are self-evident) - inline notes about header filtering, match_on, etc. - per-test docstrings restating the test name Keeps the two non-obvious notes that aren't recoverable from the code: the vcrpy/respx httpx-transport collision rationale on _RESPX_CONFLICTING_FILES, the vcrpy "return None to skip persisting" contract on filter_non_2xx_response, and the fixture-ordering dependency on _vcr_record_retries.

Provider SDKs already retry transient 5xx/429 with exponential backoff (default max_retries=2), and pytest.mark.flaky covers test-level retries on top of that. Setting litellm.num_retries=3 here just multiplied the existing layers — worst case 6 (flaky) x 3 (this) x 2 (CI rerunfailures) = 36 attempts on a single test. Removing it keeps SDK-level network-blip protection intact and shortens worst-case latency on cache-miss runs.

…erve from cache

…ibuteError

…tte poisoning record_mode='once' refused to add new requests once any cassette existed in Redis. Combined with filter_non_2xx_response (which drops non-2xx responses from the saved cassette) and a 24h shared-Redis TTL, a single transient API failure mid-test left the cassette stuck with only the leading non-API requests (e.g. the model_prices fetch from raw.githubusercontent.com), and every subsequent run for the next 24h errored with 'Can't overwrite existing cassette'. new_episodes records anything not already present, so partially populated cassettes recover on the next run instead of poisoning the suite for a full TTL window.

litellm's default LiteLLMAiohttpTransport routes requests through aiohttp, which sits below httpx and is invisible to vcrpy's httpx-stub interception. Under vcrpy + aiohttp, requests reach the real network but responses come back through the stubbed httpx transport as empty 200s, surfacing as 'Unable to get json response - Expecting value: line 1 column 1 (char 0)' in providers like Anthropic, Gemini, and any other path that exercises the aiohttp transport. Disabling the aiohttp transport when the VCR persister is registered forces all calls through pure httpx, which vcrpy can record and replay correctly.

Azure OpenAI's responses-API DELETE endpoint rejects requests that carry a JSON body with: "Unexpected body with size 2. This API method does not accept a request body.". The default LiteLLMAiohttpTransport silently elides empty-dict bodies on DELETE so this was masked, but the pure-httpx transport (used when DISABLE_AIOHTTP_TRANSPORT=True or under vcrpy/respx patching) sends literal '{}' (2 bytes), which Azure rejects. Only attach json= when the provider's transform actually returned a non-empty dict; otherwise issue a bodyless DELETE.

…itellm_vcr-cassette-llm-tests-af37 # Conflicts: # litellm/llms/custom_httpx/llm_http_handler.py

The Anthropic replay tests hardcoded specific token counts and content strings ('Hello! How can I help you today?', prompt_tokens == 12). On a fresh CI Redis those values must match a pre-recorded cassette that doesn't exist, so the first run hits the live API and gets different real bytes back. Assert on shape instead: non-empty content, positive token counts, finish_reason in the known set, and (for streaming) more than one chunk. The tests still exercise the full transformation pipeline end-to-end and catch shape regressions; drift in the exact text/token counts is expected and now tolerated.

…transport vcrpy's aiohttp stub captures response bodies via 'await response.read()', which drains aiohttp's StreamReader. Downstream consumers of the same ClientResponse (litellm's AiohttpResponseStream, which iterates response.content.iter_chunked) then see an empty body and surface as JSON 'Expecting value: line 1 column 1 (char 0)' errors on every record-path call. The previous workaround set litellm.disable_aiohttp_transport=True for the whole VCR-active session, which made the tests exercise pure httpx instead of the production aiohttp transport. That hid the production transport from coverage and surfaced its own bugs (e.g. the Azure DELETE-with-empty-body case fixed in upstream staging). Replace the workaround with a targeted monkey-patch that re-feeds the captured body into the StreamReader via unread_data after vcrpy records it. Tests now run through the same transport customers do, both on first record and on replay, for both unary and streaming endpoints. Verified locally against api.anthropic.com with the production LiteLLMAiohttpTransport: record path passes (real network, 4.2s), replay path passes (Redis cache, 1.8s).

Stop falling back to REDIS_URL/REDIS_SSL_URL/REDIS_HOST for the VCR persister. Sharing a Redis with the application cache risks cassettes being wiped by tests that flush the app Redis.

Managed Redis (e.g. Upstash) drops idle TLS connections, which surfaced in CI as a teardown ERROR on test_gemini_image_size_limit_exceeded: redis.exceptions.ConnectionError: EOF occurred in violation of protocol (_ssl.c:2427) Cassette persistence is a cache, not test correctness, so: - Configure the redis client with Retry(ExponentialBackoff, retries=2) on ConnectionError/TimeoutError to absorb single-socket drops. - Wrap save_cassette so a final failure logs a warning instead of failing teardown — the next run re-records. - Wrap load_cassette so an outage on read becomes a cache miss (CassetteNotFoundError) instead of erroring in setup.

Set LITELLM_VCR_VERBOSE=1 to print a one-line cassette verdict per test (HIT / MISS / PARTIAL / NOOP) showing replay vs new-recording counts. Useful for local QA to confirm which tests actually exercised the cache and which fell through to the live provider.

A test that fails (incl. all the failing retries before a passing one) can otherwise overwrite a known-good cassette with a 'bad luck' recording. Tests like test_prompt_caching, which assert on provider state across two calls, can produce a 200 response that semantically fails the assertion — the 2xx filter doesn't catch this because the HTTP layer is fine. - pytest_runtest_makereport hook attaches each phase report to the pytest item. - _vcr_outcome_gate fixture (combining the verbose-mode reporter) reads the call-phase outcome at teardown and informs the persister via mark_test_outcome_for_cassette before vcrpy's Cassette.__exit__ triggers save_cassette. - save_cassette consults the per-key 'did the test pass?' flag and short-circuits when False, leaving any prior good recording intact. - Defaults to passed=True when no marker is present so non-test usage of the persister still works.

Some tests can't benefit from cassette replay because they assert on state that only exists in the live provider between two calls (e.g. prompt-cache propagation, intermittent provider quirks). Marking them with @pytest.mark.vcr just wastes cycles trying to record cassettes they will never replay against successfully. Opt-out by nodeid suffix so subclassed/parametrized variants are covered: - ::test_prompt_caching — Anthropic/Bedrock prompt-cache propagation isn't deterministic in the 0–1s window the test gives it. - ::test_async_pdf_handling_with_file_id — flaky upstream Wikipedia fetch through the Anthropic Files API. - TestBedrockInvokeNovaJson::test_json_response_pydantic_obj — Bedrock Nova returns tool_call vs JSON nondeterministically (other providers' subclasses are healthy). - ::test_bedrock_converse__streaming_passthrough — Bedrock streaming response_cost calc returns None intermittently. These tests keep their existing @pytest.mark.flaky retry behavior.

A test that produces non-deterministic request bodies (e.g. uuid in the prompt) under record_mode=new_episodes never replays — every CI run appends fresh unmatched episodes. The cassette grows unbounded over time and silently inflates Redis (we observed one cassette at 22 episodes / ~860KB after ~5 CI runs). Refuse the save when episode count exceeds MAX_EPISODES_PER_CASSETTE so the pathology surfaces with a loud warning that points to the opt-out fix instead of festering invisibly.

Anthropic's URL fetcher intermittently returns 400 'Unable to download the file' for the Wikipedia URL the test was using. Point it at the repo's existing tests/llm_translation/fixtures/dummy.pdf via raw GitHub instead — small, deterministic, reliably fetchable. With a stable URL the test no longer needs to be opted out of VCR; remove it from the incompatible list so it can replay from cassette.

Previously, the per-test [VCR HIT/MISS/...] line was written via TerminalReporter.write_line from inside fixture teardown. Pytest captures that stream by default and only surfaces it on FAILED tests (under 'Captured stdout teardown'), so passing tests' verdicts were invisible in CI logs and the user couldn't tell whether the cache was working. Write directly to sys.__stderr__ so the line bypasses pytest's capture entirely. Under xdist each worker has its own __stderr__ which CircleCI aggregates into the live job log alongside the PASSED/FAILED markers.

… MIME Raw github serves application/octet-stream which OpenAI/Gemini reject when LiteLLM fetches the URL client-side. jsDelivr serves the same file with content-type: application/pdf. Pin to a commit SHA so the asset is immutable and jsDelivr can cache it for a year.

codecov · 2026-05-01T21:27:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…orter Previous attempt wrote to sys.__stderr__ from the test fixture. Under xdist, fixtures run inside worker subprocesses whose stderr is captured by the controller and only released to the live log on test failure — so passing tests' verdicts were silently swallowed. Round-trip via report.user_properties: the worker-side fixture stashes the verdict on user_properties, xdist serializes it onto the report, and a controller-side pytest_runtest_logreport hook writes it via the TerminalReporter (the same plugin that emits PASSED/FAILED markers). TerminalReporter is resolved lazily on first hook call because it's not yet registered when conftest's pytest_configure runs. Verified locally in both serial and xdist modes.

Remove explanatory comments that restated what the code already says. Kept only those that document non-obvious external contracts (the aiohttp record-path patch's reason for re-feeding the body, and the warning messages inside save_cassette that reach the user).

…-tests-af37

mateo-berri · 2026-05-01T21:55:08Z

bugbot run

mateo-berri · 2026-05-01T21:55:11Z

@greptileai

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 0543953. Configure here.}

greptile-apps · 2026-05-01T22:15:57Z

+def redis_key_for(cassette_path: str) -> str:
+    rel = os.path.relpath(str(cassette_path))
+    if rel.endswith(".yaml"):
+        rel = rel[: -len(".yaml")]
+    rel = rel.replace("/cassettes/", "/").lstrip("./")
+    return f"{REDIS_KEY_PREFIX}{rel}"


redis_key_for calls os.path.relpath without an explicit start argument, so the result depends on the process's current working directory at call time. If pytest is invoked from tests/llm_translation/ instead of the repo root, the relative path produced omits the tests/llm_translation/ prefix, yielding a key like litellm:vcr:cassette:test_anthropic/test_foo rather than litellm:vcr:cassette:tests/llm_translation/test_anthropic/test_foo. CI-recorded cassettes and locally-run replays would then map to different Redis keys and never share the cache. Anchoring to a fixed repo-root directory eliminates this.

Suggested change

def redis_key_for(cassette_path: str) -> str:

rel = os.path.relpath(str(cassette_path))

if rel.endswith(".yaml"):

rel = rel[: -len(".yaml")]

rel = rel.replace("/cassettes/", "/").lstrip("./")

return f"{REDIS_KEY_PREFIX}{rel}"

def redis_key_for(cassette_path: str) -> str:

_repo_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

rel = os.path.relpath(str(cassette_path), start=_repo_root)

if rel.endswith(".yaml"):

rel = rel[: -len(".yaml")]

rel = rel.replace("/cassettes/", "/").lstrip("./")

return f"{REDIS_KEY_PREFIX}{rel}"

I mean, technically true. But this is always run from CI/CD so it doesn't really matter

Fair point — if CI always runs from the repo root and local dev is expected to do the same, the CWD sensitivity is a non-issue in practice.

Tip: You can give me more context with .greptile/rules.md so I get this right next time.

cursor Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread tests/llm_translation/cassettes/anthropic_basic_completion.yaml Outdated

cursor Bot reviewed Apr 30, 2026

View reviewed changes

greptile-apps Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread tests/llm_translation/vcr_config.py Outdated

Comment thread tests/llm_translation/cassettes/_record_anthropic_fixtures.py Outdated

cursoragent and others added 11 commits April 30, 2026 18:08

tests(vcr): allow playback repeats so duplicate intra-test requests s…

59d5901

…erve from cache

fix(llm_request_utils): handle None proxy_server_request without Attr…

efdeff8

…ibuteError

tests(vcr): accept REDIS_URL / REDIS_SSL_URL for managed Redis with TLS

f55a710

tests(vcr): drop YAML/cassettes-directory metaphor from Redis keys

468b849

style: reformat to pass ci

f6a37a6

mateo-berri force-pushed the litellm_vcr-cassette-llm-tests-af37 branch from e98ef5c to f6a37a6 Compare May 1, 2026 00:01

mateo-berri added 7 commits April 30, 2026 17:22

Merge remote-tracking branch 'origin/litellm_internal_staging' into l…

722a1a9

…itellm_vcr-cassette-llm-tests-af37 # Conflicts: # litellm/llms/custom_httpx/llm_http_handler.py

tests(vcr): drop redundant comments and docstrings

6728746

mateo-berri mentioned this pull request May 1, 2026

tests(vcr): instrument Redis target + per-cassette persist log lines #26967

Draft

mateo-berri mentioned this pull request May 1, 2026

tests/ci: cassette proxy + opt every cost-bearing CI job in #26986

Draft

tests(vcr): isolate cassette redis to CASSETTE_REDIS_URL

4c69557

Stop falling back to REDIS_URL/REDIS_SSL_URL/REDIS_HOST for the VCR persister. Sharing a Redis with the application cache risks cassettes being wiped by tests that flush the app Redis.

mateo-berri changed the title ~~tests(llm_translation): add VCR cassette infrastructure for offline replay~~ tests(vcr): redis-backed vcrpy cache for offline LLM e2e replay May 1, 2026

mateo-berri added 8 commits May 1, 2026 12:53

mateo-berri added 3 commits May 1, 2026 14:29

Merge branch 'litellm_internal_staging' into litellm_vcr-cassette-llm…

0543953

…-tests-af37

cursor Bot reviewed May 1, 2026

View reviewed changes

mateo-berri marked this pull request as ready for review May 1, 2026 22:10

mateo-berri requested a review from yuneng-berri May 1, 2026 22:10

greptile-apps Bot reviewed May 1, 2026

View reviewed changes

yuneng-berri enabled auto-merge May 1, 2026 22:16

yuneng-berri approved these changes May 1, 2026

View reviewed changes

yuneng-berri merged commit d07cdd4 into litellm_internal_staging May 1, 2026
118 checks passed

yuneng-berri deleted the litellm_vcr-cassette-llm-tests-af37 branch May 1, 2026 22:16

Uh oh!

Conversation

mateo-berri commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Linear ticket

Summary

How it works

Cache-poisoning safeguards

Resilience

Verbose mode

Required environment

Flushing the cache

Disabling VCR

What's in the diff

Core infrastructure

Tests & demo

Glue

Pre-Submission checklist

Type

Follow-ups

Uh oh!

CLAassistant commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mateo-berri commented Apr 30, 2026

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mateo-berri commented Apr 30, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

mateo-berri commented Apr 30, 2026

Uh oh!

greptile-apps Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 1, 2026

Codecov Report

Uh oh!

mateo-berri commented May 1, 2026

Uh oh!

mateo-berri commented May 1, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

mateo-berri May 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mateo-berri commented Apr 30, 2026 •

edited

Loading

CLAassistant commented Apr 30, 2026 •

edited

Loading

cursor Bot left a comment •

edited

Loading

greptile-apps Bot commented Apr 30, 2026 •

edited

Loading