Skip to content

tests(vcr): redis-backed vcrpy cache for offline LLM e2e replay#26838

Merged
yuneng-berri merged 32 commits intolitellm_internal_stagingfrom
litellm_vcr-cassette-llm-tests-af37
May 1, 2026
Merged

tests(vcr): redis-backed vcrpy cache for offline LLM e2e replay#26838
yuneng-berri merged 32 commits intolitellm_internal_stagingfrom
litellm_vcr-cassette-llm-tests-af37

Conversation

@mateo-berri
Copy link
Copy Markdown
Collaborator

@mateo-berri mateo-berri commented Apr 30, 2026

Relevant issues

Long-term solution to the recurring "out of Anthropic credits" CI failures discussed in #sdlc Slack on 2026-04-29.

Linear ticket

Resolves LIT-2683

Summary

Live LLM e2e tests have been draining provider billing accounts and going flaky on outages. This PR introduces a Redis-backed vcrpy cache so CI exercises the same end-to-end LiteLLM transformation paths (request shaping, response parsing, streaming, headers) without hitting the live provider on every run — ~zero per-PR cost while still smoke-testing against reality once a day.

The cache lives on a dedicated Redis (CASSETTE_REDIS_URL) so it's isolated from the application Redis (REDIS_URL/REDIS_HOST) used by other test suites — those flush their Redis as part of teardown, which would otherwise wipe cassettes.

Observed impact on llm_translation_testing: ~47% wall-clock reduction (8:11 → 4:21) once the cache is warm, with 0 provider calls.

How it works

  • Every test under tests/llm_translation/ and tests/llm_responses_api_testing/ is auto-marked with @pytest.mark.vcr via conftest.py. No per-test annotation needed.
  • First run hits the live provider and stores the HTTP exchange in Redis under litellm:vcr:cassette:<test_id> with a 24h TTL.
  • All subsequent runs within 24h replay from Redis, no network, no API keys.
  • The 24h TTL means each new day's first run records again, so upstream API drift surfaces within a day.
  • Only 2xx responses are persisted (filter_non_2xx_response) — transient 5xx/4xx never poison the cache.
  • record_mode="new_episodes" so partial recordings can be completed without nuking what already replays.

Cache-poisoning safeguards

A naive cassette cache picks up a lot of "bad luck" recordings. Three layers prevent that:

  1. Outcome gate — only persist cassettes from tests that pass. A pytest_runtest_makereport hook stamps each test's call-phase outcome onto the cassette key; save_cassette consults it and refuses to write when the test failed. This means the failed retries pytest-rerunfailures produces before a green retry never overwrite a known-good cassette.
  2. Episode capMAX_EPISODES_PER_CASSETTE = 50 catches the pathology where a test produces non-deterministic request bodies (e.g. uuids), record_mode=new_episodes keeps appending unmatched episodes, and the cassette balloons forever. Refusing to persist past 50 surfaces the issue loudly instead of silently inflating Redis.
  3. VCR-incompatible opt-out_VCR_INCOMPATIBLE_NODEID_SUFFIXES lists the handful of tests that observe live cross-call provider state (prompt-cache warm-up, streaming response_cost calc, Bedrock Nova tool-call nondeterminism). They fall through to live calls with their existing @pytest.mark.flaky retry logic.

Resilience

  • Redis client uses retry=Retry(ExponentialBackoff(cap=2, base=0.1), retries=2) on ConnectionError / TimeoutError so a single dropped TLS socket on Upstash doesn't fail teardown.
  • load_cassette outages convert to CassetteNotFoundError (cache miss → live call, not a test setup error).
  • save_cassette outages log a warning and return (persistence is a cache, not test correctness).

Verbose mode

Set LITELLM_VCR_VERBOSE=1 to surface a per-test verdict in the live CI log alongside PASSED/FAILED markers:

PASSED tests/llm_translation/test_anthropic_completion.py::test_anthropic_basic_completion_replay
[VCR HIT] 2 replayed, 0 new (2 cassette entries) :: tests/llm_translation/test_anthropic_completion.py::test_anthropic_basic_completion_replay

Verdicts: HIT (pure replay), MISS (cold cache, recorded), PARTIAL (mix), NOOP (no HTTP traffic). Implemented as a worker-side user_properties stash that the controller's pytest_runtest_logreport hook picks up and prints — needed because xdist worker stderr is captured and only released on test failure.

Required environment

  • CASSETTE_REDIS_URL — dedicated Redis for cassettes. Already configured in CircleCI; locally, set it in .env.rc (or equivalent) and source it. If unset, VCR registration is skipped and tests fall back to live calls.
  • Provider credentials (ANTHROPIC_API_KEY, OPENAI_API_KEY, AWS_*, etc.) — only needed on cache-miss (recording). Replay needs nothing.

Flushing the cache

Force a re-record on the next run instead of waiting for the 24h TTL:

make test-llm-translation-flush-vcr-cache

The flush script only deletes keys under the litellm:vcr:cassette: prefix.

Disabling VCR

Skip the cache entirely (every call goes live, no recording):

LITELLM_VCR_DISABLE=1 uv run pytest tests/llm_translation/test_<file>.py

What's in the diff

Core infrastructure

  • tests/_vcr_redis_persister.py — Redis-backed vcrpy persister (24h TTL, litellm:vcr:cassette: key prefix), 2xx-only response filter, outcome-gated persistence, episode cap, transient-error resilience, and an aiohttp record-path patch so vcrpy doesn't drain the response stream out from under LiteLLMAiohttpTransport. Reads only CASSETTE_REDIS_URL — no fallback to the application Redis.
  • tests/_flush_vcr_cache.py — scoped flush utility (only touches keys under the litellm:vcr:cassette: prefix).
  • tests/llm_translation/conftest.py / tests/llm_responses_api_testing/conftest.py — register the Redis persister, define the vcr_config fixture (auth/header scrubbing, request-shape matching), auto-apply @pytest.mark.vcr to every test in the directory, wire the outcome-gate hook + fixture, and ship the controller-side verbose-mode pytest_runtest_logreport hook. Files using respx (which patches the same httpx transport vcrpy does) are excluded via _RESPX_CONFLICTING_FILES to avoid one library silently winning.

Tests & demo

  • tests/llm_translation/test_anthropic_completion.py — two replay tests (test_anthropic_basic_completion_replay, test_anthropic_streaming_completion_replay) demonstrating the flow on a real e2e path.
  • tests/llm_translation/test_vcr_redis_persister.py — 23 unit tests covering: roundtrip, 24h TTL, missing-key behavior, key normalization, 2xx filter coverage, transient-error handling on read & write, outcome gate (skip-on-fail, proceed-on-pass, default-on-unknown), and episode cap (refuse-above, allow-at-threshold).

Glue

  • Makefiletest-llm-translation-flush-vcr-cache target.
  • pyproject.toml / uv.lock — adds vcrpy==8.1.1 and pytest-recording==0.13.4 to dev.
  • tests/code_coverage_tests/liccheck.ini — license allowlist entry for pytest-recording.
  • litellm/litellm_core_utils/llm_request_utils.py — small null-safety fix in get_proxy_server_request_headers (when proxy_server_request is None rather than missing, the previous .get(...).get(...) chain raised AttributeError).
  • tests/llm_translation/base_llm_unit_tests.py — switched the test_async_pdf_handling_with_file_id PDF URL from Wikimedia (intermittent 400s from Anthropic's server-side fetcher) to a SHA-pinned jsDelivr mirror of the in-repo fixture (raw GitHub serves PDFs as application/octet-stream which OpenAI/Gemini reject).
  • tests/llm_translation/Readme.md — record / replay / flush / disable workflow.

Pre-Submission checklist

  • Tests added (23 unit tests in test_vcr_redis_persister.py, replay demos in test_anthropic_completion.py)
  • No raw secrets in cassettes — request/response header filter scrubs Authorization, x-api-key, anthropic-api-key, AWS sigv4, GCP keys, cookies, organization IDs, request IDs
  • Cassettes Redis is isolated from application Redis — CASSETTE_REDIS_URL only, no fallback
  • Cassettes from failed tests never poison the cache (outcome gate)
  • PR's scope is isolated to test infrastructure plus one minor null-safety fix in llm_request_utils.py and one PDF-fixture URL fix

Type

🚄 Infrastructure
✅ Test
🐛 Bug Fix (null-safety in get_proxy_server_request_headers)

Follow-ups

  • Mark the highest-cost live tests in other directories (Anthropic, then OpenAI/Bedrock/Vertex) with @pytest.mark.vcr. The auto-marker already covers llm_translation/ and llm_responses_api_testing/.
  • Add a nightly job that runs against the live API and re-records — this preserves the "smoke test against reality" guarantee while keeping per-PR runs offline.
  • Promote the conftest VCR plumbing into a shared tests/_vcr_pytest_plugin.py so other directories can opt in by importing instead of copy-pasting.

Slack Thread

Open in Web Open in Cursor 

…eplay

Live LLM e2e tests have been draining provider billing accounts and going
flaky on outages (LIT-2683). This change introduces vcrpy-backed cassette
replay so CI can exercise the same end-to-end LiteLLM transformation paths
without hitting the live provider:

- Add 'vcrpy==8.1.1' to the dev dependency group.
- New 'tests/llm_translation/vcr_config.py' centralises the VCR config:
  filters auth/secret headers and per-request response headers, matches on
  method+URI+body, and exposes 'LITELLM_VCR_RECORD_MODE' for re-recording.
- New 'tests/llm_translation/test_anthropic_completion_vcr.py' demonstrates
  the pattern with one non-streaming and one streaming Anthropic test that
  replay from cassettes shipped under 'cassettes/'.
- New 'tests/llm_translation/cassettes/_record_anthropic_fixtures.py' lets
  contributors regenerate the canned Anthropic cassettes against a local
  in-process mock (no API key required), and 'cassettes/README.md' documents
  the full record/replay/refresh workflow.
- New 'make test-llm-translation-record FILE=...' Makefile target to refresh
  cassettes against the live API.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 30, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ mateo-berri
❌ cursoragent
You have signed the CLA already but the status is still pending? Let us recheck it.

@mateo-berri
Copy link
Copy Markdown
Collaborator Author

bugbot run

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Committed cassette diverges from fixture generator output
    • Updated _record_anthropic_fixtures.py to delete cassettes before recording (so vcrpy's record_mode="all" doesn't append to stale content) and strip non-deterministic Date/Server headers, then regenerated the committed cassettes so they match the script's output byte-for-byte.
Preview (72c92920b2)
diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -185,3 +185,18 @@
 	$(UV_RUN) pytest tests/llm_translation/$(FILE) \
 		--junitxml=test-results/junit.xml \
 		-v --tb=short --maxfail=100 --timeout=300
+
+# VCR cassette helpers --------------------------------------------------------
+# Re-record a single VCR-backed translation test file against the live API.
+# Provider credentials must be exported (e.g. ANTHROPIC_API_KEY).
+#
+# Example:
+#   ANTHROPIC_API_KEY=sk-ant-... make test-llm-translation-record \
+#       FILE=test_anthropic_completion_vcr.py
+test-llm-translation-record: install-test-deps
+	@if [ -z "$(FILE)" ]; then \
+		echo "Usage: make test-llm-translation-record FILE=test_filename.py"; \
+		exit 1; \
+	fi
+	LITELLM_VCR_RECORD_MODE=once \
+		$(UV_RUN) pytest tests/llm_translation/$(FILE) -v --tb=short

diff --git a/pyproject.toml b/pyproject.toml
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -149,6 +149,7 @@
     "parameterized==0.9.0",
     "openapi-core==0.22.0; python_version < '3.14'",
     "pytest-timeout==2.4.0",
+    "vcrpy==8.1.1",
 ]
 proxy-dev = [
     "prisma==0.11.0",

diff --git a/tests/llm_translation/Readme.md b/tests/llm_translation/Readme.md
--- a/tests/llm_translation/Readme.md
+++ b/tests/llm_translation/Readme.md
@@ -1,3 +1,20 @@
-Unit tests for individual LLM providers. 
+Unit tests for individual LLM providers.
 
\ No newline at end of file
-Name of the test file is the name of the LLM provider - e.g. `test_openai.py` is for OpenAI. 
+Name of the test file is the name of the LLM provider - e.g. `test_openai.py` is for OpenAI.
+
+## VCR-backed tests
+
+Files matching `*_vcr.py` (e.g. `test_anthropic_completion_vcr.py`) replay
+recorded HTTP traffic from `cassettes/` instead of calling the real provider.
+They run offline by default — no API keys required, no per-PR cost.
+
+To re-record against the live API:
+
+```bash
+ANTHROPIC_API_KEY=sk-ant-... \
+  make test-llm-translation-record FILE=test_anthropic_completion_vcr.py
+```
+
+See [`cassettes/README.md`](./cassettes/README.md) for the full workflow,
+including how to add a new cassette-backed test and what to scrub from
+recordings before committing.

diff --git a/tests/llm_translation/cassettes/README.md b/tests/llm_translation/cassettes/README.md
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/cassettes/README.md
@@ -1,0 +1,80 @@
+# VCR cassettes for LLM translation tests
+
+This directory holds [vcrpy](https://vcrpy.readthedocs.io/) cassettes used by
+`tests/llm_translation/` to replay real provider HTTP traffic without hitting
+the live API.
+
+Why this exists is tracked in
+[LIT-2683](https://linear.app/litellm-ai/issue/LIT-2683) and discussed in
+`#sdlc` on Slack: e2e tests were repeatedly draining provider billing accounts
+and producing flaky CI on outages. Recording the HTTP exchange once and
+replaying it on subsequent runs gives us realistic provider responses
+(streaming, headers, edge-case payloads) at zero per-PR cost.
+
+## How to add a new cassette-backed test
+
+1. Pick a small, deterministic call. Avoid prompts whose output depends on
+   wall-clock time, randomness, or live web data.
+2. Add a test in a `*_vcr.py` file under `tests/llm_translation/`. Wrap it
+   with `@litellm_vcr.use_cassette("<some_name>.yaml")` from
+   `tests/llm_translation/vcr_config.py`.
+3. Record the cassette once:
+
+   ```bash
+   LITELLM_VCR_RECORD_MODE=once \
+     ANTHROPIC_API_KEY=sk-ant-... \
+     uv run pytest tests/llm_translation/test_my_provider_vcr.py::test_my_case -v
+   ```
+
+   or, equivalently:
+
+   ```bash
+   ANTHROPIC_API_KEY=sk-ant-... \
+     make test-llm-translation-record FILE=test_my_provider_vcr.py
+   ```
+
+4. Inspect the resulting YAML file:
+   - **Strip any secrets** that survived `vcr_config.py`'s header filter.
+     `vcr_config.py` already removes the common ones (`Authorization`,
+     `x-api-key`, `cookie`, AWS sigv4 headers, etc.) — but a request *body*
+     might contain a token if your test passed one inline.
+   - Trim very large response bodies if they aren't load-bearing for the
+     assertion.
+5. Commit the cassette alongside the test.
+
+## Re-recording
+
+Run the same `make test-llm-translation-record` command. vcrpy's `once` mode
+will *not* overwrite an existing cassette — delete the file first if you're
+intentionally refreshing it:
+
+```bash
+rm tests/llm_translation/cassettes/anthropic_basic_completion.yaml
+ANTHROPIC_API_KEY=sk-ant-... make test-llm-translation-record \
+    FILE=test_anthropic_completion_vcr.py
+```
+
+## Refreshing the canned Anthropic fixtures
+
+The two Anthropic cassettes in this directory
+(`anthropic_basic_completion.yaml` and `anthropic_streaming_completion.yaml`)
+are recorded against an in-process mock so contributors can regenerate them
+without an `ANTHROPIC_API_KEY`:
+
+```bash
+uv run python tests/llm_translation/cassettes/_record_anthropic_fixtures.py
+```
+
+For a full refresh against the real API, delete the cassettes first and use
+the `LITELLM_VCR_RECORD_MODE=once` path with a real key.
+
+## Don't
+
+- Don't commit cassettes containing real API keys, OAuth tokens, or PII.
+  When in doubt, `grep -i 'sk-\|bearer\|api-key' cassettes/*.yaml` after
+  recording.
+- Don't rely on cassettes for tests of *non-deterministic* behavior
+  (rate-limit retries, timeouts, the model itself making a creative choice).
+  Mock those at the LiteLLM layer instead.
+- Don't record both real and mock host names into the same cassette without
+  rewriting the URL — vcrpy matches on host/port by default.

diff --git a/tests/llm_translation/cassettes/_record_anthropic_fixtures.py b/tests/llm_translation/cassettes/_record_anthropic_fixtures.py
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/cassettes/_record_anthropic_fixtures.py
@@ -1,0 +1,258 @@
+"""Helper script that records Anthropic-shaped cassettes against a local mock.
+
+This is a *one-shot* utility, not a test. It exists so we can deterministically
+regenerate the canned Anthropic cassettes shipped under
+``tests/llm_translation/cassettes/`` without spending real provider credits and
+without needing an ``ANTHROPIC_API_KEY``.
+
+Run it with::
+
+    uv run python tests/llm_translation/cassettes/_record_anthropic_fixtures.py
+
+The script:
+
+1. Spins up a tiny in-process HTTP server that returns canned Anthropic
+   ``/v1/messages`` payloads (one non-streaming, one SSE streaming).
+2. Records LiteLLM's real outbound HTTP through vcrpy.
+3. Rewrites the cassette URL/Host so replay matches genuine
+   ``https://api.anthropic.com/v1/messages`` traffic.
+
+If you want to refresh against the *real* Anthropic API instead, use the
+``LITELLM_VCR_RECORD_MODE=once`` workflow described in
+``tests/llm_translation/vcr_config.py`` — that path needs a real API key.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import re
+import sys
+import threading
+from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
+from pathlib import Path
+from typing import Any, Iterable
+
+import vcr  # type: ignore[import-not-found]
+
+REPO_ROOT = Path(__file__).resolve().parents[3]
+sys.path.insert(0, str(REPO_ROOT))
+
+import litellm  # noqa: E402
+
+CASSETTE_DIR = Path(__file__).parent
+MOCK_HOST = "127.0.0.1"
+NON_STREAM_PORT = 18765
+STREAM_PORT = 18766
+REAL_ANTHROPIC_HOST = "api.anthropic.com"
+
+NON_STREAM_RESPONSE: dict[str, Any] = {
+    "id": "msg_01ABCDEFGHIJKLMNOPQRSTUV",
+    "type": "message",
+    "role": "assistant",
+    "model": "claude-sonnet-4-5-20250929",
+    "content": [{"type": "text", "text": "Hello! How can I help you today?"}],
+    "stop_reason": "end_turn",
+    "stop_sequence": None,
+    "usage": {
+        "input_tokens": 12,
+        "cache_creation_input_tokens": 0,
+        "cache_read_input_tokens": 0,
+        "output_tokens": 11,
+    },
+}
+
+STREAM_EVENTS: list[tuple[str, dict[str, Any]]] = [
+    (
+        "message_start",
+        {
+            "type": "message_start",
+            "message": {
+                "id": "msg_01STREAMABCDEFGH",
+                "type": "message",
+                "role": "assistant",
+                "model": "claude-sonnet-4-5-20250929",
+                "content": [],
+                "stop_reason": None,
+                "stop_sequence": None,
+                "usage": {"input_tokens": 14, "output_tokens": 1},
+            },
+        },
+    ),
+    (
+        "content_block_start",
+        {
+            "type": "content_block_start",
+            "index": 0,
+            "content_block": {"type": "text", "text": ""},
+        },
+    ),
+    (
+        "content_block_delta",
+        {
+            "type": "content_block_delta",
+            "index": 0,
+            "delta": {"type": "text_delta", "text": "Hello"},
+        },
+    ),
+    (
+        "content_block_delta",
+        {
+            "type": "content_block_delta",
+            "index": 0,
+            "delta": {"type": "text_delta", "text": " from"},
+        },
+    ),
+    (
+        "content_block_delta",
+        {
+            "type": "content_block_delta",
+            "index": 0,
+            "delta": {"type": "text_delta", "text": " LiteLLM!"},
+        },
+    ),
+    ("content_block_stop", {"type": "content_block_stop", "index": 0}),
+    (
+        "message_delta",
+        {
+            "type": "message_delta",
+            "delta": {"stop_reason": "end_turn", "stop_sequence": None},
+            "usage": {"output_tokens": 5},
+        },
+    ),
+    ("message_stop", {"type": "message_stop"}),
+]
+
+
+def _make_handler(mode: str) -> type[BaseHTTPRequestHandler]:
+    class Handler(BaseHTTPRequestHandler):
+        def log_message(self, *args: Any, **kwargs: Any) -> None:  # silence
+            return
+
+        def do_POST(self) -> None:  # noqa: N802
+            length = int(self.headers.get("Content-Length", "0"))
+            self.rfile.read(length)
+            if mode == "json":
+                body = json.dumps(NON_STREAM_RESPONSE).encode("utf-8")
+                self.send_response(200)
+                self.send_header("Content-Type", "application/json")
+                self.send_header("Content-Length", str(len(body)))
+                self.send_header("anthropic-ratelimit-requests-limit", "4000")
+                self.send_header("anthropic-ratelimit-requests-remaining", "3999")
+                self.end_headers()
+                self.wfile.write(body)
+            else:
+                self.send_response(200)
+                self.send_header("Content-Type", "text/event-stream")
+                self.send_header("Cache-Control", "no-cache")
+                self.end_headers()
+                for event_name, data in STREAM_EVENTS:
+                    chunk = (
+                        f"event: {event_name}\n" f"data: {json.dumps(data)}\n\n"
+                    ).encode("utf-8")
+                    self.wfile.write(chunk)
+                    self.wfile.flush()
+
+    return Handler
+
+
+def _serve(port: int, mode: str) -> ThreadingHTTPServer:
+    srv = ThreadingHTTPServer((MOCK_HOST, port), _make_handler(mode))
+    threading.Thread(target=srv.serve_forever, daemon=True).start()
+    return srv
+
+
+# Headers that vary every run (timestamps, server build) and must be stripped
+# so the cassette is byte-stable across regenerations. Replay does not depend
+# on them.
+_NON_DETERMINISTIC_HEADERS = ("Date", "Server")
+
+
+def _strip_nondeterministic_headers(path: Path) -> None:
+    """Remove headers whose values change every run from the cassette."""
+    text = path.read_text()
+    for header in _NON_DETERMINISTIC_HEADERS:
+        # Matches a YAML block like::
+        #
+        #       Date:
+        #       - Thu, 30 Apr 2026 00:43:16 GMT
+        #
+        # under the response ``headers:`` mapping. Indentation is fixed by vcrpy.
+        pattern = re.compile(
+            rf"^      {re.escape(header)}:\n      - .*\n",
+            re.MULTILINE,
+        )
+        text = pattern.sub("", text)
+    path.write_text(text)
+
+
+def _rewrite_cassette_to_real_host(path: Path, mock_host_port: str) -> None:
+    """Replace mock host/port in the cassette with the real Anthropic host."""
+    text = path.read_text()
+    text = text.replace(f"http://{mock_host_port}", f"https://{REAL_ANTHROPIC_HOST}")
+    text = text.replace(mock_host_port, REAL_ANTHROPIC_HOST)
+    path.write_text(text)
+    _strip_nondeterministic_headers(path)
+
+
+def _consume(iterable: Iterable[Any]) -> None:
+    for _ in iterable:
+        pass
+
+
+def record_non_streaming() -> None:
+    cassette = CASSETTE_DIR / "anthropic_basic_completion.yaml"
+    if cassette.exists():
+        cassette.unlink()
+    server = _serve(NON_STREAM_PORT, "json")
+    try:
+        my_vcr = vcr.VCR(
+            record_mode="all",
+            filter_headers=["authorization", "x-api-key", "anthropic-version"],
+        )
+        with my_vcr.use_cassette(str(cassette)):
+            response = litellm.completion(
+                model="anthropic/claude-sonnet-4-5-20250929",
+                messages=[{"role": "user", "content": "Hello!"}],
+                api_base=f"http://{MOCK_HOST}:{NON_STREAM_PORT}",
+                api_key="sk-ant-recording",
+            )
+            assert response.choices[0].message.content
+    finally:
+        server.shutdown()
+    _rewrite_cassette_to_real_host(cassette, f"{MOCK_HOST}:{NON_STREAM_PORT}")
+
+
+def record_streaming() -> None:
+    cassette = CASSETTE_DIR / "anthropic_streaming_completion.yaml"
+    if cassette.exists():
+        cassette.unlink()
+    server = _serve(STREAM_PORT, "stream")
+    try:
+        my_vcr = vcr.VCR(
+            record_mode="all",
+            filter_headers=["authorization", "x-api-key", "anthropic-version"],
+        )
+        with my_vcr.use_cassette(str(cassette)):
+            stream = litellm.completion(
+                model="anthropic/claude-sonnet-4-5-20250929",
+                messages=[{"role": "user", "content": "Hello!"}],
+                api_base=f"http://{MOCK_HOST}:{STREAM_PORT}",
+                api_key="sk-ant-recording",
+                stream=True,
+            )
+            _consume(stream)
+    finally:
+        server.shutdown()
+    _rewrite_cassette_to_real_host(cassette, f"{MOCK_HOST}:{STREAM_PORT}")
+
+
+def main() -> None:
+    os.environ.setdefault("LITELLM_LOG", "WARNING")
+    record_non_streaming()
+    record_streaming()
+    print(f"Wrote cassettes to {CASSETTE_DIR}")
+
+
+if __name__ == "__main__":
+    main()

diff --git a/tests/llm_translation/cassettes/anthropic_basic_completion.yaml b/tests/llm_translation/cassettes/anthropic_basic_completion.yaml
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/cassettes/anthropic_basic_completion.yaml
@@ -1,0 +1,41 @@
+interactions:
+- request:
+    body: '{"model": "claude-sonnet-4-5-20250929", "messages": [{"role": "user", "content":
+      [{"type": "text", "text": "Hello!"}]}], "max_tokens": 64000}'
+    headers:
+      Accept-Encoding:
+      - gzip, deflate
+      Connection:
+      - keep-alive
+      Content-Length:
+      - '141'
+      Host:
+      - api.anthropic.com
+      User-Agent:
+      - litellm/1.84.0
+      accept:
+      - application/json
+      content-type:
+      - application/json
+    method: POST
+    uri: https://api.anthropic.com/v1/messages
+  response:
+    body:
+      string: '{"id": "msg_01ABCDEFGHIJKLMNOPQRSTUV", "type": "message", "role": "assistant",
+        "model": "claude-sonnet-4-5-20250929", "content": [{"type": "text", "text":
+        "Hello! How can I help you today?"}], "stop_reason": "end_turn", "stop_sequence":
+        null, "usage": {"input_tokens": 12, "cache_creation_input_tokens": 0, "cache_read_input_tokens":
+        0, "output_tokens": 11}}'
+    headers:
+      Content-Length:
+      - '358'
+      Content-Type:
+      - application/json
+      anthropic-ratelimit-requests-limit:
+      - '4000'
+      anthropic-ratelimit-requests-remaining:
+      - '3999'
+    status:
+      code: 200
+      message: OK
+version: 1

diff --git a/tests/llm_translation/cassettes/anthropic_streaming_completion.yaml b/tests/llm_translation/cassettes/anthropic_streaming_completion.yaml
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/cassettes/anthropic_streaming_completion.yaml
@@ -1,0 +1,81 @@
+interactions:
+- request:
+    body: '{"model": "claude-sonnet-4-5-20250929", "messages": [{"role": "user", "content":
+      [{"type": "text", "text": "Hello!"}]}], "max_tokens": 64000, "stream": true}'
+    headers:
+      Accept-Encoding:
+      - gzip, deflate
+      Connection:
+      - keep-alive
+      Content-Length:
+      - '157'
+      Host:
+      - api.anthropic.com
+      User-Agent:
+      - litellm/1.84.0
+      accept:
+      - application/json
+      content-type:
+      - application/json
+    method: POST
+    uri: https://api.anthropic.com/v1/messages
+  response:
+    body:
+      string: 'event: message_start
+
+        data: {"type": "message_start", "message": {"id": "msg_01STREAMABCDEFGH",
+        "type": "message", "role": "assistant", "model": "claude-sonnet-4-5-20250929",
+        "content": [], "stop_reason": null, "stop_sequence": null, "usage": {"input_tokens":
+        14, "output_tokens": 1}}}
+
+
+        event: content_block_start
+
+        data: {"type": "content_block_start", "index": 0, "content_block": {"type":
+        "text", "text": ""}}
+
+
+        event: content_block_delta
+
+        data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta",
+        "text": "Hello"}}
+
+
+        event: content_block_delta
+
+        data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta",
+        "text": " from"}}
+
+
+        event: content_block_delta
+
+        data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta",
+        "text": " LiteLLM!"}}
+
+
+        event: content_block_stop
+
+        data: {"type": "content_block_stop", "index": 0}
+
+
+        event: message_delta
+
+        data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence":
+        null}, "usage": {"output_tokens": 5}}
+
+
+        event: message_stop
+
+        data: {"type": "message_stop"}
+
+
+        '
+    headers:
+      Cache-Control:
+      - no-cache
+      Content-Type:
+      - text/event-stream
+    status:
+      code: 200
+      message: OK
+version: 1

diff --git a/tests/llm_translation/test_anthropic_completion_vcr.py b/tests/llm_translation/test_anthropic_completion_vcr.py
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/test_anthropic_completion_vcr.py
@@ -1,0 +1,100 @@
+"""
+VCR-backed Anthropic completion tests.
+
+These tests exercise the same end-to-end ``litellm.completion`` code paths
+as ``test_anthropic_completion.py`` but replay HTTP traffic from cassettes
+under ``cassettes/`` instead of calling ``api.anthropic.com``. CI can run
+them with no API key and zero cost.
+
+To re-record after a deliberate change to request shape (or to refresh
+against the live API), set ``LITELLM_VCR_RECORD_MODE=once`` and provide a
+real ``ANTHROPIC_API_KEY``::
+
+    LITELLM_VCR_RECORD_MODE=once \\
+        ANTHROPIC_API_KEY=sk-ant-... \\
+        uv run pytest tests/llm_translation/test_anthropic_completion_vcr.py -v
+
+See ``tests/llm_translation/vcr_config.py`` and ``tests/llm_translation/cassettes/README.md``
+for the full workflow.
+"""
+
+import os
+import sys
+
+import pytest
+
+sys.path.insert(0, os.path.abspath("../.."))
+sys.path.insert(0, os.path.dirname(__file__))
+
+import litellm  # noqa: E402
+
+from vcr_config import litellm_vcr  # noqa: E402
+
+
+# A non-secret placeholder API key. We never want a real key written to a
+# cassette, and ``vcr_config`` filters Authorization / x-api-key headers
+# anyway. Using a deterministic placeholder also stops the SDK from raising
+# when ``ANTHROPIC_API_KEY`` is unset (the common CI case).
+PLACEHOLDER_ANTHROPIC_API_KEY = "sk-ant-vcr-placeholder"
+
+
+@pytest.fixture(autouse=True)
+def _placeholder_anthropic_key(monkeypatch):
+    """Provide a placeholder key when none is set so replay works offline.
+
+    If a real key is present in the environment (e.g. when re-recording),
+    we leave it untouched.
+    """
+    if not os.environ.get("ANTHROPIC_API_KEY"):
+        monkeypatch.setenv("ANTHROPIC_API_KEY", PLACEHOLDER_ANTHROPIC_API_KEY)
+
+
+@litellm_vcr.use_cassette("anthropic_basic_completion.yaml")
+def test_anthropic_basic_completion_replay():
+    """Smoke-test that a vanilla Anthropic completion replays from a cassette.
+
+    This is the canonical example for the cassette-based testing pattern:
+    no API key required at runtime, deterministic output, and the full
+    LiteLLM transformation pipeline (request shaping + response parsing)
+    runs against a real-shape Anthropic payload.
+    """
+    response = litellm.completion(
+        model="anthropic/claude-sonnet-4-5-20250929",
+        messages=[{"role": "user", "content": "Hello!"}],
+    )
+
+    assert response is not None
+    assert response.choices[0].message.content == ("Hello! How can I help you today?")
+    assert response.usage.prompt_tokens == 12
+    assert response.usage.completion_tokens == 11
+    # Anthropic sets stop_reason="end_turn" → litellm normalises to "stop"
+    assert response.choices[0].finish_reason == "stop"
+
+
+@litellm_vcr.use_cassette("anthropic_streaming_completion.yaml")
+def test_anthropic_streaming_completion_replay():
+    """Replay a streaming Anthropic completion from a cassette.
+
+    Exercises the SSE chunk parser and the public streaming surface. The
+    underlying cassette captures every ``content_block_delta`` event Anthropic
+    emits, so any regression in the streaming transformation will surface here.
+    """
+    stream = litellm.completion(
+        model="anthropic/claude-sonnet-4-5-20250929",
+        messages=[{"role": "user", "content": "Hello!"}],
+        stream=True,
+    )
+
+    collected_text = ""
+    finish_reason = None
+    for chunk in stream:
+        if not chunk.choices:
+            continue
+        delta = chunk.choices[0].delta
+        if delta and delta.content:
+            collected_text += delta.content
+        if chunk.choices[0].finish_reason:
+            finish_reason = chunk.choices[0].finish_reason
+
+    assert collected_text == "Hello from LiteLLM!"
+    assert finish_reason == "stop"

diff --git a/tests/llm_translation/vcr_config.py b/tests/llm_translation/vcr_config.py
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/vcr_config.py
@@ -1,0 +1,123 @@
+"""
+Shared VCR configuration for ``tests/llm_translation``.
+
+This module centralises the cassette setup used by tests that would otherwise
+hit a real LLM provider over the network. The goal is to let CI replay
+recorded HTTP traffic by default — no API keys required — and to provide a
+single switch for re-recording cassettes against the live provider.
+
+Usage in a test::
+
+    from .vcr_config import litellm_vcr  # noqa: E402
+
+    @litellm_vcr.use_cassette("anthropic_basic_completion.yaml")
+    def test_basic_completion():
+        resp = litellm.completion(
+            model="anthropic/claude-sonnet-4-5-20250929",
+            messages=[{"role": "user", "content": "Hello!"}],
+        )
+        assert resp.choices[0].message.content
+
+Recording mode
+--------------
+By default the cassette is replayed (``record_mode='none'``). To re-record:
+
+    LITELLM_VCR_RECORD_MODE=once \\
+        ANTHROPIC_API_KEY=sk-ant-... \\
+        uv run pytest tests/llm_translation/test_anthropic_completion_vcr.py
+
+Valid values for ``LITELLM_VCR_RECORD_MODE`` mirror vcrpy's record modes:
+``none`` (replay only — fail on missing cassette), ``once`` (record if the
+cassette doesn't exist), ``new_episodes`` (append new interactions), and
+``all`` (always re-record). See the vcrpy docs for details.
+
+Why this exists
+---------------
+Per the discussion that produced LIT-2683, our e2e tests repeatedly drained
+provider billing accounts and produced flaky CI on outages. Recording the
+HTTP exchange once and replaying it on subsequent runs gives us realistic
+provider responses (including streaming, headers, and edge-case payloads)
+without per-PR cost or rate-limit risk. Re-record periodically to catch
+real provider drift.
+"""
+
+from __future__ import annotations
+
+import os
+from pathlib import Path
+from typing import Any
+
+import vcr
+
+CASSETTE_DIR: Path = Path(__file__).parent / "cassettes"
+
+# Headers that must never be persisted to a cassette. These are matched
+# case-insensitively by vcrpy.
+_FILTERED_REQUEST_HEADERS = (
+    "authorization",
+    "x-api-key",
+    "anthropic-api-key",
+    "openai-api-key",
+    "azure-api-key",
+    "api-key",
+    "cookie",
+    "x-amz-security-token",
+    "x-amz-date",
+    "x-amz-content-sha256",
+    "amz-sdk-invocation-id",
+    "amz-sdk-request",
+)
+
+_FILTERED_RESPONSE_HEADERS = (
+    "set-cookie",
+    "x-request-id",
+    "cf-ray",
+    "anthropic-organization-id",
+    "openai-organization",
+    "request-id",
+)
+
+
+def _record_mode() -> str:
+    """Resolve the active vcrpy record mode from the environment.
+
+    Defaults to ``"none"`` so CI never accidentally hits the live provider.
+    """
+    mode = os.environ.get("LITELLM_VCR_RECORD_MODE", "none").strip().lower()
+    if mode not in {"none", "once", "new_episodes", "all"}:
+        raise ValueError(
+            f"LITELLM_VCR_RECORD_MODE={mode!r} is not a valid vcrpy record mode."
+        )
+    return mode
+
+
+def _build_vcr() -> vcr.VCR:
+    """Construct the shared ``VCR`` instance used by translation tests."""
+    return vcr.VCR(
+        cassette_library_dir=str(CASSETTE_DIR),
+        record_mode=_record_mode(),
+        # Match on method + URI + body so streaming vs non-streaming and
+        # different prompts get distinct cassettes.
+        match_on=("method", "scheme", "host", "port", "path", "query", "body"),
+        filter_headers=list(_FILTERED_REQUEST_HEADERS),
+        decode_compressed_response=True,
+    )
+
+
+def _scrub_response(response: Any) -> Any:
+    """Strip per-request response headers we don't want in the cassette."""
+    if not isinstance(response, dict):
+        return response
+    headers = response.get("headers") or {}
+    if isinstance(headers, dict):
+        for header in list(headers):
+            if header.lower() in _FILTERED_RESPONSE_HEADERS:
+                headers.pop(header, None)
+    return response
+
+
+litellm_vcr: vcr.VCR = _build_vcr()
+litellm_vcr.before_record_response = _scrub_response
+
+
+__all__ = ["litellm_vcr", "CASSETTE_DIR"]

diff --git a/uv.lock b/uv.lock
--- a/uv.lock
+++ b/uv.lock
@@ -9,7 +9,7 @@
 ]
 
 [options]
-exclude-newer = "0001-01-01T00:00:00Z" # This has no effect and is included for backwards compatibility when using relative exclude-newer values.
+exclude-newer = "2026-04-27T00:38:13.673780212Z"
 exclude-newer-span = "P3D"
 
 [manifest]
@@ -3242,6 +3242,7 @@
     { name = "types-redis" },
     { name = "types-requests" },
     { name = "types-setuptools" },
+    { name = "vcrpy" },
 ]
... diff truncated: showing 800 of 830 lines

You can send follow-ups to the cloud agent here.

Comment thread tests/llm_translation/cassettes/anthropic_basic_completion.yaml Outdated
Delete existing cassettes before recording (record_mode='all' with
vcrpy appends rather than overwriting), and strip non-deterministic
response headers (Date, Server) so re-running the helper produces a
byte-stable diff.

Regenerate the committed cassettes with the fixed script so they match
what contributors get when following the README.
@mateo-berri
Copy link
Copy Markdown
Collaborator Author

bugbot run

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 72c9292. Configure here.

@mateo-berri
Copy link
Copy Markdown
Collaborator Author

@greptileai

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

This PR introduces a Redis-backed vcrpy cache (CASSETTE_REDIS_URL) that replays recorded HTTP exchanges during CI, eliminating per-run provider costs and flakiness from upstream outages. The infrastructure is well-designed — outcome gate, episode cap, 2xx-only filter, and transient-error resilience are all present and thoroughly unit-tested with fakeredis.

One concrete issue worth addressing before enabling this in production CI: redis_key_for derives its key via os.path.relpath without an explicit start, so keys are anchored to the process's CWD at call time. Running pytest from any directory other than the repo root (e.g. cd tests/llm_translation && pytest) produces a different key prefix than CI, which silently breaks cache sharing.

Confidence Score: 4/5

Safe to merge with the CWD-sensitivity fix applied; all other findings are P2 or below

One P1 finding (redis_key_for CWD dependency can cause cache key mismatches between CI and local) caps the score at 4/5. The two P2 findings do not lower the score further.

tests/_vcr_redis_persister.py — redis_key_for key derivation and save_cassette guard ordering

Important Files Changed

Filename Overview
tests/_vcr_redis_persister.py New Redis-backed vcrpy persister with TTL, outcome gate, episode cap, and aiohttp body-rewind patch; one CWD-sensitive key-derivation path worth noting
tests/llm_translation/conftest.py Adds VCR auto-marker, persister registration, outcome-gate fixture, and verbose-mode logreport hook; respx/incompatible-test exclusion lists look correct
tests/llm_responses_api_testing/conftest.py Near-identical VCR plumbing to llm_translation/conftest.py; intentional duplication acknowledged in PR follow-ups; no per-file exclusion list (appropriate since no respx conflicts here)
tests/llm_translation/test_vcr_redis_persister.py 23 focused unit tests using fakeredis; covers roundtrip, TTL, cache miss, error resilience, outcome gate, and episode cap — thorough coverage
litellm/litellm_core_utils/llm_request_utils.py Correct null-safety fix: guards against proxy_server_request being None (not just absent) by using (... or {}).get("headers") chaining
tests/_flush_vcr_cache.py SCAN+pipeline batch-delete targeting only the litellm:vcr:cassette: prefix; correctly isolated from application Redis keys
tests/llm_translation/test_anthropic_completion.py Two new VCR replay demo tests (basic + streaming); will fall back to live API on cold cache, which is expected and intended

Reviews (3): Last reviewed commit: "Merge branch 'litellm_internal_staging' ..." | Re-trigger Greptile

Comment thread tests/llm_translation/vcr_config.py Outdated
Comment thread tests/llm_translation/cassettes/_record_anthropic_fixtures.py Outdated
cursoragent and others added 11 commits April 30, 2026 18:08
…ulk capture

Per Yuneng's feedback, use a single @pytest.mark.vcr marker so one record
sweep populates cassettes for every marked test across all providers,
instead of forcing each test to bind to a hard-coded cassette path.

Changes vs. the initial scaffolding:

- Add 'pytest-recording==0.13.4' on top of vcrpy. Adopt its layout:
  cassettes live at 'cassettes/<test_module>/<test_name>.yaml', resolved
  automatically. New tests just decorate with '@pytest.mark.vcr' — no
  imports or path bookkeeping.
- Move the shared filter/match config into a 'vcr_config' fixture in
  'tests/llm_translation/conftest.py' (consumed by pytest-recording for
  every marked test in the dir). Drop the standalone 'vcr_config.py'.
- Bulk record / replay via the standard '--record-mode' CLI flag:
  'make test-llm-translation-record' now sweeps every '@pytest.mark.vcr'
  test under tests/llm_translation in one shot. Optional 'TARGET=' var
  scopes to a single file.
- Move existing cassettes to the per-test paths and update the local
  in-process Anthropic regenerator to write to the same paths.
- Refresh README + Makefile target docs to match the sweep workflow.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
…reptile fixes

CI's license check fails on the new dev dep because liccheck cannot read
the PEP 639 'License-Expression' field that pytest-recording uses. Add
the package to the manually-verified allowlist (MIT, confirmed via PyPI
classifier).

Also addresses greptile P2 review comments:
- Add 'anthropic-version' to the request-header filter list so live and
  mock recordings produce structurally identical cassettes.
- Replace the indentation-sensitive regex in
  '_strip_nondeterministic_headers' with a YAML parse-and-rewrite so the
  helper keeps working if vcrpy ever changes its serialization style.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
Stores VCR cassettes in Redis under litellm:vcr:cassette:<rel_path> with
a 24h expiry instead of YAML on disk. The TTL means each daily CI run
starts with an aged-out cache, naturally re-records against live providers,
and surfaces upstream API drift within a day without a manual `make`
re-record sweep. Opt-in via LITELLM_VCR_REDIS=1; default behaviour is
unchanged so local dev keeps the on-disk cassettes.

before_record_response now drops non-2xx responses so a transient 5xx or
429 from a provider can't poison the cache for the rest of the TTL window.
Vcr-marked tests bump litellm.num_retries to 3 during recording so
provider-SDK exponential backoff kicks in on the cache-miss path.

Tests cover the three surfaces we depend on in CI: serialize/deserialize
roundtrip via the real vcrpy serializer, TTL is actually applied to saved
keys, cache miss raises CassetteNotFoundError so vcrpy falls through to
record mode, and 2xx-only filtering across the status-code matrix
(2xx kept, 3xx/4xx/5xx dropped, with 429 and 503 explicitly pinned).
Removes the YAML cassette feature entirely and replaces it with a
Redis-only flow. Every test in tests/llm_translation/ and
tests/llm_responses_api_testing/ is auto-marked @pytest.mark.vcr via
conftest.pytest_collection_modifyitems, so any provider call lands in
the Redis cache (litellm:vcr:cassette:<rel_path>, 24h TTL). First run
records, runs within the day replay, day rollover re-records and
surfaces upstream API drift within 24h.

VCR is on by default. Set LITELLM_VCR_DISABLE=1, or simply leave
REDIS_HOST unset, to opt out — both bypass the auto-marker entirely so
nothing about cassettes runs. record_mode is "once" so cache-miss
records and cache-hit replays.

The 8 existing respx-using files in tests/llm_translation are excluded
from the auto-marker (vcrpy and respx both patch the httpx transport;
applying both makes one silently win). The persister's own unit-test
file is also excluded so it doesn't recursively run inside a cassette.

The persister moved from tests/llm_translation/_vcr_redis_persister.py
to tests/_vcr_redis_persister.py so both conftests share it. The two
demo tests in test_anthropic_completion_vcr.py were ported into
test_anthropic_completion.py and the demo file was deleted.

Adds tests/_flush_vcr_cache.py + a Make target
(test-llm-translation-flush-vcr-cache) that scans
litellm:vcr:cassette:* and pipelines DELETEs, for the
"I want the next CI run to re-record now" workflow. Drops the now-dead
test-llm-translation-record target.

Provider keys are still required on cache-miss (which happens on first
run and once a day after that). Replay-mode runs need only Redis.
Removes commentary that restated the code, including:

- module-level banners explaining what the conftest does (covered by
  Readme.md and the function bodies)
- docstrings on _scrub_response, _before_record_response, vcr_config,
  _vcr_disabled, pytest_recording_configure (function names + bodies
  are self-evident)
- inline notes about header filtering, match_on, etc.
- per-test docstrings restating the test name

Keeps the two non-obvious notes that aren't recoverable from the code:
the vcrpy/respx httpx-transport collision rationale on
_RESPX_CONFLICTING_FILES, the vcrpy "return None to skip persisting"
contract on filter_non_2xx_response, and the fixture-ordering
dependency on _vcr_record_retries.
Provider SDKs already retry transient 5xx/429 with exponential backoff
(default max_retries=2), and pytest.mark.flaky covers test-level
retries on top of that. Setting litellm.num_retries=3 here just
multiplied the existing layers — worst case 6 (flaky) x 3 (this) x
2 (CI rerunfailures) = 36 attempts on a single test.

Removing it keeps SDK-level network-blip protection intact and
shortens worst-case latency on cache-miss runs.
@mateo-berri mateo-berri force-pushed the litellm_vcr-cassette-llm-tests-af37 branch from e98ef5c to f6a37a6 Compare May 1, 2026 00:01
…tte poisoning

record_mode='once' refused to add new requests once any cassette
existed in Redis. Combined with filter_non_2xx_response (which drops
non-2xx responses from the saved cassette) and a 24h shared-Redis TTL,
a single transient API failure mid-test left the cassette stuck with
only the leading non-API requests (e.g. the model_prices fetch from
raw.githubusercontent.com), and every subsequent run for the next 24h
errored with 'Can't overwrite existing cassette'.

new_episodes records anything not already present, so partially
populated cassettes recover on the next run instead of poisoning the
suite for a full TTL window.
litellm's default LiteLLMAiohttpTransport routes requests through aiohttp,
which sits below httpx and is invisible to vcrpy's httpx-stub interception.
Under vcrpy + aiohttp, requests reach the real network but responses come
back through the stubbed httpx transport as empty 200s, surfacing as
'Unable to get json response - Expecting value: line 1 column 1 (char 0)'
in providers like Anthropic, Gemini, and any other path that exercises the
aiohttp transport.

Disabling the aiohttp transport when the VCR persister is registered
forces all calls through pure httpx, which vcrpy can record and replay
correctly.
Azure OpenAI's responses-API DELETE endpoint rejects requests that carry
a JSON body with: "Unexpected body with size 2. This API method does
not accept a request body.". The default LiteLLMAiohttpTransport silently
elides empty-dict bodies on DELETE so this was masked, but the pure-httpx
transport (used when DISABLE_AIOHTTP_TRANSPORT=True or under vcrpy/respx
patching) sends literal '{}' (2 bytes), which Azure rejects.

Only attach json= when the provider's transform actually returned a
non-empty dict; otherwise issue a bodyless DELETE.
…itellm_vcr-cassette-llm-tests-af37

# Conflicts:
#	litellm/llms/custom_httpx/llm_http_handler.py
The Anthropic replay tests hardcoded specific token counts and content
strings ('Hello! How can I help you today?', prompt_tokens == 12). On a
fresh CI Redis those values must match a pre-recorded cassette that
doesn't exist, so the first run hits the live API and gets different
real bytes back.

Assert on shape instead: non-empty content, positive token counts,
finish_reason in the known set, and (for streaming) more than one chunk.
The tests still exercise the full transformation pipeline end-to-end and
catch shape regressions; drift in the exact text/token counts is
expected and now tolerated.
…transport

vcrpy's aiohttp stub captures response bodies via 'await response.read()',
which drains aiohttp's StreamReader. Downstream consumers of the same
ClientResponse (litellm's AiohttpResponseStream, which iterates
response.content.iter_chunked) then see an empty body and surface as
JSON 'Expecting value: line 1 column 1 (char 0)' errors on every
record-path call.

The previous workaround set litellm.disable_aiohttp_transport=True for
the whole VCR-active session, which made the tests exercise pure httpx
instead of the production aiohttp transport. That hid the production
transport from coverage and surfaced its own bugs (e.g. the Azure
DELETE-with-empty-body case fixed in upstream staging).

Replace the workaround with a targeted monkey-patch that re-feeds the
captured body into the StreamReader via unread_data after vcrpy records
it. Tests now run through the same transport customers do, both on
first record and on replay, for both unary and streaming endpoints.

Verified locally against api.anthropic.com with the production
LiteLLMAiohttpTransport: record path passes (real network, 4.2s),
replay path passes (Redis cache, 1.8s).
Stop falling back to REDIS_URL/REDIS_SSL_URL/REDIS_HOST for the VCR
persister. Sharing a Redis with the application cache risks cassettes
being wiped by tests that flush the app Redis.
@mateo-berri mateo-berri changed the title tests(llm_translation): add VCR cassette infrastructure for offline replay tests(vcr): redis-backed vcrpy cache for offline LLM e2e replay May 1, 2026
Managed Redis (e.g. Upstash) drops idle TLS connections, which surfaced
in CI as a teardown ERROR on test_gemini_image_size_limit_exceeded:

  redis.exceptions.ConnectionError: EOF occurred in violation of
  protocol (_ssl.c:2427)

Cassette persistence is a cache, not test correctness, so:

- Configure the redis client with Retry(ExponentialBackoff, retries=2)
  on ConnectionError/TimeoutError to absorb single-socket drops.
- Wrap save_cassette so a final failure logs a warning instead of
  failing teardown — the next run re-records.
- Wrap load_cassette so an outage on read becomes a cache miss
  (CassetteNotFoundError) instead of erroring in setup.
Set LITELLM_VCR_VERBOSE=1 to print a one-line cassette verdict per
test (HIT / MISS / PARTIAL / NOOP) showing replay vs new-recording
counts. Useful for local QA to confirm which tests actually exercised
the cache and which fell through to the live provider.
A test that fails (incl. all the failing retries before a passing one)
can otherwise overwrite a known-good cassette with a 'bad luck'
recording. Tests like test_prompt_caching, which assert on provider
state across two calls, can produce a 200 response that semantically
fails the assertion — the 2xx filter doesn't catch this because the
HTTP layer is fine.

- pytest_runtest_makereport hook attaches each phase report to the
  pytest item.
- _vcr_outcome_gate fixture (combining the verbose-mode reporter)
  reads the call-phase outcome at teardown and informs the persister
  via mark_test_outcome_for_cassette before vcrpy's Cassette.__exit__
  triggers save_cassette.
- save_cassette consults the per-key 'did the test pass?' flag and
  short-circuits when False, leaving any prior good recording intact.
- Defaults to passed=True when no marker is present so non-test
  usage of the persister still works.
Some tests can't benefit from cassette replay because they assert on
state that only exists in the live provider between two calls (e.g.
prompt-cache propagation, intermittent provider quirks). Marking them
with @pytest.mark.vcr just wastes cycles trying to record cassettes
they will never replay against successfully.

Opt-out by nodeid suffix so subclassed/parametrized variants are
covered:

- ::test_prompt_caching — Anthropic/Bedrock prompt-cache propagation
  isn't deterministic in the 0–1s window the test gives it.
- ::test_async_pdf_handling_with_file_id — flaky upstream Wikipedia
  fetch through the Anthropic Files API.
- TestBedrockInvokeNovaJson::test_json_response_pydantic_obj —
  Bedrock Nova returns tool_call vs JSON nondeterministically (other
  providers' subclasses are healthy).
- ::test_bedrock_converse__streaming_passthrough — Bedrock streaming
  response_cost calc returns None intermittently.

These tests keep their existing @pytest.mark.flaky retry behavior.
A test that produces non-deterministic request bodies (e.g. uuid in
the prompt) under record_mode=new_episodes never replays — every CI
run appends fresh unmatched episodes. The cassette grows unbounded
over time and silently inflates Redis (we observed one cassette at
22 episodes / ~860KB after ~5 CI runs).

Refuse the save when episode count exceeds MAX_EPISODES_PER_CASSETTE
so the pathology surfaces with a loud warning that points to the
opt-out fix instead of festering invisibly.
Anthropic's URL fetcher intermittently returns 400 'Unable to download
the file' for the Wikipedia URL the test was using. Point it at the
repo's existing tests/llm_translation/fixtures/dummy.pdf via raw
GitHub instead — small, deterministic, reliably fetchable.

With a stable URL the test no longer needs to be opted out of VCR;
remove it from the incompatible list so it can replay from cassette.
Previously, the per-test [VCR HIT/MISS/...] line was written via
TerminalReporter.write_line from inside fixture teardown. Pytest
captures that stream by default and only surfaces it on FAILED tests
(under 'Captured stdout teardown'), so passing tests' verdicts were
invisible in CI logs and the user couldn't tell whether the cache
was working.

Write directly to sys.__stderr__ so the line bypasses pytest's
capture entirely. Under xdist each worker has its own __stderr__
which CircleCI aggregates into the live job log alongside the
PASSED/FAILED markers.
… MIME

Raw github serves application/octet-stream which OpenAI/Gemini reject
when LiteLLM fetches the URL client-side. jsDelivr serves the same
file with content-type: application/pdf. Pin to a commit SHA so the
asset is immutable and jsDelivr can cache it for a year.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…orter

Previous attempt wrote to sys.__stderr__ from the test fixture. Under
xdist, fixtures run inside worker subprocesses whose stderr is captured
by the controller and only released to the live log on test failure —
so passing tests' verdicts were silently swallowed.

Round-trip via report.user_properties: the worker-side fixture stashes
the verdict on user_properties, xdist serializes it onto the report,
and a controller-side pytest_runtest_logreport hook writes it via the
TerminalReporter (the same plugin that emits PASSED/FAILED markers).
TerminalReporter is resolved lazily on first hook call because it's
not yet registered when conftest's pytest_configure runs.

Verified locally in both serial and xdist modes.
Remove explanatory comments that restated what the code already says.
Kept only those that document non-obvious external contracts (the aiohttp
record-path patch's reason for re-feeding the body, and the warning
messages inside save_cassette that reach the user).
@mateo-berri
Copy link
Copy Markdown
Collaborator Author

bugbot run

@mateo-berri
Copy link
Copy Markdown
Collaborator Author

@greptileai

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 0543953. Configure here.

@mateo-berri mateo-berri marked this pull request as ready for review May 1, 2026 22:10
@mateo-berri mateo-berri requested a review from yuneng-berri May 1, 2026 22:10
Comment on lines +24 to +29
def redis_key_for(cassette_path: str) -> str:
rel = os.path.relpath(str(cassette_path))
if rel.endswith(".yaml"):
rel = rel[: -len(".yaml")]
rel = rel.replace("/cassettes/", "/").lstrip("./")
return f"{REDIS_KEY_PREFIX}{rel}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 redis_key_for calls os.path.relpath without an explicit start argument, so the result depends on the process's current working directory at call time. If pytest is invoked from tests/llm_translation/ instead of the repo root, the relative path produced omits the tests/llm_translation/ prefix, yielding a key like litellm:vcr:cassette:test_anthropic/test_foo rather than litellm:vcr:cassette:tests/llm_translation/test_anthropic/test_foo. CI-recorded cassettes and locally-run replays would then map to different Redis keys and never share the cache. Anchoring to a fixed repo-root directory eliminates this.

Suggested change
def redis_key_for(cassette_path: str) -> str:
rel = os.path.relpath(str(cassette_path))
if rel.endswith(".yaml"):
rel = rel[: -len(".yaml")]
rel = rel.replace("/cassettes/", "/").lstrip("./")
return f"{REDIS_KEY_PREFIX}{rel}"
def redis_key_for(cassette_path: str) -> str:
_repo_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
rel = os.path.relpath(str(cassette_path), start=_repo_root)
if rel.endswith(".yaml"):
rel = rel[: -len(".yaml")]
rel = rel.replace("/cassettes/", "/").lstrip("./")
return f"{REDIS_KEY_PREFIX}{rel}"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, technically true. But this is always run from CI/CD so it doesn't really matter

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point — if CI always runs from the repo root and local dev is expected to do the same, the CWD sensitivity is a non-issue in practice.

Tip: You can give me more context with .greptile/rules.md so I get this right next time.

@yuneng-berri yuneng-berri enabled auto-merge May 1, 2026 22:16
@yuneng-berri yuneng-berri merged commit d07cdd4 into litellm_internal_staging May 1, 2026
118 checks passed
@yuneng-berri yuneng-berri deleted the litellm_vcr-cassette-llm-tests-af37 branch May 1, 2026 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants