tests(vcr): redis-backed vcrpy cache for offline LLM e2e replay#26838
tests(vcr): redis-backed vcrpy cache for offline LLM e2e replay#26838yuneng-berri merged 32 commits intolitellm_internal_stagingfrom
Conversation
…eplay Live LLM e2e tests have been draining provider billing accounts and going flaky on outages (LIT-2683). This change introduces vcrpy-backed cassette replay so CI can exercise the same end-to-end LiteLLM transformation paths without hitting the live provider: - Add 'vcrpy==8.1.1' to the dev dependency group. - New 'tests/llm_translation/vcr_config.py' centralises the VCR config: filters auth/secret headers and per-request response headers, matches on method+URI+body, and exposes 'LITELLM_VCR_RECORD_MODE' for re-recording. - New 'tests/llm_translation/test_anthropic_completion_vcr.py' demonstrates the pattern with one non-streaming and one streaming Anthropic test that replay from cassettes shipped under 'cassettes/'. - New 'tests/llm_translation/cassettes/_record_anthropic_fixtures.py' lets contributors regenerate the canned Anthropic cassettes against a local in-process mock (no API key required), and 'cassettes/README.md' documents the full record/replay/refresh workflow. - New 'make test-llm-translation-record FILE=...' Makefile target to refresh cassettes against the live API. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
|
|
|
bugbot run |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Committed cassette diverges from fixture generator output
- Updated
_record_anthropic_fixtures.pyto delete cassettes before recording (so vcrpy'srecord_mode="all"doesn't append to stale content) and strip non-deterministicDate/Serverheaders, then regenerated the committed cassettes so they match the script's output byte-for-byte.
- Updated
Preview (72c92920b2)
diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -185,3 +185,18 @@
$(UV_RUN) pytest tests/llm_translation/$(FILE) \
--junitxml=test-results/junit.xml \
-v --tb=short --maxfail=100 --timeout=300
+
+# VCR cassette helpers --------------------------------------------------------
+# Re-record a single VCR-backed translation test file against the live API.
+# Provider credentials must be exported (e.g. ANTHROPIC_API_KEY).
+#
+# Example:
+# ANTHROPIC_API_KEY=sk-ant-... make test-llm-translation-record \
+# FILE=test_anthropic_completion_vcr.py
+test-llm-translation-record: install-test-deps
+ @if [ -z "$(FILE)" ]; then \
+ echo "Usage: make test-llm-translation-record FILE=test_filename.py"; \
+ exit 1; \
+ fi
+ LITELLM_VCR_RECORD_MODE=once \
+ $(UV_RUN) pytest tests/llm_translation/$(FILE) -v --tb=short
diff --git a/pyproject.toml b/pyproject.toml
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -149,6 +149,7 @@
"parameterized==0.9.0",
"openapi-core==0.22.0; python_version < '3.14'",
"pytest-timeout==2.4.0",
+ "vcrpy==8.1.1",
]
proxy-dev = [
"prisma==0.11.0",
diff --git a/tests/llm_translation/Readme.md b/tests/llm_translation/Readme.md
--- a/tests/llm_translation/Readme.md
+++ b/tests/llm_translation/Readme.md
@@ -1,3 +1,20 @@
-Unit tests for individual LLM providers.
+Unit tests for individual LLM providers.
\ No newline at end of file
-Name of the test file is the name of the LLM provider - e.g. `test_openai.py` is for OpenAI.
+Name of the test file is the name of the LLM provider - e.g. `test_openai.py` is for OpenAI.
+
+## VCR-backed tests
+
+Files matching `*_vcr.py` (e.g. `test_anthropic_completion_vcr.py`) replay
+recorded HTTP traffic from `cassettes/` instead of calling the real provider.
+They run offline by default — no API keys required, no per-PR cost.
+
+To re-record against the live API:
+
+```bash
+ANTHROPIC_API_KEY=sk-ant-... \
+ make test-llm-translation-record FILE=test_anthropic_completion_vcr.py
+```
+
+See [`cassettes/README.md`](./cassettes/README.md) for the full workflow,
+including how to add a new cassette-backed test and what to scrub from
+recordings before committing.
diff --git a/tests/llm_translation/cassettes/README.md b/tests/llm_translation/cassettes/README.md
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/cassettes/README.md
@@ -1,0 +1,80 @@
+# VCR cassettes for LLM translation tests
+
+This directory holds [vcrpy](https://vcrpy.readthedocs.io/) cassettes used by
+`tests/llm_translation/` to replay real provider HTTP traffic without hitting
+the live API.
+
+Why this exists is tracked in
+[LIT-2683](https://linear.app/litellm-ai/issue/LIT-2683) and discussed in
+`#sdlc` on Slack: e2e tests were repeatedly draining provider billing accounts
+and producing flaky CI on outages. Recording the HTTP exchange once and
+replaying it on subsequent runs gives us realistic provider responses
+(streaming, headers, edge-case payloads) at zero per-PR cost.
+
+## How to add a new cassette-backed test
+
+1. Pick a small, deterministic call. Avoid prompts whose output depends on
+ wall-clock time, randomness, or live web data.
+2. Add a test in a `*_vcr.py` file under `tests/llm_translation/`. Wrap it
+ with `@litellm_vcr.use_cassette("<some_name>.yaml")` from
+ `tests/llm_translation/vcr_config.py`.
+3. Record the cassette once:
+
+ ```bash
+ LITELLM_VCR_RECORD_MODE=once \
+ ANTHROPIC_API_KEY=sk-ant-... \
+ uv run pytest tests/llm_translation/test_my_provider_vcr.py::test_my_case -v
+ ```
+
+ or, equivalently:
+
+ ```bash
+ ANTHROPIC_API_KEY=sk-ant-... \
+ make test-llm-translation-record FILE=test_my_provider_vcr.py
+ ```
+
+4. Inspect the resulting YAML file:
+ - **Strip any secrets** that survived `vcr_config.py`'s header filter.
+ `vcr_config.py` already removes the common ones (`Authorization`,
+ `x-api-key`, `cookie`, AWS sigv4 headers, etc.) — but a request *body*
+ might contain a token if your test passed one inline.
+ - Trim very large response bodies if they aren't load-bearing for the
+ assertion.
+5. Commit the cassette alongside the test.
+
+## Re-recording
+
+Run the same `make test-llm-translation-record` command. vcrpy's `once` mode
+will *not* overwrite an existing cassette — delete the file first if you're
+intentionally refreshing it:
+
+```bash
+rm tests/llm_translation/cassettes/anthropic_basic_completion.yaml
+ANTHROPIC_API_KEY=sk-ant-... make test-llm-translation-record \
+ FILE=test_anthropic_completion_vcr.py
+```
+
+## Refreshing the canned Anthropic fixtures
+
+The two Anthropic cassettes in this directory
+(`anthropic_basic_completion.yaml` and `anthropic_streaming_completion.yaml`)
+are recorded against an in-process mock so contributors can regenerate them
+without an `ANTHROPIC_API_KEY`:
+
+```bash
+uv run python tests/llm_translation/cassettes/_record_anthropic_fixtures.py
+```
+
+For a full refresh against the real API, delete the cassettes first and use
+the `LITELLM_VCR_RECORD_MODE=once` path with a real key.
+
+## Don't
+
+- Don't commit cassettes containing real API keys, OAuth tokens, or PII.
+ When in doubt, `grep -i 'sk-\|bearer\|api-key' cassettes/*.yaml` after
+ recording.
+- Don't rely on cassettes for tests of *non-deterministic* behavior
+ (rate-limit retries, timeouts, the model itself making a creative choice).
+ Mock those at the LiteLLM layer instead.
+- Don't record both real and mock host names into the same cassette without
+ rewriting the URL — vcrpy matches on host/port by default.
diff --git a/tests/llm_translation/cassettes/_record_anthropic_fixtures.py b/tests/llm_translation/cassettes/_record_anthropic_fixtures.py
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/cassettes/_record_anthropic_fixtures.py
@@ -1,0 +1,258 @@
+"""Helper script that records Anthropic-shaped cassettes against a local mock.
+
+This is a *one-shot* utility, not a test. It exists so we can deterministically
+regenerate the canned Anthropic cassettes shipped under
+``tests/llm_translation/cassettes/`` without spending real provider credits and
+without needing an ``ANTHROPIC_API_KEY``.
+
+Run it with::
+
+ uv run python tests/llm_translation/cassettes/_record_anthropic_fixtures.py
+
+The script:
+
+1. Spins up a tiny in-process HTTP server that returns canned Anthropic
+ ``/v1/messages`` payloads (one non-streaming, one SSE streaming).
+2. Records LiteLLM's real outbound HTTP through vcrpy.
+3. Rewrites the cassette URL/Host so replay matches genuine
+ ``https://api.anthropic.com/v1/messages`` traffic.
+
+If you want to refresh against the *real* Anthropic API instead, use the
+``LITELLM_VCR_RECORD_MODE=once`` workflow described in
+``tests/llm_translation/vcr_config.py`` — that path needs a real API key.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import re
+import sys
+import threading
+from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
+from pathlib import Path
+from typing import Any, Iterable
+
+import vcr # type: ignore[import-not-found]
+
+REPO_ROOT = Path(__file__).resolve().parents[3]
+sys.path.insert(0, str(REPO_ROOT))
+
+import litellm # noqa: E402
+
+CASSETTE_DIR = Path(__file__).parent
+MOCK_HOST = "127.0.0.1"
+NON_STREAM_PORT = 18765
+STREAM_PORT = 18766
+REAL_ANTHROPIC_HOST = "api.anthropic.com"
+
+NON_STREAM_RESPONSE: dict[str, Any] = {
+ "id": "msg_01ABCDEFGHIJKLMNOPQRSTUV",
+ "type": "message",
+ "role": "assistant",
+ "model": "claude-sonnet-4-5-20250929",
+ "content": [{"type": "text", "text": "Hello! How can I help you today?"}],
+ "stop_reason": "end_turn",
+ "stop_sequence": None,
+ "usage": {
+ "input_tokens": 12,
+ "cache_creation_input_tokens": 0,
+ "cache_read_input_tokens": 0,
+ "output_tokens": 11,
+ },
+}
+
+STREAM_EVENTS: list[tuple[str, dict[str, Any]]] = [
+ (
+ "message_start",
+ {
+ "type": "message_start",
+ "message": {
+ "id": "msg_01STREAMABCDEFGH",
+ "type": "message",
+ "role": "assistant",
+ "model": "claude-sonnet-4-5-20250929",
+ "content": [],
+ "stop_reason": None,
+ "stop_sequence": None,
+ "usage": {"input_tokens": 14, "output_tokens": 1},
+ },
+ },
+ ),
+ (
+ "content_block_start",
+ {
+ "type": "content_block_start",
+ "index": 0,
+ "content_block": {"type": "text", "text": ""},
+ },
+ ),
+ (
+ "content_block_delta",
+ {
+ "type": "content_block_delta",
+ "index": 0,
+ "delta": {"type": "text_delta", "text": "Hello"},
+ },
+ ),
+ (
+ "content_block_delta",
+ {
+ "type": "content_block_delta",
+ "index": 0,
+ "delta": {"type": "text_delta", "text": " from"},
+ },
+ ),
+ (
+ "content_block_delta",
+ {
+ "type": "content_block_delta",
+ "index": 0,
+ "delta": {"type": "text_delta", "text": " LiteLLM!"},
+ },
+ ),
+ ("content_block_stop", {"type": "content_block_stop", "index": 0}),
+ (
+ "message_delta",
+ {
+ "type": "message_delta",
+ "delta": {"stop_reason": "end_turn", "stop_sequence": None},
+ "usage": {"output_tokens": 5},
+ },
+ ),
+ ("message_stop", {"type": "message_stop"}),
+]
+
+
+def _make_handler(mode: str) -> type[BaseHTTPRequestHandler]:
+ class Handler(BaseHTTPRequestHandler):
+ def log_message(self, *args: Any, **kwargs: Any) -> None: # silence
+ return
+
+ def do_POST(self) -> None: # noqa: N802
+ length = int(self.headers.get("Content-Length", "0"))
+ self.rfile.read(length)
+ if mode == "json":
+ body = json.dumps(NON_STREAM_RESPONSE).encode("utf-8")
+ self.send_response(200)
+ self.send_header("Content-Type", "application/json")
+ self.send_header("Content-Length", str(len(body)))
+ self.send_header("anthropic-ratelimit-requests-limit", "4000")
+ self.send_header("anthropic-ratelimit-requests-remaining", "3999")
+ self.end_headers()
+ self.wfile.write(body)
+ else:
+ self.send_response(200)
+ self.send_header("Content-Type", "text/event-stream")
+ self.send_header("Cache-Control", "no-cache")
+ self.end_headers()
+ for event_name, data in STREAM_EVENTS:
+ chunk = (
+ f"event: {event_name}\n" f"data: {json.dumps(data)}\n\n"
+ ).encode("utf-8")
+ self.wfile.write(chunk)
+ self.wfile.flush()
+
+ return Handler
+
+
+def _serve(port: int, mode: str) -> ThreadingHTTPServer:
+ srv = ThreadingHTTPServer((MOCK_HOST, port), _make_handler(mode))
+ threading.Thread(target=srv.serve_forever, daemon=True).start()
+ return srv
+
+
+# Headers that vary every run (timestamps, server build) and must be stripped
+# so the cassette is byte-stable across regenerations. Replay does not depend
+# on them.
+_NON_DETERMINISTIC_HEADERS = ("Date", "Server")
+
+
+def _strip_nondeterministic_headers(path: Path) -> None:
+ """Remove headers whose values change every run from the cassette."""
+ text = path.read_text()
+ for header in _NON_DETERMINISTIC_HEADERS:
+ # Matches a YAML block like::
+ #
+ # Date:
+ # - Thu, 30 Apr 2026 00:43:16 GMT
+ #
+ # under the response ``headers:`` mapping. Indentation is fixed by vcrpy.
+ pattern = re.compile(
+ rf"^ {re.escape(header)}:\n - .*\n",
+ re.MULTILINE,
+ )
+ text = pattern.sub("", text)
+ path.write_text(text)
+
+
+def _rewrite_cassette_to_real_host(path: Path, mock_host_port: str) -> None:
+ """Replace mock host/port in the cassette with the real Anthropic host."""
+ text = path.read_text()
+ text = text.replace(f"http://{mock_host_port}", f"https://{REAL_ANTHROPIC_HOST}")
+ text = text.replace(mock_host_port, REAL_ANTHROPIC_HOST)
+ path.write_text(text)
+ _strip_nondeterministic_headers(path)
+
+
+def _consume(iterable: Iterable[Any]) -> None:
+ for _ in iterable:
+ pass
+
+
+def record_non_streaming() -> None:
+ cassette = CASSETTE_DIR / "anthropic_basic_completion.yaml"
+ if cassette.exists():
+ cassette.unlink()
+ server = _serve(NON_STREAM_PORT, "json")
+ try:
+ my_vcr = vcr.VCR(
+ record_mode="all",
+ filter_headers=["authorization", "x-api-key", "anthropic-version"],
+ )
+ with my_vcr.use_cassette(str(cassette)):
+ response = litellm.completion(
+ model="anthropic/claude-sonnet-4-5-20250929",
+ messages=[{"role": "user", "content": "Hello!"}],
+ api_base=f"http://{MOCK_HOST}:{NON_STREAM_PORT}",
+ api_key="sk-ant-recording",
+ )
+ assert response.choices[0].message.content
+ finally:
+ server.shutdown()
+ _rewrite_cassette_to_real_host(cassette, f"{MOCK_HOST}:{NON_STREAM_PORT}")
+
+
+def record_streaming() -> None:
+ cassette = CASSETTE_DIR / "anthropic_streaming_completion.yaml"
+ if cassette.exists():
+ cassette.unlink()
+ server = _serve(STREAM_PORT, "stream")
+ try:
+ my_vcr = vcr.VCR(
+ record_mode="all",
+ filter_headers=["authorization", "x-api-key", "anthropic-version"],
+ )
+ with my_vcr.use_cassette(str(cassette)):
+ stream = litellm.completion(
+ model="anthropic/claude-sonnet-4-5-20250929",
+ messages=[{"role": "user", "content": "Hello!"}],
+ api_base=f"http://{MOCK_HOST}:{STREAM_PORT}",
+ api_key="sk-ant-recording",
+ stream=True,
+ )
+ _consume(stream)
+ finally:
+ server.shutdown()
+ _rewrite_cassette_to_real_host(cassette, f"{MOCK_HOST}:{STREAM_PORT}")
+
+
+def main() -> None:
+ os.environ.setdefault("LITELLM_LOG", "WARNING")
+ record_non_streaming()
+ record_streaming()
+ print(f"Wrote cassettes to {CASSETTE_DIR}")
+
+
+if __name__ == "__main__":
+ main()
diff --git a/tests/llm_translation/cassettes/anthropic_basic_completion.yaml b/tests/llm_translation/cassettes/anthropic_basic_completion.yaml
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/cassettes/anthropic_basic_completion.yaml
@@ -1,0 +1,41 @@
+interactions:
+- request:
+ body: '{"model": "claude-sonnet-4-5-20250929", "messages": [{"role": "user", "content":
+ [{"type": "text", "text": "Hello!"}]}], "max_tokens": 64000}'
+ headers:
+ Accept-Encoding:
+ - gzip, deflate
+ Connection:
+ - keep-alive
+ Content-Length:
+ - '141'
+ Host:
+ - api.anthropic.com
+ User-Agent:
+ - litellm/1.84.0
+ accept:
+ - application/json
+ content-type:
+ - application/json
+ method: POST
+ uri: https://api.anthropic.com/v1/messages
+ response:
+ body:
+ string: '{"id": "msg_01ABCDEFGHIJKLMNOPQRSTUV", "type": "message", "role": "assistant",
+ "model": "claude-sonnet-4-5-20250929", "content": [{"type": "text", "text":
+ "Hello! How can I help you today?"}], "stop_reason": "end_turn", "stop_sequence":
+ null, "usage": {"input_tokens": 12, "cache_creation_input_tokens": 0, "cache_read_input_tokens":
+ 0, "output_tokens": 11}}'
+ headers:
+ Content-Length:
+ - '358'
+ Content-Type:
+ - application/json
+ anthropic-ratelimit-requests-limit:
+ - '4000'
+ anthropic-ratelimit-requests-remaining:
+ - '3999'
+ status:
+ code: 200
+ message: OK
+version: 1
diff --git a/tests/llm_translation/cassettes/anthropic_streaming_completion.yaml b/tests/llm_translation/cassettes/anthropic_streaming_completion.yaml
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/cassettes/anthropic_streaming_completion.yaml
@@ -1,0 +1,81 @@
+interactions:
+- request:
+ body: '{"model": "claude-sonnet-4-5-20250929", "messages": [{"role": "user", "content":
+ [{"type": "text", "text": "Hello!"}]}], "max_tokens": 64000, "stream": true}'
+ headers:
+ Accept-Encoding:
+ - gzip, deflate
+ Connection:
+ - keep-alive
+ Content-Length:
+ - '157'
+ Host:
+ - api.anthropic.com
+ User-Agent:
+ - litellm/1.84.0
+ accept:
+ - application/json
+ content-type:
+ - application/json
+ method: POST
+ uri: https://api.anthropic.com/v1/messages
+ response:
+ body:
+ string: 'event: message_start
+
+ data: {"type": "message_start", "message": {"id": "msg_01STREAMABCDEFGH",
+ "type": "message", "role": "assistant", "model": "claude-sonnet-4-5-20250929",
+ "content": [], "stop_reason": null, "stop_sequence": null, "usage": {"input_tokens":
+ 14, "output_tokens": 1}}}
+
+
+ event: content_block_start
+
+ data: {"type": "content_block_start", "index": 0, "content_block": {"type":
+ "text", "text": ""}}
+
+
+ event: content_block_delta
+
+ data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta",
+ "text": "Hello"}}
+
+
+ event: content_block_delta
+
+ data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta",
+ "text": " from"}}
+
+
+ event: content_block_delta
+
+ data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta",
+ "text": " LiteLLM!"}}
+
+
+ event: content_block_stop
+
+ data: {"type": "content_block_stop", "index": 0}
+
+
+ event: message_delta
+
+ data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence":
+ null}, "usage": {"output_tokens": 5}}
+
+
+ event: message_stop
+
+ data: {"type": "message_stop"}
+
+
+ '
+ headers:
+ Cache-Control:
+ - no-cache
+ Content-Type:
+ - text/event-stream
+ status:
+ code: 200
+ message: OK
+version: 1
diff --git a/tests/llm_translation/test_anthropic_completion_vcr.py b/tests/llm_translation/test_anthropic_completion_vcr.py
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/test_anthropic_completion_vcr.py
@@ -1,0 +1,100 @@
+"""
+VCR-backed Anthropic completion tests.
+
+These tests exercise the same end-to-end ``litellm.completion`` code paths
+as ``test_anthropic_completion.py`` but replay HTTP traffic from cassettes
+under ``cassettes/`` instead of calling ``api.anthropic.com``. CI can run
+them with no API key and zero cost.
+
+To re-record after a deliberate change to request shape (or to refresh
+against the live API), set ``LITELLM_VCR_RECORD_MODE=once`` and provide a
+real ``ANTHROPIC_API_KEY``::
+
+ LITELLM_VCR_RECORD_MODE=once \\
+ ANTHROPIC_API_KEY=sk-ant-... \\
+ uv run pytest tests/llm_translation/test_anthropic_completion_vcr.py -v
+
+See ``tests/llm_translation/vcr_config.py`` and ``tests/llm_translation/cassettes/README.md``
+for the full workflow.
+"""
+
+import os
+import sys
+
+import pytest
+
+sys.path.insert(0, os.path.abspath("../.."))
+sys.path.insert(0, os.path.dirname(__file__))
+
+import litellm # noqa: E402
+
+from vcr_config import litellm_vcr # noqa: E402
+
+
+# A non-secret placeholder API key. We never want a real key written to a
+# cassette, and ``vcr_config`` filters Authorization / x-api-key headers
+# anyway. Using a deterministic placeholder also stops the SDK from raising
+# when ``ANTHROPIC_API_KEY`` is unset (the common CI case).
+PLACEHOLDER_ANTHROPIC_API_KEY = "sk-ant-vcr-placeholder"
+
+
+@pytest.fixture(autouse=True)
+def _placeholder_anthropic_key(monkeypatch):
+ """Provide a placeholder key when none is set so replay works offline.
+
+ If a real key is present in the environment (e.g. when re-recording),
+ we leave it untouched.
+ """
+ if not os.environ.get("ANTHROPIC_API_KEY"):
+ monkeypatch.setenv("ANTHROPIC_API_KEY", PLACEHOLDER_ANTHROPIC_API_KEY)
+
+
+@litellm_vcr.use_cassette("anthropic_basic_completion.yaml")
+def test_anthropic_basic_completion_replay():
+ """Smoke-test that a vanilla Anthropic completion replays from a cassette.
+
+ This is the canonical example for the cassette-based testing pattern:
+ no API key required at runtime, deterministic output, and the full
+ LiteLLM transformation pipeline (request shaping + response parsing)
+ runs against a real-shape Anthropic payload.
+ """
+ response = litellm.completion(
+ model="anthropic/claude-sonnet-4-5-20250929",
+ messages=[{"role": "user", "content": "Hello!"}],
+ )
+
+ assert response is not None
+ assert response.choices[0].message.content == ("Hello! How can I help you today?")
+ assert response.usage.prompt_tokens == 12
+ assert response.usage.completion_tokens == 11
+ # Anthropic sets stop_reason="end_turn" → litellm normalises to "stop"
+ assert response.choices[0].finish_reason == "stop"
+
+
+@litellm_vcr.use_cassette("anthropic_streaming_completion.yaml")
+def test_anthropic_streaming_completion_replay():
+ """Replay a streaming Anthropic completion from a cassette.
+
+ Exercises the SSE chunk parser and the public streaming surface. The
+ underlying cassette captures every ``content_block_delta`` event Anthropic
+ emits, so any regression in the streaming transformation will surface here.
+ """
+ stream = litellm.completion(
+ model="anthropic/claude-sonnet-4-5-20250929",
+ messages=[{"role": "user", "content": "Hello!"}],
+ stream=True,
+ )
+
+ collected_text = ""
+ finish_reason = None
+ for chunk in stream:
+ if not chunk.choices:
+ continue
+ delta = chunk.choices[0].delta
+ if delta and delta.content:
+ collected_text += delta.content
+ if chunk.choices[0].finish_reason:
+ finish_reason = chunk.choices[0].finish_reason
+
+ assert collected_text == "Hello from LiteLLM!"
+ assert finish_reason == "stop"
diff --git a/tests/llm_translation/vcr_config.py b/tests/llm_translation/vcr_config.py
new file mode 100644
--- /dev/null
+++ b/tests/llm_translation/vcr_config.py
@@ -1,0 +1,123 @@
+"""
+Shared VCR configuration for ``tests/llm_translation``.
+
+This module centralises the cassette setup used by tests that would otherwise
+hit a real LLM provider over the network. The goal is to let CI replay
+recorded HTTP traffic by default — no API keys required — and to provide a
+single switch for re-recording cassettes against the live provider.
+
+Usage in a test::
+
+ from .vcr_config import litellm_vcr # noqa: E402
+
+ @litellm_vcr.use_cassette("anthropic_basic_completion.yaml")
+ def test_basic_completion():
+ resp = litellm.completion(
+ model="anthropic/claude-sonnet-4-5-20250929",
+ messages=[{"role": "user", "content": "Hello!"}],
+ )
+ assert resp.choices[0].message.content
+
+Recording mode
+--------------
+By default the cassette is replayed (``record_mode='none'``). To re-record:
+
+ LITELLM_VCR_RECORD_MODE=once \\
+ ANTHROPIC_API_KEY=sk-ant-... \\
+ uv run pytest tests/llm_translation/test_anthropic_completion_vcr.py
+
+Valid values for ``LITELLM_VCR_RECORD_MODE`` mirror vcrpy's record modes:
+``none`` (replay only — fail on missing cassette), ``once`` (record if the
+cassette doesn't exist), ``new_episodes`` (append new interactions), and
+``all`` (always re-record). See the vcrpy docs for details.
+
+Why this exists
+---------------
+Per the discussion that produced LIT-2683, our e2e tests repeatedly drained
+provider billing accounts and produced flaky CI on outages. Recording the
+HTTP exchange once and replaying it on subsequent runs gives us realistic
+provider responses (including streaming, headers, and edge-case payloads)
+without per-PR cost or rate-limit risk. Re-record periodically to catch
+real provider drift.
+"""
+
+from __future__ import annotations
+
+import os
+from pathlib import Path
+from typing import Any
+
+import vcr
+
+CASSETTE_DIR: Path = Path(__file__).parent / "cassettes"
+
+# Headers that must never be persisted to a cassette. These are matched
+# case-insensitively by vcrpy.
+_FILTERED_REQUEST_HEADERS = (
+ "authorization",
+ "x-api-key",
+ "anthropic-api-key",
+ "openai-api-key",
+ "azure-api-key",
+ "api-key",
+ "cookie",
+ "x-amz-security-token",
+ "x-amz-date",
+ "x-amz-content-sha256",
+ "amz-sdk-invocation-id",
+ "amz-sdk-request",
+)
+
+_FILTERED_RESPONSE_HEADERS = (
+ "set-cookie",
+ "x-request-id",
+ "cf-ray",
+ "anthropic-organization-id",
+ "openai-organization",
+ "request-id",
+)
+
+
+def _record_mode() -> str:
+ """Resolve the active vcrpy record mode from the environment.
+
+ Defaults to ``"none"`` so CI never accidentally hits the live provider.
+ """
+ mode = os.environ.get("LITELLM_VCR_RECORD_MODE", "none").strip().lower()
+ if mode not in {"none", "once", "new_episodes", "all"}:
+ raise ValueError(
+ f"LITELLM_VCR_RECORD_MODE={mode!r} is not a valid vcrpy record mode."
+ )
+ return mode
+
+
+def _build_vcr() -> vcr.VCR:
+ """Construct the shared ``VCR`` instance used by translation tests."""
+ return vcr.VCR(
+ cassette_library_dir=str(CASSETTE_DIR),
+ record_mode=_record_mode(),
+ # Match on method + URI + body so streaming vs non-streaming and
+ # different prompts get distinct cassettes.
+ match_on=("method", "scheme", "host", "port", "path", "query", "body"),
+ filter_headers=list(_FILTERED_REQUEST_HEADERS),
+ decode_compressed_response=True,
+ )
+
+
+def _scrub_response(response: Any) -> Any:
+ """Strip per-request response headers we don't want in the cassette."""
+ if not isinstance(response, dict):
+ return response
+ headers = response.get("headers") or {}
+ if isinstance(headers, dict):
+ for header in list(headers):
+ if header.lower() in _FILTERED_RESPONSE_HEADERS:
+ headers.pop(header, None)
+ return response
+
+
+litellm_vcr: vcr.VCR = _build_vcr()
+litellm_vcr.before_record_response = _scrub_response
+
+
+__all__ = ["litellm_vcr", "CASSETTE_DIR"]
diff --git a/uv.lock b/uv.lock
--- a/uv.lock
+++ b/uv.lock
@@ -9,7 +9,7 @@
]
[options]
-exclude-newer = "0001-01-01T00:00:00Z" # This has no effect and is included for backwards compatibility when using relative exclude-newer values.
+exclude-newer = "2026-04-27T00:38:13.673780212Z"
exclude-newer-span = "P3D"
[manifest]
@@ -3242,6 +3242,7 @@
{ name = "types-redis" },
{ name = "types-requests" },
{ name = "types-setuptools" },
+ { name = "vcrpy" },
]
... diff truncated: showing 800 of 830 linesYou can send follow-ups to the cloud agent here.
Delete existing cassettes before recording (record_mode='all' with vcrpy appends rather than overwriting), and strip non-deterministic response headers (Date, Server) so re-running the helper produces a byte-stable diff. Regenerate the committed cassettes with the fixed script so they match what contributors get when following the README.
|
bugbot run |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 72c9292. Configure here.
Greptile SummaryThis PR introduces a Redis-backed vcrpy cache ( One concrete issue worth addressing before enabling this in production CI: Confidence Score: 4/5Safe to merge with the CWD-sensitivity fix applied; all other findings are P2 or below One P1 finding (redis_key_for CWD dependency can cause cache key mismatches between CI and local) caps the score at 4/5. The two P2 findings do not lower the score further. tests/_vcr_redis_persister.py — redis_key_for key derivation and save_cassette guard ordering
|
| Filename | Overview |
|---|---|
| tests/_vcr_redis_persister.py | New Redis-backed vcrpy persister with TTL, outcome gate, episode cap, and aiohttp body-rewind patch; one CWD-sensitive key-derivation path worth noting |
| tests/llm_translation/conftest.py | Adds VCR auto-marker, persister registration, outcome-gate fixture, and verbose-mode logreport hook; respx/incompatible-test exclusion lists look correct |
| tests/llm_responses_api_testing/conftest.py | Near-identical VCR plumbing to llm_translation/conftest.py; intentional duplication acknowledged in PR follow-ups; no per-file exclusion list (appropriate since no respx conflicts here) |
| tests/llm_translation/test_vcr_redis_persister.py | 23 focused unit tests using fakeredis; covers roundtrip, TTL, cache miss, error resilience, outcome gate, and episode cap — thorough coverage |
| litellm/litellm_core_utils/llm_request_utils.py | Correct null-safety fix: guards against proxy_server_request being None (not just absent) by using (... or {}).get("headers") chaining |
| tests/_flush_vcr_cache.py | SCAN+pipeline batch-delete targeting only the litellm:vcr:cassette: prefix; correctly isolated from application Redis keys |
| tests/llm_translation/test_anthropic_completion.py | Two new VCR replay demo tests (basic + streaming); will fall back to live API on cold cache, which is expected and intended |
Reviews (3): Last reviewed commit: "Merge branch 'litellm_internal_staging' ..." | Re-trigger Greptile
…ulk capture Per Yuneng's feedback, use a single @pytest.mark.vcr marker so one record sweep populates cassettes for every marked test across all providers, instead of forcing each test to bind to a hard-coded cassette path. Changes vs. the initial scaffolding: - Add 'pytest-recording==0.13.4' on top of vcrpy. Adopt its layout: cassettes live at 'cassettes/<test_module>/<test_name>.yaml', resolved automatically. New tests just decorate with '@pytest.mark.vcr' — no imports or path bookkeeping. - Move the shared filter/match config into a 'vcr_config' fixture in 'tests/llm_translation/conftest.py' (consumed by pytest-recording for every marked test in the dir). Drop the standalone 'vcr_config.py'. - Bulk record / replay via the standard '--record-mode' CLI flag: 'make test-llm-translation-record' now sweeps every '@pytest.mark.vcr' test under tests/llm_translation in one shot. Optional 'TARGET=' var scopes to a single file. - Move existing cassettes to the per-test paths and update the local in-process Anthropic regenerator to write to the same paths. - Refresh README + Makefile target docs to match the sweep workflow. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
…reptile fixes CI's license check fails on the new dev dep because liccheck cannot read the PEP 639 'License-Expression' field that pytest-recording uses. Add the package to the manually-verified allowlist (MIT, confirmed via PyPI classifier). Also addresses greptile P2 review comments: - Add 'anthropic-version' to the request-header filter list so live and mock recordings produce structurally identical cassettes. - Replace the indentation-sensitive regex in '_strip_nondeterministic_headers' with a YAML parse-and-rewrite so the helper keeps working if vcrpy ever changes its serialization style. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
Stores VCR cassettes in Redis under litellm:vcr:cassette:<rel_path> with a 24h expiry instead of YAML on disk. The TTL means each daily CI run starts with an aged-out cache, naturally re-records against live providers, and surfaces upstream API drift within a day without a manual `make` re-record sweep. Opt-in via LITELLM_VCR_REDIS=1; default behaviour is unchanged so local dev keeps the on-disk cassettes. before_record_response now drops non-2xx responses so a transient 5xx or 429 from a provider can't poison the cache for the rest of the TTL window. Vcr-marked tests bump litellm.num_retries to 3 during recording so provider-SDK exponential backoff kicks in on the cache-miss path. Tests cover the three surfaces we depend on in CI: serialize/deserialize roundtrip via the real vcrpy serializer, TTL is actually applied to saved keys, cache miss raises CassetteNotFoundError so vcrpy falls through to record mode, and 2xx-only filtering across the status-code matrix (2xx kept, 3xx/4xx/5xx dropped, with 429 and 503 explicitly pinned).
Removes the YAML cassette feature entirely and replaces it with a Redis-only flow. Every test in tests/llm_translation/ and tests/llm_responses_api_testing/ is auto-marked @pytest.mark.vcr via conftest.pytest_collection_modifyitems, so any provider call lands in the Redis cache (litellm:vcr:cassette:<rel_path>, 24h TTL). First run records, runs within the day replay, day rollover re-records and surfaces upstream API drift within 24h. VCR is on by default. Set LITELLM_VCR_DISABLE=1, or simply leave REDIS_HOST unset, to opt out — both bypass the auto-marker entirely so nothing about cassettes runs. record_mode is "once" so cache-miss records and cache-hit replays. The 8 existing respx-using files in tests/llm_translation are excluded from the auto-marker (vcrpy and respx both patch the httpx transport; applying both makes one silently win). The persister's own unit-test file is also excluded so it doesn't recursively run inside a cassette. The persister moved from tests/llm_translation/_vcr_redis_persister.py to tests/_vcr_redis_persister.py so both conftests share it. The two demo tests in test_anthropic_completion_vcr.py were ported into test_anthropic_completion.py and the demo file was deleted. Adds tests/_flush_vcr_cache.py + a Make target (test-llm-translation-flush-vcr-cache) that scans litellm:vcr:cassette:* and pipelines DELETEs, for the "I want the next CI run to re-record now" workflow. Drops the now-dead test-llm-translation-record target. Provider keys are still required on cache-miss (which happens on first run and once a day after that). Replay-mode runs need only Redis.
Removes commentary that restated the code, including: - module-level banners explaining what the conftest does (covered by Readme.md and the function bodies) - docstrings on _scrub_response, _before_record_response, vcr_config, _vcr_disabled, pytest_recording_configure (function names + bodies are self-evident) - inline notes about header filtering, match_on, etc. - per-test docstrings restating the test name Keeps the two non-obvious notes that aren't recoverable from the code: the vcrpy/respx httpx-transport collision rationale on _RESPX_CONFLICTING_FILES, the vcrpy "return None to skip persisting" contract on filter_non_2xx_response, and the fixture-ordering dependency on _vcr_record_retries.
Provider SDKs already retry transient 5xx/429 with exponential backoff (default max_retries=2), and pytest.mark.flaky covers test-level retries on top of that. Setting litellm.num_retries=3 here just multiplied the existing layers — worst case 6 (flaky) x 3 (this) x 2 (CI rerunfailures) = 36 attempts on a single test. Removing it keeps SDK-level network-blip protection intact and shortens worst-case latency on cache-miss runs.
e98ef5c to
f6a37a6
Compare
…tte poisoning record_mode='once' refused to add new requests once any cassette existed in Redis. Combined with filter_non_2xx_response (which drops non-2xx responses from the saved cassette) and a 24h shared-Redis TTL, a single transient API failure mid-test left the cassette stuck with only the leading non-API requests (e.g. the model_prices fetch from raw.githubusercontent.com), and every subsequent run for the next 24h errored with 'Can't overwrite existing cassette'. new_episodes records anything not already present, so partially populated cassettes recover on the next run instead of poisoning the suite for a full TTL window.
litellm's default LiteLLMAiohttpTransport routes requests through aiohttp, which sits below httpx and is invisible to vcrpy's httpx-stub interception. Under vcrpy + aiohttp, requests reach the real network but responses come back through the stubbed httpx transport as empty 200s, surfacing as 'Unable to get json response - Expecting value: line 1 column 1 (char 0)' in providers like Anthropic, Gemini, and any other path that exercises the aiohttp transport. Disabling the aiohttp transport when the VCR persister is registered forces all calls through pure httpx, which vcrpy can record and replay correctly.
Azure OpenAI's responses-API DELETE endpoint rejects requests that carry
a JSON body with: "Unexpected body with size 2. This API method does
not accept a request body.". The default LiteLLMAiohttpTransport silently
elides empty-dict bodies on DELETE so this was masked, but the pure-httpx
transport (used when DISABLE_AIOHTTP_TRANSPORT=True or under vcrpy/respx
patching) sends literal '{}' (2 bytes), which Azure rejects.
Only attach json= when the provider's transform actually returned a
non-empty dict; otherwise issue a bodyless DELETE.
…itellm_vcr-cassette-llm-tests-af37 # Conflicts: # litellm/llms/custom_httpx/llm_http_handler.py
The Anthropic replay tests hardcoded specific token counts and content
strings ('Hello! How can I help you today?', prompt_tokens == 12). On a
fresh CI Redis those values must match a pre-recorded cassette that
doesn't exist, so the first run hits the live API and gets different
real bytes back.
Assert on shape instead: non-empty content, positive token counts,
finish_reason in the known set, and (for streaming) more than one chunk.
The tests still exercise the full transformation pipeline end-to-end and
catch shape regressions; drift in the exact text/token counts is
expected and now tolerated.
…transport vcrpy's aiohttp stub captures response bodies via 'await response.read()', which drains aiohttp's StreamReader. Downstream consumers of the same ClientResponse (litellm's AiohttpResponseStream, which iterates response.content.iter_chunked) then see an empty body and surface as JSON 'Expecting value: line 1 column 1 (char 0)' errors on every record-path call. The previous workaround set litellm.disable_aiohttp_transport=True for the whole VCR-active session, which made the tests exercise pure httpx instead of the production aiohttp transport. That hid the production transport from coverage and surfaced its own bugs (e.g. the Azure DELETE-with-empty-body case fixed in upstream staging). Replace the workaround with a targeted monkey-patch that re-feeds the captured body into the StreamReader via unread_data after vcrpy records it. Tests now run through the same transport customers do, both on first record and on replay, for both unary and streaming endpoints. Verified locally against api.anthropic.com with the production LiteLLMAiohttpTransport: record path passes (real network, 4.2s), replay path passes (Redis cache, 1.8s).
Stop falling back to REDIS_URL/REDIS_SSL_URL/REDIS_HOST for the VCR persister. Sharing a Redis with the application cache risks cassettes being wiped by tests that flush the app Redis.
Managed Redis (e.g. Upstash) drops idle TLS connections, which surfaced in CI as a teardown ERROR on test_gemini_image_size_limit_exceeded: redis.exceptions.ConnectionError: EOF occurred in violation of protocol (_ssl.c:2427) Cassette persistence is a cache, not test correctness, so: - Configure the redis client with Retry(ExponentialBackoff, retries=2) on ConnectionError/TimeoutError to absorb single-socket drops. - Wrap save_cassette so a final failure logs a warning instead of failing teardown — the next run re-records. - Wrap load_cassette so an outage on read becomes a cache miss (CassetteNotFoundError) instead of erroring in setup.
Set LITELLM_VCR_VERBOSE=1 to print a one-line cassette verdict per test (HIT / MISS / PARTIAL / NOOP) showing replay vs new-recording counts. Useful for local QA to confirm which tests actually exercised the cache and which fell through to the live provider.
A test that fails (incl. all the failing retries before a passing one) can otherwise overwrite a known-good cassette with a 'bad luck' recording. Tests like test_prompt_caching, which assert on provider state across two calls, can produce a 200 response that semantically fails the assertion — the 2xx filter doesn't catch this because the HTTP layer is fine. - pytest_runtest_makereport hook attaches each phase report to the pytest item. - _vcr_outcome_gate fixture (combining the verbose-mode reporter) reads the call-phase outcome at teardown and informs the persister via mark_test_outcome_for_cassette before vcrpy's Cassette.__exit__ triggers save_cassette. - save_cassette consults the per-key 'did the test pass?' flag and short-circuits when False, leaving any prior good recording intact. - Defaults to passed=True when no marker is present so non-test usage of the persister still works.
Some tests can't benefit from cassette replay because they assert on state that only exists in the live provider between two calls (e.g. prompt-cache propagation, intermittent provider quirks). Marking them with @pytest.mark.vcr just wastes cycles trying to record cassettes they will never replay against successfully. Opt-out by nodeid suffix so subclassed/parametrized variants are covered: - ::test_prompt_caching — Anthropic/Bedrock prompt-cache propagation isn't deterministic in the 0–1s window the test gives it. - ::test_async_pdf_handling_with_file_id — flaky upstream Wikipedia fetch through the Anthropic Files API. - TestBedrockInvokeNovaJson::test_json_response_pydantic_obj — Bedrock Nova returns tool_call vs JSON nondeterministically (other providers' subclasses are healthy). - ::test_bedrock_converse__streaming_passthrough — Bedrock streaming response_cost calc returns None intermittently. These tests keep their existing @pytest.mark.flaky retry behavior.
A test that produces non-deterministic request bodies (e.g. uuid in the prompt) under record_mode=new_episodes never replays — every CI run appends fresh unmatched episodes. The cassette grows unbounded over time and silently inflates Redis (we observed one cassette at 22 episodes / ~860KB after ~5 CI runs). Refuse the save when episode count exceeds MAX_EPISODES_PER_CASSETTE so the pathology surfaces with a loud warning that points to the opt-out fix instead of festering invisibly.
Anthropic's URL fetcher intermittently returns 400 'Unable to download the file' for the Wikipedia URL the test was using. Point it at the repo's existing tests/llm_translation/fixtures/dummy.pdf via raw GitHub instead — small, deterministic, reliably fetchable. With a stable URL the test no longer needs to be opted out of VCR; remove it from the incompatible list so it can replay from cassette.
Previously, the per-test [VCR HIT/MISS/...] line was written via TerminalReporter.write_line from inside fixture teardown. Pytest captures that stream by default and only surfaces it on FAILED tests (under 'Captured stdout teardown'), so passing tests' verdicts were invisible in CI logs and the user couldn't tell whether the cache was working. Write directly to sys.__stderr__ so the line bypasses pytest's capture entirely. Under xdist each worker has its own __stderr__ which CircleCI aggregates into the live job log alongside the PASSED/FAILED markers.
… MIME Raw github serves application/octet-stream which OpenAI/Gemini reject when LiteLLM fetches the URL client-side. jsDelivr serves the same file with content-type: application/pdf. Pin to a commit SHA so the asset is immutable and jsDelivr can cache it for a year.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…orter Previous attempt wrote to sys.__stderr__ from the test fixture. Under xdist, fixtures run inside worker subprocesses whose stderr is captured by the controller and only released to the live log on test failure — so passing tests' verdicts were silently swallowed. Round-trip via report.user_properties: the worker-side fixture stashes the verdict on user_properties, xdist serializes it onto the report, and a controller-side pytest_runtest_logreport hook writes it via the TerminalReporter (the same plugin that emits PASSED/FAILED markers). TerminalReporter is resolved lazily on first hook call because it's not yet registered when conftest's pytest_configure runs. Verified locally in both serial and xdist modes.
Remove explanatory comments that restated what the code already says. Kept only those that document non-obvious external contracts (the aiohttp record-path patch's reason for re-feeding the body, and the warning messages inside save_cassette that reach the user).
|
bugbot run |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 0543953. Configure here.
| def redis_key_for(cassette_path: str) -> str: | ||
| rel = os.path.relpath(str(cassette_path)) | ||
| if rel.endswith(".yaml"): | ||
| rel = rel[: -len(".yaml")] | ||
| rel = rel.replace("/cassettes/", "/").lstrip("./") | ||
| return f"{REDIS_KEY_PREFIX}{rel}" |
There was a problem hiding this comment.
redis_key_for calls os.path.relpath without an explicit start argument, so the result depends on the process's current working directory at call time. If pytest is invoked from tests/llm_translation/ instead of the repo root, the relative path produced omits the tests/llm_translation/ prefix, yielding a key like litellm:vcr:cassette:test_anthropic/test_foo rather than litellm:vcr:cassette:tests/llm_translation/test_anthropic/test_foo. CI-recorded cassettes and locally-run replays would then map to different Redis keys and never share the cache. Anchoring to a fixed repo-root directory eliminates this.
| def redis_key_for(cassette_path: str) -> str: | |
| rel = os.path.relpath(str(cassette_path)) | |
| if rel.endswith(".yaml"): | |
| rel = rel[: -len(".yaml")] | |
| rel = rel.replace("/cassettes/", "/").lstrip("./") | |
| return f"{REDIS_KEY_PREFIX}{rel}" | |
| def redis_key_for(cassette_path: str) -> str: | |
| _repo_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) | |
| rel = os.path.relpath(str(cassette_path), start=_repo_root) | |
| if rel.endswith(".yaml"): | |
| rel = rel[: -len(".yaml")] | |
| rel = rel.replace("/cassettes/", "/").lstrip("./") | |
| return f"{REDIS_KEY_PREFIX}{rel}" |
There was a problem hiding this comment.
I mean, technically true. But this is always run from CI/CD so it doesn't really matter
There was a problem hiding this comment.
Fair point — if CI always runs from the repo root and local dev is expected to do the same, the CWD sensitivity is a non-issue in practice.
Tip: You can give me more context with .greptile/rules.md so I get this right next time.
Relevant issues
Long-term solution to the recurring "out of Anthropic credits" CI failures discussed in
#sdlcSlack on 2026-04-29.Linear ticket
Resolves LIT-2683
Summary
Live LLM e2e tests have been draining provider billing accounts and going flaky on outages. This PR introduces a Redis-backed vcrpy cache so CI exercises the same end-to-end LiteLLM transformation paths (request shaping, response parsing, streaming, headers) without hitting the live provider on every run — ~zero per-PR cost while still smoke-testing against reality once a day.
The cache lives on a dedicated Redis (
CASSETTE_REDIS_URL) so it's isolated from the application Redis (REDIS_URL/REDIS_HOST) used by other test suites — those flush their Redis as part of teardown, which would otherwise wipe cassettes.Observed impact on
llm_translation_testing: ~47% wall-clock reduction (8:11 → 4:21) once the cache is warm, with 0 provider calls.How it works
tests/llm_translation/andtests/llm_responses_api_testing/is auto-marked with@pytest.mark.vcrviaconftest.py. No per-test annotation needed.litellm:vcr:cassette:<test_id>with a 24h TTL.2xxresponses are persisted (filter_non_2xx_response) — transient 5xx/4xx never poison the cache.record_mode="new_episodes"so partial recordings can be completed without nuking what already replays.Cache-poisoning safeguards
A naive cassette cache picks up a lot of "bad luck" recordings. Three layers prevent that:
pytest_runtest_makereporthook stamps each test's call-phase outcome onto the cassette key;save_cassetteconsults it and refuses to write when the test failed. This means the failed retriespytest-rerunfailuresproduces before a green retry never overwrite a known-good cassette.MAX_EPISODES_PER_CASSETTE = 50catches the pathology where a test produces non-deterministic request bodies (e.g. uuids),record_mode=new_episodeskeeps appending unmatched episodes, and the cassette balloons forever. Refusing to persist past 50 surfaces the issue loudly instead of silently inflating Redis._VCR_INCOMPATIBLE_NODEID_SUFFIXESlists the handful of tests that observe live cross-call provider state (prompt-cache warm-up, streamingresponse_costcalc, Bedrock Nova tool-call nondeterminism). They fall through to live calls with their existing@pytest.mark.flakyretry logic.Resilience
retry=Retry(ExponentialBackoff(cap=2, base=0.1), retries=2)onConnectionError/TimeoutErrorso a single dropped TLS socket on Upstash doesn't fail teardown.load_cassetteoutages convert toCassetteNotFoundError(cache miss → live call, not a test setup error).save_cassetteoutages log a warning and return (persistence is a cache, not test correctness).Verbose mode
Set
LITELLM_VCR_VERBOSE=1to surface a per-test verdict in the live CI log alongsidePASSED/FAILEDmarkers:Verdicts:
HIT(pure replay),MISS(cold cache, recorded),PARTIAL(mix),NOOP(no HTTP traffic). Implemented as a worker-sideuser_propertiesstash that the controller'spytest_runtest_logreporthook picks up and prints — needed because xdist worker stderr is captured and only released on test failure.Required environment
CASSETTE_REDIS_URL— dedicated Redis for cassettes. Already configured in CircleCI; locally, set it in.env.rc(or equivalent) andsourceit. If unset, VCR registration is skipped and tests fall back to live calls.ANTHROPIC_API_KEY,OPENAI_API_KEY,AWS_*, etc.) — only needed on cache-miss (recording). Replay needs nothing.Flushing the cache
Force a re-record on the next run instead of waiting for the 24h TTL:
The flush script only deletes keys under the
litellm:vcr:cassette:prefix.Disabling VCR
Skip the cache entirely (every call goes live, no recording):
What's in the diff
Core infrastructure
tests/_vcr_redis_persister.py— Redis-backed vcrpy persister (24h TTL,litellm:vcr:cassette:key prefix), 2xx-only response filter, outcome-gated persistence, episode cap, transient-error resilience, and an aiohttp record-path patch so vcrpy doesn't drain the response stream out from underLiteLLMAiohttpTransport. Reads onlyCASSETTE_REDIS_URL— no fallback to the application Redis.tests/_flush_vcr_cache.py— scoped flush utility (only touches keys under thelitellm:vcr:cassette:prefix).tests/llm_translation/conftest.py/tests/llm_responses_api_testing/conftest.py— register the Redis persister, define thevcr_configfixture (auth/header scrubbing, request-shape matching), auto-apply@pytest.mark.vcrto every test in the directory, wire the outcome-gate hook + fixture, and ship the controller-side verbose-modepytest_runtest_logreporthook. Files usingrespx(which patches the same httpx transport vcrpy does) are excluded via_RESPX_CONFLICTING_FILESto avoid one library silently winning.Tests & demo
tests/llm_translation/test_anthropic_completion.py— two replay tests (test_anthropic_basic_completion_replay,test_anthropic_streaming_completion_replay) demonstrating the flow on a real e2e path.tests/llm_translation/test_vcr_redis_persister.py— 23 unit tests covering: roundtrip, 24h TTL, missing-key behavior, key normalization, 2xx filter coverage, transient-error handling on read & write, outcome gate (skip-on-fail, proceed-on-pass, default-on-unknown), and episode cap (refuse-above, allow-at-threshold).Glue
Makefile—test-llm-translation-flush-vcr-cachetarget.pyproject.toml/uv.lock— addsvcrpy==8.1.1andpytest-recording==0.13.4todev.tests/code_coverage_tests/liccheck.ini— license allowlist entry forpytest-recording.litellm/litellm_core_utils/llm_request_utils.py— small null-safety fix inget_proxy_server_request_headers(whenproxy_server_requestisNonerather than missing, the previous.get(...).get(...)chain raisedAttributeError).tests/llm_translation/base_llm_unit_tests.py— switched thetest_async_pdf_handling_with_file_idPDF URL from Wikimedia (intermittent 400s from Anthropic's server-side fetcher) to a SHA-pinned jsDelivr mirror of the in-repo fixture (raw GitHub serves PDFs asapplication/octet-streamwhich OpenAI/Gemini reject).tests/llm_translation/Readme.md— record / replay / flush / disable workflow.Pre-Submission checklist
test_vcr_redis_persister.py, replay demos intest_anthropic_completion.py)Authorization,x-api-key,anthropic-api-key, AWS sigv4, GCP keys, cookies, organization IDs, request IDsCASSETTE_REDIS_URLonly, no fallbackllm_request_utils.pyand one PDF-fixture URL fixType
🚄 Infrastructure
✅ Test
🐛 Bug Fix (null-safety in
get_proxy_server_request_headers)Follow-ups
@pytest.mark.vcr. The auto-marker already coversllm_translation/andllm_responses_api_testing/.tests/_vcr_pytest_plugin.pyso other directories can opt in by importing instead of copy-pasting.Slack Thread