CacheSaver

Client-Side Framework for Efficient & Reproducible LLM Inference

What is CacheSaver?

CacheSaver is a lightweight client-side library that wraps existing LLM inference clients to make them:

Efficient — repeated prompts and sub-problems are automatically cached and reused.
Reproducible — identical inputs yield identical outputs across runs.
Compatible — works with any LLM client (OpenAI, HuggingFace, vLLM, etc.) without modifying your code.

This repository accompanies our paper “CacheSaver: Client-Side Caching for Efficient and Reproducible LLM Inference”, accepted at EMNLP 2025, and the related project blog.

💡 Why CacheSaver?

⚙️ The Problem

🧮 LLM inference dominates cost and energy consumption sometimes up to 90% of the model’s total lifecycle.
🤔 Many reasoning workflows (e.g., multi-agent, Tree-of-Thoughts, self-refinement) reuse sub-problems that are recomputed each time.
🎲 Reproducibility is tricky because most LLM APIs don’t support deterministic seeding.

🚀 Our Solution

CacheSaver tackles these challenges with a client-side cache and namespace system:

🔌 Wrap your existing client: no model or server changes required.
🧩 Introduce namespaces that act like “seeds”:
- Within a namespace → random sampling stays IID.
- Across namespaces → identical prompts yield identical results.
♻️ Cache intermediate reasoning steps to reuse them across runs.

✨ Result: Faster, cheaper, and reproducible inference — all with minimal effort.

Key Features

🚀 Plug-and-Play: One-line integration.
🔁 Cache & Reuse: Avoid recomputation of repeated sub-problems.
🧩 Namespace Control: Fine-grained reproducibility without losing randomness.
⚙️ Universal Compatibility: Works with any LLM API or local model.
🧠 Lightweight: No server-side code, minimal memory overhead.

Installation

Install the latest release with its minimal dependencies:

pip install cachesaver

You can also install the latest version from the source:

# Install with test dependencies (Will also publish in pypi soon).
git clone https://github.com/au-clan/cachesaver-core.git
cd cachesaver-core
pip install -e ".[test]"

# Run tests to verify everything works
pytest test/ -v

Quickstart

from cachesaver.models.openai import AsyncOpenAI

client = AsyncOpenAI(batch_size=2)

resp = await client.chat.completions.create(
    model="gpt-4.1-nano",
    messages=[
        {"role": "user", "content": "What's the capital of France?"}
    ]
)

For more use cases and examples of CacheSaver in action, please see the following Jupyter notebook.

Performance

We tested CacheSaver on common LLM research tasks such as hyperparameter tuning, ablation studies, and benchmarking, where many prompts or reasoning steps repeat across runs. By caching and reusing identical sub-queries on the client side, CacheSaver avoided redundant computations and achieved substantial savings, making experiments up to 6× cheaper and 7× faster for hyperparameter tuning, 2.5× cheaper for ablation studies, and about 2× cheaper for benchmarking, all without changing model behavior or code.

Feedback

We welcome all forms of feedback! Please open an issue for bugs, questions, or suggestions. Your input helps us improve CacheSaver, address common challenges efficiently, and build a stronger, more collaborative community.

Citation

Official BibTeX to be announced with EMNLP 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 273 Commits
assets		assets
datasets		datasets
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
API_proposal.md		API_proposal.md
LICENSE		LICENSE
README.md		README.md
examples.ipynb		examples.ipynb
prompts.md		prompts.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CacheSaver

What is CacheSaver?

💡 Why CacheSaver?

⚙️ The Problem

🚀 Our Solution

Key Features

Installation

Quickstart

For more use cases and examples of CacheSaver in action, please see the following Jupyter notebook.

Performance

Feedback

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CacheSaver

What is CacheSaver?

💡 Why CacheSaver?

⚙️ The Problem

🚀 Our Solution

Key Features

Installation

Quickstart

For more use cases and examples of CacheSaver in action, please see the following Jupyter notebook.

Performance

Feedback

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages