Skip to content

feat(tools): add LRU cache simulator for lookup-hash JSONL logs#3021

Merged
ApostaC merged 16 commits intoLMCache:devfrom
yoo-kumaneko:feature/cache-simulator
Apr 15, 2026
Merged

feat(tools): add LRU cache simulator for lookup-hash JSONL logs#3021
ApostaC merged 16 commits intoLMCache:devfrom
yoo-kumaneko:feature/cache-simulator

Conversation

@yoo-kumaneko
Copy link
Copy Markdown
Contributor

@yoo-kumaneko yoo-kumaneko commented Apr 13, 2026

Adds lmcache/tools/cache_simulator/ with four modules:

  • lru_cache.py — LRUCacheFast (O(1)) and LRUCache (O(log n) with
    position tracking) backed by OrderedDict / SortedList
  • simulator.py — load_lookup_events(), simulate(), print_statistics(),
    plot_statistics(), and a CLI
  • plot_hit_rate.py — capacity sweep over log-spaced GiB range + matplotlib
    plot
  • README.md — user and developer documentation

Quick Start

# 1. Collect logs from a live server
lmcache server --lookup-hash-log-dir /data/lmcache/lookup_hashes ...

# 2. Simulate at a fixed capacity — prints text report and saves a PNG chart
python3 -m lmcache.tools.cache_simulator.simulator \
    -i /data/lmcache/lookup_hashes \
    --cache-capacity-gib 64 \
    -o stats.png

# 3. Sweep across capacities to find the right cache size
python3 -m lmcache.tools.cache_simulator.plot_hit_rate \
    -i /data/lmcache/lookup_hashes \
    --min-capacity-gib 1 \
    --max-capacity-gib 512 \
    --points 30 \
    -o sweep.png

The primary metric is token cache hit rate:

hit_tokens / total_tokens

where hit_tokens = hit_prefix_chunks × chunk_size. Tail tokens (seq_len mod chunk_size) are always counted as misses. Cache capacity is expressed in bytes; the CLI accepts GiB and auto-computes bytes-per-chunk from shapes/dtypes in the first event.

Running simulator.py prints a full text report and saves a 8-panel statistics PNG (per-request hit rate, hit prefix length, chunk reuse count, rolling hit rate, input length, global span, cache position).

Example diagrams.

hit_rate_vs_capacity stats

You can use the following chunk hashes data to plot the diagrams above.
https://drive.google.com/file/d/18jxUlI_J9sT_Mis3nft0nYEudwvJJ9UI/view?usp=drive_link


Note

Medium Risk
Introduces new CLI entrypoints and new optional runtime dependencies (e.g., matplotlib, sortedcontainers, transformers) that may affect packaging/CLI startup if dependency sets are misconfigured. Core server/runtime logic is otherwise untouched, so functional risk is mostly limited to the new tool surface area.

Overview
Adds a new lmcache tool command group, including lmcache tool cache-simulator {simulate,sweep,gen-dataset} for offline analysis of lookup-hash JSONL logs.

Introduces a cache-simulator implementation under lmcache/tools/cache_simulator/ with LRU cache models, log loading, token hit-rate simulation (including prefix-hit semantics and tail-token misses), reporting/plotting, and a dataset generator for vllm bench serve.

Updates CLI dependency set (requirements/cli.txt) to include matplotlib, and adds a focused test suite covering the LRU caches, event loading, and simulation correctness.

Reviewed by Cursor Bugbot for commit d7adca9. Bugbot is set up for automated code reviews on this repo. Configure here.

Adds lmcache/tools/cache_simulator/ with four modules:

- lru_cache.py      — LRUCacheFast (O(1)) and LRUCache (O(log n) with
                      position tracking) backed by OrderedDict / SortedList
- simulator.py      — load_lookup_events(), simulate(), print_statistics(),
                      plot_statistics(), and a CLI
- plot_hit_rate.py  — capacity sweep over log-spaced GiB range + matplotlib
                      plot
- README.md         — user and developer documentation

The primary metric is *token* cache hit rate:

    hit_tokens / total_tokens

where hit_tokens = hit_prefix_chunks × chunk_size and tail tokens
(seq_len mod chunk_size) are always counted as misses, matching the
LMCache server's semantics. Cache capacity is expressed in bytes; the
CLI accepts GiB and auto-computes bytes-per-chunk from shapes/dtypes in
the first event.

Running simulator.py prints a full text report and saves a 7-panel
statistics PNG (per-request hit rate, hit prefix length, chunk reuse
count, rolling hit rate, input length, global span, cache position).

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: crclq2018 <crclq2018@gmail.com>
Signed-off-by: rigginschen <rigginschen@tencent.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a cache simulator tool for LMCache, featuring LRU cache implementations, a simulation engine, and utilities for plotting token hit rates against cache capacity. The review feedback highlights several compliance and quality issues: the lack of unit or integration tests for the new feature, missing docstrings for public CLI entry points in violation of the project style guide, and a recommendation to use isinstance for better type safety in the simulation logic.

Comment thread lmcache/tools/cache_simulator/simulator.py
Comment thread lmcache/tools/cache_simulator/simulator.py Outdated
Comment thread lmcache/tools/cache_simulator/simulator.py Outdated
Comment thread lmcache/tools/cache_simulator/plot_hit_rate.py Outdated
Comment thread lmcache/tools/cache_simulator/simulator.py
Comment thread lmcache/tools/cache_simulator/simulator.py
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! It would be more helpful to put a few screenshots of the results into the README.md to give users a better understanding of the expected outcome.

@@ -0,0 +1,124 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! This is very useful!
Usage-wise, can you put it under lmcache cli, so that we can run
lmcache tool cache_simulator
instead of using
python -m lmcache.tools.cache_simulator.simulator?

Copy link
Copy Markdown
Contributor Author

@yoo-kumaneko yoo-kumaneko Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Now we can use it like this:

# 1. Collect logs from a live server (see Step 1 below)
lmcache server --lookup-hash-log-dir /data/lmcache/lookup_hashes ...

# 2. Simulate at a fixed capacity — prints text report and saves a PNG chart
lmcache tool cache-simulator simulate \
    -i /data/lmcache/lookup_hashes \
    --cache-capacity-gib 64 \
    -o stats.png

# 3. Sweep across capacities to find the right cache size
lmcache tool cache-simulator sweep \
    -i /data/lmcache/lookup_hashes \
    --min-capacity-gib 1 \
    --max-capacity-gib 512 \
    --points 30 \
    -o sweep.png

Integrates the cache simulator into the lmcache CLI so users can run:

  lmcache tool cache-simulator simulate -i <logs> --cache-capacity-gib 64
  lmcache tool cache-simulator sweep    -i <logs> --min-capacity-gib 1 --max-capacity-gib 512

`simulate` replays lookup-hash JSONL logs at a fixed cache capacity,
prints a text report, and saves a 7-panel statistics PNG.
`sweep` scans a log-spaced range of capacities and saves a hit-rate
vs capacity PNG.

The python -m lmcache.tools.cache_simulator.{simulator,plot_hit_rate}
entry points continue to work unchanged.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: crclq2018 <crclq2018@gmail.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
Comment thread lmcache/cli/commands/tool/__init__.py Outdated
…cation

Adds four public functions that serve as the single source of truth for
CLI flag definitions and execution logic:

  simulator.py:       add_simulate_arguments(parser), run_simulate(args)
  plot_hit_rate.py:   add_sweep_arguments(parser),    run_sweep(args)

ToolCommand now calls these instead of duplicating the flags itself.
Adding or removing a flag in the simulator modules automatically takes
effect in both `python -m ...` and `lmcache tool cache-simulator ...`.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: crclq2018 <crclq2018@gmail.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
Comment thread lmcache/tools/cache_simulator/simulator.py Outdated
Comment thread lmcache/tools/cache_simulator/simulator.py
ToolCommand in __init__.py is now a thin dispatcher (~60 lines).
Cache-simulator wiring moves to tool/cache_simulator.py.

Adding a future tool requires only:
  1. Create tool/<new_tool>.py with a register() function
  2. Import it in __init__.py and call new_tool.register(inner)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: crclq2018 <crclq2018@gmail.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
Comment thread lmcache/cli/commands/tool/cache_simulator.py
yoo-kumaneko and others added 3 commits April 14, 2026 14:58
Adds a "CLI integration" subsection under "For Developers" that:
- shows the tool/ package layout alongside the simulator package
- explains that add_*_arguments/run_* are the single source of truth
- tells developers where to edit when adding a flag vs a new action

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: crclq2018 <crclq2018@gmail.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
Replaces all python3 -m ... invocations in Quick Start, Step 2, Step 3,
and CLI Reference with lmcache tool cache-simulator simulate/sweep.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: crclq2018 <crclq2018@gmail.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
…ulator

* simulator.py: replace hasattr(cache, "position") + type: ignore with
  isinstance(cache, LRUCache) for proper type narrowing

* tests/tools/test_cache_simulator.py: 26 unit tests covering
  LRUCacheFast, LRUCache, compute_kv_bytes_per_chunk,
  load_lookup_events, and simulate (including prefix semantics,
  tail-token misses, eviction, and fast vs normal mode parity)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: crclq2018 <crclq2018@gmail.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
@yoo-kumaneko yoo-kumaneko requested a review from hickeyma as a code owner April 14, 2026 07:19
@yoo-kumaneko yoo-kumaneko requested a review from KuntaiDu April 14, 2026 07:22
Comment thread lmcache/tools/cache_simulator/__init__.py
yoo-kumaneko and others added 3 commits April 14, 2026 15:33
The cache simulator tool (lmcache tool cache-simulator) uses
matplotlib for generating PNG charts. Declaring it in cli.txt
ensures it is always available when the lmcache CLI is installed,
avoiding ImportError on any lmcache invocation.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: crclq2018 <crclq2018@gmail.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
Without this file setuptools.find_packages() does not discover
lmcache.tools or its sub-packages, causing ImportError on a
standard (non-editable) pip install.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: crclq2018 <crclq2018@gmail.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
Adds simulate_example.png and sweep_example.png under docs/ and
references them in Step 2 and Step 3 of the README, as requested
by ApostaC in the PR review.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: crclq2018 <crclq2018@gmail.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
@yoo-kumaneko
Copy link
Copy Markdown
Contributor Author

Screenshots added to README.md

Comment thread lmcache/tools/cache_simulator/lru_cache.py
yoo-kumaneko and others added 2 commits April 14, 2026 17:03
…serve dataset

Add gen_bench_dataset.py which converts LMCache lookup-hash JSONL logs
into a vllm bench serve custom dataset (JSONL with "prompt" and
"output_tokens" fields).  The conversion preserves prefix-sharing
structure: requests that shared a chunk hash in the original logs will
share the same token prefix in the synthetic prompts, so LMCache prefix
caching sees the same hit/miss pattern during replay.

Algorithm: build a stable safe vocabulary from the tokenizer (tokens
that round-trip through encode/decode cleanly), then deterministically
map each chunk hash to chunk_size token IDs via SHA-256 seeded RNG.

Also wire "gen-dataset" as a new sub-action of
`lmcache tool cache-simulator` and update the README with Step 4.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
The dataset generation step is not required for the core cache simulator
workflow. Keep the gen-dataset command available but remove it from the
main README flow (Table of Contents, Quick Start, and step sections).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
@yoo-kumaneko yoo-kumaneko force-pushed the feature/cache-simulator branch from c644b38 to 3e1a886 Compare April 14, 2026 10:53
yoo-kumaneko and others added 2 commits April 14, 2026 19:03
matplotlib is only needed when actually plotting (plot_statistics /
run_sweep).  Moving the import inside those functions lets the module
be imported — and all unit tests collected — without matplotlib
installed, fixing the CI "1 error" collection failure.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0d82a9f. Configure here.

Comment thread lmcache/tools/cache_simulator/gen_bench_dataset.py
Comment thread lmcache/tools/cache_simulator/gen_bench_dataset.py
@ApostaC ApostaC enabled auto-merge (squash) April 14, 2026 20:17
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Apr 14, 2026
Copy link
Copy Markdown
Contributor

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design doc is missing, but functionality-wise LGTM

@ApostaC ApostaC merged commit e64b6e3 into LMCache:dev Apr 15, 2026
36 of 38 checks passed
ftian1 pushed a commit to ftian1/LMCache that referenced this pull request Apr 20, 2026
…che#3021)

* Add LRU cache simulator for lookup-hash JSONL logs

* Adds lmcache/tools/cache_simulator/ with four modules:

Signed-off-by: crclq2018 <crclq2018@gmail.com>
Signed-off-by: rigginschen <rigginschen@tencent.com>
Signed-off-by: kumaneko <crclq2018@gmail.com>
Co-authored-by: rigginschen <rigginschen@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants