feat: add chunk hashes logger to MP server for offline data analysis by yoo-kumaneko · Pull Request #2928 · LMCache/LMCache

yoo-kumaneko · 2026-04-01T12:06:39Z

Summary

Record chunk hashes computed during lookup() to rotating JSONL files for offline analysis. Uses async background thread to avoid adding I/O latency to the hot path. Files rotate on a configurable time interval (default 6h) and include human-readable timestamps and model names. Disabled by default (--chunk-hash-log-dir to enable).

Motivation

Collecting lookup chunk hashes allows us to analyze data distribution patterns.
These insights help guide infrastructure decisions, such as storage selection and capacity planning.

Offline Data Analysis Diagrams

As the figure shown below, we can get a precise estimate of the hit rate given the cache capacity.

We can also get other useful analysis results from the collected chunk data, like the rolling hit rate.

Note

Medium Risk
Adds new request-level telemetry and file I/O in the MP server path (though gated by has_subscribers() and disabled by default), so regressions could impact lookup performance or disk usage when enabled.

Overview
Adds a new MP_LOOKUP observability event emitted during MPCacheEngine.lookup() containing per-request chunk hashes plus model/layout metadata, guarded by EventBus.has_subscribers() to avoid hot-path overhead when unused.

Introduces LookupHashLoggingSubscriber with LookupHashLogConfig to write these events to rotating JSONL files (time/size based rotation with max-file retention), wires it into mp_observability/config.py via new CLI flags, and documents the new options/metadata contract. Includes a new test suite covering enable/disable behavior, rotation, retention, and JSON formatting.

^{Reviewed by Cursor Bugbot for commit 1ced3e0. Bugbot is set up for automated code reviews on this repo. Configure here.}

gemini-code-assist

Code Review

This pull request introduces an asynchronous ChunkHashLogger to record chunk hashes to rotating JSONL files during lookups in the multiprocess server, including new configuration options and CLI arguments. The review feedback highlights several areas for improvement: the file rotation logic should be decoupled from model name changes to prevent excessive file creation, the logger should initialize its file list from existing files to maintain retention limits across restarts, and file operations should be hardened with explicit encoding and better handle management.

yoo-kumaneko · 2026-04-03T05:10:47Z

@sammshen Would you like to take a look at this PR?
I added a lookup logger that records to a file the chunk hashes being looked up.
If the logger is None, it becomes a no-op, so there is no performance overhead.

chunxiaozheng · 2026-04-07T03:27:17Z

@yoo-kumaneko Thanks for your contribution! A minor question, will this have any performance impact?

maobaolong

@yoo-kumaneko Awesome feature, left some comments. @sammshen Would you like to take another look?

BTW, if you paste some analysis diagram to the description, it would helps reviewer to quick understand your motivation of this PR.

yoo-kumaneko · 2026-04-07T03:56:59Z

@yoo-kumaneko Thanks for your contribution! A minor question, will this have any performance impact?

It should have a negligible performance effect. I've done a comparison test. As shown below, the hash logger has no visible effect in TTFT and other metrics.

Hash logger turned on

============ Serving Benchmark Result ============
Successful requests:                     100       
Failed requests:                         0         
Maximum request concurrency:             4         
Request rate configured (RPS):           4.00      
Benchmark duration (s):                  105.11    
Total input tokens:                      1100000   
Total generated tokens:                  100       
Request throughput (req/s):              0.95      
Output token throughput (tok/s):         0.95      
Peak output token throughput (tok/s):    4.00      
Peak concurrent requests:                8.00      
Total token throughput (tok/s):          10466.40  
---------------Time to First Token----------------
Mean TTFT (ms):                          4170.34   
Median TTFT (ms):                        4562.69   
P99 TTFT (ms):                           7586.79   
==================================================

Hash logger turned off

============ Serving Benchmark Result ============
Successful requests:                     100       
Failed requests:                         0         
Maximum request concurrency:             4         
Request rate configured (RPS):           4.00      
Benchmark duration (s):                  105.32    
Total input tokens:                      1100000   
Total generated tokens:                  100       
Request throughput (req/s):              0.95      
Output token throughput (tok/s):         0.95      
Peak output token throughput (tok/s):    4.00      
Peak concurrent requests:                8.00      
Total token throughput (tok/s):          10444.90  
---------------Time to First Token----------------
Mean TTFT (ms):                          4182.33   
Median TTFT (ms):                        4560.56   
P99 TTFT (ms):                           7711.41   
==================================================

yoo-kumaneko · 2026-04-07T06:58:13Z

@yoo-kumaneko Awesome feature, left some comments. @sammshen Would you like to take another look?

BTW, if you paste some analysis diagram to the description, it would helps reviewer to quick understand your motivation of this PR.

I've added the diagrams to the PR description

maobaolong

LGTM. Thanks!

chunxiaozheng

LGTM!

sammshen · 2026-04-08T10:53:35Z

quick question, why are we not using the existing observability / prometheus modules?

sammshen

blocking until consulting @ApostaC and @royyhuang

sammshen · 2026-04-08T10:58:55Z

IIUC, it's:

persistence for later analysis
remember evicted chunks

sammshen

LGTM actually, the code changes to other config.py and server.py seems pretty minimal

Cherry-pick squashed changes from LMCache#2928 which adds a chunk hash file logger to the MP server for offline analysis. Signed-off-by: root <crclq2018@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: rigginschen <rigginschen@tencent.com>

ApostaC

Just one quick comment, please see below.

In the meantime, if we want to add new events or change existing event metadata schema, we should update https://github.com/yoo-kumaneko/LMCache/blob/dev/lmcache/v1/mp_observability/EVENTS.md to reflect the changes. This makes the code more maintainable for other developers (and AI tools)

ApostaC · 2026-04-13T00:12:25Z

+        self._event_bus.publish(
+            Event(
+                event_type=EventType.MP_LOOKUP,
+                session_id=key.request_id,
+                metadata={
+                    "request_id": key.request_id,
+                    "chunk_hashes": chunk_hashes,
+                    "model_name": model_name,
+                    "chunk_size": self.chunk_size,
+                    "seq_len": len(key.token_ids),
+                    "dtypes": [str(d) for d in layout_desc.dtypes],
+                    "shapes": [list(s) for s in layout_desc.shapes],
+                },
+            )
+        )


Potential alternative: we reuse the EventType.MP_LOOKUP_PREFETCH_START and just add more metadata to it?
@royyhuang Good to have your thoughts here as well.

@ApostaC Yes, we can reuse EventType.MP_LOOKUP_PREFETCH_START. However, we’ll need to move the publication of this event to after the chunk hashes are computed (since they’re required) and after the layout == None check. Does that sound OK?

ApostaC · 2026-04-13T00:16:25Z

Btw, I really like the diagrams in the description! Will it be possible to put the analysis script in the lmcache/tools/ folder?

yoo-kumaneko · 2026-04-13T03:13:22Z

Btw, I really like the diagrams in the description! Will it be possible to put the analysis script in the lmcache/tools/ folder?

Sure!

…tening Add EventBus.has_subscribers() to cheaply check if any callback is registered for a given EventType. Gate the MP_LOOKUP publish in MPCacheEngine.lookup() behind this check so that the metadata dict (including dtype/shape list comprehensions) is never allocated when the lookup hash logger is disabled. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: rigginschen <rigginschen@tencent.com>

…LMCache#12)" This reverts commit ee037db. Signed-off-by: rigginschen <rigginschen@tencent.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: rigginschen <rigginschen@tencent.com>

yoo-kumaneko · 2026-04-13T16:54:17Z

EVENTS.md updated

ApostaC

LGTM!

ApostaC · 2026-04-13T17:43:05Z

@yoo-kumaneko Please fix the UT, thanks!

* Revert "feat: cherry-pick chunk hash file logger from PR LMCache#2928 (#12)" This reverts commit ee037db. Signed-off-by: rigginschen <rigginschen@tencent.com> * feat: add chunk hash logger as EventBus subscriber Add JSONL-based chunk hash logging to the multiprocess server for offline analysis of KV cache behavior. Implemented as a ChunkHashLoggingSubscriber on the EventBus — no extra queue or worker thread needed. Includes configurable log rotation, chunk metadata (chunk_size, seq_len, dtypes, shapes), and CLI args. Signed-off-by: Ryan <crclq2018@gmail.com> Signed-off-by: rigginschen <rigginschen@tencent.com> * refactor: rename ChunkHashLogger to LookupHashLogger Rename the chunk hash logging subscriber to lookup hash logger to better reflect that it logs hashes observed during lookup operations. - chunk_hash.py → lookup_hash.py - ChunkHashLogConfig → LookupHashLogConfig - ChunkHashLoggingSubscriber → LookupHashLoggingSubscriber - --chunk-hash-log-* CLI args → --lookup-hash-log-* - lookup_hashes_*.jsonl file name pattern - Update docs and tests accordingly Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: rigginschen <rigginschen@tencent.com> * Use tell to get the accurate file size Signed-off-by: rigginschen <rigginschen@tencent.com> --------- Signed-off-by: rigginschen <rigginschen@tencent.com> Signed-off-by: Ryan <crclq2018@gmail.com> Co-authored-by: rigginschen <rigginschen@tencent.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: kumaneko <71458228+yoo-kumaneko@users.noreply.github.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d816b9c. Configure here.}

Move tests/v1/multiprocess/test_lookup_hash_logger.py to tests/v1/mp_observability/subscribers/logging/ to match the source file structure and ensure tests run under the standard CI suite. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: kumaneko <crclq2018@gmail.com>

…MCache#2928) * feat: add chunk hash logger as EventBus subscriber Add JSONL-based chunk hash logging to the multiprocess server for offline analysis of KV cache behavior. Implemented as a ChunkHashLoggingSubscriber on the EventBus — no extra queue or worker thread needed. Includes configurable log rotation, chunk metadata (chunk_size, seq_len, dtypes, shapes), and CLI args. Signed-off-by: Ryan <crclq2018@gmail.com> Signed-off-by: rigginschen <rigginschen@tencent.com> * refactor: rename ChunkHashLogger to LookupHashLogger Rename the chunk hash logging subscriber to lookup hash logger to better reflect that it logs hashes observed during lookup operations. - chunk_hash.py → lookup_hash.py - ChunkHashLogConfig → LookupHashLogConfig - ChunkHashLoggingSubscriber → LookupHashLoggingSubscriber - --chunk-hash-log-* CLI args → --lookup-hash-log-* - lookup_hashes_*.jsonl file name pattern - Update docs and tests accordingly Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: rigginschen <rigginschen@tencent.com> * Use tell to get the accurate file size Signed-off-by: rigginschen <rigginschen@tencent.com> * perf(mp): skip MP_LOOKUP event construction when no subscriber is listening Add EventBus.has_subscribers() to cheaply check if any callback is registered for a given EventType. Gate the MP_LOOKUP publish in MPCacheEngine.lookup() behind this check so that the metadata dict (including dtype/shape list comprehensions) is never allocated when the lookup hash logger is disabled. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: rigginschen <rigginschen@tencent.com> * docs(mp): document MP_LOOKUP event metadata contract in EVENTS.md Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: rigginschen <rigginschen@tencent.com> * test(mp): move lookup hash logger tests to correct directory Move tests/v1/multiprocess/test_lookup_hash_logger.py to tests/v1/mp_observability/subscribers/logging/ to match the source file structure and ensure tests run under the standard CI suite. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: kumaneko <crclq2018@gmail.com> --------- Signed-off-by: Ryan <crclq2018@gmail.com> Signed-off-by: rigginschen <rigginschen@tencent.com> Signed-off-by: kumaneko <71458228+yoo-kumaneko@users.noreply.github.com> Signed-off-by: kumaneko <crclq2018@gmail.com> Co-authored-by: rigginschen <rigginschen@tencent.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread lmcache/v1/multiprocess/chunk_hash_logger.py Outdated

Comment thread lmcache/v1/multiprocess/chunk_hash_logger.py Outdated

Comment thread lmcache/v1/multiprocess/chunk_hash_logger.py Outdated

Comment thread lmcache/v1/multiprocess/chunk_hash_logger.py Outdated

cursor Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread lmcache/v1/multiprocess/chunk_hash_logger.py Outdated

Comment thread lmcache/v1/multiprocess/chunk_hash_logger.py Outdated

Comment thread lmcache/v1/multiprocess/chunk_hash_logger.py Outdated

maobaolong reviewed Apr 1, 2026

View reviewed changes

Comment thread lmcache/v1/multiprocess/chunk_hash_logger.py Outdated

yoo-kumaneko force-pushed the feature/chunk-hash-logger branch from c842e81 to 5676ad6 Compare April 1, 2026 13:09

cursor Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread lmcache/v1/multiprocess/chunk_hash_logger.py Outdated

maobaolong added the mp_mode label Apr 2, 2026

yoo-kumaneko requested a review from maobaolong April 3, 2026 05:15

maobaolong reviewed Apr 7, 2026

View reviewed changes

Comment thread lmcache/v1/multiprocess/chunk_hash_logger.py Outdated

maobaolong reviewed Apr 7, 2026

View reviewed changes

Comment thread lmcache/v1/multiprocess/chunk_hash_logger.py Outdated

maobaolong reviewed Apr 7, 2026

View reviewed changes

Comment thread lmcache/v1/multiprocess/server.py Outdated

maobaolong reviewed Apr 7, 2026

View reviewed changes

yoo-kumaneko requested review from ApostaC, deng451e, hickeyma and sammshen as code owners April 7, 2026 03:56

cursor Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread tests/v1/mp_observability/subscribers/logging/test_lookup_hash_logger.py

maobaolong approved these changes Apr 7, 2026

View reviewed changes

cursor Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread lmcache/v1/multiprocess/config.py Outdated

chunxiaozheng approved these changes Apr 7, 2026

View reviewed changes

maobaolong added the full Run comprehensive tests on this PR label Apr 7, 2026

sammshen requested changes Apr 8, 2026

View reviewed changes

sammshen approved these changes Apr 8, 2026

View reviewed changes

ApostaC reviewed Apr 13, 2026

View reviewed changes

yoo-kumaneko changed the title ~~feat: add chunk hash file logger to MP server for offline analysis~~ feat: add chunk hashes logger to MP server for offline data analysis Apr 13, 2026

auto-merge was automatically disabled April 13, 2026 15:33
Head branch was pushed to by a user without write access

cursor Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread lmcache/v1/mp_observability/event.py

github-actions Bot removed the full Run comprehensive tests on this PR label Apr 13, 2026

yoo-kumaneko pushed a commit to yoo-kumaneko/LMCache that referenced this pull request Apr 13, 2026

Revert "feat: cherry-pick chunk hash file logger from PR LMCache#2928 (…

4333c53

…LMCache#12)" This reverts commit ee037db. Signed-off-by: rigginschen <rigginschen@tencent.com>

rigginschen and others added 2 commits April 14, 2026 00:53

docs(mp): document MP_LOOKUP event metadata contract in EVENTS.md

9a05a13

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: rigginschen <rigginschen@tencent.com>

Merge branch 'dev' into feature/chunk-hash-logger

de03baf

ApostaC approved these changes Apr 13, 2026

View reviewed changes

ApostaC enabled auto-merge (squash) April 13, 2026 17:42

ApostaC disabled auto-merge April 13, 2026 17:43

github-actions Bot added full Run comprehensive tests on this PR and removed full Run comprehensive tests on this PR labels Apr 13, 2026

Merge branch 'dev' into feature/chunk-hash-logger

5b096db

Signed-off-by: kumaneko <71458228+yoo-kumaneko@users.noreply.github.com>

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread lmcache/v1/mp_observability/subscribers/logging/lookup_hash.py

Merge branch 'dev' into feature/chunk-hash-logger

d816b9c

Signed-off-by: kumaneko <71458228+yoo-kumaneko@users.noreply.github.com>

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread tests/v1/mp_observability/subscribers/logging/test_lookup_hash_logger.py

yoo-kumaneko and others added 2 commits April 14, 2026 15:57

Merge branch 'dev' into feature/chunk-hash-logger

1ced3e0

chunxiaozheng enabled auto-merge (squash) April 14, 2026 11:01

github-actions Bot added the full Run comprehensive tests on this PR label Apr 14, 2026

chunxiaozheng merged commit cfb5c52 into LMCache:dev Apr 14, 2026
39 checks passed

Conversation

yoo-kumaneko commented Apr 1, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Offline Data Analysis Diagrams

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yoo-kumaneko commented Apr 3, 2026

Uh oh!

chunxiaozheng commented Apr 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maobaolong left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yoo-kumaneko commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hash logger turned on

Hash logger turned off

Uh oh!

Uh oh!

yoo-kumaneko commented Apr 7, 2026

Uh oh!

maobaolong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chunxiaozheng left a comment

Choose a reason for hiding this comment

Uh oh!

sammshen commented Apr 8, 2026

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

sammshen commented Apr 8, 2026

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

ApostaC Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

yoo-kumaneko Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

ApostaC commented Apr 13, 2026

Uh oh!

yoo-kumaneko commented Apr 13, 2026

Uh oh!

Uh oh!

yoo-kumaneko commented Apr 13, 2026

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

ApostaC commented Apr 13, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

yoo-kumaneko commented Apr 1, 2026 •

edited by cursor Bot

Loading

maobaolong left a comment •

edited

Loading

yoo-kumaneko commented Apr 7, 2026 •

edited

Loading