[MP] Add tracing functionality for storage manager by ApostaC · Pull Request #3063 · LMCache/LMCache

ApostaC · 2026-04-16T22:29:45Z

What this PR does / why we need it:

Adds a trace-recording subsystem under lmcache/v1/mp_observability/trace/ that captures StorageManager public-API calls to a binary trace file. This enables offline inspection and replay of cache workloads for debugging and benchmarking.

User-facing additions:

--trace-level storage: enable trace recording at the storage-manager level (only storage is supported today).
--trace-output <path>: target trace file. Defaults to a timestamped file under $TMPDIR when --trace-level is set without an explicit path.

Public API under lmcache.v1.mp_observability.trace:

enable_tracing decorator — publishes a TRACE_CALL event on each call (entry-only, inputs only). When tracing is disabled (the default), this adds a single boolean attribute load per call.
StorageTraceRecorder — EventSubscriber that writes TRACE_CALL events to the binary trace file.
TraceReader — streaming reader for inspection.

Special notes for your reviewers:

Trace payloads are serialized via custom codecs (trace/codecs.py) to keep the binary format compact and stable across runs.
The recorder is wired into MP servers (server.py, blend_server_v2.py) via maybe_register_trace_recorder called from the observability lifecycle.
Replay lives in a separate package and is not part of this PR.

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Note

Medium Risk
Adds new file I/O and event publication in core StorageManager call paths when enabled, plus new CLI/config surface; disabled-by-default gating limits impact but regressions could affect performance or shutdown/flush behavior.

Overview
Adds a new binary trace-recording path for MP mode that captures StorageManager public-API calls as unified EventType.TRACE_CALL events and persists them to a length-prefixed msgpack trace file (new trace/ package with decorator, codec registry, file format, recorder, and streaming reader).

Wires the feature into lmcache server via new observability flags --trace-level storage and --trace-output, initializes the recorder in MP server entry points, and instruments StorageManager methods (including manual enter/exit events for the read_prefetched_results context manager). Also moves PrefetchHandle into distributed/api.py, updates docs/contracts, and adds unit tests covering gating/overhead, codec round-trips, and end-to-end record/write/read behavior.

^{Reviewed by Cursor Bugbot for commit 61cf5cb. Bugbot is set up for automated code reviews on this repo. Configure here.}

Signed-off-by: ApostaC <yihua98@uchicago.edu>

gemini-code-assist

Code Review

This pull request introduces a trace recording subsystem for LMCache, enabling the capture of StorageManager API calls into a binary file for later replay. The implementation includes a tracing decorator, type codecs for msgpack serialization, a binary file format, and integration into the cache server. Feedback highlights a critical issue where overwriting the file header in-place with a different size would corrupt the length-prefixed stream, and suggests combining the length prefix and frame body into a single write operation to ensure atomicity and improve performance.

gemini-code-assist · 2026-04-16T22:35:39Z

+    def attach_storage_config(self, config: StorageManagerConfig) -> None:
+        """Rewrite the header now that the StorageManagerConfig is known.
+
+        The header is initially written with empty config fields so the
+        recorder can be constructed before the StorageManager.  The
+        server lifecycle calls this method once the config is in hand.
+
+        For long traces this means the on-disk header at offset 0 is
+        re-written; the file position advances forward as records
+        accumulate.  Implementation: seek to 0, overwrite the header
+        frame in-place, seek back to end.
+        """
+        with self._lock:
+            if self._closed:
+                return
+            sm_json = json.dumps(self._safe_config_dict(config), sort_keys=True)
+            digest = hashlib.sha256(sm_json.encode("utf-8")).hexdigest()
+            cur = self._fd.tell()
+            self._fd.seek(0)
+            self._write_header(sm_json, digest)
+            self._fd.seek(cur)


The attach_storage_config method seeks to the beginning of the file and overwrites the header frame. Since the trace file is a sequential length-prefixed msgpack stream, overwriting the header with a frame of a different size (which is guaranteed here as the initial header has empty config strings while the updated one contains the full JSON config) will corrupt the stream. The new, larger header will overwrite the start of the subsequent Record frames, making the file unreadable by the TraceReader or any replay tool.

Instead of overwriting the header in-place, consider using a dedicated 'Configuration' record type that can be appended to the stream whenever the configuration is available or changes.

gemini-code-assist · 2026-04-16T22:35:39Z

+        self._fd.write(_LEN_STRUCT.pack(len(frame)))
+        self._fd.write(frame)


The _write_frame method performs two separate write() calls to the unbuffered file descriptor (buffering=0). This breaks the atomicity mentioned in the format.py documentation (line 13), which relies on a single write() call for frames smaller than PIPE_BUF. Additionally, performing two system calls per record is inefficient for high-frequency tracing.

Combine the length prefix and the frame body into a single bytes object before writing.

Suggested change

self._fd.write(_LEN_STRUCT.pack(len(frame)))

self._fd.write(frame)

self._fd.write(_LEN_STRUCT.pack(len(frame)) + frame)

KuntaiDu · 2026-04-16T22:37:30Z

@@ -37,6 +37,14 @@
 from lmcache.v1.memory_management import MemoryObj


Desing doc plz

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC

PTAL

ApostaC · 2026-04-16T23:09:50Z

+
+
+# Now that ``PrefetchHandle`` exists in this module's namespace, finish
+# wiring its trace codec.  Done here (rather than at codec import time)
+# to avoid a ``storage_manager → trace.codecs → storage_manager`` cycle.
+register_prefetch_handle_codec()


This is not good. The correct way to break the circular import is to move PrefetchHandle into lmcache/v1/distributed/api.py, and this makes more sense because it's a module-level API data class.

ApostaC · 2026-04-16T23:10:31Z

    event_bus = init_observability(obs_config)

+    # Wire up the trace recorder (no-op when --trace-level is unset).
+    maybe_register_trace_recorder(event_bus, obs_config, storage_manager_config)


Can we rename it to maybe_initialize_trace_recorder?

ApostaC · 2026-04-16T23:32:44Z

+    lmcache_version: str
+    """``lmcache.__version__`` of the recording process."""


Can you confirm whether LMCache has lmcache.__version__? I don't think so for now. Maybe maintain a separate version in this file? One potential problem is that it cannot capture the interface changes in the storage manager and other functions that is going to be captured.

ApostaC · 2026-04-16T23:44:26Z

Will the recorder have a close function? (Not sure if it's needed).
The real thing behind this: how to make sure that the output trace file is closed correctly?

Signed-off-by: ApostaC <yihua98@uchicago.edu>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 61cf5cb. Configure here.}

cursor · 2026-04-17T01:24:33Z

+
+    recorder = StorageTraceRecorder(output_path=output_path)
+    recorder.attach_storage_config(storage_manager_config)
+    bus.register_subscriber(recorder)


No cleanup on error leaves gate on and fd leaked

Medium Severity

If attach_storage_config or bus.register_subscriber raises after the StorageTraceRecorder constructor succeeds, the tracing gate remains permanently ON (set inside __init__) and the file descriptor leaks because close() is never called. Since the recorder is never registered on the bus, EventBus.stop() won't invoke its shutdown() either. Every subsequent traced call then pays argument-binding overhead for events that no subscriber will ever process. A try/except wrapping these calls that calls recorder.close() on failure would prevent the resource leak and stuck gate.

Additional Locations (1)

lmcache/v1/mp_observability/trace/recorder.py#L87-L88

^{Reviewed by Cursor Bugbot for commit 61cf5cb. Configure here.}

sammshen

LGTM!

KuntaiDu · 2026-04-17T05:50:19Z

@@ -0,0 +1,376 @@
+# Trace Recording (`lmcache trace`)


High-level question: should we trace MP server IPC calls, or just trace L1 calls?

For me L1 calls are easiest to impl, no GPU required and useful --- I am OK with keeping it. But IPC calls seems to be most relevant in terms of evaluating GPU-related racing conditions.

We will do MPCacheEngine and MQ (IPC) level tracing as the next step. For this version, let's just do storage-level tracing

KuntaiDu · 2026-04-17T05:55:55Z

+
+All captured calls publish the same `EventType.TRACE_CALL` event:
+
+```python


I'm not familiar with EventBus impl. Will the lmcache trace invokes too many events that causes EventBut to overflow?

Not too many. Usually, a call to StorageManager's top-level function will emit multiple events. So this won't overflow event bus.

In the worst case, event bus has some overflow handling so that the memory usage won't explode

KuntaiDu · 2026-04-17T05:57:11Z

+  `attach_storage_config`; gate flip on init / off on close;
+  publish-via-EventBus end-to-end (codec encode → file → reader → codec
+  decode); truncated-tail tolerance; bad-magic rejection;
+  `dropped_count` increments on unencodable args.


NIT: ideally the doc should be shorter --- this doc is for both human and LLMs.

KuntaiDu · 2026-04-17T06:01:10Z

+~~~~~~
+
+Replaying a recorded trace is delivered separately via the
+``lmcache trace`` and ``lmcache bench trace-replay`` CLIs.


English-wise lmcache traces sounds like the same thing as lmcache bench trace-replay and it can be a bit confusing.

Maybe lmcache trace capture and lmcache trace replay and leave lmcache bench for just OpenAI API call benchmarks.

Agree, will update in the next PR where lmcache trace is really implemented.

KuntaiDu · 2026-04-17T06:02:33Z



+@dataclass(frozen=True)
+class PrefetchHandle:


QQ: is this inside the scope of this PR, or it is just a handy refactor?

It's a small refactor. Otherwise there will be circular import: previously the trace's module needs to import PrefetchHandle from StorageManager to do serde, and StorageManager needs to import the trace module to generate the trace.

This refactor actually should have been done eariler -- the PrefetchHandle is the public API data type, so it should stay in api.py not storage_manager.py

KuntaiDu · 2026-04-17T06:05:04Z

+register_codec(
+    torch.dtype,
+    TypeCodec(tag="torch.dtype", encode=_enc_torch_dtype, decode=_dec_torch_dtype),
+)


QQ: should we use Protobuf here so that we can easily ensure reader / writer has the same version, or is it too heavy?

Asking this because the codec feels like redoing things in protobuf.

Actually protobuf is more complicated. Here we just want a safe way to serialize/deserialize the function arguments when tracing.
Pickle is the easiest one to use, but it has big security problem -- attackers can embed executable code in the trace and execute it when user use lmcache trace replay

KuntaiDu

Left several quick design-related questions. PTAL.

KuntaiDu

LGTM

[Add] tracing capability for storage manager

38f42f7

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC requested review from chunxiaozheng, deng451e, hickeyma, maobaolong and sammshen as code owners April 16, 2026 22:29

cursor Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread lmcache/v1/mp_observability/trace/recorder.py Outdated

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

KuntaiDu requested changes Apr 16, 2026

View reviewed changes

ApostaC added 2 commits April 16, 2026 22:51

fix code quality issues

b31fd42

Signed-off-by: ApostaC <yihua98@uchicago.edu>

Merge branch 'dev' into claude/silly-bhaskara

d55f9fb

cursor Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread lmcache/v1/mp_observability/trace/recorder.py Outdated

Comment thread lmcache/v1/mp_observability/trace/__init__.py

Update docs

4f84b78

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC commented Apr 16, 2026

View reviewed changes

fix yihua's comments

4f79478

Signed-off-by: ApostaC <yihua98@uchicago.edu>

cursor Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread tests/v1/mp_observability/trace/test_codecs.py Outdated

fix cursorbugbot comment

61cf5cb

Signed-off-by: ApostaC <yihua98@uchicago.edu>

cursor Bot reviewed Apr 17, 2026

View reviewed changes

sammshen approved these changes Apr 17, 2026

View reviewed changes

KuntaiDu reviewed Apr 17, 2026

View reviewed changes

KuntaiDu approved these changes Apr 17, 2026

View reviewed changes

ApostaC enabled auto-merge (squash) April 17, 2026 22:56

github-actions Bot added the full Run comprehensive tests on this PR label Apr 17, 2026

ApostaC merged commit 0c4f03d into LMCache:dev Apr 18, 2026
51 of 54 checks passed

ApostaC mentioned this pull request Apr 18, 2026

[Chore][Docs]: daily drift check — multi-process mode #3076

Merged

3 tasks

		self._fd.write(_LEN_STRUCT.pack(len(frame)))
		self._fd.write(frame)

		@@ -37,6 +37,14 @@
		from lmcache.v1.memory_management import MemoryObj

		lmcache_version: str
		"""``lmcache.__version__`` of the recording process."""


		All captured calls publish the same `EventType.TRACE_CALL` event:

		```python



		@dataclass(frozen=True)
		class PrefetchHandle:

Conversation

ApostaC commented Apr 16, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 17, 2026

Choose a reason for hiding this comment

No cleanup on error leaves gate on and fd leaked

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KuntaiDu Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KuntaiDu Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KuntaiDu left a comment

Choose a reason for hiding this comment

Uh oh!

KuntaiDu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ApostaC commented Apr 16, 2026 •

edited by cursor Bot

Loading

KuntaiDu Apr 17, 2026 •

edited

Loading

KuntaiDu Apr 17, 2026 •

edited

Loading