Skip to content

[Observability] Per-request root OTel span and SpanRegistry for MP server tracing#3033

Merged
ApostaC merged 5 commits intoLMCache:devfrom
deng451e:per_request_root_span
Apr 16, 2026
Merged

[Observability] Per-request root OTel span and SpanRegistry for MP server tracing#3033
ApostaC merged 5 commits intoLMCache:devfrom
deng451e:per_request_root_span

Conversation

@deng451e
Copy link
Copy Markdown
Collaborator

@deng451e deng451e commented Apr 15, 2026

Description:

Summary

  • Adds a root "request" OTel span per session wrapping all child spans (mp.lookup_prefetch, mp.retrieve, mp.store)
  • Introduces 4 CPU-synchronous sentinels: MP_REQUEST_START, MP_STORE_SUBMITTED, MP_RETRIEVE_SUBMITTED, MP_SESSION_END; root span close deferred until
    in-flight GPU callbacks complete
  • Introduces SpanRegistry — shared span context dict enabling multi-subscriber, multi-level tracing (e.g. request → mp.retrieve → l2.disk_load) without
    coupling to MPServerTracingSubscriber; new subscribers plug in with two lines in config.py
  • Adds a Docker Compose example for viewing waterfall traces (OTel Collector → Tempo → Grafana) with a pre-provisioned dashboard
  • Design doc covering all request trace scenarios and a worked example for registering sub-spans
Screenshot 2026-04-14 at 4 55 14 PM

Note

Medium Risk
Changes MP server event emission and tracing lifecycle handling around async GPU operations; incorrect ordering/counter logic could lead to leaked spans or misleading trace boundaries, though the change is well-covered by new tests.

Overview
Adds a per-request root OTel span ("request") for MP server tracing and ensures mp.lookup_prefetch, mp.retrieve, and mp.store spans are properly parented under it, avoiding orphaned spans in trace backends.

Introduces four new CPU-synchronous lifecycle events (MP_REQUEST_START, MP_STORE_SUBMITTED, MP_RETRIEVE_SUBMITTED, MP_SESSION_END) and updates the MP server to emit them so the tracing subscriber can defer closing the root span until any in-flight GPU store/retrieve callbacks complete (while stamping the root end time at MP_SESSION_END).

Adds a shared SpanRegistry for cross-subscriber context propagation, wires it into observability config, expands tracing tests to cover deferral/race scenarios, and includes a new examples/observability Docker Compose stack (Collector/Tempo/Prometheus/Grafana) plus a design doc to demonstrate and validate the new trace hierarchy.

Reviewed by Cursor Bugbot for commit 35a8305. Bugbot is set up for automated code reviews on this repo. Configure here.

Signed-off-by: deng451e <838677410@qq.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a per-request root span for the MPServerTracingSubscriber, ensuring child operations are correctly nested and providing a unified view of request lifecycles. It introduces new events and a deferral protocol to handle asynchronous GPU operations, preventing orphaned spans. The PR also adds a comprehensive observability example stack including Grafana dashboards and OTel configuration. Feedback was provided to refactor duplicated logic in the span-end handlers to improve maintainability.

Comment thread lmcache/v1/mp_observability/subscribers/tracing/mp_server.py Outdated
Comment thread tests/v1/mp_observability/subscribers/tracing/test_mp_server.py Outdated
Comment thread lmcache/v1/multiprocess/server.py
Copy link
Copy Markdown
Contributor

@royyhuang royyhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the implementation itself LGTM. But I am wondering if later we have more span events need to traced, do they all need to implemented in this single file? I am worried whether this file will become unmanageable. Is it possible we can share the root span across subscribers? Also should we also consider the case of multi-level span? For example, request root span can have retrieve subspan and retrieve subspan can also have some subsubspan?

…bility

Add SpanRegistry to share OTel span contexts across tracing subscribers,
addressing reviewer concern that all future span events would accumulate
in a single file with no mechanism for multi-level nesting.

Key changes:
- New span_registry.py: thin (session_id, span_name) → (span, ctx) dict,
  no locking needed (single drain thread)
- MPServerTracingSubscriber accepts optional SpanRegistry; creates a
  private one if not provided (backward compatible)
- Root "request" span and all child spans ("store", "retrieve",
  "lookup_prefetch") are registered while open so other subscribers can
  look up parent contexts without coupling to this class
- config.py creates one shared registry and passes it to the subscriber;
  future subscribers (L1, L2, SM) plug in with two lines
- Tests updated to assert span state via registry.get() instead of the
  removed _root_spans dict; two new tests cover registry registration
  and cleanup of child spans
- docs/design/observability/request-event-span.md: new "Extending the
  Span Hierarchy" section with two worked examples (same-level span and
  three-level request → mp.retrieve → l2.disk_load)

Signed-off-by: deng451e <838677410@qq.com>
Comment thread lmcache/v1/mp_observability/subscribers/tracing/mp_server.py
Signed-off-by: deng451e <838677410@qq.com>
@deng451e
Copy link
Copy Markdown
Collaborator Author

Overall the implementation itself LGTM. But I am wondering if later we have more span events need to traced, do they all need to implemented in this single file? I am worried whether this file will become unmanageable. Is it possible we can share the root span across subscribers? Also should we also consider the case of multi-level span? For example, request root span can have retrieve subspan and retrieve subspan can also have some subsubspan?

Thanks for the suggestion! In the last commit, I added a new file for SpanRegistry: a shared (session_id, span_name) → (span, ctx) map for tracing subscribers. Subscribers can fetch parent contexts (e.g., "request" or "retrieve") to build spans like request → mp.retrieve → sub_span. New subscribers require only two config lines—no code changes. Examples are in docs/design/observability/request-event-span.md.

Copy link
Copy Markdown
Contributor

@royyhuang royyhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the previous comment. I do like the registry design. I made some minor comments regarding the new design. Please address if possible. Otherwise LGTM. Maybe also consider add the changes to cacheblend_server?

Comment thread lmcache/v1/mp_observability/config.py Outdated
# registry.get_context(session_id, "retrieve") without coupling to
# MPServerTracingSubscriber.
registry = SpanRegistry()
bus.register_subscriber(MPServerTracingSubscriber(registry))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the registry is global and always shared with all subscribers, I would recommend make this a singleton with get_span_registry under the subscribers.tracing module.

Comment thread lmcache/v1/mp_observability/event.py
Comment thread lmcache/v1/multiprocess/server.py
@royyhuang royyhuang changed the title [Observability] Per-request root OTel span for MP server tracing [Observability] Per-request root OTel span and SpanRegistry for MP server tracing Apr 15, 2026
Comment thread lmcache/v1/mp_observability/event.py
@deng451e
Copy link
Copy Markdown
Collaborator Author

cacheblend_server

yes, I plan to add the request event span for the blend server in a follow-up PR. or should I include it in this PR instead?

- Add get_span_registry() singleton so config.py does not construct SpanRegistry directly
- Add SpanRegistry.all_session_ids() and use it in shutdown() to avoid leaking root spans for counter-free sessions
- Refactor duplicated counter decrement + deferred-close logic in _on_end into a single shared check
- Wrap store path in try/except/finally so MP_STORE_END is always published even when reserve_write or the copy loop raises
- Remove dead _drain helper from test file
- Document MP_REQUEST_START / MP_STORE_SUBMITTED / MP_RETRIEVE_SUBMITTED / MP_SESSION_END in EVENTS.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: deng451e <838677410@qq.com>
@royyhuang
Copy link
Copy Markdown
Contributor

cacheblend_server

yes, I plan to add the request event span for the blend server in a follow-up PR. or should I include it in this PR instead?

I feel its fine. Just one extra file changed. Unless if you already have what other metrics/events/loggings you want to add cacheblend in mind, you can add this in another PR.

@deng451e deng451e force-pushed the per_request_root_span branch from 21003c0 to 1a66427 Compare April 15, 2026 19:34
Comment thread lmcache/v1/multiprocess/blend_server_v2.py Outdated
Comment thread lmcache/v1/mp_observability/subscribers/tracing/cb_server.py Outdated
Comment thread lmcache/v1/mp_observability/subscribers/tracing/cb_server.py Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 5 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 480f061. Configure here.

Comment thread lmcache/v1/multiprocess/blend_server_v2.py Outdated
Comment thread lmcache/v1/mp_observability/subscribers/tracing/cb_server.py Outdated
@deng451e deng451e force-pushed the per_request_root_span branch from 480f061 to 1a66427 Compare April 15, 2026 21:31
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ApostaC ApostaC enabled auto-merge (squash) April 15, 2026 23:04
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Apr 15, 2026
@ApostaC ApostaC merged commit 5035e1d into LMCache:dev Apr 16, 2026
51 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants