[Observability] Per-request root OTel span and SpanRegistry for MP server tracing#3033
Conversation
Signed-off-by: deng451e <838677410@qq.com>
There was a problem hiding this comment.
Code Review
This pull request implements a per-request root span for the MPServerTracingSubscriber, ensuring child operations are correctly nested and providing a unified view of request lifecycles. It introduces new events and a deferral protocol to handle asynchronous GPU operations, preventing orphaned spans. The PR also adds a comprehensive observability example stack including Grafana dashboards and OTel configuration. Feedback was provided to refactor duplicated logic in the span-end handlers to improve maintainability.
royyhuang
left a comment
There was a problem hiding this comment.
Overall the implementation itself LGTM. But I am wondering if later we have more span events need to traced, do they all need to implemented in this single file? I am worried whether this file will become unmanageable. Is it possible we can share the root span across subscribers? Also should we also consider the case of multi-level span? For example, request root span can have retrieve subspan and retrieve subspan can also have some subsubspan?
…bility
Add SpanRegistry to share OTel span contexts across tracing subscribers,
addressing reviewer concern that all future span events would accumulate
in a single file with no mechanism for multi-level nesting.
Key changes:
- New span_registry.py: thin (session_id, span_name) → (span, ctx) dict,
no locking needed (single drain thread)
- MPServerTracingSubscriber accepts optional SpanRegistry; creates a
private one if not provided (backward compatible)
- Root "request" span and all child spans ("store", "retrieve",
"lookup_prefetch") are registered while open so other subscribers can
look up parent contexts without coupling to this class
- config.py creates one shared registry and passes it to the subscriber;
future subscribers (L1, L2, SM) plug in with two lines
- Tests updated to assert span state via registry.get() instead of the
removed _root_spans dict; two new tests cover registry registration
and cleanup of child spans
- docs/design/observability/request-event-span.md: new "Extending the
Span Hierarchy" section with two worked examples (same-level span and
three-level request → mp.retrieve → l2.disk_load)
Signed-off-by: deng451e <838677410@qq.com>
Thanks for the suggestion! In the last commit, I added a new file for SpanRegistry: a shared (session_id, span_name) → (span, ctx) map for tracing subscribers. Subscribers can fetch parent contexts (e.g., "request" or "retrieve") to build spans like request → mp.retrieve → sub_span. New subscribers require only two config lines—no code changes. Examples are in docs/design/observability/request-event-span.md. |
| # registry.get_context(session_id, "retrieve") without coupling to | ||
| # MPServerTracingSubscriber. | ||
| registry = SpanRegistry() | ||
| bus.register_subscriber(MPServerTracingSubscriber(registry)) |
There was a problem hiding this comment.
Since the registry is global and always shared with all subscribers, I would recommend make this a singleton with get_span_registry under the subscribers.tracing module.
yes, I plan to add the request event span for the blend server in a follow-up PR. or should I include it in this PR instead? |
- Add get_span_registry() singleton so config.py does not construct SpanRegistry directly - Add SpanRegistry.all_session_ids() and use it in shutdown() to avoid leaking root spans for counter-free sessions - Refactor duplicated counter decrement + deferred-close logic in _on_end into a single shared check - Wrap store path in try/except/finally so MP_STORE_END is always published even when reserve_write or the copy loop raises - Remove dead _drain helper from test file - Document MP_REQUEST_START / MP_STORE_SUBMITTED / MP_RETRIEVE_SUBMITTED / MP_SESSION_END in EVENTS.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: deng451e <838677410@qq.com>
I feel its fine. Just one extra file changed. Unless if you already have what other metrics/events/loggings you want to add cacheblend in mind, you can add this in another PR. |
21003c0 to
1a66427
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 5 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 480f061. Configure here.
480f061 to
1a66427
Compare

Description:
Summary
in-flight GPU callbacks complete
coupling to MPServerTracingSubscriber; new subscribers plug in with two lines in config.py
Note
Medium Risk
Changes MP server event emission and tracing lifecycle handling around async GPU operations; incorrect ordering/counter logic could lead to leaked spans or misleading trace boundaries, though the change is well-covered by new tests.
Overview
Adds a per-request root OTel span (
"request") for MP server tracing and ensuresmp.lookup_prefetch,mp.retrieve, andmp.storespans are properly parented under it, avoiding orphaned spans in trace backends.Introduces four new CPU-synchronous lifecycle events (
MP_REQUEST_START,MP_STORE_SUBMITTED,MP_RETRIEVE_SUBMITTED,MP_SESSION_END) and updates the MP server to emit them so the tracing subscriber can defer closing the root span until any in-flight GPU store/retrieve callbacks complete (while stamping the root end time atMP_SESSION_END).Adds a shared
SpanRegistryfor cross-subscriber context propagation, wires it into observability config, expands tracing tests to cover deferral/race scenarios, and includes a newexamples/observabilityDocker Compose stack (Collector/Tempo/Prometheus/Grafana) plus a design doc to demonstrate and validate the new trace hierarchy.Reviewed by Cursor Bugbot for commit 35a8305. Bugbot is set up for automated code reviews on this repo. Configure here.