Checklist
Motivation
This issue implements "fine-grained profiling for PD with EP/DP/PP" as a sub-task of #8210 "[Roadmap] Distributed Serving Enhancement on 2025 H2." cc @stmatengss
SGLang currently lacks fine-grained request tracing capabilities. After analyzing the performance of numerous LLM inference scenarios based on SGLang over the past few months, we found that request tracing functionality is crucial. Although the PyTorch Profiler also can trace execution, it cannot collect performance data over extended periods and is unable to observe parallelism.
With sglang tracing, we can obtain the following information:
- Latency of each execution segment within a request
- Parallel execution status of requests(TP/DP/PP/EP)
- Interactions between requests across multiple nodes and parallel threads(PD-Disaggregation)
- Thread behavior in parallel execution (e.g., whether requests are backing up, whether resources are underutilized)
This FR is a preview intended to demonstrate our output results and gather feedback for improvements.
We have implemented a PoC, but due to the following pending tasks, the official PR will be submitted in 2~3 weeks:
- Comprehensive stability testing
- Design document completion
- Additional instrumentation for DP/PP and error handling
ProposedSolutionHighlights
- Modular Tracing Framework
- Implemented a complete tracing package and provide a set of simple APIs. Developers can easily and flexibly customize their tracing workflows using these APIs. We have pre-instrumented key points in the request execution paths.
- OpenTelemetry Integration
- Generate Spans via OpenTelemetry APIs to natively support OpenTelemetry Collector integration.
- Resolved the single-context tracking limitation in OpenTelemetry, enabling simultaneous tracing of multiple request contexts even when continuous batching causes misaligned request execution
- Distributed System Support
- Implemented multi-node tracking in PD-Disaggregation scenarios(mini-LB, prefill/decode nodes)
- Implement intra-request concurrency tracing, such as TP. DP, PP, and EP tracking are under development.
- Multi-Format Visualization
- Jaeger/Zipkin: Request-centric view
- Perfetto: Thread-centric view. We even implemented the capability to merge trace data with PyTorch Profiler data.
Visualization Preview
Organize requests as first-level directories, threads as second-level directories (for observing parallelism), and place execution segments at the third-level hierarchy.
Below is a legend for PD-Disaggregation with TP=1.
Below is a legend for non PD-Disaggregation with TP=2
Organize threads as first-level directories, place concurrently executing request segments on second-level lines, and use links to interconnect all execution segments of a single request.
Below is a legend for PD-Disaggregation with TP=1.

Real-World Impact
Over the past few months, we have leveraged this functionality to address numerous challenges, such as resource capacity planning and long-tail latency analysis.
Related resources
No response
Checklist
Motivation
This issue implements "fine-grained profiling for PD with EP/DP/PP" as a sub-task of #8210 "[Roadmap] Distributed Serving Enhancement on 2025 H2." cc @stmatengss
SGLang currently lacks fine-grained request tracing capabilities. After analyzing the performance of numerous LLM inference scenarios based on SGLang over the past few months, we found that request tracing functionality is crucial. Although the PyTorch Profiler also can trace execution, it cannot collect performance data over extended periods and is unable to observe parallelism.
With sglang tracing, we can obtain the following information:
This FR is a preview intended to demonstrate our output results and gather feedback for improvements.
We have implemented a PoC, but due to the following pending tasks, the official PR will be submitted in 2~3 weeks:
ProposedSolutionHighlights
Visualization Preview
Organize requests as first-level directories, threads as second-level directories (for observing parallelism), and place execution segments at the third-level hierarchy.
Below is a legend for PD-Disaggregation with TP=1.
Below is a legend for non PD-Disaggregation with TP=2
Organize threads as first-level directories, place concurrently executing request segments on second-level lines, and use links to interconnect all execution segments of a single request.

Below is a legend for PD-Disaggregation with TP=1.
Real-World Impact
Over the past few months, we have leveraged this functionality to address numerous challenges, such as resource capacity planning and long-tail latency analysis.
Related resources
No response