feat(observability): add OpenTelemetry tracing for pipeline parallelism#23169
feat(observability): add OpenTelemetry tracing for pipeline parallelism#23169ShangmingCai merged 1 commit intosgl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces observability for pipeline parallelism by adding tracing for PP forward passes. It includes a new PP_FORWARD request stage and updates the scheduler to record timing statistics and metadata during batch execution. A potential issue was identified in set_time_batch where passing attributes as positional arguments could cause a TypeError for methods that do not support them.
| if attrs is None: | ||
| method(ts) | ||
| else: | ||
| method(ts, attrs) |
There was a problem hiding this comment.
Passing attrs as a positional argument to method is risky because many existing set_*_time methods in SchedulerReqTimeStats (like set_forward_entry_time) do not accept a second positional argument. If set_time_batch is called with attrs for one of those methods, it will raise a TypeError. It would be safer to pass attrs as a keyword argument, provided the target methods are updated to accept it.
3c8f2b8 to
a16345c
Compare
ShangmingCai
left a comment
There was a problem hiding this comment.
Looks good. cc: @sufeng-buaa please double-check.
| ) | ||
|
|
||
| # pipeline parallelism | ||
| PP_FORWARD = RequestStageConfig( |
There was a problem hiding this comment.
The stage name is somewhat ambiguous. The segment you are tracing is actually the CPU-side run_batch, so run_batch_cpu would be a more appropriate name.
| last_decode_scheduled_time: float = 0.0 | ||
| last_forward_entry_time: float = 0.0 | ||
| last_prefill_finished_time: float = 0.0 | ||
| pp_forward_start_time: float = 0.0 |
There was a problem hiding this comment.
Similarly, please rename it accordingly.
| "is_last_pp_rank": self.pp_group.is_last_rank, | ||
| } | ||
| if mb_id is not None: | ||
| attrs["pp_mb_id"] = mb_id |
There was a problem hiding this comment.
I think we can keep mb_id, and move the other scheduler-related attributes to the thread span.
| with torch.profiler.record_function("run_batch"): | ||
| with self.forward_stream_ctx: | ||
| self.forward_stream.wait_stream(self.schedule_stream) | ||
| if trace_enabled: |
There was a problem hiding this comment.
If the function parameters are simple, there’s no need to check trace_enabled here.
| # pipeline parallelism | ||
| PP_FORWARD = RequestStageConfig( | ||
| "pp_forward", | ||
| level=2, |
There was a problem hiding this comment.
Under prefill_forward and decode_forward, there may be chunked_prefill and decode_loop, so this span will be attached one level deeper. Let’s set the level to 4 for now. I’ll refactor the levels consistently later.
|
And please update scripts/convert_otel_2_perfetto.py diff --git a/scripts/convert_otel_2_perfetto.py b/scripts/convert_otel_2_perfetto.py
index 3a82969a4..89534a38f 100644
--- a/scripts/convert_otel_2_perfetto.py
+++ b/scripts/convert_otel_2_perfetto.py
@@ -237,6 +237,10 @@ def generate_perfetto_span(engine_root_spans, smg_otel_spans, thread_meta_data):
pid = int(thread_span["attributes"]["pid"])
host_id = thread_span["attributes"]["host_id"]
thread_name = f'{thread_span["attributes"]["host_id"][:8]}:{thread_span["attributes"]["thread_label"]}'
+ if "pp_rank" in thread_span["attributes"]:
+ thread_name += f"-PP{thread_span['attributes']['pp_rank']}"
+ if "dp_rank" in thread_span["attributes"]:
+ thread_name += f"-DP{thread_span['attributes']['dp_rank']}"
if "tp_rank" in thread_span["attributes"]:
thread_name += f"-TP{thread_span['attributes']['tp_rank']}" |
a16345c to
358e3cd
Compare
d1e2348 to
a71c9e8
Compare
|
@sufeng-buaa I have resolved the above reviews. Could you please review again? |
| attrs=attrs, | ||
| ) | ||
| result = self.run_batch(self.cur_batch, pp_proxy_tensors) | ||
| set_time_batch( |
There was a problem hiding this comment.
set_time_batch(
self.cur_batch.reqs,
"set_run_batch_cpu_start_time",
trace_only=True,
attrs={"pp_mb_id":mb_id}
)| self.cur_batch.reqs, | ||
| "set_run_batch_cpu_start_time", | ||
| trace_only=True, | ||
| attrs=attrs, |
There was a problem hiding this comment.
No attrs need to be passed in
| mb_metadata: List[Optional[PPBatchMetadata]], | ||
| last_rank_comm_queue: deque, | ||
| ): | ||
| attrs = ( |
There was a problem hiding this comment.
Simple attributes can be passed directly to set_time_batch()
7a5b731 to
c974f66
Compare
|
@sufeng-buaa OK, code simplified |
|
/tag-and-rerun-ci |
ShangmingCai
left a comment
There was a problem hiding this comment.
Please fix this test.
Begin (11/17):
python3 /home/runner/work/sglang/sglang/test/registered/unit/observability/test_trace.py
.
.
Skipping import of cpp extensions due to incompatible torch version. Please upgrade to torch >= 2.11.0 (found 2.9.1+cu130).
..E
======================================================================
ERROR: test_trace_thread_context (__main__.TestDataclasses)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/sglang/sglang/test/registered/unit/observability/test_trace.py", line 82, in test_trace_thread_context
info = TraceThreadInfo("h", 1, "l", 0, 0)
TypeError: TraceThreadInfo.__init__() missing 1 required positional argument: 'pp_rank'
----------------------------------------------------------------------
Ran 3 tests in 0.000s
FAILED (errors=1)
.Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>
c974f66 to
59d2aa2
Compare
@ShangmingCai this unittest has been fixed, but CI still has some errors, do they related to this PR? |
|
/rerun-failed-ci |
1 similar comment
|
/rerun-failed-ci |
…sm (sgl-project#23169) Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>
Motivation
Implement PP OpenTelemetry tracing as mentioned in roadmap #13511
Modifications
add pp_forward metrics
Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci