Add seq_num propagation to GPU kernel events in Kineto trace output#1296
Closed
mdlogic wants to merge 1 commit intopytorch:mainfrom
Closed
Add seq_num propagation to GPU kernel events in Kineto trace output#1296mdlogic wants to merge 1 commit intopytorch:mainfrom
mdlogic wants to merge 1 commit intopytorch:mainfrom
Conversation
…ytorch#1296) Summary: Propagate the NCCL collective sequence number (Seq) from CPU-side record_param_comms events to their linked GPU kernel events in the chrome trace JSON output. CPU events already carry the Seq field via generic metadata serialization. This change copies it to CONCURRENT_KERNEL events so that GPU-level collective operations can also be correlated across ranks. Changes: - output_json.cpp: Add kSeqNum constant and read Seq from the linked CPU collective record's metadata, appending it to GPU kernel event args Reviewed By: scotts Differential Revision: D96145504
d3053bd to
c8385b5
Compare
|
This pull request has been merged in 2b15a60. |
mdlogic
added a commit
to mdlogic/pytorch
that referenced
this pull request
Mar 12, 2026
Bump Kineto submodule from 0035505 to 2b15a60 to include pytorch/kineto#1296 (seq_num propagation to GPU kernel events in trace output). This is needed for pytorch#177148 (NCCL sequence number tracing).
This was referenced Mar 12, 2026
mdlogic
added a commit
to mdlogic/pytorch
that referenced
this pull request
Mar 12, 2026
Bump Kineto submodule from 0035505 to 2b15a60 to include pytorch/kineto#1296 (seq_num propagation to GPU kernel events in trace output). This is needed for pytorch#177148 (NCCL sequence number tracing).
mdlogic
added a commit
to mdlogic/pytorch
that referenced
this pull request
Mar 12, 2026
Bump Kineto submodule from 0035505 to 2b15a60 to include pytorch/kineto#1296 (seq_num propagation to GPU kernel events in trace output). This is needed for pytorch#177148 (NCCL sequence number tracing).
mdlogic
added a commit
to mdlogic/pytorch
that referenced
this pull request
Mar 12, 2026
Bump Kineto submodule from 0035505 to 2b15a60 to include pytorch/kineto#1296 (seq_num propagation to GPU kernel events in trace output). This is needed for pytorch#177148 (NCCL sequence number tracing).
mdlogic
added a commit
to mdlogic/pytorch
that referenced
this pull request
Mar 13, 2026
Bump Kineto submodule from 0035505 to 2b15a60 to include pytorch/kineto#1296 (seq_num propagation to GPU kernel events in trace output). This is needed for pytorch#177148 (NCCL sequence number tracing).
pytorchmergebot
pushed a commit
to pytorch/pytorch
that referenced
this pull request
Mar 13, 2026
Bump Kineto submodule from 0035505 to 2b15a60 to include pytorch/kineto#1296 (seq_num propagation to GPU kernel events in trace output). This is needed so that #177148 (D96145503) can use the new Kineto APIs for NCCL sequence number tracing. ## Included kineto commits - 2b15a60 Add seq_num propagation to GPU kernel events in Kineto trace output (#1296) - 350b58f Refactor CuptiActivityProfiler.cpp to use CuptiCbidRegistry (#1297) - 1f9ceb1 Use HAS_CUPTI_RANGE_PROFILER to avoid range profiler init (#1298) - ebaac17 Add USDT log type to logger framework (#1285) - e2e7e97 Revert D94566477: Add NCCL collective sequence number (seq_num) to Kineto profiler traces - a7c5f4d Add NCCL collective sequence number (seq_num) to Kineto profiler traces (#1294) Pull Request resolved: #177298 Approved by: https://github.com/sanrise, https://github.com/malfet
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Propagate the NCCL collective sequence number (Seq) from CPU-side
record_param_comms events to their linked GPU kernel events in the
chrome trace JSON output.
CPU events already carry the Seq field via generic metadata serialization.
This change copies it to CONCURRENT_KERNEL events so that GPU-level
collective operations can also be correlated across ranks.
Changes:
CPU collective record's metadata, appending it to GPU kernel event args
Differential Revision: D96145504