privateuse1 backend integration with kineto by divyanshk · Pull Request #172154 · pytorch/pytorch

divyanshk · 2026-01-10T01:41:07Z

Created privateuse1_profiler.h/.cpp — A registry pattern that allows PrivateUse1 backends to register IActivityProfiler factories via REGISTER_PRIVATEUSE1_PROFILER(MyProfiler) macro, with compile-time static_assert ensuring the class inherits from libkineto::IActivityProfiler.
- This makes the assumption that backends will take a dependency on Kineto to use IActivityProfiler interface. Right now the backends have to check in their implementation to Kineto - so this might be a step up and a safe assumption.
- As an alternative, PyTorch could define its own abstract interface that mirrors IActivityProfiler, then internally forward to Kineto.
Kineto init paths — Added onKinetoInit() calls in kineto_shim.cpp (user-triggered profiling via prepareTrace()), but not for kineto_client_interface.cpp (daemon mode via global_kineto_init()), with guards to ensure Kineto is initialized before forwarding.

TODO

[Done] Gate this behind a new ProfilerState::KINETO_PRIVATEUSE1 check
[Done] Check how (if at all) kineto build args need to change. Mostly it shouldn't as for privateuse1 we wont need CUDA/ROCm/XPU etc.
[Done] How does this break kineto's fbcode setup? Not applicable

pytorch-bot · 2026-01-10T01:41:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172154

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b0b06d3 with merge base 5f68a4a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

scotts · 2026-01-12T19:05:25Z

@divyanshk, what does this enable that wasn't previously possible? Is it currently the case that privateuse1 backends just don't work at all with torch.profile()?

divyanshk · 2026-01-12T19:46:54Z

@scotts For privateuse1 backends, we provide stubs so that users can use the legacy autograd profiler https://github.com/pytorch/pytorch/blob/main/docs/source/accelerator/profiler.md. If they want to use Kineto, they then have to have a kineto-compatible backend, which requires checking in the profiler implementation in kineto (AIU backend follows that later route). What I am hoping to enable is for backend users to use the latter route, without checking in a lot of code in Kineto. That is the entire philosophy behind Privateuse1 - we just provide extension points in pytorch core, users have their implementations in their own repo without us having to maintain it.

torch/csrc/profiler/kineto_shim.cpp

torch/csrc/profiler/standalone/privateuse1_profiler.cpp

sraikund16 · 2026-01-13T00:31:16Z

@scotts For privateuse1 backends, we provide stubs so that users can use the legacy autograd profiler https://github.com/pytorch/pytorch/blob/main/docs/source/accelerator/profiler.md. If they want to use Kineto, they then have to have a kineto-compatible backend, which requires checking in the profiler implementation in kineto (AIU backend fillows that later route). What I am hoping to enable is for backend users to use the latter route, without checking in a lot of code in Kineto. That is the entire philosophy behind Privateuse1 - we just provide extension points in pytorch core, users have their implementations in their own repo without us having to maintain it.

How will users export a chrome trace without kineto? Is this impl for using the FunctionEvent frontend for now?

ppnaik1890 · 2026-01-22T13:56:42Z

Hi @divyanshk
So to confirm our understanding, we register our backend using REGISTER_PRIVATEUSE1_PROFILER and enable integration with IActivityProfiler . The registration and integration code for the backend needs to reside in an outside repo?
For example, this plugin will be moved out of the repo?

divyanshk · 2026-01-23T18:44:29Z

@ppnaik1890 Yes that is correct. How does that sound ?

To be clear, here "outside repo" would be the AIU / IBM Pytorch backend implementation. I don't know if that exists right now (code pointer?)

raghukiran1224 · 2026-01-27T03:15:49Z

To be clear, here "outside repo" would be the AIU / IBM Pytorch backend implementation. I don't know if that exists right now (code pointer?)

@divyanshk That we can move to our current dev org torch-spyre if the pattern is being followed for out of tree accelerators.

torch/csrc/profiler/standalone/privateuse1_profiler.h

torch/csrc/profiler/standalone/privateuse1_profiler.cpp

scotts · 2026-03-05T20:34:22Z

torch/csrc/profiler/standalone/privateuse1_profiler.h

+// libkineto::IActivityProfiler.
+struct RegisterPrivateUse1Profiler {
+  template <typename ProfilerClass>
+  explicit RegisterPrivateUse1Profiler(ProfilerClass*) {


Do we need to take a parameter? We don't actually use it. And when we instantiate this in the macro, we're passing nullptr. Can't we simplify this by just making this an empty constructor?

I am using ProfilerClass for the compile time type assertion below.

Yes, but that's available as a symbol because of template <typename ProfilerClass>. I don't think you need a dummy parameter to reference it. It also may be a bit more idiomatic to make it a template parameter to the class itself rather than the constructor, but I don't think it makes any actual functionality difference in our case.

Ahh yes thats nice! Thank you. Templating the struct is much more cleaner than templating its constructor.

On templated constructor (what I had earlier), Out of curiosity, where you thinking of something like this

struct Foo { template <typename T> Foo() { /* use T somehow */ } };

but this isn't valid ?

I think that's valid? Did you try it and get a compilation error? It's not a requirement that a template parameter is the type of an actual parameter.

scotts · 2026-03-05T20:37:45Z

torch/csrc/autograd/profiler_kineto.cpp

+  // Forward registered PrivateUse1 profiler factory to Kineto.
+  // Only for KINETO_PRIVATEUSE1 state where backend provides its own
+  // IActivityProfiler.
+#ifdef USE_KINETO


We called torch::profiler::impl::kineto::prepareTrace() above without a USE_KINETO check. Do we need to do that here? I think we should figure out the absolute minimum we need to do such checks.

yeah - because torch::profiler::impl::kineto::prepareTrace() is for kineto_shim.cpp which can operate with USE_KINETO not set because it has almost every function with a ifdef guard - sad i know :-( check out this comment https://github.com/pytorch/pytorch/blob/main/torch/csrc/profiler/kineto_shim.cpp#L16

We might be able to get rid of kineto_shim altogether but that is a separate conversation.

I can get rid of this because the if condition config.state == ProfilerState::KINETO_PRIVATEUSE1 is true only when kineto is present, but that will be at runtime. but for compile time correctness we would have to keep this USE_KINETO around PrivateUse1ProfilerRegistry.

// Here lies pain and #ifdef USE_KINETO

Got an actual laugh-out-loud from me. :) Yeah, let's deal with cleaning this up later as its own thing.

scotts · 2026-03-05T20:40:18Z

This is great! I think this will greatly improve our profiler integrations. I have a bunch of small comments about the code itself, but we also need some tests. At the least, I think we need some C++ tests which mock up a trivial "external" profiler. If we can also somehow get that working with the Python side in tests as well, that would be great.

test/cpp/profiler/CMakeLists.txt

divyanshk · 2026-03-07T03:14:55Z

torch/csrc/autograd/profiler_kineto.cpp

+  // Forward registered PrivateUse1 profiler factory to Kineto.
+  // Only for KINETO_PRIVATEUSE1 state where backend provides its own
+  // IActivityProfiler.
+#ifdef USE_KINETO


yeah - because torch::profiler::impl::kineto::prepareTrace() is for kineto_shim.cpp which can operate with USE_KINETO not set because it has almost every function with a ifdef guard - sad i know :-( check out this comment https://github.com/pytorch/pytorch/blob/main/torch/csrc/profiler/kineto_shim.cpp#L16

We might be able to get rid of kineto_shim altogether but that is a separate conversation.

meta-codesync · 2026-03-09T18:24:15Z

@divyanshk has imported this pull request. If you are a Meta employee, you can view this in D95825766.

test/cpp/profiler/test_privateuse1_profiler.cpp

test/profiler/test_profiler.py

divyanshk · 2026-03-10T21:47:54Z

.ci/pytorch/test.sh

  test_custom_script_ops
  test_custom_backend
  test_torch_function_benchmark
+  test_libtorch_profiler


For dev infra folks, I am including profiler test as part of default shard 2 - the test is tiny. Here is the log:

+ /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin/test_privateuse1_profiler --gtest_filter=PrivateUse1ProfilerTest.EndToEndProfiling CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = PrivateUse1ProfilerTest.EndToEndProfiling-*_CUDA:*_MultiCUDA [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from PrivateUse1ProfilerTest [ RUN ] PrivateUse1ProfilerTest.EndToEndProfiling [ OK ] PrivateUse1ProfilerTest.EndToEndProfiling (0 ms) [----------] 1 test from PrivateUse1ProfilerTest (0 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (0 ms total) [ PASSED ] 1 test. + /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin/test_privateuse1_profiler --gtest_filter=-PrivateUse1ProfilerTest.EndToEndProfiling CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = -PrivateUse1ProfilerTest.EndToEndProfiling:*_CUDA:*_MultiCUDA [==========] Running 3 tests from 1 test suite. [----------] Global test environment set-up. [----------] 3 tests from PrivateUse1ProfilerTest [ RUN ] PrivateUse1ProfilerTest.RegistrySingleton [ OK ] PrivateUse1ProfilerTest.RegistrySingleton (0 ms) [ RUN ] PrivateUse1ProfilerTest.RegisterFactory [ OK ] PrivateUse1ProfilerTest.RegisterFactory (0 ms) [ RUN ] PrivateUse1ProfilerTest.OnKinetoInitForwarding [ OK ] PrivateUse1ProfilerTest.OnKinetoInitForwarding (0 ms) [----------] 3 tests from PrivateUse1ProfilerTest (0 ms total) [----------] Global test environment tear-down [==========] 3 tests from 1 test suite ran. (0 ms total) [ PASSED ] 3 tests.

Including two separate processes because one of the test (EndToEndProfiling) requires to exist in its own - it does a e2e test, having other smaller unit tests run with it can pollute it.

divyanshk · 2026-03-10T22:23:19Z

@pytorchbot merge

pytorchmergebot · 2026-03-10T22:25:29Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-03-11T01:39:36Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx950.1)

Details for Dev Infra team

Raised by workflow job

divyanshk · 2026-03-12T14:06:06Z

@pytorchmergebot merge

pytorchmergebot · 2026-03-12T14:08:19Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@ppnaik1890

Propose a profiling toolkit for Spyre covering the full stack: PyTorch Profiler integration via REGISTER_PRIVATEUSE1_PROFILER (pytorch/pytorch#172154), Spyre SMI, IR instrumentation-based fine-grained profiler, memory profiling (DDR + scratchpad), Inductor provenance tracking, HTA integration, FFDC, and multi-card/energy profiling. Covers both the current OpSpec/SuperDSC pipeline and the future KTIR transition (RFC 0682). Tracking issue: #1048 Co-Authored-By: @ppnaik1890 and @flop1971 Signed-off-by: kaoutar55 <kaoutar.elmaghraoui@gmail.com>

@ppnaik1890

Propose a profiling toolkit for Spyre covering the full stack: PyTorch Profiler integration via REGISTER_PRIVATEUSE1_PROFILER (pytorch/pytorch#172154), Spyre SMI, IR instrumentation-based fine-grained profiler, memory profiling (DDR + scratchpad), Inductor provenance tracking, HTA integration, FFDC, and multi-card/energy profiling. Covers both the current OpSpec/SuperDSC pipeline and the future KTIR transition (RFC 0682). Tracking issue: #1048 Co-Authored-By: @ppnaik1890 and @flop1971 Signed-off-by: kaoutar55 <kaoutar.elmaghraoui@gmail.com>

1. Created privateuse1_profiler.h/.cpp — A registry pattern that allows PrivateUse1 backends to register IActivityProfiler factories via REGISTER_PRIVATEUSE1_PROFILER(MyProfiler) macro, with compile-time static_assert ensuring the class inherits from libkineto::IActivityProfiler. * This makes the assumption that backends will take a dependency on Kineto to use IActivityProfiler interface. Right now the backends have to check in their implementation to Kineto - so this might be a step up and a safe assumption. * As an alternative, PyTorch could define its own abstract interface that mirrors IActivityProfiler, then internally forward to Kineto. 2. Kineto init paths — Added onKinetoInit() calls in kineto_shim.cpp (user-triggered profiling via prepareTrace()), but _not_ for kineto_client_interface.cpp (daemon mode via global_kineto_init()), with guards to ensure Kineto is initialized before forwarding. TODO 1. [Done] Gate this behind a new ProfilerState::KINETO_PRIVATEUSE1 check 2. [Done] Check how (if at all) kineto build args need to change. Mostly it shouldn't as for privateuse1 we wont need CUDA/ROCm/XPU etc. 3. [Done] How does this break kineto's fbcode setup? Not applicable Pull Request resolved: pytorch#172154 Approved by: https://github.com/scotts

divyanshk added release notes: profiler release notes category topic: not user facing topic category labels Jan 10, 2026

pytorch deleted a comment from github-actions bot Jan 10, 2026

sraikund16 reviewed Jan 13, 2026

View reviewed changes

torch/csrc/profiler/kineto_shim.cpp Outdated Show resolved Hide resolved

sraikund16 reviewed Jan 13, 2026

View reviewed changes

torch/csrc/profiler/standalone/privateuse1_profiler.cpp Outdated Show resolved Hide resolved

mcalman mentioned this pull request Jan 23, 2026

Generalize tb_plugin to be accelerator agnostic pytorch/kineto#1223

Closed

divyanshk force-pushed the divyanshk/profiler_privateuse1_reg branch from dcc9d9d to 6d3c751 Compare February 26, 2026 04:47

ppnaik1890 mentioned this pull request Mar 3, 2026

PrivateUse1 profiler activity registration via open registration API torch-spyre/torch-spyre#861

Open

divyanshk force-pushed the divyanshk/profiler_privateuse1_reg branch from 6d3c751 to 4955312 Compare March 3, 2026 22:53

scotts reviewed Mar 4, 2026

View reviewed changes

torch/csrc/profiler/standalone/privateuse1_profiler.h Outdated Show resolved Hide resolved

scotts reviewed Mar 5, 2026

View reviewed changes

torch/csrc/profiler/standalone/privateuse1_profiler.cpp Outdated Show resolved Hide resolved

divyanshk marked this pull request as ready for review March 5, 2026 17:55

scotts reviewed Mar 5, 2026

View reviewed changes

divyanshk force-pushed the divyanshk/profiler_privateuse1_reg branch from 4955312 to 8e4b002 Compare March 7, 2026 03:02

divyanshk commented Mar 7, 2026

View reviewed changes

ppnaik1890 mentioned this pull request Mar 9, 2026

Integrate with profiling backend registration API in kineto torch-spyre/torch-spyre#928

Open

2 tasks

divyanshk force-pushed the divyanshk/profiler_privateuse1_reg branch 2 times, most recently from f97a00a to 9aef685 Compare March 9, 2026 17:52

scotts reviewed Mar 9, 2026

View reviewed changes

test/cpp/profiler/test_privateuse1_profiler.cpp Outdated Show resolved Hide resolved

scotts reviewed Mar 9, 2026

View reviewed changes

test/profiler/test_profiler.py Outdated Show resolved Hide resolved

divyanshk added 3 commits March 10, 2026 08:23

reorder forwardingToKineto before prepareTrace

5ed025c

separate e2e test in a seperate invocation

65d5a25

rename forward -> register, for ci use executable over pytest wrapper

665f140

divyanshk commented Mar 10, 2026

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 10, 2026

pytorchmergebot added the merging label Mar 10, 2026

pytorchmergebot removed the merging label Mar 11, 2026

rocm job failed at merge time, using export LD_LIBRARY_PATH

b0b06d3

divyanshk added the ciflow/rocm Trigger "default" config CI on ROCm label Mar 11, 2026

pytorch deleted a comment from pytorch-bot bot Mar 11, 2026

divyanshk added ciflow/rocm-mi355 Trigger "default" config CI on ROCm MI355 runners and removed ciflow/rocm Trigger "default" config CI on ROCm labels Mar 11, 2026

pytorchmergebot added the merging label Mar 12, 2026

pytorchmergebot closed this in e274aff Mar 12, 2026

pytorchmergebot added Merged and removed merging labels Mar 12, 2026

kaoutar55 mentioned this pull request Mar 16, 2026

RFC 0601: Spyre Profiling Toolkit torch-spyre/torch-spyre#1049

Closed

3 tasks

kaoutar55 mentioned this pull request Mar 18, 2026

RFC 0601: Spyre Profiling Toolkit torch-spyre/RFCs#3

Merged

3 tasks

vishalgoyal316 mentioned this pull request Mar 20, 2026

[RFC] Adding Kernel-Level Profiling Support to OpenReg #177978

Open

divyanshk mentioned this pull request Mar 24, 2026

[Feature] Enabling dynamic profiler plugins pytorch/kineto#1121

Open

fffrog mentioned this pull request Apr 9, 2026

Enable profiler.ProfilerActivity.PrivateUse1 to be renamed to device-specific name #179806

Open

Conversation

divyanshk commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172154

✅ No Failures

Uh oh!

scotts commented Jan 12, 2026

Uh oh!

divyanshk commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sraikund16 commented Jan 13, 2026

Uh oh!

ppnaik1890 commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

divyanshk commented Jan 23, 2026

Uh oh!

raghukiran1224 commented Jan 27, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts commented Mar 5, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meta-codesync bot commented Mar 9, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divyanshk commented Mar 10, 2026

Uh oh!

pytorchmergebot commented Mar 10, 2026

Merge started

Uh oh!

pytorchmergebot commented Mar 11, 2026

Merge failed

Uh oh!

divyanshk commented Mar 12, 2026

Uh oh!

pytorchmergebot commented Mar 12, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

divyanshk commented Jan 10, 2026 •

edited

Loading

pytorch-bot bot commented Jan 10, 2026 •

edited

Loading

divyanshk commented Jan 12, 2026 •

edited

Loading

ppnaik1890 commented Jan 22, 2026 •

edited

Loading

scotts Mar 9, 2026 •

edited

Loading