Skip to content

[c10] Move P2P access logic from ATen to c10#173571

Closed
minsii wants to merge 1 commit intopytorch:mainfrom
minsii:export-D91506414
Closed

[c10] Move P2P access logic from ATen to c10#173571
minsii wants to merge 1 commit intopytorch:mainfrom
minsii:export-D91506414

Conversation

@minsii
Copy link
Copy Markdown
Contributor

@minsii minsii commented Jan 27, 2026

Summary:
Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Jan 27, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Jan 27, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/173571

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f342c43 with merge base 02a87d7 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Jan 27, 2026

@minsii has exported this pull request. If you are a Meta employee, you can view the originating Diff in D91506414.

pytorch-bot bot pushed a commit that referenced this pull request Jan 27, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 27, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 28, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 28, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 28, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 28, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 28, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 28, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 28, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 28, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 28, 2026
minsii added a commit to minsii/pytorch that referenced this pull request Jan 28, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 29, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 29, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 29, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 29, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 29, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
minsii added a commit to minsii/pytorch that referenced this pull request Jan 29, 2026
Summary:

Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.

The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Reviewed By: ngimel

Differential Revision: D91506414
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Jan 29, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@wdvr
Copy link
Copy Markdown
Contributor

wdvr commented Jan 30, 2026

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jan 30, 2026
@wdvr
Copy link
Copy Markdown
Contributor

wdvr commented Jan 30, 2026

closing for rebase

@wdvr wdvr closed this Jan 30, 2026
pytorchmergebot pushed a commit that referenced this pull request Feb 10, 2026
Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.
The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Original diff by @minsii #173571

Bifferential Revision: [D92675476](https://our.internmc.facebook.com/intern/diff/D92675476/)
Pull Request resolved: #174582
Approved by: https://github.com/Skylion007
radeksm pushed a commit to radeksm/pytorch that referenced this pull request Feb 20, 2026
Refactor PeerToPeerAccess by moving the core implementation from
aten/src/ATen/cuda to c10/cuda. This makes P2P and fabric access
queries available at the c10 layer without requiring ATen dependencies.
The ATen layer now provides thin wrappers that ensure CUDA lazy
initialization before forwarding to c10. This separation allows
lower-level CUDA code to query P2P capabilities without pulling in
ATen context machinery.

Original diff by @minsii pytorch#173571

Bifferential Revision: [D92675476](https://our.internmc.facebook.com/intern/diff/D92675476/)
Pull Request resolved: pytorch#174582
Approved by: https://github.com/Skylion007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported meta-exported topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants