Make torch.cuda.gds APIs public#147120
Make torch.cuda.gds APIs public#147120mikaylagawarecki wants to merge 8 commits intogh/mikaylagawarecki/313/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147120
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 3 PendingAs of commit 5768d7f with merge base f95bdf5 ( UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Follow up to #145748 [ghstack-poisoned]
Follow up to #145748 [ghstack-poisoned]
Follow up to #145748 [ghstack-poisoned]
albanD
left a comment
There was a problem hiding this comment.
Sounds good, only small nits
|
|
||
|
|
||
| def _gds_register_buffer(s: Storage) -> None: | ||
| def gds_register_buffer(s: Storage) -> None: |
There was a problem hiding this comment.
Any user of the old private APIs? Do you want to keep them to avoid breaking any previous user by doing something like _gds_register_buffer = gds_register_buffer ?
There was a problem hiding this comment.
0 hits on github from repos that are not just copy pastes of pytorch so I feel ok about breaking the BC here
| gds_register_buffer | ||
| gds_deregister_buffer | ||
| GdsFile | ||
|
|
There was a problem hiding this comment.
Can you add here or link from here to an example on how to use these APIs?
Follow up to #145748 that turned USE_CUFILE on for CUDA 12.6 and 12.8 binaries [ghstack-poisoned]
albanD
left a comment
There was a problem hiding this comment.
Moving API to public sounds good.
Let's plan on having a tutorial (including serialization config needs) before 2.7 as an E2E example.
Follow up to #145748 that turned USE_CUFILE on for CUDA 12.6 and 12.8 binaries [ghstack-poisoned]
Follow up to #145748 that turned USE_CUFILE on for CUDA 12.6 and 12.8 binaries [ghstack-poisoned]
Follow up to #145748 that turned USE_CUFILE on for CUDA 12.6 and 12.8 binaries [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Follow up after : #147120 Cufile was enabled only on Linux: https://pypi.org/project/nvidia-cufile-cu12/#files Fixes validation workflow failues: https://github.com/pytorch/test-infra/actions/runs/13558218752/job/37896578837 ``` File "C:\Jenkins\Miniconda3\envs\conda-env-13558218752\lib\site-packages\torch\cuda\gds.py", line 105, in __init__ raise RuntimeError("GdsFile is not supported on this platform.") RuntimeError: GdsFile is not supported on this platform. Exception ignored in: <function GdsFile.__del__ at 0x000001772B5003A0> Traceback (most recent call last): File "C:\Jenkins\Miniconda3\envs\conda-env-13558218752\lib\site-packages\torch\cuda\gds.py", line 113, in __del__ if self.handle is not None: AttributeError: 'GdsFile' object has no attribute 'handle' ``` Pull Request resolved: #148060 Approved by: https://github.com/mikaylagawarecki
Follow up after : #147120 Cufile was enabled only on Linux: https://pypi.org/project/nvidia-cufile-cu12/#files Fixes validation workflow failues: https://github.com/pytorch/test-infra/actions/runs/13558218752/job/37896578837 ``` File "C:\Jenkins\Miniconda3\envs\conda-env-13558218752\lib\site-packages\torch\cuda\gds.py", line 105, in __init__ raise RuntimeError("GdsFile is not supported on this platform.") RuntimeError: GdsFile is not supported on this platform. Exception ignored in: <function GdsFile.__del__ at 0x000001772B5003A0> Traceback (most recent call last): File "C:\Jenkins\Miniconda3\envs\conda-env-13558218752\lib\site-packages\torch\cuda\gds.py", line 113, in __del__ if self.handle is not None: AttributeError: 'GdsFile' object has no attribute 'handle' ``` Pull Request resolved: #148060 Approved by: https://github.com/mikaylagawarecki
Follow up to #145748 that turned USE_CUFILE on for CUDA 12.6 and 12.8 binaries
Stack from ghstack (oldest at bottom):