Add wrappers for synchronous GPUDirect Storage APIs by mikaylagawarecki · Pull Request #130633 · pytorch/pytorch

mikaylagawarecki · 2024-07-12T18:04:55Z

Based in part on NVIDIA/apex#1774

Stack from ghstack (oldest at bottom):

-> Add wrappers for synchronous GPUDirect Storage APIs #130633

Differential Revision: D60155434

[ghstack-poisoned]

pytorch-bot · 2024-07-12T18:04:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130633

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0c18902 with merge base c047bdd ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: 92150af Pull Request resolved: #130633

[ghstack-poisoned]

ghstack-source-id: dec91e4 Pull Request resolved: #130633

torch/csrc/Module.cpp

albanD · 2024-07-15T14:25:39Z

cmake/Modules/FindCUDAToolkit.cmake

    _CUDAToolkit_find_and_add_import_lib(cublas_static DEPS culibos)
  endif()

+  if(CUDAToolkit_VERSION VERSION_GREATER_EQUAL 11.4)


Is this file vendored from somewhere? Did you take that update from there?

yep from here https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/FindCUDAToolkit.cmake#L1245-1251, the cuFile changes were only in 3.25, but seems like we are on version 3.17

Ok!
Did you update the full thing to 3.25? Or just picked the subset of the changes you needed?
@malfet what do you think is the best way to do this? A full update to a given version to make sure we're not in a weird in-between. Or just what we need to reduce churn?

Just copy pasted the subset I needed, fwiw I am not sure that the initial file was copy pasted directly as I see this at the top

pytorch/cmake/Modules/FindCUDAToolkit.cmake

Line 2 in e12cfe3

# This module is back-ported from CMake 3.17 and above to work with CMake 3.10

but around the date where the PR that added this file was merged I don't see a similar comment https://gitlab.kitware.com/cmake/cmake/-/blob/21b102c77d85897c2500488180e58de077447b4c/Modules/FindCUDAToolkit.cmake

not sure whether there were any changes made/which commit to CMake it was taken from

test/test_serialization.py

aten/src/ATen/cuda/CUDAGdsFile.cpp

[ghstack-poisoned]

ghstack-source-id: 32b0551 Pull Request resolved: #130633

[ghstack-poisoned]

ghstack-source-id: 55f6610 Pull Request resolved: #130633

[ghstack-poisoned]

ghstack-source-id: 2bf34da Pull Request resolved: #130633

[ghstack-poisoned]

mikaylagawarecki · 2024-07-24T03:31:12Z

@mikaylagawarecki has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Based in part on NVIDIA/apex#1774 Pull Request resolved: pytorch#130633 Approved by: https://github.com/albanD

…130633)" This reverts commit 5b5e069. Reverted pytorch#130633 on behalf of https://github.com/clee2000 due to breaking a lot of jobs and build rules internally D60085885, possibly needs to update some bazel build? ([comment](pytorch#130633 (comment)))

mikaylagawarecki · 2024-07-25T16:06:26Z

Build failures present on D60085885 do not exist on the imported D60155434 (also verified by running some of the builds locally on the diff and they succeeded), the service_lab signal that is failing previously succeeded so is flaky.

Going to rebase and merge

[ghstack-poisoned]

ghstack-source-id: 1ca20ab Pull Request resolved: #130633

mikaylagawarecki · 2024-07-25T18:02:17Z

@pytorchbot merge

pytorchmergebot · 2024-07-25T18:04:10Z

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team

Raised by workflow job

mikaylagawarecki · 2024-07-25T18:44:15Z

@pytorchbot merge

pytorchmergebot · 2024-07-25T18:46:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

clee2000 · 2024-07-26T18:06:32Z

@pytorchbot revert -m "still failing internally D60265673" -c ghfirst

pytorchmergebot · 2024-07-26T18:08:11Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2024-07-26T18:08:23Z

@mikaylagawarecki your PR has been successfully reverted.

This reverts commit 709ddf7. Reverted #130633 on behalf of https://github.com/clee2000 due to still failing internally D60265673 ([comment](#130633 (comment)))

Reland #130633 USE_CUFILE turned off by default in this version Pull Request resolved: #133489 Approved by: https://github.com/albanD

…#133489) Reland pytorch#130633 USE_CUFILE turned off by default in this version Pull Request resolved: pytorch#133489 Approved by: https://github.com/albanD

Update

40fdfe6

[ghstack-poisoned]

Update

53547a3

[ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Jul 12, 2024

Add wrappers for synchronous GPUDirect Storage APIs

f2a6be6

ghstack-source-id: 92150af Pull Request resolved: #130633

Update

aca8cea

[ghstack-poisoned]

Update

2167e30

[ghstack-poisoned]

Update

ab8df6f

[ghstack-poisoned]

Update

076f9d2

[ghstack-poisoned]

Update

e12cfe3

[ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Jul 12, 2024

Add wrappers for synchronous GPUDirect Storage APIs

a56dc1e

ghstack-source-id: dec91e4 Pull Request resolved: #130633

albanD reviewed Jul 15, 2024

View reviewed changes

Update

2963098

[ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Jul 15, 2024

Add wrappers for synchronous GPUDirect Storage APIs

1d1b12c

ghstack-source-id: 32b0551 Pull Request resolved: #130633

Update

d53dc64

[ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Jul 16, 2024

Add wrappers for synchronous GPUDirect Storage APIs

d6b61e2

ghstack-source-id: 55f6610 Pull Request resolved: #130633

Update

db35bfb

[ghstack-poisoned]

mikaylagawarecki added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 16, 2024

Update

24efee3

[ghstack-poisoned]

Update

c28aa36

[ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Jul 17, 2024

Add wrappers for synchronous GPUDirect Storage APIs

27eaadc

ghstack-source-id: 2bf34da Pull Request resolved: #130633

mikaylagawarecki mentioned this pull request Jul 17, 2024

Packaging for libcufile pytorch/builder#1924

Open

Update

817f453

[ghstack-poisoned]

PaliC mentioned this pull request Jul 23, 2024

[BE] Improve error message when there are internal changes #131547

Closed

mikaylagawarecki mentioned this pull request Jul 24, 2024

Fix public API tests #131386

Closed

xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024

Add wrappers for synchronous GPUDirect Storage APIs (pytorch#130633)

535b7e9

Based in part on NVIDIA/apex#1774 Pull Request resolved: pytorch#130633 Approved by: https://github.com/albanD

Update

0c18902

[ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Jul 25, 2024

Add wrappers for synchronous GPUDirect Storage APIs

c086043

ghstack-source-id: 1ca20ab Pull Request resolved: #130633

pytorchmergebot added the merging label Jul 25, 2024

pytorchmergebot removed the merging label Jul 25, 2024

pytorchmergebot added the merging label Jul 25, 2024

pytorchmergebot closed this in 709ddf7 Jul 25, 2024

pytorchmergebot removed the merging label Jul 25, 2024

pytorchmergebot reopened this Jul 26, 2024

henrylhtsang mentioned this pull request Jul 31, 2024

[BE][typing] fix types in common pruning #132309

Closed

mikaylagawarecki mentioned this pull request Aug 14, 2024

[Reland] Add wrappers for synchronous GPUDirect Storage APIs #133489

Closed

mikaylagawarecki closed this Aug 14, 2024

pytorchmergebot pushed a commit that referenced this pull request Aug 15, 2024

[Reland] Add wrappers for synchronous GPUDirect Storage APIs (#133489)

018e48c

Reland #130633 USE_CUFILE turned off by default in this version Pull Request resolved: #133489 Approved by: https://github.com/albanD

github-actions bot deleted the gh/mikaylagawarecki/232/head branch September 17, 2024 01:54

Conversation

mikaylagawarecki commented Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130633

✅ No Failures

Uh oh!

Uh oh!

albanD Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Jul 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mikaylagawarecki commented Jul 24, 2024

Uh oh!

mikaylagawarecki commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikaylagawarecki commented Jul 25, 2024

Uh oh!

pytorchmergebot commented Jul 25, 2024

Merge failed

Uh oh!

mikaylagawarecki commented Jul 25, 2024

Uh oh!

pytorchmergebot commented Jul 25, 2024

Merge started

Uh oh!

clee2000 commented Jul 26, 2024

Uh oh!

pytorchmergebot commented Jul 26, 2024

Uh oh!

pytorchmergebot commented Jul 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mikaylagawarecki commented Jul 12, 2024 •

edited

Loading

pytorch-bot bot commented Jul 12, 2024 •

edited

Loading

mikaylagawarecki Jul 15, 2024 •

edited

Loading

mikaylagawarecki Jul 16, 2024 •

edited

Loading

mikaylagawarecki commented Jul 25, 2024 •

edited

Loading