Add wrappers for synchronous GPUDirect Storage APIs#130633
Add wrappers for synchronous GPUDirect Storage APIs#130633mikaylagawarecki wants to merge 28 commits intogh/mikaylagawarecki/232/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130633
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0c18902 with merge base c047bdd ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| _CUDAToolkit_find_and_add_import_lib(cublas_static DEPS culibos) | ||
| endif() | ||
|
|
||
| if(CUDAToolkit_VERSION VERSION_GREATER_EQUAL 11.4) |
There was a problem hiding this comment.
Is this file vendored from somewhere? Did you take that update from there?
There was a problem hiding this comment.
yep from here https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/FindCUDAToolkit.cmake#L1245-1251, the cuFile changes were only in 3.25, but seems like we are on version 3.17
There was a problem hiding this comment.
Ok!
Did you update the full thing to 3.25? Or just picked the subset of the changes you needed?
@malfet what do you think is the best way to do this? A full update to a given version to make sure we're not in a weird in-between. Or just what we need to reduce churn?
There was a problem hiding this comment.
Just copy pasted the subset I needed, fwiw I am not sure that the initial file was copy pasted directly as I see this at the top
but around the date where the PR that added this file was merged I don't see a similar comment https://gitlab.kitware.com/cmake/cmake/-/blob/21b102c77d85897c2500488180e58de077447b4c/Modules/FindCUDAToolkit.cmake
not sure whether there were any changes made/which commit to CMake it was taken from
|
@mikaylagawarecki has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Based in part on NVIDIA/apex#1774 Pull Request resolved: pytorch#130633 Approved by: https://github.com/albanD
…130633)" This reverts commit 5b5e069. Reverted pytorch#130633 on behalf of https://github.com/clee2000 due to breaking a lot of jobs and build rules internally D60085885, possibly needs to update some bazel build? ([comment](pytorch#130633 (comment)))
|
Build failures present on D60085885 do not exist on the imported D60155434 (also verified by running some of the builds locally on the diff and they succeeded), the service_lab signal that is failing previously succeeded so is flaky. Going to rebase and merge |
|
@pytorchbot merge |
Merge failedReason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR! Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m "still failing internally D60265673" -c ghfirst |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@mikaylagawarecki your PR has been successfully reverted. |
This reverts commit 709ddf7. Reverted #130633 on behalf of https://github.com/clee2000 due to still failing internally D60265673 ([comment](#130633 (comment)))
Reland #130633 USE_CUFILE turned off by default in this version Pull Request resolved: #133489 Approved by: https://github.com/albanD
…#133489) Reland pytorch#130633 USE_CUFILE turned off by default in this version Pull Request resolved: pytorch#133489 Approved by: https://github.com/albanD
Based in part on NVIDIA/apex#1774
Stack from ghstack (oldest at bottom):
Differential Revision: D60155434