Improve device info with new flops and bandwidth formula based on hardware libraries#162245
Improve device info with new flops and bandwidth formula based on hardware libraries#162245exclamaforte wants to merge 7 commits intomainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162245
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 16 Unrelated FailuresAs of commit ae6712f with merge base be8095b ( NEW FAILURE - The following job has failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
63f7d5c to
01b8dd3
Compare
6e5f1b3 to
0178760
Compare
358d218 to
d90e00d
Compare
v0i0
left a comment
There was a problem hiding this comment.
looks great, left some nits / questions
818777d to
62b819a
Compare
eellison
left a comment
There was a problem hiding this comment.
deferring to @v0i0 and @shunting314 but great to see this change!
1f4b603 to
7797807
Compare
|
@pytorchbot merge |
Merge failedReason: 1 jobs have failed, first few of them are: inductor / unit-test / inductor-test / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…dware libraries (pytorch#162245) Previously, DeviceInfo provided theoretical hardware information based on a hardcoded list manually created from various datasheets. This update: - Attempting to gather the information from a hardware library like `pynvml`, improving accuracy and expanding support to devices that don't have entries in the datasheet list. - Adjusts flops and bw calculation based on these hardware values. For example, if the the memory or SMs are underclocked, it adjusts the theoretical max flops/bw accordingly. Pull Request resolved: pytorch#162245 Approved by: https://github.com/v0i0, https://github.com/shunting314
|
@pytorchbot revert -m="Diff reverted internally" -c="ghfirst" This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).) |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@exclamaforte your PR has been successfully reverted. |
…d on hardware libraries (#162245)" This reverts commit 35d7b32. Reverted #162245 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](#162245 (comment)))
…dware libraries (pytorch#162245) Previously, DeviceInfo provided theoretical hardware information based on a hardcoded list manually created from various datasheets. This update: - Attempting to gather the information from a hardware library like `pynvml`, improving accuracy and expanding support to devices that don't have entries in the datasheet list. - Adjusts flops and bw calculation based on these hardware values. For example, if the the memory or SMs are underclocked, it adjusts the theoretical max flops/bw accordingly. Pull Request resolved: pytorch#162245 Approved by: https://github.com/v0i0, https://github.com/shunting314
…d on hardware libraries (pytorch#162245)" This reverts commit 35d7b32. Reverted pytorch#162245 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](pytorch#162245 (comment)))
…dware libraries (pytorch#162245) Previously, DeviceInfo provided theoretical hardware information based on a hardcoded list manually created from various datasheets. This update: - Attempting to gather the information from a hardware library like `pynvml`, improving accuracy and expanding support to devices that don't have entries in the datasheet list. - Adjusts flops and bw calculation based on these hardware values. For example, if the the memory or SMs are underclocked, it adjusts the theoretical max flops/bw accordingly. Pull Request resolved: pytorch#162245 Approved by: https://github.com/v0i0, https://github.com/shunting314
…d on hardware libraries (pytorch#162245)" This reverts commit 35d7b32. Reverted pytorch#162245 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](pytorch#162245 (comment)))
…dware libraries (pytorch#162245) Previously, DeviceInfo provided theoretical hardware information based on a hardcoded list manually created from various datasheets. This update: - Attempting to gather the information from a hardware library like `pynvml`, improving accuracy and expanding support to devices that don't have entries in the datasheet list. - Adjusts flops and bw calculation based on these hardware values. For example, if the the memory or SMs are underclocked, it adjusts the theoretical max flops/bw accordingly. Pull Request resolved: pytorch#162245 Approved by: https://github.com/v0i0, https://github.com/shunting314
…d on hardware libraries (pytorch#162245)" This reverts commit 35d7b32. Reverted pytorch#162245 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](pytorch#162245 (comment)))
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Previously, DeviceInfo provided theoretical hardware information based on a hardcoded list manually created from various datasheets.
This update:
pynvml, improving accuracy and expanding support to devices that don't have entries in the datasheet list.cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @chenyang78