Skip to content

Improve device info with new flops and bandwidth formula based on hardware libraries#162245

Closed
exclamaforte wants to merge 7 commits intomainfrom
exclamaforte/profile-diff-algo
Closed

Improve device info with new flops and bandwidth formula based on hardware libraries#162245
exclamaforte wants to merge 7 commits intomainfrom
exclamaforte/profile-diff-algo

Conversation

@exclamaforte
Copy link
Contributor

@exclamaforte exclamaforte commented Sep 5, 2025

Previously, DeviceInfo provided theoretical hardware information based on a hardcoded list manually created from various datasheets.

This update:

  • Attempting to gather the information from a hardware library like pynvml, improving accuracy and expanding support to devices that don't have entries in the datasheet list.
  • Adjusts flops and bw calculation based on these hardware values. For example, if the the memory or SMs are underclocked, it adjusts the theoretical max flops/bw accordingly.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @chenyang78

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162245

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 16 Unrelated Failures

As of commit ae6712f with merge base be8095b (image):

NEW FAILURE - The following job has failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@exclamaforte exclamaforte force-pushed the exclamaforte/profile-diff-algo branch from 63f7d5c to 01b8dd3 Compare September 5, 2025 09:00
@exclamaforte exclamaforte changed the title Update device info with new flops formula Update device info with new flops formula based on hardware libraries Sep 5, 2025
@exclamaforte exclamaforte added the topic: not user facing topic category label Sep 5, 2025
@exclamaforte exclamaforte force-pushed the exclamaforte/profile-diff-algo branch 2 times, most recently from 6e5f1b3 to 0178760 Compare September 6, 2025 02:39
@exclamaforte exclamaforte added the ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 label Sep 6, 2025
@exclamaforte exclamaforte changed the title Update device info with new flops formula based on hardware libraries Improve device info with new flops and bandwidth formula based on hardware libraries Sep 6, 2025
@exclamaforte exclamaforte force-pushed the exclamaforte/profile-diff-algo branch 3 times, most recently from 358d218 to d90e00d Compare September 7, 2025 21:04
Copy link
Contributor

@v0i0 v0i0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, left some nits / questions

@exclamaforte exclamaforte force-pushed the exclamaforte/profile-diff-algo branch from 818777d to 62b819a Compare September 8, 2025 23:31
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deferring to @v0i0 and @shunting314 but great to see this change!

@exclamaforte exclamaforte force-pushed the exclamaforte/profile-diff-algo branch from 1f4b603 to 7797807 Compare September 10, 2025 10:01
@exclamaforte
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / unit-test / inductor-test / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team Raised by workflow job

@exclamaforte
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…dware libraries (pytorch#162245)

Previously, DeviceInfo provided theoretical hardware information based on a hardcoded list manually created from various datasheets.

This update:
- Attempting to gather the information from a hardware library like `pynvml`, improving accuracy and expanding support to devices that don't have entries in the datasheet list.
- Adjusts flops and bw calculation based on these hardware values. For example, if the the memory or SMs are underclocked, it adjusts the theoretical max flops/bw accordingly.

Pull Request resolved: pytorch#162245
Approved by: https://github.com/v0i0, https://github.com/shunting314
@facebook-github-bot
Copy link
Contributor

@pytorchbot revert -m="Diff reverted internally" -c="ghfirst"

This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).)

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@exclamaforte your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Sep 19, 2025
…d on hardware libraries (#162245)"

This reverts commit 35d7b32.

Reverted #162245 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](#162245 (comment)))
@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Sep 19, 2025
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…dware libraries (pytorch#162245)

Previously, DeviceInfo provided theoretical hardware information based on a hardcoded list manually created from various datasheets.

This update:
- Attempting to gather the information from a hardware library like `pynvml`, improving accuracy and expanding support to devices that don't have entries in the datasheet list.
- Adjusts flops and bw calculation based on these hardware values. For example, if the the memory or SMs are underclocked, it adjusts the theoretical max flops/bw accordingly.

Pull Request resolved: pytorch#162245
Approved by: https://github.com/v0i0, https://github.com/shunting314
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…d on hardware libraries (pytorch#162245)"

This reverts commit 35d7b32.

Reverted pytorch#162245 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](pytorch#162245 (comment)))
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…dware libraries (pytorch#162245)

Previously, DeviceInfo provided theoretical hardware information based on a hardcoded list manually created from various datasheets.

This update:
- Attempting to gather the information from a hardware library like `pynvml`, improving accuracy and expanding support to devices that don't have entries in the datasheet list.
- Adjusts flops and bw calculation based on these hardware values. For example, if the the memory or SMs are underclocked, it adjusts the theoretical max flops/bw accordingly.

Pull Request resolved: pytorch#162245
Approved by: https://github.com/v0i0, https://github.com/shunting314
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…d on hardware libraries (pytorch#162245)"

This reverts commit 35d7b32.

Reverted pytorch#162245 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](pytorch#162245 (comment)))
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…dware libraries (pytorch#162245)

Previously, DeviceInfo provided theoretical hardware information based on a hardcoded list manually created from various datasheets.

This update:
- Attempting to gather the information from a hardware library like `pynvml`, improving accuracy and expanding support to devices that don't have entries in the datasheet list.
- Adjusts flops and bw calculation based on these hardware values. For example, if the the memory or SMs are underclocked, it adjusts the theoretical max flops/bw accordingly.

Pull Request resolved: pytorch#162245
Approved by: https://github.com/v0i0, https://github.com/shunting314
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…d on hardware libraries (pytorch#162245)"

This reverts commit 35d7b32.

Reverted pytorch#162245 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](pytorch#162245 (comment)))
@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Dec 3, 2025
@github-actions github-actions bot closed this Jan 2, 2026
@github-actions github-actions bot deleted the exclamaforte/profile-diff-algo branch February 1, 2026 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor open source Reverted Stale topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants