Add Intel GPU info collection to the collect env script#137846
Add Intel GPU info collection to the collect env script#137846jingxu10 wants to merge 11 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137846
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Cancelled Job, 9 Unrelated FailuresAs of commit d20562a with merge base 0db3e0c ( CANCELLED JOB - The following job was cancelled. Please retry:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "topic: not user facing" |
0f6eaa9 to
f1cf6ec
Compare
| ret = run_and_read_all( | ||
| run_lambda, | ||
| 'powershell.exe "gwmi -Class Win32_OperatingSystem | Select-Object -Property Caption,\ | ||
| OSArchitecture,Version | ConvertTo-Json"', | ||
| ) | ||
| try: | ||
| obj = json.loads(ret) | ||
| ret = f'{obj["Caption"]} ({obj["Version"]} {obj["OSArchitecture"]})' | ||
| except ValueError as e: |
There was a problem hiding this comment.
Can you please land this one as separate PR
torch/utils/collect_env.py
Outdated
|
|
||
|
|
||
| def get_gpu_info(run_lambda): | ||
| def get_nvidia_gpu_info(run_lambda): |
There was a problem hiding this comment.
Please don't delete any existing functions, as this might be considered a public API change
There was a problem hiding this comment.
keep the original function with its behavior above.
There was a problem hiding this comment.
I think new name is misleading, as it also takes care of fetching AMD GPU info name. What's wrong with keeping the original name, but adding logic for fetching XPU name at the end of ifdef chain?
There was a problem hiding this comment.
The implementation of get_gpu_info seems working for CUDA only. We need to use a different method to retrieve XPU driver and info. Implemented in a separate function.
torch/utils/collect_env.py
Outdated
| if mgr_name == "": | ||
| rc, _, _ = run("which dpkg") | ||
| if rc == 0: | ||
| mgr_name = "dpkg" | ||
| if mgr_name == "": | ||
| rc, _, _ = run("which dnf") | ||
| if rc == 0: | ||
| mgr_name = "dnf" | ||
| if mgr_name == "": | ||
| rc, _, _ = run("which yum") | ||
| if rc == 0: | ||
| mgr_name = "yum" | ||
| if mgr_name == "": | ||
| rc, _, _ = run("which zypper") | ||
| if rc == 0: | ||
| mgr_name = "zypper" |
There was a problem hiding this comment.
Please Avoid code duplication, use loops
| if mgr_name == "": | |
| rc, _, _ = run("which dpkg") | |
| if rc == 0: | |
| mgr_name = "dpkg" | |
| if mgr_name == "": | |
| rc, _, _ = run("which dnf") | |
| if rc == 0: | |
| mgr_name = "dnf" | |
| if mgr_name == "": | |
| rc, _, _ = run("which yum") | |
| if rc == 0: | |
| mgr_name = "yum" | |
| if mgr_name == "": | |
| rc, _, _ = run("which zypper") | |
| if rc == 0: | |
| mgr_name = "zypper" | |
| for mgr_name in ["dpkg", "dnf", "yum", "zypper", ""]: | |
| if mgr_name == "": | |
| continue | |
| rc, _, _ = run(f"which {mgr_name}") | |
| if rc == 0: | |
| break |
77fb09d to
dc8231a
Compare
torch/utils/collect_env.py
Outdated
| return smi | ||
|
|
||
|
|
||
| def get_pkg_version(run_lambda, pkg): |
There was a problem hiding this comment.
The package version could be simple use by sycl-ls after sourcing oneAPI, that would have output like below, maybe we could use this for simplicity?
+[opencl:gpu][opencl:3] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1550 OpenCL 3.0 NEO [24.22.xxxxx.yy]
+[opencl:gpu][opencl:4] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1550 OpenCL 3.0 NEO [24.22.xxxxx.yy]
+[level_zero:gpu][level_zero:0] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.xxxxx]
+[level_zero:gpu][level_zero:1] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.xxxxx]
There was a problem hiding this comment.
sycl-ls is a separate command that requires users to install separately. Not planning to request users to install more packages.
|
I think it is enough for us to know these items below:
What do you think? |
dc8231a to
a3b1a27
Compare
updated to include these info. |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -c nosignal -m "This is breaking tests on xpu, detail log: https://hud.pytorch.org/pr/pytorch/pytorch/154962#43700962849" |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@jingxu10 your PR has been successfully reverted. |
)" This reverts commit c6b4f98. Reverted #137846 on behalf of https://github.com/etaf due to This is breaking tests on xpu, detail log: https://hud.pytorch.org/pr/pytorch/pytorch/154962#43700962849 ([comment](#137846 (comment)))
|
@pytorchbot merge -f "Lint is green" |
| cmd: str = str(grep_version[pkg_mgr]["command"]) | ||
| cmd = cmd.format(pkg_name) | ||
| ret = run_and_read_all(run_lambda, cmd) | ||
| if ret is None or ret == "": |
There was a problem hiding this comment.
| if ret is None or ret == "": | |
| if not ret: |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m "Just testing if it will fix PR time benchmarks signal" -c weird |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@jingxu10 your PR has been successfully reverted. |
|
@pytorchbot merge -f "pr_time_benchmarks failures are unrelated to this PR, see https://hud.pytorch.org/hud/pytorch/pytorch/59eb61b2d1e4b64debbefa036acd0d8c7d55f0a3/1?per_page=50&name_filter=pr_time_benchmarks&mergeEphemeralLF=true" |
|
Can't merge closed PR #137846 |
|
@jingxu10 Please help reland this PR. |

As title, add Intel GPU info collection to the collect env script
Output examples:
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @yf225 @ColinPeppler @desertfire