-
-
Notifications
You must be signed in to change notification settings - Fork 6k
fix/strix halo and windows AMD ROCm support #5301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
danielhanchen
merged 184 commits into
unslothai:main
from
LeoBorcherding:fix/rocm-strix-halo-unified-memory
May 30, 2026
Merged
Changes from 1 commit
Commits
Show all changes
184 commits
Select commit
Hold shift + click to select a range
0539621
fix(studio): set HIP_VISIBLE_DEVICES in apply_gpu_ids for ROCm traini…
LeoBorcherding 14fccde
test: tighten apply_gpu_ids ROCm fallback assertions
LeoBorcherding e87c90f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 74c871d
fix: detect ROCm unified memory (Strix Halo / AMD iGPU) via torch fal…
LeoBorcherding 0d58e42
Apply unified-memory reconciliation in get_gpu_utilization too
danielhanchen cb0edfc
Use 'is not None' and log debug on torch.version.hip probe failures
danielhanchen 9a83a74
fix(studio): honour HIP_VISIBLE_DEVICES in _get_parent_visible_gpu_sp…
LeoBorcherding 8332bb7
Merge fix/5180-hip-visible-devices-worker into fix/rocm-strix-halo-un…
LeoBorcherding 22a0d6b
Merge remote-tracking branch 'origin/fix/rocm-strix-halo-unified-memo…
LeoBorcherding 4e7a083
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding 9bcd0ed
fix(install): harden AMD ROCm GPU detection for multi-GPU and env-fil…
LeoBorcherding 3241eb3
Fix KFD sysfs awk fallback to read properties file
danielhanchen d2da8ce
fix(setup.ps1): detect AMD ROCm GPU on Windows, bring to parity with …
LeoBorcherding f84a723
fix(install.ps1): detect AMD ROCm GPU on Windows, bring to parity wit…
LeoBorcherding 5d5ae56
fix(install.ps1): suppress 'No NVIDIA GPU detected' when AMD GPU is p…
LeoBorcherding 270b2dd
feat: add Windows AMD ROCm PyTorch wheel installation
LeoBorcherding 14f6559
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8299842
fix: also install torchvision and torchaudio from AMD Windows repo
LeoBorcherding ec40e9f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 7e29435
feat: add ROCm 7.1.1 Windows wheel mapping
LeoBorcherding e53b543
fix: install rocm_sdk_core and rocm_sdk_libraries_custom alongside torch
LeoBorcherding a74cb8b
fix: expand ROCm wheel array to scalars for Invoke-InstallCommand
LeoBorcherding 79670ab
fix: use --no-deps for AMD Windows torch wheel install
LeoBorcherding b550948
fix: setup.ps1 and install_python_stack.py now install ROCm torch on …
LeoBorcherding 6f79213
fix: suppress manual-install warning when ROCm torch already present;…
LeoBorcherding b9c6882
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 67d8b74
feat: add rocm step display in setup.ps1; fix warning and progress co…
LeoBorcherding f036ee0
fix: detect AMD SDK ROCm torch via __version__ when torch.version.hip…
LeoBorcherding 7d3de8b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 77f7ade
perf: drop --no-cache-dir from AMD ROCm torch wheel installs
LeoBorcherding 9439657
fix: use install-state flag instead of subprocess probe for AMD Windo…
LeoBorcherding 4b2f7fb
fix: hoist global declaration to top of _ensure_rocm_torch
LeoBorcherding 7fbdce1
fix: pass AMD torch install status via env var to suppress false warning
LeoBorcherding f09a424
fix: register ROCm DLL directory before torch import on Windows
LeoBorcherding 05f5cda
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 0facfbc
fix: remove hardcoded non-standard ROCm paths from DLL directory scan
LeoBorcherding cc77737
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] efcaccb
fix: prevent torchao overrides step from overwriting AMD ROCm torch
LeoBorcherding 301d6c0
fix: add rocm_sdk namespace tarball to Windows ROCm wheel installs
LeoBorcherding 6fe91e7
feat: enable ROCm 7.2 torch install + warn on gfx1151 with ROCm < 7.2
LeoBorcherding 1680dac
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 550317d
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding 5deb230
fix: prefer Python 3.12 for AMD ROCm users when 3.13 is also installed
LeoBorcherding bafb3f5
fix: also check uv-managed Python 3.12 for AMD ROCm #5301
LeoBorcherding 2de2c29
fix: hide amd-smi console popups on Windows, guard torch.distributed.…
LeoBorcherding ba6b279
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 1707169
fix: suppress remaining console popups on Windows, patch torch.distri…
LeoBorcherding a704722
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 739e5d5
fix: stub all missing torch.distributed attrs for ROCm Windows wheel …
LeoBorcherding 18690c6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] bfac7c1
fix: inject torch.distributed stub when C backend missing in ROCm Win…
LeoBorcherding a288ff2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 4d401e7
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding fe5546d
fix(rocm/windows): pre-stub torch._C._distributed_c10d + raise amd-sm…
LeoBorcherding 85841b5
fix(rocm): guard c10d stub, fix TorchIndexFamily for 7.1, clean dead …
LeoBorcherding f892b65
fix(tests): match windows AMD warning assertion to actual source string
LeoBorcherding 42c5d98
chore: trim verbose comment blocks across all ROCm-related files
LeoBorcherding 265a09a
fix: guard reconcile call against None numeric_ids; add torchvision l…
LeoBorcherding 643a797
fix(install.ps1): recreate venv with Python 3.12 after ROCm switch
LeoBorcherding 7b38f0a
ux: detect AMD GPU before Python selection to avoid double venv creation
LeoBorcherding 1ba91d7
fix(rocm/win): auto-stub all _distributed_c10d symbols via PEP-562 __…
LeoBorcherding fac005b
chore: trim c10d stub comment
LeoBorcherding a3d9bac
fix(rocm/win): auto-stub missing torch.distributed attrs (Store, Proc…
LeoBorcherding 73ae40c
fix(rocm/win): pre-stub fsdp submodules in sys.modules; fix __getattr…
LeoBorcherding ea510b5
feat(rocm/win): arch-aware wheel selector always picks newest ROCm re…
LeoBorcherding 4d09cbb
fix(rocm/win): stub class metaclass for ProcessGroup.BackendType; amd…
LeoBorcherding e64c196
fix: stub __members__ so torchao float8 enum check doesn't crash on R…
LeoBorcherding 26f073d
fix: stub distributed tensor/functional_collectives to prevent missin…
LeoBorcherding b073201
fix: give mod stubs __path__ and pre-stub _tensor to fix 'not a packa…
LeoBorcherding ce9098a
fix: stub torch.ops._c10d_functional namespace with hashable op senti…
LeoBorcherding e778e0e
fix: stub entire torchao package on ROCm Windows instead of individua…
LeoBorcherding 3e57133
fix: set __spec__ on mod stubs so importlib.util.find_spec doesn't raise
LeoBorcherding cf10215
fix: add meta path finder to auto-stub subpackages of stub modules
LeoBorcherding 3264319
fix: use _unsloth_stub sentinel instead of loader=None for stub detec…
LeoBorcherding d731c5f
refactor(rocm/win): switch to repo.amd.com arch-aware index, remove s…
LeoBorcherding 9c9d462
fix(rocm/win): restore _distributed_c10d + torchao stubs; fix BNB ins…
LeoBorcherding 48406ad
worker: remove _distributed_c10d stub; stub only torchao
LeoBorcherding f5278de
fix: BNB AMD wheel skipped + torch.compile segfault on Windows ROCm
LeoBorcherding a4483df
fix: BNB AMD wheel install fails uv wheel filename check
LeoBorcherding a3c94e7
worker: patch _grouped_mm CUDA dispatch on Windows ROCm (gfx1200 null…
LeoBorcherding a87077e
worker: fix torchao stub — return stub classes not modules for isinst…
LeoBorcherding 769790e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 1d30fb5
Merge branch 'main' into fix/rocm-strix-halo-unified-memory
Imagineer99 d91fced
tests: add coverage for Windows ROCm install paths and worker patches
LeoBorcherding 324f56c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 4d41efc
tests: fix encoding, IS_WINDOWS patching, and wrong assertion
LeoBorcherding 5b6adbe
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d5c3c7b
fix: pin BNB_ROCM_VERSION=72 for torch==2.11.0+rocm7.13.0 compatibility
LeoBorcherding f95cb20
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] c55aaa5
fix: detect BNB ROCm DLL suffix dynamically instead of hardcoding '72'
LeoBorcherding b313a48
fix: patch torch.distributed stubs in server process for Windows ROCm
LeoBorcherding b33a90e
fix: gate _grouped_mm dispatch patch on HIP < 7.13
LeoBorcherding 75ef599
fix: stub is_torchelastic_launched on torch.distributed for Windows ROCm
LeoBorcherding 1db8e49
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 370debe
fix: explicit warnings on AMD ROCm arch/version fallbacks + Fast-Inst…
LeoBorcherding 7a5e93b
fix: robust gfx arch detection for Strix Halo / HIP-runtime-only inst…
LeoBorcherding ffa16f0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2befb57
fix: resolve hipinfo/hipconfig via HIP_PATH/ROCM_PATH when not on PATH
LeoBorcherding 116fa6e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] ae6d042
feat: print HIP SDK path and full hipconfig version in terminal on AM…
LeoBorcherding bbf004c
fix: Strix rocm7.1 segfault bypass + Ubuntu 24.04 HIP gcc-install-dir
LeoBorcherding f3ac63f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] cc36e4b
fix: BNB_ROCM_VERSION in server process + torch._C._distributed_c10d …
LeoBorcherding f0ec030
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6831c2a
fix(win32): populate distributed c10d stub with dummy symbols
LeoBorcherding 39ae2e8
fix(win32): distinguish HIP SDK installed vs GPU not ROCm-accessible
LeoBorcherding 4e75d42
fix(win32): scope ROCm workarounds to AMD hosts only
LeoBorcherding 84b8456
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding d89d9b1
fix(linux): route Strix + ROCm 7.1 to AMD arch-specific index
LeoBorcherding 692e876
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding 0c2020d
fix(studio/rocm): gate ROCm-only side-effects on active torch runtime
danielhanchen 0b0b8df
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 76137b2
fix(studio/rocm): worker.py parity + don't roll back ROCm torch on bn…
danielhanchen 0be9749
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2177321
fix(studio/rocm): robustness pass - rocm tag normalisation, Strix rou…
danielhanchen 96b9e46
fix(studio/rocm): multi-GPU selection, Strix sibling handling, defens…
danielhanchen 825cbf5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8c30241
fix(studio/rocm): worker BNB/grouped_mm broad gate, install.sh Strix …
danielhanchen e6cc98e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 06f28e4
fix(studio/rocm): code review hardening pass
LeoBorcherding bb37bf4
Merge remote-tracking branch 'upstream/main' into fix/rocm-strix-halo…
LeoBorcherding 47fdc85
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d663d12
fix(studio/training): GPU OOM guard to prevent system freeze on VRAM …
LeoBorcherding 888f91d
Merge branch 'fix/rocm-strix-halo-unified-memory' of github.com:LeoBo…
LeoBorcherding e953b90
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding 536a54d
fix(studio/rocm): OOM guard ROCm-only + unified memory, multi-GPU arc…
LeoBorcherding ec021a0
fix(tests): update ROCm version cap expectations from rocm7.1 to rocm7.2
LeoBorcherding 90f6cd4
fix(tests): correct MLX smoke test losses_per_step assertion
LeoBorcherding 792c3a0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 5d84704
fix(studio/worker): detect unified-memory APU by GPU name not VRAM/RA…
LeoBorcherding 67ab0a6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 3245793
fix(install/setup.ps1): force array on hipinfo gcnArchName parse to f…
LeoBorcherding 89140c5
Merge remote-tracking branch 'upstream/main' into fix/rocm-strix-halo…
LeoBorcherding 9c50ee3
fix(studio/rocm): classify unified-memory APU via VRAM/RAM ratio, not…
LeoBorcherding 9393fff
fix(studio/rocm): revert to gcnArchName for unified-memory APU classi…
LeoBorcherding 86d8ff0
fix(studio/llama-prebuilt): resolve hipinfo via HIP_PATH/ROCM_PATH on…
LeoBorcherding 143f6f3
fix(studio/llama-prebuilt): pass --has-rocm from setup.ps1 to skip re…
LeoBorcherding d0864e8
fix(studio/llama-prebuilt): add HIP asset to simple-policy Windows path
LeoBorcherding a0baf8f
fix(studio/setup.ps1): auto-remove mismatched llama.cpp install kind
LeoBorcherding 2bca6ee
fix(studio/setup.ps1): show live PyTorch install output in verbose mo…
LeoBorcherding c6a90de
fix(rocm/windows): set ROCBLAS_TENSILE_LIBPATH for bundled rocblas.dll
LeoBorcherding 2712a6d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] eeee665
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding 889b33c
fix(install.sh): restore gfx token dedup in Strix multi-GPU awk indexer
LeoBorcherding 0fda1e2
fix(studio/install): correct _TOTAL progress count on Windows
LeoBorcherding 688c508
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 284145a
fix(install.ps1): enforce torch>=2.11.0 for gfx120X and Strix on Windows
LeoBorcherding 8983afd
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding 85bbb03
fix(rocm/windows): address Codex nits - deterministic DLL suffix, CUD…
LeoBorcherding 69b582c
fix(rocm): misleading amd-smi log, BNB spec consistency, torch ceilin…
LeoBorcherding 5c72e64
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 0763a99
fix(rocm): torch floor in setup.ps1, torchvision pin for Strix, rocms…
LeoBorcherding ad9ea00
fix(rocm): warn on OOB HIP_VISIBLE_DEVICES, bail on empty numeric_ids…
LeoBorcherding 94a7a03
fix(rocm): gate StubSubpackageFinder on win32 ROCm, add gcnArchName f…
LeoBorcherding 38acd5b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 80dd40e
fix(rocm): pin torchvision/torchaudio in setup.ps1, remove -Unique fr…
LeoBorcherding 59825be
fix(rocm): add 8060s/8050s to OOM guard device-name fallback, extract…
LeoBorcherding 4ecf797
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 62e18d8
fix(rocm): pass explicit dtype on bf16-unsupported hardware (RDNA2)
LeoBorcherding 3244537
fix: reduce log noise for expected non-issues on Windows ROCm
LeoBorcherding 30eb2d9
Merge remote-tracking branch 'upstream/main' into fix/rocm-strix-halo…
LeoBorcherding f5c2e8a
[AMD] FIx installation of bitsandbytes when it's from .dev and skip r…
Erland366 927a9a6
Merge Erland/studio-amd-installer-fixes-redone: fix bnb ROCm install …
LeoBorcherding 2ec5d00
fix: use force_pip for Windows ROCm bitsandbytes prebuilt wheel install
LeoBorcherding 7be61bb
fix: three small correctness fixes found in PR review
LeoBorcherding c074848
Merge remote-tracking branch 'upstream/main' into fix/rocm-strix-halo…
LeoBorcherding 51d82da
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding bd1162a
fix: stub torchao in export subprocess on Windows ROCm
LeoBorcherding b3a8792
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 3a19990
install.sh, setup.sh: add GPU arch step logging to match PS1 scripts
LeoBorcherding c8c60ab
Fix BNB_ROCM_VERSION gate, ROCm GPU mask preference, APU unified memo…
shimmyshimmer 57165b6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] afb343f
fix: guard recompile_limit + fix AMD VRAM monitor fallback
LeoBorcherding 3c65ef6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] a0a8b02
fix: Windows VRAM monitor via Performance Counter API
LeoBorcherding 82dad78
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8519d41
fix: rename to _rocm_windows_perf_counter_vram_gb, scope to IS_ROCM
LeoBorcherding 0c2f582
fix: AMD VRAM monitor — Linux DRM sysfs + Windows perf counter
LeoBorcherding 32fd3c4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] b8b230c
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding f5a4e3c
fix: AMD GPU monitor — utilization, temperature, and power for Window…
LeoBorcherding da6469c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2711607
fix: remove ADL ctypes — does not support AMD iGPU (Strix Halo)
LeoBorcherding 5e99a47
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
fix: Strix rocm7.1 segfault bypass + Ubuntu 24.04 HIP gcc-install-dir
Issue 1 (install.sh): gfx1151/gfx1150 + ROCm 7.1 causes a segfault in torch._grouped_mm (moe_utils.py:167). The Radeon repo now ships cp313 wheels for rocm-rel-7.1, so _amd_gpu_radeon=true silently lands on the broken combo. When Strix Halo/Point is detected and TORCH_INDEX_URL is rocm7.1, override to rocm7.2 PyTorch index, update TORCH_CONSTRAINT, and set _amd_gpu_radeon=false to bypass the Radeon repo entirely. Emits a clear [WARN] explaining the segfault and linking to the ROCm upgrade docs. Issue 2 (setup.sh): ROCm 7.x ships clang-20 which on Ubuntu 24.04+ picks /usr/lib/gcc/x86_64-linux-gnu/14/ (runtime dir, no C++ headers), causing 'cstdlib file not found' and a failed llama.cpp HIP build. Iterate gcc versions 14→11 to find the first install dir that has both runtime and /usr/include/c++/<ver> headers, then pass --gcc-install-dir to clang via CMAKE_HIP_FLAGS. Fix confirmed by h34v3nzc0dex (llama.cpp 417/417 clean). 11 new tests across TestStrixRocm71Override and TestSetupShGccInstallDir; total 203 passed, 2 skipped
- Loading branch information
commit bbf004c36bd2a1affbcea99cb55000d335a49deb
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fresh Linux installs where
rocminfois unavailable andamd-smi listonly proves a GPU is present but does not include the gfx token, this leaves_strix_gfxempty and skips the Strix ROCm 7.1 override. The script then continues with the generic/Radeonrocm7.1wheels that the surrounding comment says hit the_grouped_mmsegfault ongfx1150/gfx1151; add the sameamd-smi static --asicgfx probe used elsewhere before deciding there is no Strix GPU.Useful? React with 👍 / 👎.