[MAGMA][CUDA] eigh: deprecate MAGMA and dispatch to cuSolver unconditionally by gderossi · Pull Request #174619 · pytorch/pytorch

gderossi · 2026-02-09T21:26:05Z

Both cuSolver and hipSolver support syevd/syevj, so just removed MAGMA path entirely and updated relevant tests to skip if missing cuSolver instead of missing MAGMA. Benchmark script and results are included below, though results only show sizes 512+ because MAGMA just calls LAPACK on sizes up to 128.

Benchmarking script:

import torch
import torch.utils.benchmark as benchmark

from itertools import product

results = []

batches = [(), (16,), (64,)]
sizes = [16, 128, 512,  2048]
dtypes = [torch.float32, torch.float64, torch.complex64, torch.complex128]

for b, n, dtype in product(batches, sizes, dtypes):
    shape = b + (n, n)
    print(f"Testing shape={shape}, dtype={dtype}")
    label = "torch.linalg.eigh"
    sub_label = f"{shape}, {dtype}"
    X = torch.rand(*shape, dtype=dtype, device="cuda")
    X = X + X.mT.conj()
    stmt = "torch.linalg.eigh(X)"
    for backend in ("magma", "cusolver"):
        torch.backends.cuda.preferred_linalg_library(backend)
        # warm-up
        for _ in range(5):
            exec(stmt)

        results.append(benchmark.Timer(
            stmt=stmt,
            globals={'X': X},
            label=label,
            sub_label=sub_label,
            description=backend,
        ).blocked_autorange(min_run_time=1))

compare = benchmark.Compare(results)
compare.print()

Benchmark results on RTX Pro 6000:

[------------------------ torch.linalg.eigh -------------------------]
                                          |    magma     |   cusolver  | speedup
1 threads: ----------------------------------------------------------- |
      (512, 512), torch.float32           |     12605.6  |     11742.1 | 1.1
      (512, 512), torch.float64           |     17244.3  |     10558.8 | 1.6
      (512, 512), torch.complex64         |     18868.0  |      3612.1 | 5.2
      (512, 512), torch.complex128        |     28479.8  |     16659.5 | 1.7
      (2048, 2048), torch.float32         |    226035.4  |     19598.1 | 11.5
      (2048, 2048), torch.float64         |    451455.1  |     68374.8 | 6.6
      (2048, 2048), torch.complex64       |    535989.6  |     23807.6 | 22.5
      (2048, 2048), torch.complex128      |   1111481.8  |    164294.9 | 6.8
      (16, 512, 512), torch.float32       |    210144.0  |    187468.1 | 1.1
      (16, 512, 512), torch.float64       |    281164.8  |    167509.6 | 1.7
      (16, 512, 512), torch.complex64     |    307684.5  |     57805.7 | 5.3
      (16, 512, 512), torch.complex128    |    468624.1  |    265833.6 | 1.8
      (16, 2048, 2048), torch.float32     |   3650952.0  |    315576.2 | 11.6
      (16, 2048, 2048), torch.float64     |   7147413.6  |   1096273.9 | 6.5
      (16, 2048, 2048), torch.complex64   |   8579275.9  |    384409.0 | 22.3
      (16, 2048, 2048), torch.complex128  |  17937525.7  |   2639580.7 | 6.8
      (64, 512, 512), torch.float32       |    835108.8  |    716855.2 | 1.2
      (64, 512, 512), torch.float64       |   1145713.3  |    672703.7 | 1.7
      (64, 512, 512), torch.complex64     |   1289962.5  |    233632.8 | 5.5
      (64, 512, 512), torch.complex128    |   1863496.5  |   1067678.9 | 1.7
      (64, 2048, 2048), torch.float32     |  14329632.9  |   1257138.1 | 11.4
      (64, 2048, 2048), torch.float64     |  27999996.1  |   4381371.4 | 6.4
      (64, 2048, 2048), torch.complex64   |  32749115.0  |   1528567.4 | 21.4
      (64, 2048, 2048), torch.complex128  |  70825685.0  |  10548410.4 | 6.7

Times are in microseconds (us).

cc @nikitaved @eqy

pytorch-bot · 2026-02-09T21:26:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174619

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b0c401f with merge base c68a888 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eqy · 2026-02-09T21:27:49Z

@pytorchmergebot label ciflow/trunk ciflow/h100 ciflow/b200

nikitaved · 2026-02-10T15:39:49Z

Seems like #174674 could be a very nice follow-up :)

nikitaved

LGTM! Thank you very much!

Skylion007 · 2026-02-14T17:37:33Z

@pytorchbot rebase

pytorchmergebot · 2026-02-14T17:39:16Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2026-02-14T17:39:19Z

Successfully rebased deprecate-magma-eigh onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout deprecate-magma-eigh && git pull --rebase)

eqy · 2026-02-18T21:29:26Z

@pytorchmergebot label ciflow/trunk ciflow/h100 ciflow/b200 ciflow/rocm-mi300

eqy · 2026-02-18T21:29:46Z

@pytorchmergebot merge

pytorchmergebot · 2026-02-18T21:31:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ionally (pytorch#174619) Both cuSolver and hipSolver support syevd/syevj, so just removed MAGMA path entirely and updated relevant tests to skip if missing cuSolver instead of missing MAGMA. Benchmark script and results are included below, though results only show sizes 512+ because MAGMA just calls LAPACK on sizes up to 128. Benchmarking script: ```python import torch import torch.utils.benchmark as benchmark from itertools import product results = [] batches = [(), (16,), (64,)] sizes = [16, 128, 512, 2048] dtypes = [torch.float32, torch.float64, torch.complex64, torch.complex128] for b, n, dtype in product(batches, sizes, dtypes): shape = b + (n, n) print(f"Testing shape={shape}, dtype={dtype}") label = "torch.linalg.eigh" sub_label = f"{shape}, {dtype}" X = torch.rand(*shape, dtype=dtype, device="cuda") X = X + X.mT.conj() stmt = "torch.linalg.eigh(X)" for backend in ("magma", "cusolver"): torch.backends.cuda.preferred_linalg_library(backend) # warm-up for _ in range(5): exec(stmt) results.append(benchmark.Timer( stmt=stmt, globals={'X': X}, label=label, sub_label=sub_label, description=backend, ).blocked_autorange(min_run_time=1)) compare = benchmark.Compare(results) compare.print() ``` Benchmark results on RTX Pro 6000: ``` [------------------------ torch.linalg.eigh -------------------------] | magma | cusolver | speedup 1 threads: ----------------------------------------------------------- | (512, 512), torch.float32 | 12605.6 | 11742.1 | 1.1 (512, 512), torch.float64 | 17244.3 | 10558.8 | 1.6 (512, 512), torch.complex64 | 18868.0 | 3612.1 | 5.2 (512, 512), torch.complex128 | 28479.8 | 16659.5 | 1.7 (2048, 2048), torch.float32 | 226035.4 | 19598.1 | 11.5 (2048, 2048), torch.float64 | 451455.1 | 68374.8 | 6.6 (2048, 2048), torch.complex64 | 535989.6 | 23807.6 | 22.5 (2048, 2048), torch.complex128 | 1111481.8 | 164294.9 | 6.8 (16, 512, 512), torch.float32 | 210144.0 | 187468.1 | 1.1 (16, 512, 512), torch.float64 | 281164.8 | 167509.6 | 1.7 (16, 512, 512), torch.complex64 | 307684.5 | 57805.7 | 5.3 (16, 512, 512), torch.complex128 | 468624.1 | 265833.6 | 1.8 (16, 2048, 2048), torch.float32 | 3650952.0 | 315576.2 | 11.6 (16, 2048, 2048), torch.float64 | 7147413.6 | 1096273.9 | 6.5 (16, 2048, 2048), torch.complex64 | 8579275.9 | 384409.0 | 22.3 (16, 2048, 2048), torch.complex128 | 17937525.7 | 2639580.7 | 6.8 (64, 512, 512), torch.float32 | 835108.8 | 716855.2 | 1.2 (64, 512, 512), torch.float64 | 1145713.3 | 672703.7 | 1.7 (64, 512, 512), torch.complex64 | 1289962.5 | 233632.8 | 5.5 (64, 512, 512), torch.complex128 | 1863496.5 | 1067678.9 | 1.7 (64, 2048, 2048), torch.float32 | 14329632.9 | 1257138.1 | 11.4 (64, 2048, 2048), torch.float64 | 27999996.1 | 4381371.4 | 6.4 (64, 2048, 2048), torch.complex64 | 32749115.0 | 1528567.4 | 21.4 (64, 2048, 2048), torch.complex128 | 70825685.0 | 10548410.4 | 6.7 Times are in microseconds (us). ``` Pull Request resolved: pytorch#174619 Approved by: https://github.com/eqy, https://github.com/nikitaved, https://github.com/Skylion007

pytorchmergebot · 2026-02-19T00:11:33Z

This PR (#174619) was merged in b34a03e but it is still open, likely due to a Github bug, so mergebot is closing it manually. If you think this is a mistake, please feel free to reopen and contact Dev Infra.

…ionally (#174619) Both cuSolver and hipSolver support syevd/syevj, so just removed MAGMA path entirely and updated relevant tests to skip if missing cuSolver instead of missing MAGMA. Benchmark script and results are included below, though results only show sizes 512+ because MAGMA just calls LAPACK on sizes up to 128. Benchmarking script: ```python import torch import torch.utils.benchmark as benchmark from itertools import product results = [] batches = [(), (16,), (64,)] sizes = [16, 128, 512, 2048] dtypes = [torch.float32, torch.float64, torch.complex64, torch.complex128] for b, n, dtype in product(batches, sizes, dtypes): shape = b + (n, n) print(f"Testing shape={shape}, dtype={dtype}") label = "torch.linalg.eigh" sub_label = f"{shape}, {dtype}" X = torch.rand(*shape, dtype=dtype, device="cuda") X = X + X.mT.conj() stmt = "torch.linalg.eigh(X)" for backend in ("magma", "cusolver"): torch.backends.cuda.preferred_linalg_library(backend) # warm-up for _ in range(5): exec(stmt) results.append(benchmark.Timer( stmt=stmt, globals={'X': X}, label=label, sub_label=sub_label, description=backend, ).blocked_autorange(min_run_time=1)) compare = benchmark.Compare(results) compare.print() ``` Benchmark results on RTX Pro 6000: ``` [------------------------ torch.linalg.eigh -------------------------] | magma | cusolver | speedup 1 threads: ----------------------------------------------------------- | (512, 512), torch.float32 | 12605.6 | 11742.1 | 1.1 (512, 512), torch.float64 | 17244.3 | 10558.8 | 1.6 (512, 512), torch.complex64 | 18868.0 | 3612.1 | 5.2 (512, 512), torch.complex128 | 28479.8 | 16659.5 | 1.7 (2048, 2048), torch.float32 | 226035.4 | 19598.1 | 11.5 (2048, 2048), torch.float64 | 451455.1 | 68374.8 | 6.6 (2048, 2048), torch.complex64 | 535989.6 | 23807.6 | 22.5 (2048, 2048), torch.complex128 | 1111481.8 | 164294.9 | 6.8 (16, 512, 512), torch.float32 | 210144.0 | 187468.1 | 1.1 (16, 512, 512), torch.float64 | 281164.8 | 167509.6 | 1.7 (16, 512, 512), torch.complex64 | 307684.5 | 57805.7 | 5.3 (16, 512, 512), torch.complex128 | 468624.1 | 265833.6 | 1.8 (16, 2048, 2048), torch.float32 | 3650952.0 | 315576.2 | 11.6 (16, 2048, 2048), torch.float64 | 7147413.6 | 1096273.9 | 6.5 (16, 2048, 2048), torch.complex64 | 8579275.9 | 384409.0 | 22.3 (16, 2048, 2048), torch.complex128 | 17937525.7 | 2639580.7 | 6.8 (64, 512, 512), torch.float32 | 835108.8 | 716855.2 | 1.2 (64, 512, 512), torch.float64 | 1145713.3 | 672703.7 | 1.7 (64, 512, 512), torch.complex64 | 1289962.5 | 233632.8 | 5.5 (64, 512, 512), torch.complex128 | 1863496.5 | 1067678.9 | 1.7 (64, 2048, 2048), torch.float32 | 14329632.9 | 1257138.1 | 11.4 (64, 2048, 2048), torch.float64 | 27999996.1 | 4381371.4 | 6.4 (64, 2048, 2048), torch.complex64 | 32749115.0 | 1528567.4 | 21.4 (64, 2048, 2048), torch.complex128 | 70825685.0 | 10548410.4 | 6.7 Times are in microseconds (us). ``` Pull Request resolved: #174619 Approved by: https://github.com/eqy, https://github.com/nikitaved, https://github.com/Skylion007

…ionally (pytorch#174619) Both cuSolver and hipSolver support syevd/syevj, so just removed MAGMA path entirely and updated relevant tests to skip if missing cuSolver instead of missing MAGMA. Benchmark script and results are included below, though results only show sizes 512+ because MAGMA just calls LAPACK on sizes up to 128. Benchmarking script: ```python import torch import torch.utils.benchmark as benchmark from itertools import product results = [] batches = [(), (16,), (64,)] sizes = [16, 128, 512, 2048] dtypes = [torch.float32, torch.float64, torch.complex64, torch.complex128] for b, n, dtype in product(batches, sizes, dtypes): shape = b + (n, n) print(f"Testing shape={shape}, dtype={dtype}") label = "torch.linalg.eigh" sub_label = f"{shape}, {dtype}" X = torch.rand(*shape, dtype=dtype, device="cuda") X = X + X.mT.conj() stmt = "torch.linalg.eigh(X)" for backend in ("magma", "cusolver"): torch.backends.cuda.preferred_linalg_library(backend) # warm-up for _ in range(5): exec(stmt) results.append(benchmark.Timer( stmt=stmt, globals={'X': X}, label=label, sub_label=sub_label, description=backend, ).blocked_autorange(min_run_time=1)) compare = benchmark.Compare(results) compare.print() ``` Benchmark results on RTX Pro 6000: ``` [------------------------ torch.linalg.eigh -------------------------] | magma | cusolver | speedup 1 threads: ----------------------------------------------------------- | (512, 512), torch.float32 | 12605.6 | 11742.1 | 1.1 (512, 512), torch.float64 | 17244.3 | 10558.8 | 1.6 (512, 512), torch.complex64 | 18868.0 | 3612.1 | 5.2 (512, 512), torch.complex128 | 28479.8 | 16659.5 | 1.7 (2048, 2048), torch.float32 | 226035.4 | 19598.1 | 11.5 (2048, 2048), torch.float64 | 451455.1 | 68374.8 | 6.6 (2048, 2048), torch.complex64 | 535989.6 | 23807.6 | 22.5 (2048, 2048), torch.complex128 | 1111481.8 | 164294.9 | 6.8 (16, 512, 512), torch.float32 | 210144.0 | 187468.1 | 1.1 (16, 512, 512), torch.float64 | 281164.8 | 167509.6 | 1.7 (16, 512, 512), torch.complex64 | 307684.5 | 57805.7 | 5.3 (16, 512, 512), torch.complex128 | 468624.1 | 265833.6 | 1.8 (16, 2048, 2048), torch.float32 | 3650952.0 | 315576.2 | 11.6 (16, 2048, 2048), torch.float64 | 7147413.6 | 1096273.9 | 6.5 (16, 2048, 2048), torch.complex64 | 8579275.9 | 384409.0 | 22.3 (16, 2048, 2048), torch.complex128 | 17937525.7 | 2639580.7 | 6.8 (64, 512, 512), torch.float32 | 835108.8 | 716855.2 | 1.2 (64, 512, 512), torch.float64 | 1145713.3 | 672703.7 | 1.7 (64, 512, 512), torch.complex64 | 1289962.5 | 233632.8 | 5.5 (64, 512, 512), torch.complex128 | 1863496.5 | 1067678.9 | 1.7 (64, 2048, 2048), torch.float32 | 14329632.9 | 1257138.1 | 11.4 (64, 2048, 2048), torch.float64 | 27999996.1 | 4381371.4 | 6.4 (64, 2048, 2048), torch.complex64 | 32749115.0 | 1528567.4 | 21.4 (64, 2048, 2048), torch.complex128 | 70825685.0 | 10548410.4 | 6.7 Times are in microseconds (us). ``` Pull Request resolved: pytorch#174619 Approved by: https://github.com/eqy, https://github.com/nikitaved, https://github.com/Skylion007

gderossi requested review from Aidyn-A, IvanYashchuk, eqy, lezcano, nikitaved and syed-ahmed as code owners February 9, 2026 21:26

pytorch-bot Bot added the release notes: linalg_frontend release notes category label Feb 9, 2026

pytorch-bot Bot added ciflow/b200 ciflow/h100 ciflow/trunk Trigger trunk jobs on your pull request labels Feb 9, 2026

eqy approved these changes Feb 9, 2026

View reviewed changes

pytorchbot added the open source label Feb 9, 2026

nikitaved approved these changes Feb 10, 2026

View reviewed changes

nikitaved added the ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 label Feb 10, 2026

Skylion007 approved these changes Feb 14, 2026

View reviewed changes

Remove MAGMA backend for eigh

2d1a853

pytorchmergebot force-pushed the deprecate-magma-eigh branch from 25edcbf to 2d1a853 Compare February 14, 2026 17:39

pytorch-bot Bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/h100 ciflow/b200 labels Feb 14, 2026

seemethere mentioned this pull request Feb 15, 2026

Consolidate or retire pytorch/almalinux-builder Docker images after MAGMA deprecation #175045

Open

Aidyn-A added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 17, 2026

Aidyn-A added the ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 label Feb 17, 2026

Merge branch 'main' into deprecate-magma-eigh

b0c401f

pytorch-bot Bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Feb 18, 2026

pytorch-bot Bot added ciflow/b200 ciflow/h100 ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/trunk Trigger trunk jobs on your pull request labels Feb 18, 2026

pytorchmergebot added the merging label Feb 18, 2026

pytorchmergebot added the Merged label Feb 19, 2026

pytorchmergebot closed this Feb 19, 2026

pytorchmergebot removed the merging label Feb 19, 2026

nikitaved mentioned this pull request Feb 20, 2026

Update eigh CUDA heuristics #175403

Closed

gderossi deleted the deprecate-magma-eigh branch March 9, 2026 14:26

gottbrath mentioned this pull request May 7, 2026

Performance improvement: updated backend selection for linalg.eigh on CUDA #178979

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAGMA][CUDA] eigh: deprecate MAGMA and dispatch to cuSolver unconditionally#174619

[MAGMA][CUDA] eigh: deprecate MAGMA and dispatch to cuSolver unconditionally#174619
gderossi wants to merge 2 commits intopytorch:mainfrom
gderossi:deprecate-magma-eigh

gderossi commented Feb 9, 2026

Uh oh!

pytorch-bot Bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

eqy commented Feb 9, 2026

Uh oh!

nikitaved commented Feb 10, 2026

Uh oh!

nikitaved left a comment

Uh oh!

Skylion007 commented Feb 14, 2026

Uh oh!

pytorchmergebot commented Feb 14, 2026

Uh oh!

pytorchmergebot commented Feb 14, 2026

Uh oh!

eqy commented Feb 18, 2026

Uh oh!

eqy commented Feb 18, 2026

Uh oh!

pytorchmergebot commented Feb 18, 2026

Uh oh!

pytorchmergebot commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

gderossi commented Feb 9, 2026

Uh oh!

pytorch-bot Bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174619

✅ No Failures

Uh oh!

eqy commented Feb 9, 2026

Uh oh!

nikitaved commented Feb 10, 2026

Uh oh!

nikitaved left a comment

Choose a reason for hiding this comment

Uh oh!

Skylion007 commented Feb 14, 2026

Uh oh!

pytorchmergebot commented Feb 14, 2026

Uh oh!

pytorchmergebot commented Feb 14, 2026

Uh oh!

eqy commented Feb 18, 2026

Uh oh!

eqy commented Feb 18, 2026

Uh oh!

pytorchmergebot commented Feb 18, 2026

Merge started

Uh oh!

pytorchmergebot commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pytorch-bot Bot commented Feb 9, 2026 •

edited

Loading