Skip to content

[MAGMA][CUDA] eigh: deprecate MAGMA and dispatch to cuSolver unconditionally#174619

Closed
gderossi wants to merge 2 commits intopytorch:mainfrom
gderossi:deprecate-magma-eigh
Closed

[MAGMA][CUDA] eigh: deprecate MAGMA and dispatch to cuSolver unconditionally#174619
gderossi wants to merge 2 commits intopytorch:mainfrom
gderossi:deprecate-magma-eigh

Conversation

@gderossi
Copy link
Copy Markdown
Contributor

@gderossi gderossi commented Feb 9, 2026

Both cuSolver and hipSolver support syevd/syevj, so just removed MAGMA path entirely and updated relevant tests to skip if missing cuSolver instead of missing MAGMA. Benchmark script and results are included below, though results only show sizes 512+ because MAGMA just calls LAPACK on sizes up to 128.

Benchmarking script:

import torch
import torch.utils.benchmark as benchmark

from itertools import product

results = []

batches = [(), (16,), (64,)]
sizes = [16, 128, 512,  2048]
dtypes = [torch.float32, torch.float64, torch.complex64, torch.complex128]

for b, n, dtype in product(batches, sizes, dtypes):
    shape = b + (n, n)
    print(f"Testing shape={shape}, dtype={dtype}")
    label = "torch.linalg.eigh"
    sub_label = f"{shape}, {dtype}"
    X = torch.rand(*shape, dtype=dtype, device="cuda")
    X = X + X.mT.conj()
    stmt = "torch.linalg.eigh(X)"
    for backend in ("magma", "cusolver"):
        torch.backends.cuda.preferred_linalg_library(backend)
        # warm-up
        for _ in range(5):
            exec(stmt)

        results.append(benchmark.Timer(
            stmt=stmt,
            globals={'X': X},
            label=label,
            sub_label=sub_label,
            description=backend,
        ).blocked_autorange(min_run_time=1))

compare = benchmark.Compare(results)
compare.print()

Benchmark results on RTX Pro 6000:

[------------------------ torch.linalg.eigh -------------------------]
                                          |    magma     |   cusolver  | speedup
1 threads: ----------------------------------------------------------- |
      (512, 512), torch.float32           |     12605.6  |     11742.1 | 1.1
      (512, 512), torch.float64           |     17244.3  |     10558.8 | 1.6
      (512, 512), torch.complex64         |     18868.0  |      3612.1 | 5.2
      (512, 512), torch.complex128        |     28479.8  |     16659.5 | 1.7
      (2048, 2048), torch.float32         |    226035.4  |     19598.1 | 11.5
      (2048, 2048), torch.float64         |    451455.1  |     68374.8 | 6.6
      (2048, 2048), torch.complex64       |    535989.6  |     23807.6 | 22.5
      (2048, 2048), torch.complex128      |   1111481.8  |    164294.9 | 6.8
      (16, 512, 512), torch.float32       |    210144.0  |    187468.1 | 1.1
      (16, 512, 512), torch.float64       |    281164.8  |    167509.6 | 1.7
      (16, 512, 512), torch.complex64     |    307684.5  |     57805.7 | 5.3
      (16, 512, 512), torch.complex128    |    468624.1  |    265833.6 | 1.8
      (16, 2048, 2048), torch.float32     |   3650952.0  |    315576.2 | 11.6
      (16, 2048, 2048), torch.float64     |   7147413.6  |   1096273.9 | 6.5
      (16, 2048, 2048), torch.complex64   |   8579275.9  |    384409.0 | 22.3
      (16, 2048, 2048), torch.complex128  |  17937525.7  |   2639580.7 | 6.8
      (64, 512, 512), torch.float32       |    835108.8  |    716855.2 | 1.2
      (64, 512, 512), torch.float64       |   1145713.3  |    672703.7 | 1.7
      (64, 512, 512), torch.complex64     |   1289962.5  |    233632.8 | 5.5
      (64, 512, 512), torch.complex128    |   1863496.5  |   1067678.9 | 1.7
      (64, 2048, 2048), torch.float32     |  14329632.9  |   1257138.1 | 11.4
      (64, 2048, 2048), torch.float64     |  27999996.1  |   4381371.4 | 6.4
      (64, 2048, 2048), torch.complex64   |  32749115.0  |   1528567.4 | 21.4
      (64, 2048, 2048), torch.complex128  |  70825685.0  |  10548410.4 | 6.7

Times are in microseconds (us).

cc @nikitaved @eqy

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Feb 9, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174619

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b0c401f with merge base c68a888 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot Bot added the release notes: linalg_frontend release notes category label Feb 9, 2026
@eqy
Copy link
Copy Markdown
Collaborator

eqy commented Feb 9, 2026

@pytorchmergebot label ciflow/trunk ciflow/h100 ciflow/b200

@pytorch-bot pytorch-bot Bot added ciflow/b200 ciflow/h100 ciflow/trunk Trigger trunk jobs on your pull request labels Feb 9, 2026
@nikitaved
Copy link
Copy Markdown
Collaborator

Seems like #174674 could be a very nice follow-up :)

Copy link
Copy Markdown
Collaborator

@nikitaved nikitaved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you very much!

@nikitaved nikitaved added the ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 label Feb 10, 2026
@Skylion007
Copy link
Copy Markdown
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased deprecate-magma-eigh onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout deprecate-magma-eigh && git pull --rebase)

@pytorch-bot pytorch-bot Bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/h100 ciflow/b200 labels Feb 14, 2026
@Aidyn-A Aidyn-A added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 17, 2026
@Aidyn-A Aidyn-A added the ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 label Feb 17, 2026
@pytorch-bot pytorch-bot Bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Feb 18, 2026
@eqy
Copy link
Copy Markdown
Collaborator

eqy commented Feb 18, 2026

@pytorchmergebot label ciflow/trunk ciflow/h100 ciflow/b200 ciflow/rocm-mi300

@pytorch-bot pytorch-bot Bot added ciflow/b200 ciflow/h100 ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/trunk Trigger trunk jobs on your pull request labels Feb 18, 2026
@eqy
Copy link
Copy Markdown
Collaborator

eqy commented Feb 18, 2026

@pytorchmergebot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Ali-Razmjoo pushed a commit to fork-the-planet/pytorch that referenced this pull request Feb 19, 2026
…ionally (pytorch#174619)

Both cuSolver and hipSolver support syevd/syevj, so just removed MAGMA path entirely and updated relevant tests to skip if missing cuSolver instead of missing MAGMA. Benchmark script and results are included below, though results only show sizes 512+ because MAGMA just calls LAPACK on sizes up to 128.

Benchmarking script:
```python
import torch
import torch.utils.benchmark as benchmark

from itertools import product

results = []

batches = [(), (16,), (64,)]
sizes = [16, 128, 512,  2048]
dtypes = [torch.float32, torch.float64, torch.complex64, torch.complex128]

for b, n, dtype in product(batches, sizes, dtypes):
    shape = b + (n, n)
    print(f"Testing shape={shape}, dtype={dtype}")
    label = "torch.linalg.eigh"
    sub_label = f"{shape}, {dtype}"
    X = torch.rand(*shape, dtype=dtype, device="cuda")
    X = X + X.mT.conj()
    stmt = "torch.linalg.eigh(X)"
    for backend in ("magma", "cusolver"):
        torch.backends.cuda.preferred_linalg_library(backend)
        # warm-up
        for _ in range(5):
            exec(stmt)

        results.append(benchmark.Timer(
            stmt=stmt,
            globals={'X': X},
            label=label,
            sub_label=sub_label,
            description=backend,
        ).blocked_autorange(min_run_time=1))

compare = benchmark.Compare(results)
compare.print()
```

Benchmark results on RTX Pro 6000:
```
[------------------------ torch.linalg.eigh -------------------------]
                                          |    magma     |   cusolver  | speedup
1 threads: ----------------------------------------------------------- |
      (512, 512), torch.float32           |     12605.6  |     11742.1 | 1.1
      (512, 512), torch.float64           |     17244.3  |     10558.8 | 1.6
      (512, 512), torch.complex64         |     18868.0  |      3612.1 | 5.2
      (512, 512), torch.complex128        |     28479.8  |     16659.5 | 1.7
      (2048, 2048), torch.float32         |    226035.4  |     19598.1 | 11.5
      (2048, 2048), torch.float64         |    451455.1  |     68374.8 | 6.6
      (2048, 2048), torch.complex64       |    535989.6  |     23807.6 | 22.5
      (2048, 2048), torch.complex128      |   1111481.8  |    164294.9 | 6.8
      (16, 512, 512), torch.float32       |    210144.0  |    187468.1 | 1.1
      (16, 512, 512), torch.float64       |    281164.8  |    167509.6 | 1.7
      (16, 512, 512), torch.complex64     |    307684.5  |     57805.7 | 5.3
      (16, 512, 512), torch.complex128    |    468624.1  |    265833.6 | 1.8
      (16, 2048, 2048), torch.float32     |   3650952.0  |    315576.2 | 11.6
      (16, 2048, 2048), torch.float64     |   7147413.6  |   1096273.9 | 6.5
      (16, 2048, 2048), torch.complex64   |   8579275.9  |    384409.0 | 22.3
      (16, 2048, 2048), torch.complex128  |  17937525.7  |   2639580.7 | 6.8
      (64, 512, 512), torch.float32       |    835108.8  |    716855.2 | 1.2
      (64, 512, 512), torch.float64       |   1145713.3  |    672703.7 | 1.7
      (64, 512, 512), torch.complex64     |   1289962.5  |    233632.8 | 5.5
      (64, 512, 512), torch.complex128    |   1863496.5  |   1067678.9 | 1.7
      (64, 2048, 2048), torch.float32     |  14329632.9  |   1257138.1 | 11.4
      (64, 2048, 2048), torch.float64     |  27999996.1  |   4381371.4 | 6.4
      (64, 2048, 2048), torch.complex64   |  32749115.0  |   1528567.4 | 21.4
      (64, 2048, 2048), torch.complex128  |  70825685.0  |  10548410.4 | 6.7

Times are in microseconds (us).
```

Pull Request resolved: pytorch#174619
Approved by: https://github.com/eqy, https://github.com/nikitaved, https://github.com/Skylion007
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

This PR (#174619) was merged in b34a03e but it is still open, likely due to a Github bug, so mergebot is closing it manually. If you think this is a mistake, please feel free to reopen and contact Dev Infra.

norx1991 pushed a commit that referenced this pull request Feb 24, 2026
…ionally (#174619)

Both cuSolver and hipSolver support syevd/syevj, so just removed MAGMA path entirely and updated relevant tests to skip if missing cuSolver instead of missing MAGMA. Benchmark script and results are included below, though results only show sizes 512+ because MAGMA just calls LAPACK on sizes up to 128.

Benchmarking script:
```python
import torch
import torch.utils.benchmark as benchmark

from itertools import product

results = []

batches = [(), (16,), (64,)]
sizes = [16, 128, 512,  2048]
dtypes = [torch.float32, torch.float64, torch.complex64, torch.complex128]

for b, n, dtype in product(batches, sizes, dtypes):
    shape = b + (n, n)
    print(f"Testing shape={shape}, dtype={dtype}")
    label = "torch.linalg.eigh"
    sub_label = f"{shape}, {dtype}"
    X = torch.rand(*shape, dtype=dtype, device="cuda")
    X = X + X.mT.conj()
    stmt = "torch.linalg.eigh(X)"
    for backend in ("magma", "cusolver"):
        torch.backends.cuda.preferred_linalg_library(backend)
        # warm-up
        for _ in range(5):
            exec(stmt)

        results.append(benchmark.Timer(
            stmt=stmt,
            globals={'X': X},
            label=label,
            sub_label=sub_label,
            description=backend,
        ).blocked_autorange(min_run_time=1))

compare = benchmark.Compare(results)
compare.print()
```

Benchmark results on RTX Pro 6000:
```
[------------------------ torch.linalg.eigh -------------------------]
                                          |    magma     |   cusolver  | speedup
1 threads: ----------------------------------------------------------- |
      (512, 512), torch.float32           |     12605.6  |     11742.1 | 1.1
      (512, 512), torch.float64           |     17244.3  |     10558.8 | 1.6
      (512, 512), torch.complex64         |     18868.0  |      3612.1 | 5.2
      (512, 512), torch.complex128        |     28479.8  |     16659.5 | 1.7
      (2048, 2048), torch.float32         |    226035.4  |     19598.1 | 11.5
      (2048, 2048), torch.float64         |    451455.1  |     68374.8 | 6.6
      (2048, 2048), torch.complex64       |    535989.6  |     23807.6 | 22.5
      (2048, 2048), torch.complex128      |   1111481.8  |    164294.9 | 6.8
      (16, 512, 512), torch.float32       |    210144.0  |    187468.1 | 1.1
      (16, 512, 512), torch.float64       |    281164.8  |    167509.6 | 1.7
      (16, 512, 512), torch.complex64     |    307684.5  |     57805.7 | 5.3
      (16, 512, 512), torch.complex128    |    468624.1  |    265833.6 | 1.8
      (16, 2048, 2048), torch.float32     |   3650952.0  |    315576.2 | 11.6
      (16, 2048, 2048), torch.float64     |   7147413.6  |   1096273.9 | 6.5
      (16, 2048, 2048), torch.complex64   |   8579275.9  |    384409.0 | 22.3
      (16, 2048, 2048), torch.complex128  |  17937525.7  |   2639580.7 | 6.8
      (64, 512, 512), torch.float32       |    835108.8  |    716855.2 | 1.2
      (64, 512, 512), torch.float64       |   1145713.3  |    672703.7 | 1.7
      (64, 512, 512), torch.complex64     |   1289962.5  |    233632.8 | 5.5
      (64, 512, 512), torch.complex128    |   1863496.5  |   1067678.9 | 1.7
      (64, 2048, 2048), torch.float32     |  14329632.9  |   1257138.1 | 11.4
      (64, 2048, 2048), torch.float64     |  27999996.1  |   4381371.4 | 6.4
      (64, 2048, 2048), torch.complex64   |  32749115.0  |   1528567.4 | 21.4
      (64, 2048, 2048), torch.complex128  |  70825685.0  |  10548410.4 | 6.7

Times are in microseconds (us).
```

Pull Request resolved: #174619
Approved by: https://github.com/eqy, https://github.com/nikitaved, https://github.com/Skylion007
@gderossi gderossi deleted the deprecate-magma-eigh branch March 9, 2026 14:26
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…ionally (pytorch#174619)

Both cuSolver and hipSolver support syevd/syevj, so just removed MAGMA path entirely and updated relevant tests to skip if missing cuSolver instead of missing MAGMA. Benchmark script and results are included below, though results only show sizes 512+ because MAGMA just calls LAPACK on sizes up to 128.

Benchmarking script:
```python
import torch
import torch.utils.benchmark as benchmark

from itertools import product

results = []

batches = [(), (16,), (64,)]
sizes = [16, 128, 512,  2048]
dtypes = [torch.float32, torch.float64, torch.complex64, torch.complex128]

for b, n, dtype in product(batches, sizes, dtypes):
    shape = b + (n, n)
    print(f"Testing shape={shape}, dtype={dtype}")
    label = "torch.linalg.eigh"
    sub_label = f"{shape}, {dtype}"
    X = torch.rand(*shape, dtype=dtype, device="cuda")
    X = X + X.mT.conj()
    stmt = "torch.linalg.eigh(X)"
    for backend in ("magma", "cusolver"):
        torch.backends.cuda.preferred_linalg_library(backend)
        # warm-up
        for _ in range(5):
            exec(stmt)

        results.append(benchmark.Timer(
            stmt=stmt,
            globals={'X': X},
            label=label,
            sub_label=sub_label,
            description=backend,
        ).blocked_autorange(min_run_time=1))

compare = benchmark.Compare(results)
compare.print()
```

Benchmark results on RTX Pro 6000:
```
[------------------------ torch.linalg.eigh -------------------------]
                                          |    magma     |   cusolver  | speedup
1 threads: ----------------------------------------------------------- |
      (512, 512), torch.float32           |     12605.6  |     11742.1 | 1.1
      (512, 512), torch.float64           |     17244.3  |     10558.8 | 1.6
      (512, 512), torch.complex64         |     18868.0  |      3612.1 | 5.2
      (512, 512), torch.complex128        |     28479.8  |     16659.5 | 1.7
      (2048, 2048), torch.float32         |    226035.4  |     19598.1 | 11.5
      (2048, 2048), torch.float64         |    451455.1  |     68374.8 | 6.6
      (2048, 2048), torch.complex64       |    535989.6  |     23807.6 | 22.5
      (2048, 2048), torch.complex128      |   1111481.8  |    164294.9 | 6.8
      (16, 512, 512), torch.float32       |    210144.0  |    187468.1 | 1.1
      (16, 512, 512), torch.float64       |    281164.8  |    167509.6 | 1.7
      (16, 512, 512), torch.complex64     |    307684.5  |     57805.7 | 5.3
      (16, 512, 512), torch.complex128    |    468624.1  |    265833.6 | 1.8
      (16, 2048, 2048), torch.float32     |   3650952.0  |    315576.2 | 11.6
      (16, 2048, 2048), torch.float64     |   7147413.6  |   1096273.9 | 6.5
      (16, 2048, 2048), torch.complex64   |   8579275.9  |    384409.0 | 22.3
      (16, 2048, 2048), torch.complex128  |  17937525.7  |   2639580.7 | 6.8
      (64, 512, 512), torch.float32       |    835108.8  |    716855.2 | 1.2
      (64, 512, 512), torch.float64       |   1145713.3  |    672703.7 | 1.7
      (64, 512, 512), torch.complex64     |   1289962.5  |    233632.8 | 5.5
      (64, 512, 512), torch.complex128    |   1863496.5  |   1067678.9 | 1.7
      (64, 2048, 2048), torch.float32     |  14329632.9  |   1257138.1 | 11.4
      (64, 2048, 2048), torch.float64     |  27999996.1  |   4381371.4 | 6.4
      (64, 2048, 2048), torch.complex64   |  32749115.0  |   1528567.4 | 21.4
      (64, 2048, 2048), torch.complex128  |  70825685.0  |  10548410.4 | 6.7

Times are in microseconds (us).
```

Pull Request resolved: pytorch#174619
Approved by: https://github.com/eqy, https://github.com/nikitaved, https://github.com/Skylion007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/b200 ciflow/h100 ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: linalg_frontend release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants