Skip to content

[ROCm][SymmMem] Fix skip condition for PLATFORM_SUPPORTS_SYMM_MEM#163205

Closed
pragupta wants to merge 2 commits intopytorch:mainfrom
pragupta:pg-fix-symm-skips
Closed

[ROCm][SymmMem] Fix skip condition for PLATFORM_SUPPORTS_SYMM_MEM#163205
pragupta wants to merge 2 commits intopytorch:mainfrom
pragupta:pg-fix-symm-skips

Conversation

@pragupta
Copy link
Collaborator

@pragupta pragupta commented Sep 17, 2025

It seems TEST_CUDA is set to true even for ROCm (MI200) jobs. Changing if TEST_CUDA to an else condition to avoid running symmetric memory UTs on MI200. For other non-rocm arch, it should return true and can be skipped using other skip decorators.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @ezyang @msaroufim @dcci @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163205

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures, 1 Cancelled Job, 7 Unrelated Failures

As of commit 1810b54 with merge base 8627454 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Sep 17, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: jeffdaily / name: Jeff Daily (1810b54)
  • ✅ login: pragupta / name: Prachi Gupta (5a471a3)

@pytorch-bot pytorch-bot bot added module: rocm AMD GPU support for Pytorch oncall: distributed Add this issue/PR to distributed oncall triage queue labels Sep 17, 2025
@pragupta pragupta force-pushed the pg-fix-symm-skips branch 2 times, most recently from 0b8fc00 to 4f66074 Compare September 17, 2025 22:43
@jeffdaily jeffdaily added topic: not user facing topic category ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Sep 18, 2025
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure why not

@pytorch-bot pytorch-bot bot removed ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Sep 18, 2025
@pragupta pragupta changed the title [ROCm][SymmMem] Use skip_if_rocm_arch decorator [ROCm][SymmMem] Fix skip condition for PLATFORM_SUPPORTS_SYMM_MEM Sep 18, 2025
@jithunnair-amd jithunnair-amd added ciflow/rocm Trigger "default" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Sep 18, 2025
@jeffdaily jeffdaily added ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/periodic-rocm-mi300 Trigger "distributed" config CI on ROCm MI300/MI325 labels Sep 19, 2025
@pytorch-bot pytorch-bot bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm Trigger "default" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/periodic-rocm-mi300 Trigger "distributed" config CI on ROCm MI300/MI325 labels Sep 19, 2025
@jeffdaily jeffdaily added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 19, 2025
@jeffdaily jeffdaily added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/periodic-rocm-mi300 Trigger "distributed" config CI on ROCm MI300/MI325 labels Sep 19, 2025
@jeffdaily jeffdaily marked this pull request as ready for review September 19, 2025 04:29
@jeffdaily
Copy link
Collaborator

@pytorchbot merge -f "all failures are unrelated from main branch"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…torch#163205)

It seems `TEST_CUDA` is set to true even for ROCm (MI200) jobs. Changing if TEST_CUDA to an else condition to avoid running symmetric memory UTs on MI200. For other non-rocm arch, it should return true and can be skipped using other skip decorators.

Pull Request resolved: pytorch#163205
Approved by: https://github.com/ezyang

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…torch#163205)

It seems `TEST_CUDA` is set to true even for ROCm (MI200) jobs. Changing if TEST_CUDA to an else condition to avoid running symmetric memory UTs on MI200. For other non-rocm arch, it should return true and can be skipped using other skip decorators.

Pull Request resolved: pytorch#163205
Approved by: https://github.com/ezyang

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…torch#163205)

It seems `TEST_CUDA` is set to true even for ROCm (MI200) jobs. Changing if TEST_CUDA to an else condition to avoid running symmetric memory UTs on MI200. For other non-rocm arch, it should return true and can be skipped using other skip decorators.

Pull Request resolved: pytorch#163205
Approved by: https://github.com/ezyang

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/periodic-rocm-mi300 Trigger "distributed" config CI on ROCm MI300/MI325 ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch oncall: distributed Add this issue/PR to distributed oncall triage queue open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants