Skip to content

updated test cases to use MultithreadTestCase#158082

Closed
Electron4444 wants to merge 1 commit intopytorch:mainfrom
Electron4444:main
Closed

updated test cases to use MultithreadTestCase#158082
Electron4444 wants to merge 1 commit intopytorch:mainfrom
Electron4444:main

Conversation

@Electron4444
Copy link
Contributor

@Electron4444 Electron4444 commented Jul 11, 2025

Fixes #108744
Also addresses comments from PR #108749

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158082

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 844f5c3 with merge base ae86e8f (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 11, 2025

CLA Signed


The committers listed above are authorized under a signed CLA.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category labels Jul 11, 2025
@Electron4444 Electron4444 changed the title uodated test cases to use MultithreadTestCase updated test cases to use MultithreadTestCase Jul 11, 2025
@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 14, 2025
@Electron4444
Copy link
Contributor Author

@tianyu-l Could you review, please.

@tianyu-l tianyu-l requested review from XilunWu, wconstab and zpcore July 17, 2025 19:45
@wconstab
Copy link
Contributor

I like the speedup. but for me these tests don't run correctly. Did they run fine for you locally?

python test/distributed/tensor/test_tensor_ops.py -k slice for me gives

Mismatched elements: 64 / 64 (100.0%)
Greatest absolute difference: 3.619420289993286 at index (5, 0) (up to 1e-05 allowed)
Greatest relative difference: 72.54373168945312 at index (2, 1) (up to 1.3e-06 allowed)

@Electron4444
Copy link
Contributor Author

I had run them locally and everything seemed fine, but it seems I had an error in setting up my environment.
My apologies.
But I think this is an issue of wrong association of values to the tests.
While retesting my PR I read through the last PR in the issue that got closed for inactivity.
In the logs of the performed tests jenkins output the values expected were switched for the corresponding results.

I tried to reproduce this by outputting all the tensors of failed tests to see a pattern, but I sadly couldn't reproduce it.

My guess is that there's an error via the timing during distributed calculation, but this clearly is not working.

@Electron4444
Copy link
Contributor Author

Thanks anyways for taking the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: distributed Add this issue/PR to distributed oncall triage queue open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

switch more test cases to use MultithreadTestCase

4 participants