Add isnan exit condition to special ops#157464
Add isnan exit condition to special ops#157464malfet wants to merge 8 commits intogh/malfet/427/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157464
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 556cebd with merge base 0f9c1b3 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot merge -f "Don't think binary builds with discover anything new" |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
For eager and inductor As for all other chebyshev ops, logic is simply compiled from https://github.com/pytorch/pytorch/blob/94716db22214912896cf680dc3eb88574f611a42/aten/src/ATen/native/cuda/Math.cuh#L2821 Pull Request resolved: #157488 Approved by: https://github.com/dcci ghstack dependencies: #157464
|
@pytorchbot revert -m "caused slow test config to time out GH job link HUD commit link" -c nosignal Looking at the logs I see lines like: I'm not sure what's going on, but also found that some unrelated tests take much longer after this change too, maybe resource starvation due to running in parallel? I think they used to take <500s: |
|
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 9620994. Reverted #157488 on behalf of https://github.com/clee2000 due to caused slow test config to time out [GH job link](https://github.com/pytorch/pytorch/actions/runs/16037776972/job/45254574100) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/e124a0d88ca2aa04bfaca2dcabf5de6244048e45) ([comment](#157464 (comment)))
This reverts commit e124a0d. Reverted #157464 on behalf of https://github.com/clee2000 due to caused slow test config to time out [GH job link](https://github.com/pytorch/pytorch/actions/runs/16037776972/job/45254574100) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/e124a0d88ca2aa04bfaca2dcabf5de6244048e45) ([comment](#157464 (comment)))
|
@malfet your PR has been successfully reverted. |
|
Wow |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
They might have been slow on CUDA-11.3, but this version of CUDA is long gone. More fundamental underlying issue were linear complexity of the recursive polynomial definitions for higher order polynomials, for example see this loop from implementation of Chebyshev polynomial of the first kind
pytorch/aten/src/ATen/native/Math.h
Lines 2969 to 2973 in 7081b82
which were tested by
test_compare_cpuusing following values (as sample index 16)pytorch/torch/testing/_internal/opinfo/core.py
Line 2079 in 7081b82
Luckily chebyshev polynomials for absolute values higher than 1 pretty quickly reach infinity, see below
Which is not the case for Laguerre polynomials, but it's probably fine to just limit it to 1e7
Before
After
Fixes #79528
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k