Conversation
|
Review updated until commit 0a126c0 Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
|
!test |
wujingyue
left a comment
There was a problem hiding this comment.
LGTM -- thanks for fixing the breakage!
As discussed offline, there's the alternative to
- change
Sdpa*Op::evaluatewithout changing the fusion IR - change the two TensorViews' (seed and offset) dtype from int64 to uint64
- change their devices from CPU to GPU.
That would make this PR much smaller and also minimize the impact on Thunder.
|
Another way is: |
jacobhinkle
left a comment
There was a problem hiding this comment.
IIUC this PR changes us fully from the two-parameter version to the one-parameter version. Will this still work with older PyTorch or does this mean we are bumping our required torch version?
|
The latest stable release 2.6.0 was in Jan 2024 and would not have this change. Thunder requires nvfuser to support both the nightly and the latest stable release. For supporting both versions, it would also be simpler to keep philox_seed and philox_offset separate but with changed Is there any other approach that we could try for supporting two API variants? |
…ing in their device, dtype, and shape between versions of pytorch.
686ddd9 to
480ae2e
Compare
|
What is the best way to determine the appropriate torch versions? The latest stable release is |
|
!test |
|
!test |
|
Except for Thunder tests (which are expected), the errors seem unrelated. |
tests/python/utils.py
Outdated
| return inner_fn | ||
|
|
||
|
|
||
| UPDATED_SDPA = torch.__version__ > "2.6.0" |
There was a problem hiding this comment.
This will be updated to str(torch.version) > "2.7.0" or using packaging.version.
The changes to SDPA were made between versions 2.7.0 and 2.8.0 based on version.txt in PyTorch. So I am using 2.7.0.
|
!test |
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
|
!test |
|
Not sure how to set up a CI pipeline for torch 2.6.0 stable release. I have verified the affected tests locally. |
csrc/ops/composite.cpp
Outdated
| // The torch API returns philox_seed -> rng_state (uint64_t[2]) | ||
| // and philox_offset -> _unused (empty tensor) | ||
| philox_seed = TensorViewBuilder() | ||
| .shape(std::vector<int64_t>{2}) |
There was a problem hiding this comment.
nitpick: do we need to be explicit with std::vector<int64_t>{2}? I think we usually just do shape({2}).
There was a problem hiding this comment.
Yes. Surprisingly, I was getting an ambiguous construct compile error. I did not dig into it more.
|
!build |
pytorch/pytorch#146372 changes the flash attention API.
In nvfuser API,SdpaFwdOpnow returns arng_stateandSdpaBwdOpexpects arng_state._unusedis ignored.In nvfuser API, based on torch version,
philox_seedis nowuint64_t[2]and philox offset is a emptyuint64_ttensor on device. I chose to keep the same semantics as PyTorch since it avoids packing/unpacking PyTorch outputs when testing if I were to split therng_stateoutput of PyTorch intophilox_seedandphilox_offset. The latter approach would keep the dimensions of both tensors the same, but since we're changingdeviceanddtype, they will need to be created distinctly for the two cases. This also allows us to switch to the single-parameter version in the future more easily if desired.