temporarily removed cudnn attention backend#1717
Conversation
a5cc80a to
88c6894
Compare
|
cc @tianyu-l for review |
|
What commit did you build at? I believe the fix was merged ~8 hours ago: pytorch/pytorch#163104 |
tianyu-l
left a comment
There was a problem hiding this comment.
What is the exact issue?
If eager works, we should disable it on compile path only.
If bf16+compile works, we should disable it on quantization path only.
2.10.0a0+git28c42cc (28c42cc28090e7ee629c9a89b5ef2cc4838fb755) |
|
Ok, can you share the failure message? I would be surprised if it was the same one... As a sanity check, the following unit test (included in the PR) should not error out if you have a build with the fix: |
It's the same error message. Maybe I need to uninstall pytorch-triton too and do Doing another complete pull, uninstall, make clean, install |
|
That doesn't look the same as :/ |
Oh, I've seen both of these cudnn related issues as part of the #1713 at various points, the issue described in #1713 is indeed a different message though, sorry for the confusion. The workaround of just not using CUDNN backend is what has resolved both:
|
|
cuDNN not initialized is pretty wild, are we almost out of GPU memory or something for this model? I'll check a source build in 12.9 tomorrow. |
I don't think so, after removing cudnn backend it hits around 80gb of GMEM on a b200 with ~183gb capacity.
Sounds good, thanks for taking a look |
|
To help narrow things down, could you also please collect some logging information: Thanks! |
fegin
left a comment
There was a problem hiding this comment.
We could not disable cuDNN backend. You can disable it for specific settings. But some people are still using it to benchmark (until last week).
|
Update: I retried this morning with today's latest nightly build which include's @eqy's fix, and the issue does not repro. Looks like the CUDNN not initialized must be a local env issue for me building from source, so we can close this. |
We should remove this until long term fix for #1713 is landed. I believe @eqy is working on a fix. I tried using pytorch built from source with latest changes just now, but the issue persists, so for now we can remove cudnn attention backend and add back later.