Guard __restrict usage for CUDA#5061
Merged
StephanTLavavej merged 1 commit intomicrosoft:mainfrom Nov 8, 2024
Merged
Conversation
CaseyCarter
approved these changes
Nov 1, 2024
Contributor
|
It's |
| #endif // ^^^ !defined(__cpp_static_call_operator) ^^^ | ||
|
|
||
| #ifdef __CUDACC__ // TRANSITION, CUDA 12.4 doesn't recognize __restrict | ||
| #define _RESTRICT |
Member
Author
|
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Followup to #4958.
Fixes VSO-2295101 "[RWC][prod/fe][Regression] Cutlass failed with error C2146: syntax error: missing
')'before identifier'_Dest'". It appears that CUDA 12.4 doesn't understand__restrict, so when it attempts to split code between the device (GPU) and host (MSVC) compilers, it mangles what's sent to MSVC.Apparently this is only an issue when the template is instantiated, which is why it wasn't found by our CUDA test coverage (that includes all headers and does nothing with them). It was only encountered in our Real World Code test suite where we wisely harnessed NVIDIA's Cutlass project.
I am slightly nervous that bad kitties have grabbed
_RESTRICTbut we're following that form for_LIKELYetc. and I don't want to pre-emptively avoid issues when we have all legitimate rights to this identifier.Fixes AB#2295101.