Add _syncthreads for Write-After-Read Race#383
Conversation
8e2c13f to
033921e
Compare
|
In the relevant tests, can you also check whether a required syncthreads is actually inserted at its proper location? Result validation may not be able to expose potential race conditions. |
|
I thought about that but didn't come up with a good solution. e.g. check the entire kernel string. |
|
How about just traversing KIR to find a relevant ForLoop node and check whether it ends with a sync node? Finding relevant ForLoops may not be trivial for complex fusions, though. |
|
Here are two other options:
Do we have access to the KIR from the test? Option 2 is the easiest to implement. |
|
I think (read-only) accesses to KIR should be allowed for verification. |
|
If I read the code correctly, in the above case, the last commented-out syncthreads is NOT inserted. However, since the trip count of the inner loop can be 0, the nested syncthreads may not be executed, so we need to have the last syncthreads. Am I missing anything? |
|
Don't all for-loops iterate from 0 to some positive integer? If we enforce this constraint, then every for-loop is entered. |
Are we sure all for loops must have a trip count greater than 0? That may be the case, actually, but not 100% sure. |
|
I ran all the cpp unit tests with this assertion inside the IterDomain constructor and they passed. |
d7c85eb to
e3e3eef
Compare
e3e3eef to
c7db9fc
Compare
|
Thanks for adding the check for inserted syncthreads. Looks very good! |
6ebd1bd to
2ba5000
Compare
* Basic Write-After-Read (WAR) check to add __syncthreads to end of for-loop * Enable Tiled GEMM example * Check that IterDomain iterates from zero to some positive integer Co-authored-by: Ryan Spring <rspring@nvidia.com>
* Get a crazy test example working. * Change problem size and tile size, still an issue with N > 32. * Add sync threads in loops that read from smem, to make sure we finish reading before writing. * Predicate off threads bound to a broadcast dim of an output when its in shared memory. * Predicate smem tiling writing based on broadcasted dims in consumer. * Cleanup example a bit. * Revert "Add sync threads in loops that read from smem, to make sure we finish reading before writing." This reverts commit dffaa76. Revert this in favor of #383 * Add _syncthreads for Write-After-Read Race (#383) * Basic Write-After-Read (WAR) check to add __syncthreads to end of for-loop * Enable Tiled GEMM example * Check that IterDomain iterates from zero to some positive integer Co-authored-by: Ryan Spring <rspring@nvidia.com> * Refactor thread predication for writes to smem Co-authored-by: Naoya Maruyama <nmaruyama@nvidia.com> Co-authored-by: Ryan Spring <rdspring1@gmail.com> Co-authored-by: Ryan Spring <rspring@nvidia.com>
Fixes #380
Goal: Insert sync at end of for-loops to prevent write-after-read (WAR) race condition. WAR race condition occurs when the next iteration of the loop overwrites shared memory value before a previous operation has finished reading it.
WAR Race Check: