Add _syncthreads for Write-After-Read Race by rdspring1 · Pull Request #383 · csarofeen/pytorch

rdspring1 · 2020-09-16T02:23:48Z

Fixes #380

Basic Write-After-Read check to add __syncthreads to end of for-loop
Enable result validation of the GEMM test.

Goal: Insert sync at end of for-loops to prevent write-after-read (WAR) race condition. WAR race condition occurs when the next iteration of the loop overwrites shared memory value before a previous operation has finished reading it.

WAR Race Check:

Track all output shared memory TVs before first sync.
Track all input shared memory TVs after last sync.
If the intersection is non-empty, then there is a WAR race condition.
Recursively check each nested for-loop.

naoyam · 2020-09-16T18:45:07Z

In the relevant tests, can you also check whether a required syncthreads is actually inserted at its proper location? Result validation may not be able to expose potential race conditions.

rdspring1 · 2020-09-16T18:57:41Z

I thought about that but didn't come up with a good solution. e.g. check the entire kernel string.
Do you have any ideas?

naoyam · 2020-09-16T19:08:07Z

How about just traversing KIR to find a relevant ForLoop node and check whether it ends with a sync node? Finding relevant ForLoops may not be trivial for complex fusions, though.

rdspring1 · 2020-09-16T19:34:21Z

Here are two other options:

Instead of traversing KIR, flatten KIR and check if sync exists at position x
Check if substring at position x is __syncthreads

Do we have access to the KIR from the test? Option 2 is the easiest to implement.

naoyam · 2020-09-16T19:37:42Z

I think (read-only) accesses to KIR should be allowed for verification.

naoyam · 2020-09-17T17:31:04Z

last_op_sync_ seems to be used to suppress inserting syncthreads when the last operation is also syncthreads. It seems that the "last operation" here can mean the last operation in a nested loop body. Something like this:

for i in X
  Write to SMEM
  __syncthreads()
  Read from SMEM
  for j in Y
    do something
     __syncthreads()
  end for
  // Insert __syncthreads() here?
end for

If I read the code correctly, in the above case, the last commented-out syncthreads is NOT inserted. However, since the trip count of the inner loop can be 0, the nested syncthreads may not be executed, so we need to have the last syncthreads.

Am I missing anything?

rdspring1 · 2020-09-17T19:53:58Z

Don't all for-loops iterate from 0 to some positive integer? If we enforce this constraint, then every for-loop is entered.
Since we are performing this pass at compile time, we have to assume either all for-loops are entered or not.

naoyam · 2020-09-17T20:03:50Z

Don't all for-loops iterate from 0 to some positive integer? If we enforce this constraint, then every for-loop is entered.
Since we are performing this pass at compile time, we have to assume either all for-loops are entered or not.

Are we sure all for loops must have a trip count greater than 0? That may be the case, actually, but not 100% sure.

rdspring1 · 2020-09-17T20:26:02Z

I ran all the cpp unit tests with this assertion inside the IterDomain constructor and they passed.

  TORCH_INTERNAL_ASSERT(
      _start->isZeroInt(),
      "Cannot create an iter domain with a start that is not zero but received ",
      _extent,
      " .");

  TORCH_INTERNAL_ASSERT(
      !_extent->isZeroInt(),
      "Cannot create an iter domain with a extent that is zero but received ",
      _extent,
      " .");

naoyam · 2020-09-18T16:18:28Z

Thanks for adding the check for inserted syncthreads. Looks very good!

…e finish reading before writing." This reverts commit dffaa76. Revert this in favor of #383

* Basic Write-After-Read (WAR) check to add __syncthreads to end of for-loop * Enable Tiled GEMM example * Check that IterDomain iterates from zero to some positive integer Co-authored-by: Ryan Spring <rspring@nvidia.com>

* Get a crazy test example working. * Change problem size and tile size, still an issue with N > 32. * Add sync threads in loops that read from smem, to make sure we finish reading before writing. * Predicate off threads bound to a broadcast dim of an output when its in shared memory. * Predicate smem tiling writing based on broadcasted dims in consumer. * Cleanup example a bit. * Revert "Add sync threads in loops that read from smem, to make sure we finish reading before writing." This reverts commit dffaa76. Revert this in favor of #383 * Add _syncthreads for Write-After-Read Race (#383) * Basic Write-After-Read (WAR) check to add __syncthreads to end of for-loop * Enable Tiled GEMM example * Check that IterDomain iterates from zero to some positive integer Co-authored-by: Ryan Spring <rspring@nvidia.com> * Refactor thread predication for writes to smem Co-authored-by: Naoya Maruyama <nmaruyama@nvidia.com> Co-authored-by: Ryan Spring <rdspring1@gmail.com> Co-authored-by: Ryan Spring <rspring@nvidia.com>

rdspring1 force-pushed the rds_smem_war branch from 8e2c13f to 033921e Compare September 16, 2020 16:35

rdspring1 requested review from naoyam and tlemo September 16, 2020 16:35

naoyam reviewed Sep 17, 2020

View reviewed changes

Comment thread torch/csrc/jit/codegen/cuda/lower_insert_syncs.cpp Outdated

naoyam reviewed Sep 17, 2020

View reviewed changes

Comment thread torch/csrc/jit/codegen/cuda/lower_insert_syncs.cpp Outdated

naoyam reviewed Sep 17, 2020

View reviewed changes

Comment thread torch/csrc/jit/codegen/cuda/lower_insert_syncs.cpp Outdated

naoyam reviewed Sep 17, 2020

View reviewed changes

Comment thread torch/csrc/jit/codegen/cuda/ir_nodes.cpp Outdated

tlemo requested changes Sep 17, 2020

View reviewed changes

rdspring1 added 3 commits September 17, 2020 17:58

Basic Write-After-Read check to add __syncthreads to end of for-loop

2abf1f6

Enable Tiled GEMM example

fc67425

Check that IterDomain iterates from zero to positive integer

505e6c7

rdspring1 force-pushed the rds_smem_war branch from d7c85eb to e3e3eef Compare September 18, 2020 02:04

Replace std::set_intersection with intersection_size

c7db9fc

rdspring1 force-pushed the rds_smem_war branch from e3e3eef to c7db9fc Compare September 18, 2020 02:19

Adding WAR Hazard Sync Check

d397987

naoyam approved these changes Sep 18, 2020

View reviewed changes

tlemo requested changes Sep 18, 2020

View reviewed changes

Split Debug Information from KernelSummary

2ba5000

tlemo reviewed Sep 18, 2020

View reviewed changes

Comment thread torch/csrc/jit/codegen/cuda/kernel.cpp

rdspring1 force-pushed the rds_smem_war branch from 6ebd1bd to 2ba5000 Compare September 18, 2020 18:20

tlemo reviewed Sep 18, 2020

View reviewed changes

Comment thread torch/csrc/jit/codegen/cuda/kernel.h Outdated

tlemo reviewed Sep 18, 2020

View reviewed changes

Comment thread torch/csrc/jit/codegen/cuda/kernel.cpp

rdspring1 added 2 commits September 18, 2020 12:05

Replace size check with detect_intersection

63b6229

Merge Debug Information into KernelSummary

8e7d463

tlemo approved these changes Sep 18, 2020

View reviewed changes

rdspring1 merged commit 944dad5 into 20_8_18_devel Sep 18, 2020

rdspring1 deleted the rds_smem_war branch September 18, 2020 20:50

naoyam pushed a commit that referenced this pull request Sep 21, 2020

Revert "Add sync threads in loops that read from smem, to make sure w…

2eaa5c4

…e finish reading before writing." This reverts commit dffaa76. Revert this in favor of #383

Conversation

rdspring1 commented Sep 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naoyam commented Sep 16, 2020

Uh oh!

rdspring1 commented Sep 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naoyam commented Sep 16, 2020

Uh oh!

rdspring1 commented Sep 16, 2020

Uh oh!

naoyam commented Sep 16, 2020

Uh oh!

naoyam commented Sep 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rdspring1 commented Sep 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

naoyam commented Sep 17, 2020

Uh oh!

rdspring1 commented Sep 17, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

naoyam commented Sep 18, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rdspring1 commented Sep 16, 2020 •

edited

Loading

rdspring1 commented Sep 16, 2020 •

edited

Loading

naoyam commented Sep 17, 2020 •

edited

Loading

rdspring1 commented Sep 17, 2020 •

edited

Loading