Skip to content

Fix vec validation on expanded broadcasting#2044

Merged
zasdfgbnm merged 7 commits intodevelfrom
fix-vec-validation
Oct 14, 2022
Merged

Fix vec validation on expanded broadcasting#2044
zasdfgbnm merged 7 commits intodevelfrom
fix-vec-validation

Conversation

@zasdfgbnm
Copy link
Copy Markdown
Collaborator

Fixes #1843. Thanks, @jjsjann123 for the test and discussion.

@zasdfgbnm zasdfgbnm requested review from jjsjann123 and naoyam October 8, 2022 00:07
testValidate(&fusion, cg_outputs, {input1}, {input1}, __LINE__, __FILE__);
}

TEST_F(NVFuserTest, FusionVectorizeRepro_CUDA) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this:

// Repro of issue #1843

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Comment on lines +6522 to +6528
auto tv7 = sum(tv0, {1}, true);
auto tv_exp =
expand(tv7, {tv0->axis(0)->extent(), IrBuilder::create<Int>(32128)});
auto tv3 = exp(tv1);
auto tv8 = mul(tv3, tv_exp);
auto tv13 = sub(tv0, tv8);
fusion->addOutput(tv13);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually reproduce the issue? It seems this PR addresses an issue with validation vectorization of input and output tensors with an expanded broadcast IterDomain. This test has an expanded broadcast, but that isn't in any of input or output tensors. Am I missing anything?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, the test is tricky, there's a fusion segmentation happening right at the expand op.

I tried to get a repro without that segmentation, somehow that doesn't trigger the issue cold_sweat

Copy link
Copy Markdown
Collaborator

@naoyam naoyam Oct 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Can you please add that as a comment to the test? It'd be great if we could understand why it wouldn't fail without segmentation.

const auto size = aten_tensor.sizes().at(i);
auto root_id = tv->getMaybeRFactorDomain()[i];
const auto is_expanded_broadcasting =
root_id->isBroadcast() && root_id->hasExpandedExtent();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check for expanded broadcast here? I was under the impression that even a normal broadcast with an arbitrary stride should also pass the test, since we never index into broadcast dimension and it shouldn't break vectorization.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-expanded broadcasting is covered by the size == 1 check below. I keep "expanded broadcast" a separate case because I want to check that, if it is an expanded broadcast, then it needs to have stride 0. But for regular broadcast, there is no such limitation.

Copy link
Copy Markdown
Owner

@csarofeen csarofeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix looks good, but test needs repro in title, and we might want to make one that doesn't rely on how segmentation is applied to the fusion.

For the repro we should probably just mark tv7_exp as an input.

@jjsjann123
Copy link
Copy Markdown
Collaborator

For the repro we should probably just mark tv7_exp as an input.

Maybe I did something stupid when I was trying to repro it, but having tv7_exp as an input, even with the fastest dim implicitly broadcasted with correct shape and 0-stride seems to make it pick up only the pointwise scheduler and doesn't repro the problem.

Hopefully Xiang can easily get a repro with a simplified definition. 🤞

zasdfgbnm added a commit that referenced this pull request Oct 12, 2022
- Make `TensorViewBuilder` able to generate expanded tensor.
- Fixes a problem on the contiguity found while working on #2044. Note that an expanded dimension and the dimension before can never be contiguous. Without fixing this, the added test will generate silent wrong result.
- Validate contiguity of expanded dimensions during lowering
@jjsjann123
Copy link
Copy Markdown
Collaborator

jjsjann123 commented Oct 13, 2022

Noticed that this one hasn't been merged yet. Would be nice to get it. 🙇
cc'ing @kevinstephano We'd probably want to cherry-pick this one as well.

Copy link
Copy Markdown
Owner

@csarofeen csarofeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test references the issue at hand mentioning segmentation requirement for reproduction. Merging as is.

@csarofeen
Copy link
Copy Markdown
Owner

@zasdfgbnm to rebase when you have a chance.

@zasdfgbnm zasdfgbnm merged commit 379d0f8 into devel Oct 14, 2022
@zasdfgbnm zasdfgbnm deleted the fix-vec-validation branch October 14, 2022 17:17
@zasdfgbnm zasdfgbnm restored the fix-vec-validation branch October 14, 2022 17:17
@zasdfgbnm zasdfgbnm deleted the fix-vec-validation branch October 14, 2022 17:17
@zasdfgbnm zasdfgbnm restored the fix-vec-validation branch October 14, 2022 17:17
@zasdfgbnm zasdfgbnm deleted the fix-vec-validation branch October 14, 2022 17:17
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Nov 10, 2022
- Make `TensorViewBuilder` able to generate expanded tensor.
- Fixes a problem on the contiguity found while working on csarofeen/pytorch#2044. Note that an expanded dimension and the dimension before can never be contiguous. Without fixing this, the added test will generate silent wrong result.
- Validate contiguity of expanded dimensions during lowering
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TIMM Stride Vectorization Error

4 participants