Skip to content

Add expand and expandAs support in arith and fusion executor.#1739

Merged
csarofeen merged 4 commits intobroadcast_cleanupfrom
expand_support
Jun 3, 2022
Merged

Add expand and expandAs support in arith and fusion executor.#1739
csarofeen merged 4 commits intobroadcast_cleanupfrom
expand_support

Conversation

@csarofeen
Copy link
Copy Markdown
Owner

Add support for expand and expandAs. This just potentially adds expand on output tensors. Support is full dynamic shapes (see test case for multiple uses).

Interface is a bit clunky, but works fine, it's hard to simplify since we may want to take in symbolic integers that could even be passed in directly as an input (shown in test case).

I did not do the python interface, was going to leave that to @kevinstephano. Kevin, would recommend just looking through the test and arith.cpp/h.

Also:

  • Added a helper function in arith.cpp to "promote" sizes.
  • Added isConstInt and evaluateInt on Val which allows us to easily infer a compile time constant value.
  • Added expandedExtent through IterDomain to indicate on a broadcast dimension what the expanded size should be (only used if it's allocated as an output of a fusion).

Comment thread torch/csrc/jit/codegen/cuda/ir_base_nodes.h
Comment thread torch/csrc/jit/codegen/cuda/arith.cpp Outdated
Comment thread torch/csrc/jit/codegen/cuda/arith.cpp Outdated
iter_types[i] = dom[i]->getIterType();
}
extent_vals[i] = promoteSize(extent_vals[i], dom[i]->extent());
iter_types[i] = dom[i]->getIterType();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For extra safety, having something below would make me feel better:

if (iter_types[i] == nullptr) {
  iter_types[i] = dom[i]->getIterType();
} else {
  TORCH_INTERNAL_ASSERT( iter_types[i] == dom[i]->getIterType());
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a note, but we can't do this now because of gather. It seems to me gather can be promoted to IterType::Serial pretty aggressively, but one thing we should do, is if people gather with a window size of 1, shouldn't we just make that an explicit broadcast? Is this what convertToBroadcast was doing? If so, that should just be done in gather definition.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Propagation of IterDomain types seem to need to be done more carefully.

I think we can convert size-one gather to just a broadcast.

Comment thread torch/csrc/jit/codegen/cuda/arith.cpp Outdated
IrBuilder::create<TensorDomain>(
out_domain, std::vector<bool>(out_domain.size(), true)),
inp->getDataType().value());
IrBuilder::create<UnaryOp>(UnaryOpType::Set, out_tensor, inp);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I thought this would be a binary op, but there's no actual data dependency with other, so unary seems to make more sense.

Well, but what if a tensor was only used as an other argument? We still would need that tensor, but such tensors would not be included in, e.g., getAllValsBetween. I wonder if that could cause any problem in lowering.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make sure it works with real segmentation, but yes it has been working as all we need is to be able to infer the size of the other tensor at runtime. I have an explicit test which works fine as the tensor is marked as an input. Will try to check in a case where there's actual segmentation and the second kernel is the one with the expand.

Comment thread torch/csrc/jit/codegen/cuda/arith.cpp
Comment thread torch/csrc/jit/codegen/cuda/executor.cpp
Comment thread torch/csrc/jit/codegen/cuda/executor.cpp Outdated
Copy link
Copy Markdown
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

As a follow-up, we should use the expanded-as size information to make broadcast concretization mappings more precise and less conservative.

@csarofeen csarofeen merged commit 7d271f2 into broadcast_cleanup Jun 3, 2022
csarofeen added a commit that referenced this pull request Jun 3, 2022
Add expand and expandAs support in arith and fusion executor. (#1739)
* Make expandOp Expr.
@jjsjann123
Copy link
Copy Markdown
Collaborator

Linking issue #1739

@jjsjann123 jjsjann123 deleted the expand_support branch June 4, 2022 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants