Skip to content

c10::KernelFunction#26337

Closed
smessmer wants to merge 20 commits intogh/smessmer/49/basefrom
gh/smessmer/49/head
Closed

c10::KernelFunction#26337
smessmer wants to merge 20 commits intogh/smessmer/49/basefrom
gh/smessmer/49/head

Conversation

@smessmer
Copy link
Contributor

@smessmer smessmer commented Sep 17, 2019

Stack from ghstack:

  • Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class
  • Move that class and everything else it depends on into ATen/core/boxing
  • This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction.
  • We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly.
  • To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed.

Differential Revision: D17416607

Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/)

[ghstack-poisoned]
@smessmer smessmer requested a review from apaszke as a code owner September 17, 2019 05:25
@pytorchbot pytorchbot added caffe2 oncall: jit Add this issue/PR to JIT oncall triage queue module: internals Related to internal abstractions in c10 and ATen module: operators labels Sep 17, 2019
smessmer added a commit that referenced this pull request Sep 17, 2019
Pull Request resolved: #26337


ghstack-source-id: 90228146

Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/)
@pytorchbot pytorchbot added the module: docs Related to our documentation, both in docs/ and docblocks label Sep 17, 2019
namespace detail {
template<class Type>
constexpr uint64_t hashType() {
return typeid(Type).hash_code()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you verify (diasm in gdb or something) that it indeed becomes a constexpr value?

// We use this indirection because many KernelFunctions are created
// at static initialization time but are created with functors that
// store Tensor and we can't call the Tensor() constructor at static
// initialization time yet (SIOF). So these register with a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in C2 world it's not true - we wouldn't create an instance of operator at registration time and lifetime of operator would be managed by the invocation context, not centralized dispatcher. I honestly don't think there's a usecase for kernelFactory_ today (and in the earlier design it was not fully figured out either). Can't we just nuke the functorFactory and just declare that it's always functor?

@smessmer smessmer requested a review from dzhulgakov September 19, 2019 18:41

- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class
- Move that class and everything else it depends on into ATen/core/boxing
- This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction.
- We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly.
- To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed.

Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/)

[ghstack-poisoned]

- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class
- Move that class and everything else it depends on into ATen/core/boxing
- This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction.
- We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly.
- To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed.

Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/)

[ghstack-poisoned]
@smessmer smessmer requested a review from ezyang September 19, 2019 23:02

- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class
- Move that class and everything else it depends on into ATen/core/boxing
- This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction.
- We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly.
- To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed.

Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/)

[ghstack-poisoned]

- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class
- Move that class and everything else it depends on into ATen/core/boxing
- This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction.
- We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly.
- To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed.

Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/)

[ghstack-poisoned]
TORCH_INTERNAL_ASSERT(false, "Tried to call KernelFunction::callBoxed() on an uninitialized KernelFunction.");
} else {
// TODO We want to introduce the invariant that all kernels must be callable in a boxed way, then this case should be impossible.
TORCH_INTERNAL_ASSERT(false, "Tried to call KernelFunction::callBoxed() on a KernelFunction that can only be called with KernelFunction::callUnboxed().");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in the meantime, this should be a TORCH_CHECK, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. This doesn't fail due to user error but if we mucked up codegen. KernelFunction is not called directly by users and I don't think they can trigger this by using public APIs. Can you give an explanation why it should be TORCH_CHECK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the TODO implied this was triggerable from userspace.

*/
void callBoxed(Stack* stack) const {
if (C10_UNLIKELY(boxed_kernel_func_ == nullptr)) {
if (unboxed_kernel_func_ == nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an organizational point, it would be nice for this error handling code to live in the cpp file rather than the header. There's no reason for it to live inline, and it would make certain types of refactoring/debugging work easier (as it would avoid a recompilation of the world)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling operators is a hot path that I want to be fully inlineable. I could move it to a KernelFunction_inl.h header if you think that's better? Let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about the error reporting code, not the calling code.

*
* KernelFunction::callUnboxed() is generally better, since it will allow
* calling KernelFunctions that are backed by either boxed or unboxed
* kernels, but that one will not work for all types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, is this a compilation-time error? I would prefer if we have a single entry point callUnboxed, that just skips the fallback to boxed types if the types in question are not supported. This is how I implemented it in #26368

Copy link
Contributor Author

@smessmer smessmer Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I chose a bit a different approach here. I want the compiler to show an error if you have an op with use_c10_dispatcher: full but the boxed fallback doesn't work. That means the use_c10_dispatcher flag is the only configuration setting you have to move to fully migrate ops over to the c10 dispatcher. If you have some SFINAE config to remove ops from creating boxed wrappers, that would add an additional configuration setting and make the system more complex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't buy the argument that it's more complex. The reason I don't buy it is because #26368 is able to avoid a bunch of pieces which you have to do in this PR:

The only downside is that you don't get compile time errors in the way you've done here. But it's easy to turn all of the runtime errors into compile time errors, to see if you have 100% coverage: change supports_boxed_fallback to return unconditionally true.

// TODO Remove this function once all kernels support a boxed variant

if (unboxed_kernel_func_ != nullptr) {
using ActualSignature = Return (OperatorKernel*, Args...);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signature looks weird to me. Why is the unboxed kernel taking an OperatorKernel* as an argument? (I.e., what does this enable you to implement?) For reference, unboxed FuncType in #26368 is just Return(Args...) which provides evidence that you don't need the OperatorKernel*

Copy link
Contributor Author

@smessmer smessmer Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The autogenerated boxed wrappers need to call into the unboxed functor so they need a pointer to it. This replaces what c10::KernelCache did before, which was also passed in to the boxed kernel. In #26368, you don't call unboxed from boxed but only the other way round, which is much easier and which is why you don't need this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right!

if (unboxed_kernel_func_ != nullptr) {
using ActualSignature = Return (OperatorKernel*, Args...);
ActualSignature* func = reinterpret_cast<ActualSignature*>(unboxed_kernel_func_);
return (*func)(getFunctor_(), std::forward<Args>(args)...);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using the term functor here in an untraditional way? In C++, I expect functor to be a "function like object". But you don't call the functor, you call a function pointer, passing the functor as an argument. That's weird?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is calling into the boxed kernel, passing in the functor so the boxing wrapper can call it if the boxed kernel is just a wrapper that calls into the unboxed functor.

, unboxed_kernel_func_(unboxed_kernel_func)
{}

OperatorKernel* getFunctor_() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this method have a trailing underscore? Why is it called "get functor" but returns an "operator kernel*"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because c10::OperatorKernel is the public API all kernel functors have to inherit from in the registration API.

(*boxed_kernel_func)(functor, &stack);

TORCH_INTERNAL_ASSERT(stack.size() == 1, "A boxed kernel should only push one return to the stack");
return std::move(stack[0]).to<Return>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't stack[0] already a temporary, so the std::move is not necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, std::vector<T>::operator[] returns a T&, not a T.

}
};
template<class... Args>
struct boxAndCallBoxedFunc<void, Args...> final {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, I hate that void needs special casing here XD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, me too. if constexpr would fix this but that's C++17 only.

return OpKernel(kernel.kernel_func, kernel.cache_creator_func, kernel.unboxed_kernel_func);
template<class Return, class... Args>
Return callUnboxed(TensorTypeId dispatchKey, Args... args) const {
// TODO Remove dispatchKey argument and instead infer dispatchKey from args...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think you are aware, multi_dispatch_tensor_type_set should work well for this.

Copy link
Contributor Author

@smessmer smessmer Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's a perfect match. I'll probably move the logic into the dispatch table so it's next to the boxed logic calling multi_dispatch_tensor_type_set.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an improvement. But I still think this code can be simplified more.

@smessmer
Copy link
Contributor Author

Going to land this. As summary, I think there's three main points we need to address in future:

  • It doesn't do error checking for callUnboxed() yet. I'm currently working on moving our code base to GCC 5, which would allow me to add error checking using compile time type ids.
  • The kernelFactory_ is only needed if we want to allow kernel functors to store Tensors (or other things that can't be created at static initialization time).
  • The API is quite wide because our custom op API offers a lot of ways of creating kernels. This could be narrowed if we're willing to break the custom op API.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in ed207b5.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Sep 21, 2019
Summary:
Pull Request resolved: pytorch/pytorch#26337

- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class
- Move that class and everything else it depends on into ATen/core/boxing
- This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction.
- We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly.
- To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed.

ghstack-source-id: 90459911

Test Plan: unit tests

Differential Revision: D17416607

fbshipit-source-id: fd221f1d70eb3f1b4d33092eaa7e37d25684c934
mingbowan pushed a commit to mingbowan/pytorch that referenced this pull request Sep 23, 2019
Summary:
Pull Request resolved: pytorch#26337

- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class
- Move that class and everything else it depends on into ATen/core/boxing
- This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction.
- We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly.
- To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed.

ghstack-source-id: 90459911

Test Plan: unit tests

Differential Revision: D17416607

fbshipit-source-id: fd221f1d70eb3f1b4d33092eaa7e37d25684c934
@facebook-github-bot facebook-github-bot deleted the gh/smessmer/49/head branch October 28, 2019 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

caffe2 Merged module: build Build system issues module: docs Related to our documentation, both in docs/ and docblocks module: internals Related to internal abstractions in c10 and ATen oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants