Conversation
Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
Pull Request resolved: #26337 ghstack-source-id: 90228146 Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/)
Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
| namespace detail { | ||
| template<class Type> | ||
| constexpr uint64_t hashType() { | ||
| return typeid(Type).hash_code() |
There was a problem hiding this comment.
did you verify (diasm in gdb or something) that it indeed becomes a constexpr value?
| // We use this indirection because many KernelFunctions are created | ||
| // at static initialization time but are created with functors that | ||
| // store Tensor and we can't call the Tensor() constructor at static | ||
| // initialization time yet (SIOF). So these register with a |
There was a problem hiding this comment.
Even in C2 world it's not true - we wouldn't create an instance of operator at registration time and lifetime of operator would be managed by the invocation context, not centralized dispatcher. I honestly don't think there's a usecase for kernelFactory_ today (and in the earlier design it was not fully figured out either). Can't we just nuke the functorFactory and just declare that it's always functor?
- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class - Move that class and everything else it depends on into ATen/core/boxing - This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction. - We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly. - To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed. Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class - Move that class and everything else it depends on into ATen/core/boxing - This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction. - We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly. - To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed. Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class - Move that class and everything else it depends on into ATen/core/boxing - This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction. - We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly. - To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed. Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class - Move that class and everything else it depends on into ATen/core/boxing - This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction. - We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly. - To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed. Differential Revision: [D17416607](https://our.internmc.facebook.com/intern/diff/D17416607/) [ghstack-poisoned]
| TORCH_INTERNAL_ASSERT(false, "Tried to call KernelFunction::callBoxed() on an uninitialized KernelFunction."); | ||
| } else { | ||
| // TODO We want to introduce the invariant that all kernels must be callable in a boxed way, then this case should be impossible. | ||
| TORCH_INTERNAL_ASSERT(false, "Tried to call KernelFunction::callBoxed() on a KernelFunction that can only be called with KernelFunction::callUnboxed()."); |
There was a problem hiding this comment.
So, in the meantime, this should be a TORCH_CHECK, right?
There was a problem hiding this comment.
I don't think so. This doesn't fail due to user error but if we mucked up codegen. KernelFunction is not called directly by users and I don't think they can trigger this by using public APIs. Can you give an explanation why it should be TORCH_CHECK?
There was a problem hiding this comment.
I thought the TODO implied this was triggerable from userspace.
| */ | ||
| void callBoxed(Stack* stack) const { | ||
| if (C10_UNLIKELY(boxed_kernel_func_ == nullptr)) { | ||
| if (unboxed_kernel_func_ == nullptr) { |
There was a problem hiding this comment.
As an organizational point, it would be nice for this error handling code to live in the cpp file rather than the header. There's no reason for it to live inline, and it would make certain types of refactoring/debugging work easier (as it would avoid a recompilation of the world)
There was a problem hiding this comment.
Calling operators is a hot path that I want to be fully inlineable. I could move it to a KernelFunction_inl.h header if you think that's better? Let me know.
There was a problem hiding this comment.
I'm talking about the error reporting code, not the calling code.
| * | ||
| * KernelFunction::callUnboxed() is generally better, since it will allow | ||
| * calling KernelFunctions that are backed by either boxed or unboxed | ||
| * kernels, but that one will not work for all types. |
There was a problem hiding this comment.
I don't understand, is this a compilation-time error? I would prefer if we have a single entry point callUnboxed, that just skips the fallback to boxed types if the types in question are not supported. This is how I implemented it in #26368
There was a problem hiding this comment.
Yes, I chose a bit a different approach here. I want the compiler to show an error if you have an op with use_c10_dispatcher: full but the boxed fallback doesn't work. That means the use_c10_dispatcher flag is the only configuration setting you have to move to fully migrate ops over to the c10 dispatcher. If you have some SFINAE config to remove ops from creating boxed wrappers, that would add an additional configuration setting and make the system more complex.
There was a problem hiding this comment.
I don't buy the argument that it's more complex. The reason I don't buy it is because #26368 is able to avoid a bunch of pieces which you have to do in this PR:
- There's no per-operator setting saying whether or not it's
fullorunboxedOnly. The templates figure it out automatically. - There's no change to code generation, as is here. The templates figure it out automatically.
- The actual difference in the dispatcher is a single if statement https://github.com/pytorch/pytorch/pull/26368/files#diff-4f2e119446bc2edf856334581e5c70c9R217 testing if the boxed version is supported, and error if it is not
The only downside is that you don't get compile time errors in the way you've done here. But it's easy to turn all of the runtime errors into compile time errors, to see if you have 100% coverage: change supports_boxed_fallback to return unconditionally true.
| // TODO Remove this function once all kernels support a boxed variant | ||
|
|
||
| if (unboxed_kernel_func_ != nullptr) { | ||
| using ActualSignature = Return (OperatorKernel*, Args...); |
There was a problem hiding this comment.
This signature looks weird to me. Why is the unboxed kernel taking an OperatorKernel* as an argument? (I.e., what does this enable you to implement?) For reference, unboxed FuncType in #26368 is just Return(Args...) which provides evidence that you don't need the OperatorKernel*
There was a problem hiding this comment.
The autogenerated boxed wrappers need to call into the unboxed functor so they need a pointer to it. This replaces what c10::KernelCache did before, which was also passed in to the boxed kernel. In #26368, you don't call unboxed from boxed but only the other way round, which is much easier and which is why you don't need this.
| if (unboxed_kernel_func_ != nullptr) { | ||
| using ActualSignature = Return (OperatorKernel*, Args...); | ||
| ActualSignature* func = reinterpret_cast<ActualSignature*>(unboxed_kernel_func_); | ||
| return (*func)(getFunctor_(), std::forward<Args>(args)...); |
There was a problem hiding this comment.
Are you using the term functor here in an untraditional way? In C++, I expect functor to be a "function like object". But you don't call the functor, you call a function pointer, passing the functor as an argument. That's weird?
There was a problem hiding this comment.
this is calling into the boxed kernel, passing in the functor so the boxing wrapper can call it if the boxed kernel is just a wrapper that calls into the unboxed functor.
| , unboxed_kernel_func_(unboxed_kernel_func) | ||
| {} | ||
|
|
||
| OperatorKernel* getFunctor_() const { |
There was a problem hiding this comment.
Why does this method have a trailing underscore? Why is it called "get functor" but returns an "operator kernel*"?
There was a problem hiding this comment.
because c10::OperatorKernel is the public API all kernel functors have to inherit from in the registration API.
| (*boxed_kernel_func)(functor, &stack); | ||
|
|
||
| TORCH_INTERNAL_ASSERT(stack.size() == 1, "A boxed kernel should only push one return to the stack"); | ||
| return std::move(stack[0]).to<Return>(); |
There was a problem hiding this comment.
Isn't stack[0] already a temporary, so the std::move is not necessary?
There was a problem hiding this comment.
no, std::vector<T>::operator[] returns a T&, not a T.
| } | ||
| }; | ||
| template<class... Args> | ||
| struct boxAndCallBoxedFunc<void, Args...> final { |
There was a problem hiding this comment.
Man, I hate that void needs special casing here XD
There was a problem hiding this comment.
yes, me too. if constexpr would fix this but that's C++17 only.
| return OpKernel(kernel.kernel_func, kernel.cache_creator_func, kernel.unboxed_kernel_func); | ||
| template<class Return, class... Args> | ||
| Return callUnboxed(TensorTypeId dispatchKey, Args... args) const { | ||
| // TODO Remove dispatchKey argument and instead infer dispatchKey from args... |
There was a problem hiding this comment.
As I think you are aware, multi_dispatch_tensor_type_set should work well for this.
There was a problem hiding this comment.
yes, that's a perfect match. I'll probably move the logic into the dispatch table so it's next to the boxed logic calling multi_dispatch_tensor_type_set.
ezyang
left a comment
There was a problem hiding this comment.
This seems like an improvement. But I still think this code can be simplified more.
|
Going to land this. As summary, I think there's three main points we need to address in future:
|
|
This pull request has been merged in ed207b5. |
Summary: Pull Request resolved: pytorch/pytorch#26337 - Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class - Move that class and everything else it depends on into ATen/core/boxing - This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction. - We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly. - To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed. ghstack-source-id: 90459911 Test Plan: unit tests Differential Revision: D17416607 fbshipit-source-id: fd221f1d70eb3f1b4d33092eaa7e37d25684c934
Summary: Pull Request resolved: pytorch#26337 - Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class - Move that class and everything else it depends on into ATen/core/boxing - This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction. - We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly. - To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed. ghstack-source-id: 90459911 Test Plan: unit tests Differential Revision: D17416607 fbshipit-source-id: fd221f1d70eb3f1b4d33092eaa7e37d25684c934
Stack from ghstack:
Differential Revision: D17416607