Skip to content

Assert that kernels are called with the right signature#38361

Closed
smessmer wants to merge 24 commits intogh/smessmer/220/basefrom
gh/smessmer/220/head
Closed

Assert that kernels are called with the right signature#38361
smessmer wants to merge 24 commits intogh/smessmer/220/basefrom
gh/smessmer/220/head

Conversation

@smessmer
Copy link
Contributor

@smessmer smessmer commented May 12, 2020

Stack from ghstack:

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

Update: It's not in the call path anymore but it's now in the path when you query an OperatorHandle.

supersedes #26482

Differential Revision: D21534052

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request May 12, 2020
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

ghstack-source-id: 103966297
Pull Request resolved: #38361
@dr-ci
Copy link

dr-ci bot commented May 12, 2020

💊 CI failures summary and remediations

As of commit b04a010 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 135 times.

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request May 14, 2020
Pull Request resolved: #38361

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes D17485438

ghstack-source-id: 104070456

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request May 15, 2020
Pull Request resolved: #38361

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes D17485438

ghstack-source-id: 104189472

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
@smessmer smessmer requested a review from ezyang May 15, 2020 22:50
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request May 15, 2020
Pull Request resolved: #38361

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes D17485438

ghstack-source-id: 104222283

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)
@ezyang
Copy link
Contributor

ezyang commented May 18, 2020

The test failures might be because in some situations, we have multiple binary compatible versions of the library which have different RTTI. I've noticed this previously in ASAN. Here is a relevant note from the CI:

      # NB: We load libtorch.so with RTLD_GLOBAL for UBSAN, unlike our
      # default behavior.
      #
      # The reason for this is that without RTLD_GLOBAL, if we load multiple
      # libraries that depend on libtorch (as is the case with C++ extensions), we
      # will get multiple copies of libtorch in our address space.  When UBSAN is
      # turned on, it will do a bunch of virtual pointer consistency checks which
      # won't work correctly.  When this happens, you get a violation like:
      #
      #    member call on address XXXXXX which does not point to an object of
      #    type 'std::_Sp_counted_base<__gnu_cxx::_Lock_policy::_S_atomic>'
      #    XXXXXX note: object is of type
      #    'std::_Sp_counted_ptr<torch::nn::LinearImpl*, (__gnu_cxx::_Lock_policy)2>'
      #
      # (NB: the textual types of the objects here are misleading, because
      # they actually line up; it just so happens that there's two copies
      # of the type info floating around in the address space, so they
      # don't pointer compare equal.  See also
      #   https://github.com/google/sanitizers/issues/1175
      #
      # UBSAN is kind of right here: if we relied on RTTI across C++ extension
      # modules they would indeed do the wrong thing;  but in our codebase, we
      # don't use RTTI (because it doesn't work in mobile).  To appease
      # UBSAN, however, it's better if we ensure all the copies agree!
      #
      # By the way, an earlier version of this code attempted to load
      # libtorch_python.so with LD_PRELOAD, which has a similar effect of causing
      # it to be loaded globally.  This isn't really a good idea though, because
      # it depends on a ton of dynamic libraries that most programs aren't gonna
      # have, and it applies to child processes.

One way I know how to fix this problem is if we roll our own RTTI mechanism, manually assembling a type index. But that's a bunch of work. I'm not sure if there's any better option. Resolving the RTLD_GLOBAL problem will be quite challenging in general.

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request May 18, 2020
Pull Request resolved: #38361

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes D17485438

ghstack-source-id: 104319639

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request May 19, 2020
Pull Request resolved: #38361

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes D17485438

ghstack-source-id: 104378396

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay, much needed!

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
@smessmer smessmer mentioned this pull request Jun 10, 2020
smessmer added 9 commits June 11, 2020 13:22
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
smessmer added 2 commits June 16, 2020 18:37
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
@smessmer
Copy link
Contributor Author

The test failures might be because in some situations, we have multiple binary compatible versions of the library which have different RTTI. I've noticed this previously in ASAN. Here is a relevant note from the CI:

      # NB: We load libtorch.so with RTLD_GLOBAL for UBSAN, unlike our
      # default behavior.
      #
      # The reason for this is that without RTLD_GLOBAL, if we load multiple
      # libraries that depend on libtorch (as is the case with C++ extensions), we
      # will get multiple copies of libtorch in our address space.  When UBSAN is
      # turned on, it will do a bunch of virtual pointer consistency checks which
      # won't work correctly.  When this happens, you get a violation like:
      #
      #    member call on address XXXXXX which does not point to an object of
      #    type 'std::_Sp_counted_base<__gnu_cxx::_Lock_policy::_S_atomic>'
      #    XXXXXX note: object is of type
      #    'std::_Sp_counted_ptr<torch::nn::LinearImpl*, (__gnu_cxx::_Lock_policy)2>'
      #
      # (NB: the textual types of the objects here are misleading, because
      # they actually line up; it just so happens that there's two copies
      # of the type info floating around in the address space, so they
      # don't pointer compare equal.  See also
      #   https://github.com/google/sanitizers/issues/1175
      #
      # UBSAN is kind of right here: if we relied on RTTI across C++ extension
      # modules they would indeed do the wrong thing;  but in our codebase, we
      # don't use RTTI (because it doesn't work in mobile).  To appease
      # UBSAN, however, it's better if we ensure all the copies agree!
      #
      # By the way, an earlier version of this code attempted to load
      # libtorch_python.so with LD_PRELOAD, which has a similar effect of causing
      # it to be loaded globally.  This isn't really a good idea though, because
      # it depends on a ton of dynamic libraries that most programs aren't gonna
      # have, and it applies to child processes.

One way I know how to fix this problem is if we roll our own RTTI mechanism, manually assembling a type index. But that's a bunch of work. I'm not sure if there's any better option. Resolving the RTLD_GLOBAL problem will be quite challenging in general.

CI on the PR is passing now. I'll try landing this and see what happens. If something breaks in the master-only test cases, we might have to revert this.

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes #26482

Differential Revision: [D21534052](https://our.internmc.facebook.com/intern/diff/D21534052/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 55cdd31.

xwang233 pushed a commit to xwang233/pytorch that referenced this pull request Jun 20, 2020
Summary:
Pull Request resolved: pytorch#38361

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes D17485438

ghstack-source-id: 106178820

Test Plan: waitforsandcastle

Differential Revision: D21534052

fbshipit-source-id: 6be436a3f20586277a051d764af29e21d5567da0
@facebook-github-bot facebook-github-bot deleted the gh/smessmer/220/head branch June 22, 2020 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants