Move grpc_shutdown internals to a detached thread#17308
Conversation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dusting this off. Tests are reasonably green. There are some python gevent test failures and the python team will take a look. |
ericgribkoff
left a comment
There was a problem hiding this comment.
Python will need to use grpc_shutdown_blocking as well to allow fork support to work properly. I think this may also fix the failing gevent tests as well - I patched this locally with a switch to grpc_shutdown_blocking and was able to get "py27_gevent.test.unit._cython.cygrpc_test.TypeSmokeTest", which is failing on Kokoro, to pass.
Python (Cython) calls grpc_shutdown in several different files. I tested locally just by aliasing the function in src/python/grpcio/grpc/_cython/_cygrpc/grpc.pxi, but it would be better to not alias and just update the calls throughout the cython layer. I'm happy to add a commit to this branch if that's easier for you.
src/core/lib/iomgr/fork_posix.cc
Outdated
|
|
||
| void grpc_prefork() { | ||
| skipped_handler = true; | ||
| grpc_maybe_wait_for_async_shutdown(); |
There was a problem hiding this comment.
I don't think this is necessary here (if it is, it needs to go after the check of grpc_is_initialized() below, as this may be called by the OS after core has shutdown).
Apologies for the lack of test coverage around this; still working through some unrelated issues, but #16513 is adding tests for the fork use case.
Core does not shutdown as part of its pre-fork handlers; Python (and PHP, as far as I know) shut down core in the child's post-fork handler, but that is handled outside of core's fork handlers.
markdroth
left a comment
There was a problem hiding this comment.
This looks good! My comments are mostly minor things.
Reviewed 31 of 31 files at r1.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @ericgribkoff and @yang-g)
include/grpc/grpc.h, line 78 at r1 (raw file):
The last call to grpc_shutdown will initiate cleaning up of grpc library internals, which can happen in another thread. Once the clean-up is done, no memory is used by grpc after this call returns, nor are any instructions
I think we should remove the "after this call returns" part, since it will now happen in another thread.
src/core/lib/gprpp/thd.h, line 53 at r1 (raw file):
public: Options() : joinable_(true), tracked_(true) {} Options& set_joinable(bool joinable) {
Please add a comment documenting what this option means.
src/core/lib/gprpp/thd.h, line 57 at r1 (raw file):
return *this; } Options& set_tracked(bool tracked) {
Same here.
src/core/lib/gprpp/thd_posix.cc, line 52 at r1 (raw file):
class ThreadInternalsPosix : public grpc_core::internal::ThreadInternalsInterface {
Not directly related to this PR, but I think you can remove all occurances of grpc_core:: in this file, since we're already inside of that namespace.
src/core/lib/gprpp/thd_windows.cc, line 51 at r1 (raw file):
void* arg; /* argument to a thread */ HANDLE join_event; /* the join event */ bool joinable; /* whether it is joinable */
Do we not need to support tracked threads on Windows?
src/core/lib/gprpp/thd_windows.cc, line 147 at r1 (raw file):
} // namespace namespace grpc_core {
Not directly related to this PR, but I think this should probably move up to line 44, so that all of the code in this file is in the grpc_core namespace.
src/core/lib/surface/init.cc, line 199 at r1 (raw file):
void grpc_shutdown_internal(void* ignored) { GRPC_API_TRACE("grpc_shutdown_internal", 0, ()); gpr_mu_lock(&g_init_mu);
Suggest using grpc_core::MutexLock here, so that we don't need to explicitly unlock in each branch below.
We can actually probably use it throughout this file.
test/core/client_channel/resolvers/dns_resolver_cooldown_test.cc, line 285 at r1 (raw file):
GRPC_COMBINER_UNREF(g_combiner, "test"); } grpc_shutdown();
Could probably just call grpc_shutdown_blocking() here.
test/core/end2end/fuzzers/api_fuzzer.cc, line 1204 at r1 (raw file):
grpc_resource_quota_unref(g_resource_quota); grpc_shutdown();
Could probably just call grpc_shutdown_blocking() here.
test/core/handshake/readahead_handshaker_server_ssl.cc, line 87 at r1 (raw file):
const char* full_alpn_list[] = {"grpc-exp", "h2"}; GPR_ASSERT(server_ssl_test(full_alpn_list, 2, "grpc-exp")); grpc_shutdown();
Could probably just use grpc_shutdown_blocking() here.
test/core/util/memory_counters.h, line 24 at r1 (raw file):
#include <grpc/support/atm.h> struct grpc_memory_counters {
Are we still using this old API anywhere outside of this module? If not, consider moving it to the .cc file.
If you do that, then maybe also consider renaming this module to leak_detector.{h,cc}.
test/core/util/memory_counters.h, line 43 at r1 (raw file):
class LeakDetector { public: explicit LeakDetector(bool enable);
Is there ever any use-case for passing enable=false?
yang-g
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @ericgribkoff and @markdroth)
include/grpc/grpc.h, line 78 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think we should remove the "after this call returns" part, since it will now happen in another thread.
Done.
src/core/lib/gprpp/thd.h, line 53 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please add a comment documenting what this option means.
Done.
src/core/lib/gprpp/thd.h, line 57 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Same here.
Done. Removed.
src/core/lib/gprpp/thd_posix.cc, line 52 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Not directly related to this PR, but I think you can remove all occurances of
grpc_core::in this file, since we're already inside of that namespace.
Done.
src/core/lib/gprpp/thd_windows.cc, line 51 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Do we not need to support tracked threads on Windows?
I have removed tracked threads because python and php (which support forking) have moved to grpc_shutdown_blocking.
src/core/lib/gprpp/thd_windows.cc, line 147 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Not directly related to this PR, but I think this should probably move up to line 44, so that all of the code in this file is in the
grpc_corenamespace.
Done.
src/core/lib/iomgr/fork_posix.cc, line 52 at r1 (raw file):
Previously, ericgribkoff (Eric Gribkoff) wrote…
I don't think this is necessary here (if it is, it needs to go after the check of
grpc_is_initialized()below, as this may be called by the OS after core has shutdown).Apologies for the lack of test coverage around this; still working through some unrelated issues, but #16513 is adding tests for the fork use case.
Core does not shutdown as part of its pre-fork handlers; Python (and PHP, as far as I know) shut down core in the child's post-fork handler, but that is handled outside of core's fork handlers.
Done.
src/core/lib/surface/init.cc, line 199 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest using
grpc_core::MutexLockhere, so that we don't need to explicitly unlock in each branch below.We can actually probably use it throughout this file.
Done.
test/core/client_channel/resolvers/dns_resolver_cooldown_test.cc, line 285 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Could probably just call
grpc_shutdown_blocking()here.
Done.
test/core/end2end/fuzzers/api_fuzzer.cc, line 1204 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Could probably just call
grpc_shutdown_blocking()here.
Done.
test/core/handshake/readahead_handshaker_server_ssl.cc, line 87 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Could probably just use
grpc_shutdown_blocking()here.
Done.
test/core/util/memory_counters.h, line 24 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Are we still using this old API anywhere outside of this module? If not, consider moving it to the .cc file.
If you do that, then maybe also consider renaming this module to leak_detector.{h,cc}.
We have direct uses of the API in tests such as test/core/memory_usage/client.cc, where multiple snapshots are taken during the test.
test/core/util/memory_counters.h, line 43 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Is there ever any use-case for passing enable=false?
In places like fuzzers, we have a global boolean to decided whether to enable leak check and I believe that value can be mutated externally. This argument is trying to capture that.
yang-g
left a comment
There was a problem hiding this comment.
Reviewable status: 19 of 40 files reviewed, 13 unresolved discussions (waiting on @ericgribkoff and @markdroth)
src/core/lib/gprpp/thd.h, line 57 at r1 (raw file):
Previously, yang-g (Yang Gao) wrote…
Done. Removed.
I put tracked option back because the Fork Inc/DecThreadCount touches something global out of a lock and we still need to use tracked to exclude the detached thread.
src/core/lib/gprpp/thd_windows.cc, line 51 at r1 (raw file):
Previously, yang-g (Yang Gao) wrote…
I have removed tracked threads because python and php (which support forking) have moved to grpc_shutdown_blocking.
We are not doing fork thread counting and thus do not need tracked in this file.
src/core/lib/gprpp/thd_windows.cc, line 147 at r1 (raw file):
Previously, yang-g (Yang Gao) wrote…
Done.
It is a bit complicated because of the global function referenced the thread local variable. So I leave it as is.
|
@markdroth @ericgribkoff PTAL Thanks. |
markdroth
left a comment
There was a problem hiding this comment.
Remaining comments are minor; feel free to merge after addressing.
Reviewed 21 of 21 files at r2.
Reviewable status: 39 of 40 files reviewed, 3 unresolved discussions (waiting on @ericgribkoff, @markdroth, and @yang-g)
src/core/lib/iomgr/fork_posix.cc, line 38 at r3 (raw file):
#include "src/core/lib/iomgr/timer_manager.h" #include "src/core/lib/iomgr/wakeup_fd_posix.h" #include "src/core/lib/surface/init.h"
This include is probably not needed anymore.
test/core/json/fuzzer.cc, line 34 at r3 (raw file):
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { char* s; grpc_core::testing::LeakDetector leak_detector(leak_check);
The old code didn't use the leak_check variable. Was this a pre-existing bug, or should this be changed to use true instead?
This resolves the problem where sometimes
grpc_shutdownis called from an internal executor thread, which will cause a self-join situation.Part of the PR is to add detached thread support.
Some tests test the result of
grpc_shutdownand those are resolved by the explicit wait.This has a high risk.
This change is