Replace work serializer with a mutex in c-ares resolver#27858
Replace work serializer with a mutex in c-ares resolver#27858apolcyn merged 20 commits intogrpc:masterfrom
Conversation
markdroth
left a comment
There was a problem hiding this comment.
This looks reasonable overall. I do agree that not depending on the WorkSerializer in this code is a good idea, since we eventually want to move it out of the client channel anyway.
My main concern is about adding lock annotations so that the compiler can enforce correctness.
As a side note, is there any chance you can get #25108 rolled forward? I think it would make the kind of changes we're dealing with now a lot easier to understand.
Please let me know if you have any questions. Thanks!
Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @apolcyn)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.h, line 62 at r1 (raw file):
virtual GrpcPolledFd* NewGrpcPolledFdLocked( ares_socket_t as, grpc_pollset_set* driver_pollset_set, grpc_core::Mutex* mu) = 0;
Doesn't look like this parameter is needed. Neither implementation of this interface actually uses it.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver_windows.cc, line 102 at r1 (raw file):
}; GrpcPolledFdWindows(ares_socket_t as, grpc_core::Mutex* mu,
No need for grpc_core:: prefix, since we're already in that namespace.
Same thing throughout.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver_windows.cc, line 153 at r1 (raw file):
read_buf_ = GRPC_SLICE_MALLOC(4192); if (connect_done_) { work_serializer_->Run([this]() { ContinueRegisterForOnReadableLocked(); },
It looks like there are a bunch of places where the code was previously hopping into the WorkSerializer and is now not doing any synchronization. Don't we need to acquire the mutex in all of these places instead? If not, is it because we're already holding the mutex? In that case, why were we hopping into the WorkSerializer in these places to begin with; shouldn't we have already been running in the WorkSerializer?
More generally, please add lock annotations so the compiler can enforce code correctness here and humans can see what's expected. In particular, please add ABSL_GUARDED_BY() annotations on all data members guarded by the lock, and please add ABSL_EXCLUSIVE_LOCKS_REQUIRED() annotations on all methods that must be called while already holding the lock.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 48 at r1 (raw file):
struct grpc_ares_request { /** synchronizes access to this request, and also to associated
Please add ABSL_GUARDED_BY() annotations on all data members protected by this mutex.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 245 at r1 (raw file):
} static void on_timeout_locked(grpc_ares_ev_driver* driver,
I'm not sure if lock annotations work on standalone C-style functions, but if possible, please add ABSL_EXCLUSIVE_LOCKS_REQUIRED() annotations on all functions that require that the lock be held while they are called.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 1138 at r1 (raw file):
} } (void)GRPC_ERROR_REF(error);
This can just go in-line in the following line:
grpc_core::ExecCtx::Run(DEBUG_LOCATION, r->on_resolve_address_done, GRPC_ERROR_REF(error));
apolcyn
left a comment
There was a problem hiding this comment.
Reviewable status: 1 of 10 files reviewed, 6 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.h, line 62 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Doesn't look like this parameter is needed. Neither implementation of this interface actually uses it.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver_windows.cc, line 102 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
grpc_core::prefix, since we're already in that namespace.Same thing throughout.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver_windows.cc, line 153 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
It looks like there are a bunch of places where the code was previously hopping into the
WorkSerializerand is now not doing any synchronization. Don't we need to acquire the mutex in all of these places instead? If not, is it because we're already holding the mutex? In that case, why were we hopping into theWorkSerializerin these places to begin with; shouldn't we have already been running in theWorkSerializer?More generally, please add lock annotations so the compiler can enforce code correctness here and humans can see what's expected. In particular, please add
ABSL_GUARDED_BY()annotations on all data members guarded by the lock, and please addABSL_EXCLUSIVE_LOCKS_REQUIRED()annotations on all methods that must be called while already holding the lock.
These sites were using the work serializer redundantly.
This was addressed by #28016
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 48 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please add
ABSL_GUARDED_BY()annotations on all data members protected by this mutex.
Done.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 245 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I'm not sure if lock annotations work on standalone C-style functions, but if possible, please add
ABSL_EXCLUSIVE_LOCKS_REQUIRED()annotations on all functions that require that the lock be held while they are called.
Done - added lock annotations throughout.
The only thing I didn't do is add lock annotations to the windows code. AFAICT it looks like we don't actually use them on Windows, so I left them out.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 1138 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This can just go in-line in the following line:
grpc_core::ExecCtx::Run(DEBUG_LOCATION, r->on_resolve_address_done, GRPC_ERROR_REF(error));
Done.
|
Sorry for the delay, I believe I've addressed comments though. PTAL |
markdroth
left a comment
There was a problem hiding this comment.
This looks great!
Reviewed 9 of 9 files at r2, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @apolcyn)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 245 at r1 (raw file):
Previously, apolcyn wrote…
Done - added lock annotations throughout.
The only thing I didn't do is add lock annotations to the windows code. AFAICT it looks like we don't actually use them on Windows, so I left them out.
Did you link to the right place there? I don't see anything in that PR that says that we don't use lock annotations on Windows.
If you're sure that we really don't use them on any compiler that we might use on Windows, then I guess it's fine to omit them.
apolcyn
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! all files reviewed, all discussions resolved (waiting on @apolcyn)
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 245 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Did you link to the right place there? I don't see anything in that PR that says that we don't use lock annotations on Windows.
If you're sure that we really don't use them on any compiler that we might use on Windows, then I guess it's fine to omit them.
Sorry wrong link, I meant to point out that -Wthread-safety-analysis is only considered an error when the our bazel use_strict_warning config is used. Meanwhile, use_strict_warning is only used on mac and linux.
I've confirmed this with @veblush
|
cloud to prod failure is #19834 artifacts and distrib failures are ruby CBF: b/209603283 |
…)" This reverts commit ec600f3.
…er (grpc#27858)" (grpc#28324)" This reverts commit b972b76.
In order to add cancellation support to
grpc_httpcli_get, we need to first add cancellation support to non-client-channel resolver APIs.This change makes cancellation support in the non-client-channel c-ares resolver case simpler since we don't need to first jump into a work serializer in to initiate cancellation.
This change is