Skip to content

Replace work serializer with a mutex in c-ares resolver#27858

Merged
apolcyn merged 20 commits intogrpc:masterfrom
apolcyn:ares_resolver_mutex
Dec 8, 2021
Merged

Replace work serializer with a mutex in c-ares resolver#27858
apolcyn merged 20 commits intogrpc:masterfrom
apolcyn:ares_resolver_mutex

Conversation

@apolcyn
Copy link
Copy Markdown
Contributor

@apolcyn apolcyn commented Oct 28, 2021

In order to add cancellation support to grpc_httpcli_get, we need to first add cancellation support to non-client-channel resolver APIs.

This change makes cancellation support in the non-client-channel c-ares resolver case simpler since we don't need to first jump into a work serializer in to initiate cancellation.


This change is Reviewable

@apolcyn apolcyn added release notes: yes Indicates if PR needs to be in release notes lang/core and removed lang/core labels Oct 28, 2021
Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable overall. I do agree that not depending on the WorkSerializer in this code is a good idea, since we eventually want to move it out of the client channel anyway.

My main concern is about adding lock annotations so that the compiler can enforce correctness.

As a side note, is there any chance you can get #25108 rolled forward? I think it would make the kind of changes we're dealing with now a lot easier to understand.

Please let me know if you have any questions. Thanks!

Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @apolcyn)


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.h, line 62 at r1 (raw file):

  virtual GrpcPolledFd* NewGrpcPolledFdLocked(
      ares_socket_t as, grpc_pollset_set* driver_pollset_set,
      grpc_core::Mutex* mu) = 0;

Doesn't look like this parameter is needed. Neither implementation of this interface actually uses it.


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver_windows.cc, line 102 at r1 (raw file):

  };

  GrpcPolledFdWindows(ares_socket_t as, grpc_core::Mutex* mu,

No need for grpc_core:: prefix, since we're already in that namespace.

Same thing throughout.


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver_windows.cc, line 153 at r1 (raw file):

    read_buf_ = GRPC_SLICE_MALLOC(4192);
    if (connect_done_) {
      work_serializer_->Run([this]() { ContinueRegisterForOnReadableLocked(); },

It looks like there are a bunch of places where the code was previously hopping into the WorkSerializer and is now not doing any synchronization. Don't we need to acquire the mutex in all of these places instead? If not, is it because we're already holding the mutex? In that case, why were we hopping into the WorkSerializer in these places to begin with; shouldn't we have already been running in the WorkSerializer?

More generally, please add lock annotations so the compiler can enforce code correctness here and humans can see what's expected. In particular, please add ABSL_GUARDED_BY() annotations on all data members guarded by the lock, and please add ABSL_EXCLUSIVE_LOCKS_REQUIRED() annotations on all methods that must be called while already holding the lock.


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 48 at r1 (raw file):

struct grpc_ares_request {
  /** synchronizes access to this request, and also to associated

Please add ABSL_GUARDED_BY() annotations on all data members protected by this mutex.


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 245 at r1 (raw file):

}

static void on_timeout_locked(grpc_ares_ev_driver* driver,

I'm not sure if lock annotations work on standalone C-style functions, but if possible, please add ABSL_EXCLUSIVE_LOCKS_REQUIRED() annotations on all functions that require that the lock be held while they are called.


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 1138 at r1 (raw file):

    }
  }
  (void)GRPC_ERROR_REF(error);

This can just go in-line in the following line:

grpc_core::ExecCtx::Run(DEBUG_LOCATION, r->on_resolve_address_done, GRPC_ERROR_REF(error));

Copy link
Copy Markdown
Contributor Author

@apolcyn apolcyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 10 files reviewed, 6 unresolved discussions (waiting on @markdroth)


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.h, line 62 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Doesn't look like this parameter is needed. Neither implementation of this interface actually uses it.

Done.


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver_windows.cc, line 102 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

No need for grpc_core:: prefix, since we're already in that namespace.

Same thing throughout.

Done.


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver_windows.cc, line 153 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

It looks like there are a bunch of places where the code was previously hopping into the WorkSerializer and is now not doing any synchronization. Don't we need to acquire the mutex in all of these places instead? If not, is it because we're already holding the mutex? In that case, why were we hopping into the WorkSerializer in these places to begin with; shouldn't we have already been running in the WorkSerializer?

More generally, please add lock annotations so the compiler can enforce code correctness here and humans can see what's expected. In particular, please add ABSL_GUARDED_BY() annotations on all data members guarded by the lock, and please add ABSL_EXCLUSIVE_LOCKS_REQUIRED() annotations on all methods that must be called while already holding the lock.

These sites were using the work serializer redundantly.

This was addressed by #28016


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h, line 48 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please add ABSL_GUARDED_BY() annotations on all data members protected by this mutex.

Done.


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 245 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I'm not sure if lock annotations work on standalone C-style functions, but if possible, please add ABSL_EXCLUSIVE_LOCKS_REQUIRED() annotations on all functions that require that the lock be held while they are called.

Done - added lock annotations throughout.

The only thing I didn't do is add lock annotations to the windows code. AFAICT it looks like we don't actually use them on Windows, so I left them out.


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 1138 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This can just go in-line in the following line:

grpc_core::ExecCtx::Run(DEBUG_LOCATION, r->on_resolve_address_done, GRPC_ERROR_REF(error));

Done.

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Dec 8, 2021

Sorry for the delay, I believe I've addressed comments though. PTAL

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

Reviewed 9 of 9 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @apolcyn)


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 245 at r1 (raw file):

Previously, apolcyn wrote…

Done - added lock annotations throughout.

The only thing I didn't do is add lock annotations to the windows code. AFAICT it looks like we don't actually use them on Windows, so I left them out.

Did you link to the right place there? I don't see anything in that PR that says that we don't use lock annotations on Windows.

If you're sure that we really don't use them on any compiler that we might use on Windows, then I guess it's fine to omit them.

Copy link
Copy Markdown
Contributor Author

@apolcyn apolcyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @apolcyn)


src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 245 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Did you link to the right place there? I don't see anything in that PR that says that we don't use lock annotations on Windows.

If you're sure that we really don't use them on any compiler that we might use on Windows, then I guess it's fine to omit them.

Sorry wrong link, I meant to point out that -Wthread-safety-analysis is only considered an error when the our bazel use_strict_warning config is used. Meanwhile, use_strict_warning is only used on mac and linux.

I've confirmed this with @veblush

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Dec 8, 2021

cloud to prod failure is #19834

artifacts and distrib failures are ruby CBF: b/209603283

@apolcyn apolcyn merged commit ec600f3 into grpc:master Dec 8, 2021
ctiller added a commit that referenced this pull request Dec 9, 2021
ctiller added a commit that referenced this pull request Dec 9, 2021
apolcyn added a commit to apolcyn/grpc that referenced this pull request Dec 9, 2021
@copybara-service copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Dec 9, 2021
apolcyn added a commit that referenced this pull request Dec 9, 2021
…er" #28324" (#28325)

* Revert "Revert "Replace work serializer with a mutex in c-ares resolver (#27858)" (#28324)"

This reverts commit b972b76.

* Fix synchronization of grpc_ares_resolver_address
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bloat/improvement imported Specifies if the PR has been imported to the internal repository lang/core perf-change/medium release notes: yes Indicates if PR needs to be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants