Skip to content

Include the query type and name in all c-ares DNS error messages#22865

Merged
apolcyn merged 1 commit intogrpc:masterfrom
apolcyn:more_context_in_dns_errors
May 7, 2020
Merged

Include the query type and name in all c-ares DNS error messages#22865
apolcyn merged 1 commit intogrpc:masterfrom
apolcyn:more_context_in_dns_errors

Conversation

@apolcyn
Copy link
Copy Markdown
Contributor

@apolcyn apolcyn commented May 5, 2020

Fixes #21308.

Includes query type (e.g. "A", "AAAA", "SRV", "TXT"), and target name in potentially-surfaced errors. This additional context should help debugging DNS related errors with and without trace logs enabled. I've found myself wondering what e.g. the particular query type a c-ares tracer log line corresponded to more than once.

Example log lines with these errors messages:

D0505 08:20:12.050639568     170 grpc_ares_wrapper.cc:262]   (c-ares resolver) request:0x60e0000002e0 on_hostbyname_done_locked: C-ares status is not ARES_SUCCESS qtype=AAAA name=srv-ipv4-single-target-srv-disabled.resolver-tests-version-4.grpctestingexp. is_balancer=0: DNS server returned answer with no data
D0505 08:20:12.050749680     170 grpc_ares_ev_driver.cc:97]  (c-ares resolver) request:0x60e0000002e0 Ref ev_driver 0x613000000580
D0505 08:20:12.050772511     170 grpc_ares_ev_driver.cc:407] (c-ares resolver) request:0x60e0000002e0 notify read on: c-ares fd: 3
D0505 08:20:12.050802738     170 grpc_ares_ev_driver.cc:104] (c-ares resolver) request:0x60e0000002e0 Unref ev_driver 0x613000000580
D0505 08:20:12.050840668     170 resolver_component_test.cc:249] done=0, time_left=19.999420268
D0505 08:20:12.051168694     170 grpc_ares_ev_driver.cc:312] (c-ares resolver) request:0x60e0000002e0 readable on c-ares fd: 3
D0505 08:20:12.051269872     170 grpc_ares_wrapper.cc:203]   (c-ares resolver) request:0x60e0000002e0 on_hostbyname_done_locked qtype=A host=srv-ipv4-single-target-srv-disabled.resolver-tests-version-4.grpctestingexp. ARES_SUCCESS
D0505 08:20:12.051332281     170 grpc_ares_wrapper.cc:250]   (c-ares resolver) request:0x60e0000002e0 c-ares resolver gets a AF_INET result: 
  addr: 2.3.4.5
  port: 443

D0505 08:20:12.051414675     170 grpc_ares_ev_driver.cc:97]  (c-ares resolver) request:0x60e0000002e0 Ref ev_driver 0x613000000580
D0505 08:20:12.051426382     170 grpc_ares_ev_driver.cc:407] (c-ares resolver) request:0x60e0000002e0 notify read on: c-ares fd: 3
D0505 08:20:12.051438500     170 grpc_ares_ev_driver.cc:104] (c-ares resolver) request:0x60e0000002e0 Unref ev_driver 0x613000000580
D0505 08:20:12.051455326     170 resolver_component_test.cc:249] done=0, time_left=19.998800232
D0505 08:20:12.051613989     170 grpc_ares_ev_driver.cc:312] (c-ares resolver) request:0x60e0000002e0 readable on c-ares fd: 3
D0505 08:20:12.051676214     170 grpc_ares_wrapper.cc:368]   (c-ares resolver) request:0x60e0000002e0 on_txt_done_locked C-ares status is not ARES_SUCCESS qtype=TXT name=_grpc_config.srv-ipv4-single-target-srv-disabled.resolver-tests-version-4.grpctestingexp.: Domain name not found

@apolcyn apolcyn added lang/core release notes: yes Indicates if PR needs to be in release notes labels May 5, 2020
@apolcyn apolcyn marked this pull request as ready for review May 5, 2020 21:34
@apolcyn apolcyn requested a review from markdroth as a code owner May 5, 2020 21:34
Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding these fields to the logs is a good idea!

At some point, we really do need to convert all of the c-ares resolver code to idiomatic C++. This is basically the only code in the client_channel tree that is still C. :(

static void grpc_ares_request_ref_locked(grpc_ares_request* r);
static void grpc_ares_request_unref_locked(grpc_ares_request* r);

class GrpcAresQuery {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the purpose of this class. It seems to be a simple wrapper of grpc_ares_request, but it doesn't seem to provide any functionality that isn't already present in the latter. Why is this wrapper being added?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea here was that a GrpcAresQuery tracks fields that are particular to a single c-ares query (or "search") - e.g. a single A, AAAA, SRV, or TXT resolution performed by the c-ares APIs. A grpc_ares_request, on the other hand, tracks fields that are necessary to drive a complete DNS resolution process - which consists of multiple "child" queries.

So, the main thing is that a grpc_ares_request and a GrpcAresQuery differ in scope and lifetime - the former is per-grpc-resolution and the latter is per-c-ares-query.

Alternatively we could just put the SRV target name and TXT target name as additional fields into the grpc_ares_request, but keeping them separate seemed cleaner (I think this will become more important after making grpc_ares_hostbyname_request a sub-type of GrpcAresQuery).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I guess this is fine for now. I would like to find a cleaner way to do this that does not require allocating so many different data structures, but I guess we can optimize that as part of the larger C++ conversion. For now, please add a TODO about this.

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented May 5, 2020

At some point, we really do need to convert all of the c-ares resolver code to idiomatic C++. This is basically the only code in the client_channel tree that is still C. :(

Yes, sorry this has taken a while. C++-izing this will be a more broad change though because we'll need to change the grpc_ares_wrapper.h API.

static void grpc_ares_request_ref_locked(grpc_ares_request* r);
static void grpc_ares_request_unref_locked(grpc_ares_request* r);

class GrpcAresQuery {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I guess this is fine for now. I would like to find a cleaner way to do this that does not require allocating so many different data structures, but I guess we can optimize that as part of the larger C++ conversion. For now, please add a TODO about this.

@apolcyn apolcyn force-pushed the more_context_in_dns_errors branch from 925fbfa to bbdfde5 Compare May 7, 2020 18:53
@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented May 7, 2020

Okay, I guess this is fine for now. I would like to find a cleaner way to do this that does not require allocating so many different data structures, but I guess we can optimize that as part of the larger C++ conversion. For now, please add a TODO about this.

Done, added a TODO

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented May 7, 2020

Thanks for the quick review, squashed down commmits

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented May 7, 2020

tests are green

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lang/core release notes: yes Indicates if PR needs to be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dns resolution failed: which address?

2 participants