Skip to content

Cancel c-ares queries after a timeout to avoid getting stuck#17354

Merged
apolcyn merged 1 commit intogrpc:masterfrom
apolcyn:fix_ares_deadlocks
Dec 1, 2018
Merged

Cancel c-ares queries after a timeout to avoid getting stuck#17354
apolcyn merged 1 commit intogrpc:masterfrom
apolcyn:fix_ares_deadlocks

Conversation

@apolcyn
Copy link
Copy Markdown
Contributor

@apolcyn apolcyn commented Nov 30, 2018

The background for this is there is a multi-channel benchmark involving a large number of channels that connect to a grpclb-in-DNS based benchmark service. Frequently, this benchmark actually fails to run because one or more client channels fails to connect (reach "channel connectivtity state == ready") to the benchmark backend within a 3-and-half minute timeout. I found that when this happens, there is at least one c-ares fd notify on read without a matching on_readable; at least one DNS query gets stuck. Currently, this makes a channel effectively stuck for the rest of its lifetime because there aren't any effective deadline for UDP queries made by c-ares. But I think this is just an inherent problem with UDP/DNS queries that they risk being dropped. By cancelling queries after a reasonable timeout, channels can retry them and become unstuck.

This PR is a draft (I'm not sure if the 10 second timeout is best), but I've verified that this change fixes the problem in the benchmark.

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                                                      FILE SIZE
 ++++++++++++++ GROWING                                                                        ++++++++++++++
  +0.0%     +66 [None]                                                                         +1.28Ki  +0.0%
  +6.4%    +166 src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.cc    +166  +6.4%
      +280%     +70 grpc_ares_ev_driver_start_locked                                                   +70  +280%
      [NEW]     +38 on_timeout_locked                                                                  +38  [NEW]
      +5.5%     +31 grpc_ares_ev_driver_create_locked                                                  +31  +5.5%
      +189%     +17 grpc_ares_ev_driver_on_queries_complete_locked                                     +17  +189%
       +12%     +10 [Unmapped]                                                                         +10   +12%

  +0.0%    +232 TOTAL                                                                          +1.45Ki  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 2,020,410      Total (=)      2,020,410

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,174,082      Total (>)     11,174,078

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.new: 10
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.old: 10


[microbenchmarks] No significant performance differences

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a timeout here seems like a good idea. Just a couple of comments about the implementation.

@apolcyn apolcyn changed the title DRAFT: cancel c-ares queries after a timeout to avoid getting stuck Cancel c-ares queries after a timeout to avoid getting stuck Nov 30, 2018
@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                                                            FILE SIZE
 ++++++++++++++ GROWING                                                                              ++++++++++++++
  +0.0%    +333 [None]                                                                               +1.75Ki  +0.0%
   +13%    +331 src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.cc          +331   +13%
      [NEW]    +611 grpc_ares_ev_driver_create_locked                                                       +611  [NEW]
      [NEW]    +134 on_timeout_locked                                                                       +134  [NEW]
      +492%    +123 grpc_ares_ev_driver_start_locked                                                        +123  +492%
      +189%     +17 grpc_ares_ev_driver_on_queries_complete_locked                                           +17  +189%
      +9.9%      +8 [Unmapped]                                                                                +8  +9.9%
  +1.5%     +64 src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc             +64  +1.5%
       +11%     +64 grpc_core::(anonymous namespace)::AresDnsResolverFactory::CreateResolver(grpc_core::     +64   +11%
      +3.3%      +8 grpc_core::(anonymous namespace)::AresDnsResolver::StartResolvingLocked                   +8  +3.3%
  +0.3%     +16 src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc             +16  +0.3%
      [NEW] +1.33Ki grpc_dns_lookup_ares_continue_after_check_localhost_and_ip_literals_locked           +1.33Ki  [NEW]
      [NEW]    +808 grpc_dns_lookup_ares_locked_impl                                                        +808  [NEW]
       +10%      +7 [Unmapped]                                                                                +7   +10%
      +1.8%      +1 grpc_resolve_address_invoke_dns_lookup_ares_locked                                        +1  +1.8%

  +0.0%    +744 TOTAL                                                                                +2.15Ki  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



1 similar comment
@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                                                            FILE SIZE
 ++++++++++++++ GROWING                                                                              ++++++++++++++
  +0.0%    +333 [None]                                                                               +1.75Ki  +0.0%
   +13%    +331 src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.cc          +331   +13%
      [NEW]    +611 grpc_ares_ev_driver_create_locked                                                       +611  [NEW]
      [NEW]    +134 on_timeout_locked                                                                       +134  [NEW]
      +492%    +123 grpc_ares_ev_driver_start_locked                                                        +123  +492%
      +189%     +17 grpc_ares_ev_driver_on_queries_complete_locked                                           +17  +189%
      +9.9%      +8 [Unmapped]                                                                                +8  +9.9%
  +1.5%     +64 src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc             +64  +1.5%
       +11%     +64 grpc_core::(anonymous namespace)::AresDnsResolverFactory::CreateResolver(grpc_core::     +64   +11%
      +3.3%      +8 grpc_core::(anonymous namespace)::AresDnsResolver::StartResolvingLocked                   +8  +3.3%
  +0.3%     +16 src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc             +16  +0.3%
      [NEW] +1.33Ki grpc_dns_lookup_ares_continue_after_check_localhost_and_ip_literals_locked           +1.33Ki  [NEW]
      [NEW]    +808 grpc_dns_lookup_ares_locked_impl                                                        +808  [NEW]
       +10%      +7 [Unmapped]                                                                                +7   +10%
      +1.8%      +1 grpc_resolve_address_invoke_dns_lookup_ares_locked                                        +1  +1.8%

  +0.0%    +744 TOTAL                                                                                +2.15Ki  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few minor comments.

@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 2,020,410      Total (=)      2,020,410

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,174,079      Total (>)     11,174,077

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 2,020,410      Total (=)      2,020,410

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,174,078      Total (<)     11,174,083

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.new: 10
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.old: 10


[microbenchmarks] No significant performance differences

1 similar comment
@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.new: 10
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.old: 10


[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                                                            FILE SIZE
 ++++++++++++++ GROWING                                                                              ++++++++++++++
  +0.0%    +338 [None]                                                                               +1.74Ki  +0.0%
   +14%    +358 src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.cc          +358   +14%
      [NEW]    +611 grpc_ares_ev_driver_create_locked                                                       +611  [NEW]
      +600%    +150 grpc_ares_ev_driver_start_locked                                                        +150  +600%
      [NEW]    +134 on_timeout_locked                                                                       +134  [NEW]
      +189%     +17 grpc_ares_ev_driver_on_queries_complete_locked                                           +17  +189%
      +9.9%      +8 [Unmapped]                                                                                +8  +9.9%
  +1.5%     +64 src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc             +64  +1.5%
       +11%     +64 grpc_core::(anonymous namespace)::AresDnsResolverFactory::CreateResolver(grpc_core::     +64   +11%
      +3.3%      +8 grpc_core::(anonymous namespace)::AresDnsResolver::StartResolvingLocked                   +8  +3.3%
  +0.3%     +16 src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc             +16  +0.3%
      [NEW] +1.33Ki grpc_dns_lookup_ares_continue_after_check_localhost_and_ip_literals_locked           +1.33Ki  [NEW]
      [NEW]    +808 grpc_dns_lookup_ares_locked_impl                                                        +808  [NEW]
       +10%      +7 [Unmapped]                                                                                +7   +10%
      +1.8%      +1 grpc_resolve_address_invoke_dns_lookup_ares_locked                                        +1  +1.8%

  +0.0%    +776 TOTAL                                                                                +2.17Ki  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                                                            FILE SIZE
 ++++++++++++++ GROWING                                                                              ++++++++++++++
  +0.0%    +338 [None]                                                                               +1.74Ki  +0.0%
   +14%    +358 src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.cc          +358   +14%
      [NEW]    +611 grpc_ares_ev_driver_create_locked                                                       +611  [NEW]
      +600%    +150 grpc_ares_ev_driver_start_locked                                                        +150  +600%
      [NEW]    +134 on_timeout_locked                                                                       +134  [NEW]
      +189%     +17 grpc_ares_ev_driver_on_queries_complete_locked                                           +17  +189%
      +9.9%      +8 [Unmapped]                                                                                +8  +9.9%
  +1.5%     +64 src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc             +64  +1.5%
       +11%     +64 grpc_core::(anonymous namespace)::AresDnsResolverFactory::CreateResolver(grpc_core::     +64   +11%
      +3.3%      +8 grpc_core::(anonymous namespace)::AresDnsResolver::StartResolvingLocked                   +8  +3.3%
  +0.3%     +16 src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc             +16  +0.3%
      [NEW] +1.33Ki grpc_dns_lookup_ares_continue_after_check_localhost_and_ip_literals_locked           +1.33Ki  [NEW]
      [NEW]    +808 grpc_dns_lookup_ares_locked_impl                                                        +808  [NEW]
       +10%      +7 [Unmapped]                                                                                +7   +10%
      +1.8%      +1 grpc_resolve_address_invoke_dns_lookup_ares_locked                                        +1  +1.8%

  +0.0%    +776 TOTAL                                                                                +2.17Ki  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Nov 30, 2018

thanks for the review!

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                                                            FILE SIZE
 ++++++++++++++ GROWING                                                                              ++++++++++++++
  +0.0%    +338 [None]                                                                               +1.74Ki  +0.0%
   +14%    +358 src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.cc          +358   +14%
      [NEW]    +611 grpc_ares_ev_driver_create_locked                                                       +611  [NEW]
      +600%    +150 grpc_ares_ev_driver_start_locked                                                        +150  +600%
      [NEW]    +134 on_timeout_locked                                                                       +134  [NEW]
      +189%     +17 grpc_ares_ev_driver_on_queries_complete_locked                                           +17  +189%
      +9.9%      +8 [Unmapped]                                                                                +8  +9.9%
  +1.5%     +64 src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc             +64  +1.5%
       +11%     +64 grpc_core::(anonymous namespace)::AresDnsResolverFactory::CreateResolver(grpc_core::     +64   +11%
      +3.3%      +8 grpc_core::(anonymous namespace)::AresDnsResolver::StartResolvingLocked                   +8  +3.3%
  +0.3%     +16 src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc             +16  +0.3%
      [NEW] +1.33Ki grpc_dns_lookup_ares_continue_after_check_localhost_and_ip_literals_locked           +1.33Ki  [NEW]
      [NEW]    +808 grpc_dns_lookup_ares_locked_impl                                                        +808  [NEW]
       +10%      +7 [Unmapped]                                                                                +7   +10%
      +1.8%      +1 grpc_resolve_address_invoke_dns_lookup_ares_locked                                        +1  +1.8%

  +0.0%    +776 TOTAL                                                                                +2.17Ki  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 2,020,410      Total (=)      2,020,410

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,174,076      Total (<)     11,174,078

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.new: 10
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.old: 10


[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 2,020,410      Total (=)      2,020,410

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,174,082      Total (>)     11,174,075

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.new: 10
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.old: 10


[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 2,020,410      Total (=)      2,020,410

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,174,078      Total (>)     11,174,075

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                                                            FILE SIZE
 ++++++++++++++ GROWING                                                                              ++++++++++++++
  +0.0%    +338 [None]                                                                               +1.74Ki  +0.0%
   +14%    +358 src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver.cc          +358   +14%
      [NEW]    +611 grpc_ares_ev_driver_create_locked                                                       +611  [NEW]
      +600%    +150 grpc_ares_ev_driver_start_locked                                                        +150  +600%
      [NEW]    +134 on_timeout_locked                                                                       +134  [NEW]
      +189%     +17 grpc_ares_ev_driver_on_queries_complete_locked                                           +17  +189%
      +9.9%      +8 [Unmapped]                                                                                +8  +9.9%
  +1.5%     +64 src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc             +64  +1.5%
       +11%     +64 grpc_core::(anonymous namespace)::AresDnsResolverFactory::CreateResolver(grpc_core::     +64   +11%
      +3.3%      +8 grpc_core::(anonymous namespace)::AresDnsResolver::StartResolvingLocked                   +8  +3.3%
  +0.3%     +16 src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc             +16  +0.3%
      [NEW] +1.33Ki grpc_dns_lookup_ares_continue_after_check_localhost_and_ip_literals_locked           +1.33Ki  [NEW]
      [NEW]    +808 grpc_dns_lookup_ares_locked_impl                                                        +808  [NEW]
       +10%      +7 [Unmapped]                                                                                +7   +10%
      +1.8%      +1 grpc_resolve_address_invoke_dns_lookup_ares_locked                                        +1  +1.8%

  +0.0%    +776 TOTAL                                                                                +2.17Ki  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.new: 10
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.old: 10


[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 2,020,410      Total (=)      2,020,410

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,174,077      Total (<)     11,174,083

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.new: 10
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.old: 10


[microbenchmarks] No significant performance differences

@apolcyn apolcyn merged commit 5fed8e5 into grpc:master Dec 1, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Mar 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

disposition/FOR DISCUSSION lang/core release notes: yes Indicates if PR needs to be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants