Closed
Conversation
Contributor
Author
|
New failure in windows builds: a lot of missing macro definitions Failure on macos builds: |
copybara-service bot
pushed a commit
that referenced
this pull request
Jun 10, 2025
…ts are complete This is a slight simplification and bug fix intended to unblock c-ares 1.34.5 upgrade: #39508 Currently there seems to be a bug causing //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns to fail on Windows in that PR: - c-ares opens fd 1 to some server - c-ares opens fd 2 to query a new server b/c old one is not responding - some answer comes in on fd 2 so c-ares loses interest on fd 1 We'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) ---- DO NOT SUBMIT. This PR is for testing purposes only. [cl/769335111](http://cl/769335111) PiperOrigin-RevId: 769335111
copybara-service bot
pushed a commit
that referenced
this pull request
Jun 10, 2025
… complete This is a slight simplification and bug fix intended to unblock c-ares 1.34.5 upgrade: #39508 The bug was seen on a Windows test with the iomgr c-ares resolver: //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns But it *looks like the same bug exists in both iomgr and EE variants: - c-ares opens fd 1 to some server - c-ares opens fd 2 to query a new server b/c old one is not responding - some answer comes in on fd 2 so c-ares loses interest on fd 1 We'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) ---- DO NOT SUBMIT. This PR is for testing purposes only. [cl/769335111](http://cl/769335111) PiperOrigin-RevId: 769335111
copybara-service bot
pushed a commit
that referenced
this pull request
Jun 10, 2025
… complete This is a slight simplification and bug fix intended to unblock c-ares 1.34.5 upgrade: #39508 The bug was seen on a Windows test with the iomgr c-ares resolver: //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns But it *looks like the same bug exists in both iomgr and EE variants: - c-ares opens fd 1 to some server - c-ares opens fd 2 to query a new server b/c old one is not responding - some answer comes in on fd 2 so c-ares loses interest on fd 1 We'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) ---- DO NOT SUBMIT. This PR is for testing purposes only. [cl/769335111](http://cl/769335111) PiperOrigin-RevId: 769335111
copybara-service bot
pushed a commit
that referenced
this pull request
Jun 10, 2025
… complete This is a slight simplification and bug fix intended to unblock c-ares 1.34.5 upgrade: #39508 The bug was seen on a Windows test with the iomgr c-ares resolver: //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns But it *looks like the same bug exists in both iomgr and EE variants: - c-ares opens fd 1 to some server - c-ares opens fd 2 to query a new server b/c old one is not responding - some answer comes in on fd 2 so c-ares loses interest on fd 1 We'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request (see https://github.com/grpc/grpc/blob/06de3cc2ec8e73199a1be98f390cdabaedbb7364/src/core/lib/event_engine/ares_resolver.cc#L590) depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) ---- DO NOT SUBMIT. This PR is for testing purposes only. [cl/769335111](http://cl/769335111) PiperOrigin-RevId: 769335111
copybara-service bot
pushed a commit
that referenced
this pull request
Jun 11, 2025
… complete This is a slight simplification and bug fix intended to unblock c-ares 1.34.5 upgrade: #39508 The bug was seen on a Windows test with the iomgr c-ares resolver: //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns But it *looks like the same bug exists in both iomgr and EE variants: - c-ares opens fd 1 to some server - c-ares opens fd 2 to query a new server b/c old one is not responding - some answer comes in on fd 2 so c-ares loses interest on fd 1 We'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request (see https://github.com/grpc/grpc/blob/06de3cc2ec8e73199a1be98f390cdabaedbb7364/src/core/lib/event_engine/ares_resolver.cc#L590) depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) ---- DO NOT SUBMIT. This PR is for testing purposes only. [cl/769335111](http://cl/769335111) PiperOrigin-RevId: 769335111
copybara-service bot
pushed a commit
that referenced
this pull request
Jun 11, 2025
… complete This is a slight simplification and bug fix intended to unblock c-ares 1.34.5 upgrade: #39508 The bug was seen on a Windows test with the iomgr c-ares resolver: //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns But it *looks like the same bug exists in both iomgr and EE variants: - c-ares opens fd 1 to some server - c-ares opens fd 2 to query a new server b/c old one is not responding - some answer comes in on fd 2 so c-ares loses interest on fd 1 We'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request (see https://github.com/grpc/grpc/blob/06de3cc2ec8e73199a1be98f390cdabaedbb7364/src/core/lib/event_engine/ares_resolver.cc#L590) depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) ---- DO NOT SUBMIT. This PR is for testing purposes only. [cl/769335111](http://cl/769335111) PiperOrigin-RevId: 769335111
copybara-service bot
pushed a commit
that referenced
this pull request
Jul 15, 2025
… complete This is a slight simplification and bug fix intended to unblock c-ares 1.34.5 upgrade: #39508 The bug can be seen by running any of the iomgr c-ares resolver `resolver_component_tests_runner_invoker`: //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns But it *looks like the same bug exists in both iomgr and EE variants: 1) c-ares initiates A and AAAA queries 2) c-ares opens fd 1 to some server that act as a black hole 3) query to server 1 times out and so c-ares opens fd 2 to query a new server b/c old one is not responding c-ares 1.34.5 will internally increment a `consec_failures` field on the connection object when queries on fd 1 time out (see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L670) 4) answer to one of the queries comes in on fd 2 After processing the answer, c-ares [checks for connections to close out](https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L233) and closes connections that have `consec_failures > 0`, see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_close_sockets.c#L119 5) Now, c-ares will not indicate any interest in fd 1 from `ares_getsock` APIs So, we'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request (see https://github.com/grpc/grpc/blob/06de3cc2ec8e73199a1be98f390cdabaedbb7364/src/core/lib/event_engine/ares_resolver.cc#L590) depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) I guess this was not an issue before c-ares 1.34.5 b/c earlier c-ares did not have `consec_failures` tracking and so did not close sockets that had timed out earlier, after processing answers that came in on new fds, see https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/src/lib/ares_process.c#L1528 ---- DO NOT SUBMIT. This PR is for testing purposes only. [cl/769335111](http://cl/769335111) PiperOrigin-RevId: 769335111
copybara-service bot
pushed a commit
that referenced
this pull request
Jul 15, 2025
… complete This is a bug fix intended to unblock c-ares 1.34.5 upgrade: #39508 The bug can be seen by running any of the iomgr c-ares resolver `resolver_component_tests_runner_invoker`: //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns In the general case, bug works like this: 1) c-ares initiates A and AAAA queries 2) c-ares opens fd 1 to some server that act as a black hole 3) query to server 1 times out and so c-ares opens fd 2 to query a new server b/c old one is not responding c-ares 1.34.5 will internally increment a `consec_failures` field on the connection object when queries on fd 1 time out (see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L670) 4) answer to one of the queries comes in on fd 2 After processing the answer, c-ares [checks for connections to close out](https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L233) and closes connections that have `consec_failures > 0`, see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_close_sockets.c#L119 5) Now, c-ares will not indicate any interest in fd 1 from `ares_getsock` APIs So, we'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request (see https://github.com/grpc/grpc/blob/06de3cc2ec8e73199a1be98f390cdabaedbb7364/src/core/lib/event_engine/ares_resolver.cc#L590) depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) I guess this was not an issue before c-ares 1.34.5 b/c earlier c-ares did not have `consec_failures` tracking and so did not close sockets that had timed out earlier, after processing answers that came in on new fds, see https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/src/lib/ares_process.c#L1528 --- Note: EE c-ares resolver does not have this early-cancellation issue b/c it shuts down no-longer-interesting fds with an OK status: https://github.com/grpc/grpc/blob/79769b35d04535259592ac1b0a98d65f63203f06/src/core/lib/event_engine/ares_resolver.cc#L547 ---- DO NOT SUBMIT. This PR is for testing purposes only. [cl/769335111](http://cl/769335111) PiperOrigin-RevId: 769335111
copybara-service bot
pushed a commit
that referenced
this pull request
Jul 15, 2025
… complete This is a bug fix intended to unblock c-ares 1.34.5 upgrade: #39508 The bug can be seen by running any of the iomgr c-ares resolver `resolver_component_tests_runner_invoker`: //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns In the general case, bug works like this: 1) c-ares initiates A and AAAA queries 2) c-ares opens fd 1 to some server that act as a black hole 3) query to server 1 times out and so c-ares opens fd 2 to query a new server b/c old one is not responding c-ares 1.34.5 will internally increment a `consec_failures` field on the connection object when queries on fd 1 time out (see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L670) 4) answer to one of the queries comes in on fd 2 After processing the answer, c-ares [checks for connections to close out](https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L233) and closes connections that have `consec_failures > 0`, see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_close_sockets.c#L119 5) Now, c-ares will not indicate any interest in fd 1 from `ares_getsock` APIs So, we'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request (see https://github.com/grpc/grpc/blob/06de3cc2ec8e73199a1be98f390cdabaedbb7364/src/core/lib/event_engine/ares_resolver.cc#L590) depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) I guess this was not an issue before c-ares 1.34.5 b/c earlier c-ares did not have `consec_failures` tracking and so did not close sockets that had timed out earlier, after processing answers that came in on new fds, see https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/src/lib/ares_process.c#L1528 --- Note: EE c-ares resolver does not have this early-cancellation issue b/c it shuts down no-longer-interesting fds with an OK status: https://github.com/grpc/grpc/blob/79769b35d04535259592ac1b0a98d65f63203f06/src/core/lib/event_engine/ares_resolver.cc#L547 ---- DO NOT SUBMIT. This PR is for testing purposes only. [cl/769335111](http://cl/769335111) PiperOrigin-RevId: 769335111
copybara-service bot
pushed a commit
that referenced
this pull request
Jul 16, 2025
… complete This is a bug fix intended to unblock c-ares 1.34.5 upgrade: #39508 The bug can be seen by running any of the iomgr c-ares resolver `resolver_component_tests_runner_invoker`: //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns In the general case, bug works like this: 1) c-ares initiates A and AAAA queries 2) c-ares opens fd 1 to some server that act as a black hole 3) query to server 1 times out and so c-ares opens fd 2 to query a new server b/c old one is not responding c-ares 1.34.5 will internally increment a `consec_failures` field on the connection object when queries on fd 1 time out (see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L670) 4) answer to one of the queries comes in on fd 2 After processing the answer, c-ares [checks for connections to close out](https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L233) and closes connections that have `consec_failures > 0`, see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_close_sockets.c#L119 5) Now, c-ares will not indicate any interest in fd 1 from `ares_getsock` APIs So, we'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request (see https://github.com/grpc/grpc/blob/06de3cc2ec8e73199a1be98f390cdabaedbb7364/src/core/lib/event_engine/ares_resolver.cc#L590) depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) I guess this was not an issue before c-ares 1.34.5 b/c earlier c-ares did not have `consec_failures` tracking and so did not close sockets that had timed out earlier, after processing answers that came in on new fds, see https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/src/lib/ares_process.c#L1528 --- Note: EE c-ares resolver does not have this early-cancellation issue b/c it shuts down no-longer-interesting fds with an OK status: https://github.com/grpc/grpc/blob/79769b35d04535259592ac1b0a98d65f63203f06/src/core/lib/event_engine/ares_resolver.cc#L547 ---- DO NOT SUBMIT. This PR is for testing purposes only. [cl/769335111](http://cl/769335111) PiperOrigin-RevId: 769335111
copybara-service bot
pushed a commit
that referenced
this pull request
Jul 16, 2025
This is a bug fix intended to unblock c-ares 1.34.5 upgrade: #39508 The bug can be seen by running any of the iomgr c-ares resolver `resolver_component_tests_runner_invoker`: //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns In the general case, bug works like this: 1) c-ares initiates A and AAAA queries 2) c-ares opens fd 1 to some server that act as a black hole 3) query to server 1 times out and so c-ares opens fd 2 to query a new server b/c old one is not responding c-ares 1.34.5 will internally increment a `consec_failures` field on the connection object when queries on fd 1 time out (see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L670) 4) answer to one of the queries comes in on fd 2 After processing the answer, c-ares [checks for connections to close out](https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L233) and closes connections that have `consec_failures > 0`, see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_close_sockets.c#L119 5) Now, c-ares will not indicate any interest in fd 1 from `ares_getsock` APIs So, we'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request (see https://github.com/grpc/grpc/blob/06de3cc2ec8e73199a1be98f390cdabaedbb7364/src/core/lib/event_engine/ares_resolver.cc#L590) depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) I guess this was not an issue before c-ares 1.34.5 b/c earlier c-ares did not have `consec_failures` tracking and so did not close sockets that had timed out earlier, after processing answers that came in on new fds, see https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/src/lib/ares_process.c#L1528 --- Note: EE c-ares resolver does not have this early-cancellation issue b/c it shuts down no-longer-interesting fds with an OK status: https://github.com/grpc/grpc/blob/79769b35d04535259592ac1b0a98d65f63203f06/src/core/lib/event_engine/ares_resolver.cc#L547 PiperOrigin-RevId: 783612611
1 task
veblush
approved these changes
Jul 16, 2025
Contributor
veblush
left a comment
There was a problem hiding this comment.
Please update
Line 29 in 27a36d7
bazelbuild/bazel-central-registry#5184
Note that this can be done in a separate PR.
asheshvidyut
pushed a commit
to asheshvidyut/grpc
that referenced
this pull request
Jul 22, 2025
This is a bug fix intended to unblock c-ares 1.34.5 upgrade: grpc#39508 The bug can be seen by running any of the iomgr c-ares resolver `resolver_component_tests_runner_invoker`: //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns In the general case, bug works like this: 1) c-ares initiates A and AAAA queries 2) c-ares opens fd 1 to some server that act as a black hole 3) query to server 1 times out and so c-ares opens fd 2 to query a new server b/c old one is not responding c-ares 1.34.5 will internally increment a `consec_failures` field on the connection object when queries on fd 1 time out (see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L670) 4) answer to one of the queries comes in on fd 2 After processing the answer, c-ares [checks for connections to close out](https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L233) and closes connections that have `consec_failures > 0`, see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_close_sockets.c#L119 5) Now, c-ares will not indicate any interest in fd 1 from `ares_getsock` APIs So, we'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request (see https://github.com/grpc/grpc/blob/06de3cc2ec8e73199a1be98f390cdabaedbb7364/src/core/lib/event_engine/ares_resolver.cc#L590) depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) I guess this was not an issue before c-ares 1.34.5 b/c earlier c-ares did not have `consec_failures` tracking and so did not close sockets that had timed out earlier, after processing answers that came in on new fds, see https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/src/lib/ares_process.c#L1528 --- Note: EE c-ares resolver does not have this early-cancellation issue b/c it shuts down no-longer-interesting fds with an OK status: https://github.com/grpc/grpc/blob/79769b35d04535259592ac1b0a98d65f63203f06/src/core/lib/event_engine/ares_resolver.cc#L547 PiperOrigin-RevId: 783612611
asheshvidyut
pushed a commit
to asheshvidyut/grpc
that referenced
this pull request
Jul 22, 2025
Fixes grpc#39026 A few notheworthy changes here besides the version bump itself: - Reorder the way we pull in `ares.h` in various places, so that we make sure to pull in necessary windows header ahead of time. In older c-ares, the `ares.h` public header was more self-contained w.r.t. windows headers including `winsock2.h`. E.g. see [old way of inclusion](https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/include/ares.h#L64) vs. [new way of inclusion](https://github.com/c-ares/c-ares/blob/42ddbc14ec008e738fa44aa2c16e74cad93742c2/include/ares.h#L52). I'm not crazy about this fix here but couldn't see a much cleaner way (we could maybe set `CARES_HAVE_WINSOCK2_H` in `port_platform.h` alternatively) - Upated hand-crafted `config_windows/ares_config.h` to set newly required build defs for the c-ares build itself to work on windows - Fix ruby macos cross-compilation build to correctly set `SYSTEM` Makefile var to `Darwin` (so that we pull in `config_darwin/ares_config.h`, so far we've been getting away with the linux build mode) - Change ruby macos MINGW32 cross-compilation build to use our hand crafted `config_windows/ares_config.h` header - Explicitly link `iphlpapi` on Windows. I think we used to pick this up during c-ares library build from [this pragma](https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/include/ares.h#L71), but now that's [hidden under build define](https://github.com/c-ares/c-ares/blob/42ddbc14ec008e738fa44aa2c16e74cad93742c2/include/ares.h#L59) Closes grpc#39508 PiperOrigin-RevId: 783875246
asheshvidyut
pushed a commit
to asheshvidyut/grpc
that referenced
this pull request
Jul 23, 2025
This is a bug fix intended to unblock c-ares 1.34.5 upgrade: grpc#39508 The bug can be seen by running any of the iomgr c-ares resolver `resolver_component_tests_runner_invoker`: //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns In the general case, bug works like this: 1) c-ares initiates A and AAAA queries 2) c-ares opens fd 1 to some server that act as a black hole 3) query to server 1 times out and so c-ares opens fd 2 to query a new server b/c old one is not responding c-ares 1.34.5 will internally increment a `consec_failures` field on the connection object when queries on fd 1 time out (see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L670) 4) answer to one of the queries comes in on fd 2 After processing the answer, c-ares [checks for connections to close out](https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L233) and closes connections that have `consec_failures > 0`, see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_close_sockets.c#L119 5) Now, c-ares will not indicate any interest in fd 1 from `ares_getsock` APIs So, we'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request (see https://github.com/grpc/grpc/blob/06de3cc2ec8e73199a1be98f390cdabaedbb7364/src/core/lib/event_engine/ares_resolver.cc#L590) depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) I guess this was not an issue before c-ares 1.34.5 b/c earlier c-ares did not have `consec_failures` tracking and so did not close sockets that had timed out earlier, after processing answers that came in on new fds, see https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/src/lib/ares_process.c#L1528 --- Note: EE c-ares resolver does not have this early-cancellation issue b/c it shuts down no-longer-interesting fds with an OK status: https://github.com/grpc/grpc/blob/79769b35d04535259592ac1b0a98d65f63203f06/src/core/lib/event_engine/ares_resolver.cc#L547 PiperOrigin-RevId: 783612611
asheshvidyut
pushed a commit
to asheshvidyut/grpc
that referenced
this pull request
Jul 23, 2025
Fixes grpc#39026 A few notheworthy changes here besides the version bump itself: - Reorder the way we pull in `ares.h` in various places, so that we make sure to pull in necessary windows header ahead of time. In older c-ares, the `ares.h` public header was more self-contained w.r.t. windows headers including `winsock2.h`. E.g. see [old way of inclusion](https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/include/ares.h#L64) vs. [new way of inclusion](https://github.com/c-ares/c-ares/blob/42ddbc14ec008e738fa44aa2c16e74cad93742c2/include/ares.h#L52). I'm not crazy about this fix here but couldn't see a much cleaner way (we could maybe set `CARES_HAVE_WINSOCK2_H` in `port_platform.h` alternatively) - Upated hand-crafted `config_windows/ares_config.h` to set newly required build defs for the c-ares build itself to work on windows - Fix ruby macos cross-compilation build to correctly set `SYSTEM` Makefile var to `Darwin` (so that we pull in `config_darwin/ares_config.h`, so far we've been getting away with the linux build mode) - Change ruby macos MINGW32 cross-compilation build to use our hand crafted `config_windows/ares_config.h` header - Explicitly link `iphlpapi` on Windows. I think we used to pick this up during c-ares library build from [this pragma](https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/include/ares.h#L71), but now that's [hidden under build define](https://github.com/c-ares/c-ares/blob/42ddbc14ec008e738fa44aa2c16e74cad93742c2/include/ares.h#L59) Closes grpc#39508 PiperOrigin-RevId: 783875246
paulosjca
pushed a commit
to paulosjca/grpc
that referenced
this pull request
Aug 23, 2025
This is a bug fix intended to unblock c-ares 1.34.5 upgrade: grpc#39508 The bug can be seen by running any of the iomgr c-ares resolver `resolver_component_tests_runner_invoker`: //test/cpp/naming:resolver_component_tests_runner_invoker@experiment=no_event_engine_dns In the general case, bug works like this: 1) c-ares initiates A and AAAA queries 2) c-ares opens fd 1 to some server that act as a black hole 3) query to server 1 times out and so c-ares opens fd 2 to query a new server b/c old one is not responding c-ares 1.34.5 will internally increment a `consec_failures` field on the connection object when queries on fd 1 time out (see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L670) 4) answer to one of the queries comes in on fd 2 After processing the answer, c-ares [checks for connections to close out](https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_process.c#L233) and closes connections that have `consec_failures > 0`, see https://github.com/c-ares/c-ares/blob/d3a507e920e7af18a5efb7f9f1d8044ed4750013/src/lib/ares_close_sockets.c#L119 5) Now, c-ares will not indicate any interest in fd 1 from `ares_getsock` APIs So, we'll now shutdown fd 1 which can have the effect of prematurely cancelling the overall request (see https://github.com/grpc/grpc/blob/06de3cc2ec8e73199a1be98f390cdabaedbb7364/src/core/lib/event_engine/ares_resolver.cc#L590) depending on ordering of events (reading another query's answer on fd 2 vs. when we run on_readable callback following shutdown of fd 1) I guess this was not an issue before c-ares 1.34.5 b/c earlier c-ares did not have `consec_failures` tracking and so did not close sockets that had timed out earlier, after processing answers that came in on new fds, see https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/src/lib/ares_process.c#L1528 --- Note: EE c-ares resolver does not have this early-cancellation issue b/c it shuts down no-longer-interesting fds with an OK status: https://github.com/grpc/grpc/blob/79769b35d04535259592ac1b0a98d65f63203f06/src/core/lib/event_engine/ares_resolver.cc#L547 PiperOrigin-RevId: 783612611
paulosjca
pushed a commit
to paulosjca/grpc
that referenced
this pull request
Aug 23, 2025
Fixes grpc#39026 A few notheworthy changes here besides the version bump itself: - Reorder the way we pull in `ares.h` in various places, so that we make sure to pull in necessary windows header ahead of time. In older c-ares, the `ares.h` public header was more self-contained w.r.t. windows headers including `winsock2.h`. E.g. see [old way of inclusion](https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/include/ares.h#L64) vs. [new way of inclusion](https://github.com/c-ares/c-ares/blob/42ddbc14ec008e738fa44aa2c16e74cad93742c2/include/ares.h#L52). I'm not crazy about this fix here but couldn't see a much cleaner way (we could maybe set `CARES_HAVE_WINSOCK2_H` in `port_platform.h` alternatively) - Upated hand-crafted `config_windows/ares_config.h` to set newly required build defs for the c-ares build itself to work on windows - Fix ruby macos cross-compilation build to correctly set `SYSTEM` Makefile var to `Darwin` (so that we pull in `config_darwin/ares_config.h`, so far we've been getting away with the linux build mode) - Change ruby macos MINGW32 cross-compilation build to use our hand crafted `config_windows/ares_config.h` header - Explicitly link `iphlpapi` on Windows. I think we used to pick this up during c-ares library build from [this pragma](https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/include/ares.h#L71), but now that's [hidden under build define](https://github.com/c-ares/c-ares/blob/42ddbc14ec008e738fa44aa2c16e74cad93742c2/include/ares.h#L59) Closes grpc#39508 PiperOrigin-RevId: 783875246
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #39026
A few notheworthy changes here besides the version bump itself:
Reorder the way we pull in
ares.hin various places, so that we make sure to pull in necessary windows header ahead of time. In older c-ares, theares.hpublic header was more self-contained w.r.t. windows headers includingwinsock2.h. E.g. see old way of inclusion vs. new way of inclusion. I'm not crazy about this fix here but couldn't see a much cleaner way (we could maybe setCARES_HAVE_WINSOCK2_Hinport_platform.halternatively)Upated hand-crafted
config_windows/ares_config.hto set newly required build defs for the c-ares build itself to work on windowsFix ruby macos cross-compilation build to correctly set
SYSTEMMakefile var toDarwin(so that we pull inconfig_darwin/ares_config.h, so far we've been getting away with the linux build mode)Change ruby macos MINGW32 cross-compilation build to use our hand crafted
config_windows/ares_config.hheaderExplicitly link
iphlpapion Windows. I think we used to pick this up during c-ares library build from this pragma, but now that's hidden under build define