[c-ares DNS resolver] Fix file descriptor use-after-close bug when c-ares writes succeed but subsequent read fails#33871
Conversation
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver_posix.cc
Outdated
Show resolved
Hide resolved
| * - disable nagle */ | ||
| grpc_error_handle err; | ||
| err = grpc_set_socket_nonblocking(fd, true); | ||
| if (!err.ok()) return -1; |
There was a problem hiding this comment.
Do you need to close the socket when the set failed? Same below.
There was a problem hiding this comment.
Ah, looks like there is also this ares_set_socket_configure_callback which might be used for this purpose? And c-ares will call close socket when it returns -1: https://source.corp.google.com/piper///depot/google3/third_party/ares/src/lib/ares_process.c;l=1154;bpv=0;bpt=1;rcl=550417489
There was a problem hiding this comment.
Regarding the ares_set_socket_configure_callback, thanks for pointing out - but I think it may be simpler to just do the configuration in here?
There was a problem hiding this comment.
I would recommend utilizing it since it seems to be specifically designed for this purpose. It would also call our version's of Close() (though it does not make any difference). But I wouldn't block on that.
There was a problem hiding this comment.
done. And I agree now the config fallback does seem make this easier to read
yijiem
left a comment
There was a problem hiding this comment.
Thanks for the quick fix Alex!
|
|
||
| // fds that are used/owned by grpc - we (grpc) will close them rather than | ||
| // c-ares | ||
| std::set<ares_socket_t> owned_fds_; |
There was a problem hiding this comment.
nit: maybe to use std::unordered_set<ares_socket_t> since it has on-average constant time complexity on search and insertion and also the ordering is not needed here.
|
Thanks for the pointer to use |
Awesome! What's the repro'd error? It would be nice to link the error logs here. Thanks again! |
yijiem
left a comment
There was a problem hiding this comment.
The test strategy looks great! Thanks!
| ~SocketUseAfterCloseDetector(); | ||
|
|
||
| private: | ||
| std::thread* thread_; |
There was a problem hiding this comment.
nit: maybe use std::unique_ptr<std::thread> so that we don't need to manually delete it after join.
With the fix temporarily reverted the fix (e.g. by reverting For example: Spot checking a couple of different failure logs, notice how One failure: Another failure: |
…descriptor use-after-close bug when c-ares writes succeed but subsequent read fails" (#33934)" (#33942) Rolls forward #33871 Second and third commits here fix internal build issues In particular, add a `// IWYU pragma: no_include <ares_build.h>` since `ares.h` [includes that anyways](https://github.com/c-ares/c-ares/blob/bad62225b7f6b278b92e8e85a255600b629ef517/include/ares.h#L23) (and seems unlikely for that to change since it would be breaking)
Port #33871 to EE's GrpcPolledFdFactoryPosix. <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->
Port grpc#33871 to EE's GrpcPolledFdFactoryPosix. <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->
Normally, c-ares related fds are destroyed after all DNS resolution is finished in this code path. Also there are some fds that c-ares may fail to open or write to initially, and c-ares will close them internally before grpc ever knows about them.
But if:
Then c-ares will close the fd in this code path, but gRPC will have a reference on the fd and will still use it afterwards.
Fix here is to leverage the c-ares socket-override API to properly track fd ownership between c-ares and grpc.
Related: internal issue b/292203138