[c-ares] fix spin loop bug when c-ares gives up on a socket that still has data left in its read buffer#34185
Merged
apolcyn merged 19 commits intogrpc:masterfrom Aug 30, 2023
Merged
Conversation
yijiem
approved these changes
Aug 28, 2023
yijiem
approved these changes
Aug 30, 2023
Member
yijiem
left a comment
There was a problem hiding this comment.
What a hack! Thanks Alex for adding the test to reproduce the issue!
| // 4) Because the first two bytes were zero, c-ares attempts to malloc a | ||
| // zero-length buffer: | ||
| // https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/src/lib/ares_process.c#L428. | ||
| // 5) Because malloc(0) returns NULL, c-ares invokes handle_error and stops |
Member
There was a problem hiding this comment.
Just a small nit: maybe say c-ares' default_malloc(0) returns NULL instead? Since it seems like some systems may return a valid pointer on malloc(0): https://github.com/c-ares/c-ares/blob/main/src/lib/ares_library_init.c#L38-L42
apolcyn
added a commit
that referenced
this pull request
Sep 18, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If we get a readable event on an fd and both the following happens:
c-ares does not read all bytes off the fd
c-ares removes the fd from the set ARES_GETSOCK_READABLE
... then we have a busy loop here, where we'd keep asking c-ares to process an fd that it no longer cares about.
This is indirectly related to a change in this code one month ago: #33942 - before that change, c-ares would close the socket when it called handle_error and so
IsFdStillReadableLockedwould start returningfalse, causing us to get away with this loop. Now, becauseIsFdStillReadableLockedwill keep returning true (because of our overriddencloseAPI), we'll loop forever.The test illustrates one concrete example of how this bug can be hit.
Note that the EE version of this code already gets this right.
Related: internal issue b/297538255