Skip to content

Fix DnsNameResolver TCP fallback test and message leaks#9647

Merged
normanmaurer merged 4 commits intonetty:4.1from
njhill:dnsresolver-leaks
Oct 14, 2019
Merged

Fix DnsNameResolver TCP fallback test and message leaks#9647
normanmaurer merged 4 commits intonetty:4.1from
njhill:dnsresolver-leaks

Conversation

@njhill
Copy link
Copy Markdown
Member

@njhill njhill commented Oct 8, 2019

Motivation

A memory leak related to DNS resolution was reported in #9634, specifically linked to the TCP retry fallback functionality that was introduced relatively recently. Upon inspection it's apparent that there
are some error paths where the original UDP response might not be fully released, and more significantly the TCP response actually leaks every time on the fallback success path.

It turns out that a bug in the unit test meant that the intended TCP fallback path was not actually exercised, so it did not expose the main leak in question.

Modifications

  • Fix DnsNameResolverTest#testTruncated0 dummy server fallback logic to first read transaction id of retried query and use it in replayed response
  • Adjust semantic of internal DnsQueryContext#finish method to always take refcount ownership of passed in envelope
  • Reorder some logic in DnsResponseHandler fallback handling to verify the context of the response is expected, and ensure that the query response are either released or propagated in all cases. This also reduces a number of redundant retain/release pairings

Result

Fixes #9634

@netty-bot
Copy link
Copy Markdown

Can one of the admins verify this patch?

@njhill
Copy link
Copy Markdown
Member Author

njhill commented Oct 8, 2019

The changes look bigger than they are due to some reordering and indentation changes, recommend viewing diff with w=1 :)

@normanmaurer
Copy link
Copy Markdown
Member

@netty-bot test this please

@normanmaurer
Copy link
Copy Markdown
Member

@njhill please rebase

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: release first before log.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should preserve the logic and fallback to the original truncated response if we did not receive a TCP response just like we did before:

https://github.com/netty/netty/pull/9647/files#diff-6678a05ef56f2434e7aa20acbe9d97e9L1277

Also beside this we should not log the exception if we received the response before. As the exception may be just a "connection reset" and so harmless. Also consider calling ctx.close() in any case to ensure the channel is closed in all cases.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now fixed. I didn’t add an explicit close since the channel is closed immediately in the listener below.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this a real java doc ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets rename this to trySuccess as this is what it does ;)

njhill added 3 commits October 9, 2019 15:42
Motivation

A memory leak related to DNS resolution was reported in netty#9634,
specifically linked to the TCP retry fallback functionality that was
introduced relatively recently. Upon inspection it's apparent that there
are some error paths where the original UDP response might not be fully
released, and more significantly the TCP response actually leaks every
time on the fallback success path.

It turns out that a bug in the unit test meant that the intended TCP
fallback path was not actually exercised, so it did not expose the main
leak in question.

Modifications

- Fix DnsNameResolverTest#testTruncated0 dummy server fallback logic to
first read transaction id of retried query and use it in replayed
response
- Adjust semantic of internal DnsQueryContext#finish method to always
take refcount ownership of passed in envelope
- Reorder some logic in DnsResponseHandler fallback handling to verify
the context of the response is expected, and ensure that the query
response are either released or propagated in all cases. This also
reduces a number of redundant retain/release pairings

Result

Fixes netty#9634
@njhill njhill force-pushed the dnsresolver-leaks branch from e1c0e3e to fa84a02 Compare October 9, 2019 22:42
@njhill
Copy link
Copy Markdown
Member Author

njhill commented Oct 9, 2019

@normanmaurer addressed comments, PTAL

@normanmaurer
Copy link
Copy Markdown
Member

@netty-bot test this please

@normanmaurer
Copy link
Copy Markdown
Member

@dziemba can you verify this one as well ?

@dziemba
Copy link
Copy Markdown

dziemba commented Oct 10, 2019

@normanmaurer I've built a local snapshot from this branch (fa84a02) and ran the reproduction script and some additional internal test suites we have.

Looks all good now, no more leaks 🎉

Thanks a lot for the quick fix!

@normanmaurer
Copy link
Copy Markdown
Member

@netty-bot test this please

@normanmaurer normanmaurer added this to the 4.1.43.Final milestone Oct 12, 2019
@normanmaurer normanmaurer merged commit 2e5dd28 into netty:4.1 Oct 14, 2019
@njhill njhill deleted the dnsresolver-leaks branch October 14, 2019 14:21
normanmaurer pushed a commit that referenced this pull request Oct 16, 2019
Motivation

A memory leak related to DNS resolution was reported in #9634,
specifically linked to the TCP retry fallback functionality that was
introduced relatively recently. Upon inspection it's apparent that there
are some error paths where the original UDP response might not be fully
released, and more significantly the TCP response actually leaks every
time on the fallback success path.

It turns out that a bug in the unit test meant that the intended TCP
fallback path was not actually exercised, so it did not expose the main
leak in question.

Modifications

- Fix DnsNameResolverTest#testTruncated0 dummy server fallback logic to
first read transaction id of retried query and use it in replayed
response
- Adjust semantic of internal DnsQueryContext#finish method to always
take refcount ownership of passed in envelope
- Reorder some logic in DnsResponseHandler fallback handling to verify
the context of the response is expected, and ensure that the query
response are either released or propagated in all cases. This also
reduces a number of redundant retain/release pairings

Result

Fixes #9634
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DNS Resolver leaking direct memory

4 participants