Skip to content

Resolver: fix error handling if we didn't receive a response#2551

Merged
arkodg merged 1 commit intomoby:masterfrom
thaJeztah:fix_error_handling
May 21, 2020
Merged

Resolver: fix error handling if we didn't receive a response#2551
arkodg merged 1 commit intomoby:masterfrom
thaJeztah:fix_error_handling

Conversation

@thaJeztah
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah commented May 20, 2020

Addresses moby/moby#41003
Addresses moby/moby#20494 (comment)

Commit d5e341e updated the DNS library
and updated the error handling.

Due to changes in the library, we now had to check the response itself
to check if the response was truncated (Truncated DNS replies should
be sent to the client so that the client can retry over TCP).

However, bea32b0 added an incorrect
nil check to fix a panic, which ignored situations where
an error was returned, but no response (for example, if we failed
to connect to the DNS server).

In that situation, the error would be ignored, and further down we
would consider the connection to have been succesfull, but the DNS
server not returning a result.

After a "successful" lookup (but no results), we break the loop,
and don't attempt lookups in other DNS servers.

Versions before bea32b0 would produce:

Name To resolve: bbc.co.uk.
[resolver] query bbc.co.uk. (A) from 172.21.0.2:36181, forwarding to udp:192.168.5.1
[resolver] read from DNS server failed, read udp 172.21.0.2:36181->192.168.5.1:53: i/o timeout
[resolver] query bbc.co.uk. (A) from 172.21.0.2:38582, forwarding to udp:8.8.8.8
[resolver] received A record "151.101.0.81" for "bbc.co.uk." from udp:8.8.8.8
[resolver] received A record "151.101.192.81" for "bbc.co.uk." from udp:8.8.8.8
[resolver] received A record "151.101.64.81" for "bbc.co.uk." from udp:8.8.8.8
[resolver] received A record "151.101.128.81" for "bbc.co.uk." from udp:8.8.8.8

Versions after that commit would ignore the error, and stop further lookups:

Name To resolve: bbc.co.uk.
[resolver] query bbc.co.uk. (A) from 172.21.0.2:59870, forwarding to udp:192.168.5.1
[resolver] external DNS udp:192.168.5.1 returned empty response for "bbc.co.uk."

This patch updates the logic to handle the error to log the error (and continue with the next DNS):

  • if an error is returned, and no response was received
  • if an error is returned, but it was not related to a truncated response

@thaJeztah
Copy link
Copy Markdown
Member Author

ping @arkodg @cpuguy83 @SamWhited PTAL; this introduced a regression in 19.03.9

Copy link
Copy Markdown
Contributor

@SamWhited SamWhited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, LGTM, thanks for the fix. I seem to have misunderstood what the previous logic was doing, but this makes sense.

@thaJeztah
Copy link
Copy Markdown
Member Author

seem to have misunderstood what the previous logic was doing, but this makes sense.

Looking at the original PR I recalled I stared at that line a couple of times, but failed to see the issue. Perhaps the comment needs a rewrite (I just tried a few times, but couldn't come up with something better 🤔)

@arkodg maybe we need a test for this, but I wasn't sure how to best test this; suggestions welcome.

@tiborvass
Copy link
Copy Markdown
Contributor

@thaJeztah isn't bea32b0 the commit that broke things?

@arkodg
Copy link
Copy Markdown
Contributor

arkodg commented May 20, 2020

@thaJeztah can we recreate the empty response using a test similar to

func TestDNSProxyServFail(t *testing.T) {

@thaJeztah
Copy link
Copy Markdown
Member Author

@tiborvass Oh! You're right! Same line. I can update, but won't be near a computer for the next couple of hours; If you happen to have time, feel free to carry and update (or push to my branch directly)

@tiborvass
Copy link
Copy Markdown
Contributor

Ok sorry y'all, I couldn't figure out the test in libnetwork. So I'm adding a test in moby's integration.

@tiborvass
Copy link
Copy Markdown
Contributor

tiborvass commented May 21, 2020

Copy link
Copy Markdown
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for jumping on this @thaJeztah @tiborvass

Commit d5e341e updated the DNS library
and updated the error handling.

Due to changes in the library, we now had to check the response itself
to check if the response was truncated (Truncated DNS replies should
be sent to the client so that the client can retry over TCP).

However, bea32b0 added an incorrect
`nil` check to fix a panic, which ignored situations where
an error was returned, but no response (for example, if we failed
to connect to the DNS server).

In that situation, the error would be ignored, and further down we
would consider the connection to have been succesfull, but the DNS
server not returning a result.

After a "successful" lookup (but no results), we break the loop,
and don't attempt lookups in other DNS servers.

Versions before bea32b0 would produce:

    Name To resolve: bbc.co.uk.
    [resolver] query bbc.co.uk. (A) from 172.21.0.2:36181, forwarding to udp:192.168.5.1
    [resolver] read from DNS server failed, read udp 172.21.0.2:36181->192.168.5.1:53: i/o timeout
    [resolver] query bbc.co.uk. (A) from 172.21.0.2:38582, forwarding to udp:8.8.8.8
    [resolver] received A record "151.101.0.81" for "bbc.co.uk." from udp:8.8.8.8
    [resolver] received A record "151.101.192.81" for "bbc.co.uk." from udp:8.8.8.8
    [resolver] received A record "151.101.64.81" for "bbc.co.uk." from udp:8.8.8.8
    [resolver] received A record "151.101.128.81" for "bbc.co.uk." from udp:8.8.8.8

Versions after that commit would ignore the error, and stop further lookups:

    Name To resolve: bbc.co.uk.
    [resolver] query bbc.co.uk. (A) from 172.21.0.2:59870, forwarding to udp:192.168.5.1
    [resolver] external DNS udp:192.168.5.1 returned empty response for "bbc.co.uk."

This patch updates the logic to handle the error to log the error (and continue with the next DNS):

 - if an error is returned, and no response was received
 - if an error is returned, but it was not related to a truncated response

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Tibor Vass <tibor@docker.com>
@tiborvass tiborvass force-pushed the fix_error_handling branch from 7e3bcaf to 15ead89 Compare May 21, 2020 17:51
@tiborvass
Copy link
Copy Markdown
Contributor

@arkodg updated the commit message to mention the commit that actually caused regression bea32b0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants