dns: introduce ResolutionStatus for ResolveCb and fix #9927#10137
dns: introduce ResolutionStatus for ResolveCb and fix #9927#10137mattklein123 merged 15 commits intoenvoyproxy:masterfrom junr03:dns-status
Conversation
Signed-off-by: Jose Nino <jnino@lyft.com>
|
@mattklein123 I finally got time to come back to this. I started by just introducing the new argument for the callback, but leaving all behavior unchanged. I didn't want to conflate this with the two independent changes (don't update hosts on failure, allow different timeouts on failure) needed for the linked issues. Let me know if you'd rather look at it all together. |
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks this is great. Flushing out a few comments from my first pass. Thank you!
/wait
| // can be done with this query and initiate a new one. | ||
| if (!cancelled_) { | ||
| callback_({}); | ||
| callback_(ResolutionStatus::Failure, {}); |
There was a problem hiding this comment.
Should you have another status like cancelled/destructing/etc.? This is not quite a failure? I'm not sure it matters to higher layer code but it might?
There was a problem hiding this comment.
I thought about this when I first introduced the destruction behavior. My first opinion is that higher layers only pivot behavior based on success/failure so I was going to keep it at that level of complexity for this change, and the subsequent PR where I fix #9927.
Then I was going to explore the need to add a Destruction value that allow us to set tighter timers than on "regular failure" when fixing for envoyproxy/envoy-mobile#673. But I am not sure if we need different behavior based on what particular failure scenario the resolver experienced. By the same token, people might want even more granularity in the future?
In short: I did not want to commit to more granular status in this initial PR, and I don't think it is going to be that bad to add more granularity, if desired, in subsequent PRs.
|
@mattklein123 updated. I have updated dns_impl_test for expectations around status in different conditions. I also updated higher level constructs to use status, but I didn't change any behavior yet based on this new information (other than adding more relevant stats where possible). I plan on addressing #10137 (comment) and hence #9927 in a subsequent PR. Let me know if you'd rather see it here. Edit: fixing #9927 here. |
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
| Network::Utility::getAddressWithPort(*(response.front().address_), | ||
| Network::Utility::portFromTcpUrl(dns_url_)); | ||
|
|
||
| if (respect_dns_ttl_ && response.front().ttl_ != std::chrono::seconds(0)) { |
There was a problem hiding this comment.
moved this for consistency with the strict dns cluster code. But happy to move back up.
| tls_.shutdownThread(); | ||
| } | ||
|
|
||
| TEST_F(LogicalDnsParamTest, TtlAsDnsRefreshRate) { |
There was a problem hiding this comment.
Adding coverage that did not exist.
mattklein123
left a comment
There was a problem hiding this comment.
Nice work. Just a test comment. Thank you!
/wait
test/common/network/dns_impl_test.cc
Outdated
| [&](DnsResolver::ResolutionStatus, | ||
| std::list<DnsResponse>&& results) -> void { |
There was a problem hiding this comment.
I'm not sure what the best way would be but is there some way to simplify all of thee calls to make sure have a success/failure expectation on all of them? Perhaps some helper function/object that does the resolve, does the expectations, and stores the resolution list?
There was a problem hiding this comment.
folded everything into resolveWithExpectations
Signed-off-by: Jose Nino <jnino@lyft.com>
| EXPECT_TRUE(hasAddress(address_list, "201.134.56.7")); | ||
| } | ||
|
|
||
| TEST_P(DnsImplTest, DnsIpAddressVersion) { |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for the awesome test cleanup. 1 question. Thank you!
/wait-any
test/common/network/dns_impl_test.cc
Outdated
| [&](const std::list<DnsResponse> & | ||
| /*results*/) -> void { throw EnvoyException("Envoy exception"); })); | ||
| EXPECT_EQ(nullptr, | ||
| resolver_->resolve( |
There was a problem hiding this comment.
qq: Can't we use the new resolve with expectations everywhere?
There was a problem hiding this comment.
yes! I thought it might be overkill given that there were far fewer call sites. But if I'm cleaning up something why not go the entire way :). Now everything (for real) is folded into helper functions.
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
|
Looks like the circle machines have a funny hosts file? |
|
Thanks for the fix. If that's the CI error you are seeing it seems like it must have to do with your change? |
|
Yep, definitely has to do with my change. I just wanted to make sure that I was surfacing something latent in circle ci and not an unintended regression. Previously the |
Description: updates envoy ref to include envoyproxy/envoy#10137. Exposes client builder function to add dns failure refresh rate. Risk Level: low, new configuration option Testing: local apps, CI Docs Changes: created #715 to create a section in the docs for all the builder options. Fixes #673
Description: updates envoy ref to include envoyproxy/envoy#10137. Exposes client builder function to add dns failure refresh rate. Risk Level: low, new configuration option Testing: local apps, CI Docs Changes: created #715 to create a section in the docs for all the builder options. Fixes #673 Signed-off-by: Michael Rebello <me@michaelrebello.com>
Description: updates envoy ref to include #10137. Exposes client builder function to add dns failure refresh rate. Risk Level: low, new configuration option Testing: local apps, CI Docs Changes: created #715 to create a section in the docs for all the builder options. Fixes #673 Signed-off-by: JP Simard <jp@jpsim.com>
Description: updates envoy ref to include #10137. Exposes client builder function to add dns failure refresh rate. Risk Level: low, new configuration option Testing: local apps, CI Docs Changes: created #715 to create a section in the docs for all the builder options. Fixes #673 Signed-off-by: JP Simard <jp@jpsim.com>
Description: this PR introduces ResolutionStatus to the ResolveCb. This status can be used to determine if DnsResolution was successful or not, and lets the callback target adjust accordingly. It is used to fix #9927. Additionally, it can be used to fix
envoyproxy/envoy-mobile#673 in a subsequent PR.
Risk Level: med. This PR changes the behavior of Strict Dns Cluster. The introduced argument is not used anywhere, so all behavior stays the same.
Testing: Updated tests to pass the new argument to the callback. But behavior stays the same.
Docs Changes: updated config docs and discovery docs.
Release Notes: added a release note.
Fixes #9927
Signed-off-by: Jose Nino jnino@lyft.com