network: perform post-DNS connection drain on a per-host basis#2240
network: perform post-DNS connection drain on a per-host basis#2240
Conversation
| pending_drain_ = false; | ||
| if (enable_drain_post_dns_refresh_) { | ||
| // Check if the set of hosts pending drain contains the current resolved host. | ||
| if (hosts_to_drain_.erase(resolved_host) == 0) { |
There was a problem hiding this comment.
Based on my reading of the reference docs for erase, this line removes the resolved host from the hosts to drain vector and if it was the last host in the vector, we return early? That doesn't seem right to me since for example if we only had a single host, we'd still want to drain its connections. Unless I'm misunderstanding when erase returns zero.
There was a problem hiding this comment.
You're correct that this would be wrong if this was a vector, but hosts_to_drain_ is an absl::flat_hash_set.
Here is what is being called:
https://github.com/abseil/abseil-cpp/blob/master/absl/container/flat_hash_set.h#L232
There was a problem hiding this comment.
Another way to read this is "return if nothing was erased".
There was a problem hiding this comment.
ok thanks for sharing that. I still haven't come around to appreciating this overall C++ pattern of returning iterators from functions, but this isn't the place to debate that.
There was a problem hiding this comment.
I agree working with iterators is a bit of a mental adjustment. That said, in this case, the function is simply returning the number of elements that were deleted as a size_type. :)
Signed-off-by: Mike Schore <mike.schore@gmail.com>
Augustyniak
left a comment
There was a problem hiding this comment.
with a comment/question
| dns_cache->iterateHostMap( | ||
| [&](absl::string_view host, | ||
| const Extensions::Common::DynamicForwardProxy::DnsHostInfoSharedPtr&) { | ||
| hosts_to_drain_.emplace(host); |
There was a problem hiding this comment.
should we explicitly empty hosts_to_drain_ before we start adding hosts here? I think that we cannot end up in a state where we have dups in here but what if we had a host in the past that does not exist anymore (not sure whether this can happen)?
Imagine we had DNS refresh 1) that included host x and y and DNS refresh 2) that included y host only. We want hosts_to_drain_ to contain only y after the second DNS refresh starts
There was a problem hiding this comment.
I'm not actually sure if hosts can ever cease to exist in Envoy's cache implementation, since at the end of their TTL, I think they're simply refreshed. That said, previous re-resolutions should still exist even if this is triggered again before prior resolutions complete - in other words, I think the race where a host might persist exists irrespective of whether we call clear here.
In the case you describe:
The resolutions from 1) should still exist and be outstanding even if 2) is triggered, and thus we should still observe resolutions for x and y.
There was a problem hiding this comment.
Is it possible that 1) DNS refresh is cancelled and we never receive a completion callback associated with it?
There was a problem hiding this comment.
My understanding is the callback we're listening for should fire regardless of the result state of the resolution. @mattklein123 added it because other callbacks didn't necessarily.
| dns_cache->iterateHostMap( | ||
| [&](absl::string_view host, | ||
| const Extensions::Common::DynamicForwardProxy::DnsHostInfoSharedPtr&) { | ||
| hosts_to_drain_.emplace(host); |
There was a problem hiding this comment.
should we explicitly empty hosts_to_drain_ before we start adding hosts here? I think that we cannot end up in a state where we have dups in here but what if we had a host in the past that does not exist anymore (not sure whether this can happen)?
Imagine we had DNS refresh 1) that included host x and y and DNS refresh 2) that included y host only. We want hosts_to_drain_ to contain only y after the second DNS refresh is initialized
There was a problem hiding this comment.
hosts_to_drain_ is a set, and should not contain dups. I don't think emptying the set in here or above actually closes the window on the small window for a race where a host ceases to exist between us enumerating hosts and then triggering the refresh.
I think putting a time bound on outstanding resolutions would guard against a potential leak caused by that race, but that adds additional complexity, and given the narrow window, I'm not sure if it's really needed.
| dns_cache->iterateHostMap( | ||
| [&](absl::string_view host, | ||
| const Extensions::Common::DynamicForwardProxy::DnsHostInfoSharedPtr&) { | ||
| hosts_to_drain_.emplace(host); |
There was a problem hiding this comment.
Is it possible that 1) DNS refresh is cancelled and we never receive a completion callback associated with it?
…ol--fix-documentation-comment-issues * origin/main: Fix `bump_lyft_support_rotation.sh` posting to Slack (#2234) jni: fix mismatched return value types (#2252) Remove bintray as a maven source (#2248) test: refine connection drain test (#2245) cleanup: remove comment in Http::Client that no longer applies (#2203) cleanup: fix test with inaccurate description (#2202) network: perform post-DNS connection drain on a per-host basis (#2240) network: add enableDrainPostDnsRefresh to iOS (#2242) envoy: update to em-cherry (#2241) network: support draining connections after triggered DNS refresh (#2225) Bump rules_apple to 0.34.2 (#2236) Bump Lyft Support Rotation (#2232) Signed-off-by: JP Simard <jp@jpsim.com>
…on-6.0.0-pre.20220421.3 * origin/main: Upgrade rules_jvm_external from 4.1 -> 4.2 (#2254) CancelProofEnvoyStream (#2250) swift: add DrString tool & fix documentation comment issues (#2233) Fix `bump_lyft_support_rotation.sh` posting to Slack (#2234) jni: fix mismatched return value types (#2252) Remove bintray as a maven source (#2248) test: refine connection drain test (#2245) cleanup: remove comment in Http::Client that no longer applies (#2203) cleanup: fix test with inaccurate description (#2202) network: perform post-DNS connection drain on a per-host basis (#2240) Signed-off-by: JP Simard <jp@jpsim.com>
* main: envoy: update to efbbb04 (#2258) Update rules_java, rules_detekt, rules_jvm_external (#2249) Upgrade rules_jvm_external from 4.1 -> 4.2 (#2254) CancelProofEnvoyStream (#2250) swift: add DrString tool & fix documentation comment issues (#2233) Fix `bump_lyft_support_rotation.sh` posting to Slack (#2234) jni: fix mismatched return value types (#2252) Remove bintray as a maven source (#2248) test: refine connection drain test (#2245) cleanup: remove comment in Http::Client that no longer applies (#2203) cleanup: fix test with inaccurate description (#2202) network: perform post-DNS connection drain on a per-host basis (#2240) network: add enableDrainPostDnsRefresh to iOS (#2242) envoy: update to em-cherry (#2241) network: support draining connections after triggered DNS refresh (#2225) Signed-off-by: JP Simard <jp@jpsim.com>
Description: Limits connection draining to be based on specific hosts as each host's DNS resolution occurs following a triggered DNS refresh.
Risk Level: Moderate
Testing: Additional Unit Coverage
Signed-off-by: Mike Schore mike.schore@gmail.com