Start slow start window on first successful HC, make slow start re-enterable#23946
Start slow start window on first successful HC, make slow start re-enterable#23946mattklein123 merged 18 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
|
/retest |
|
Retrying Azure Pipelines: |
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for working on this. I think this makes a lot of sense. Some small comments and also it looks like tidy is still broken. Thank you.
/wait
envoy/upstream/upstream.h
Outdated
| * Set the timestamp of when the host has transitioned from unhealthy to healthy state via an | ||
| * active healchecking. | ||
| */ | ||
| virtual void setLastHcPassTime(MonotonicTime&& last_hc_pass_time) PURE; |
There was a problem hiding this comment.
nit: realistically it will likely be faster to pass this by value as it's just an integer
| if (first_check_ || host_->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)) { | ||
| host_->setLastHcPassTime(time_source_.monotonicTime()); | ||
| } |
There was a problem hiding this comment.
I'm wondering if we should actually do this in the block below when we reach the healthy threshold? It seems like we should start slow start when the host actually becomes healthy? wdyt?
There was a problem hiding this comment.
yes, did not realise that hosts have Host::HealthFlag::FAILED_ACTIVE_HC flag set by default during initialisation
Co-authored-by: Matt Klein <mattklein123@gmail.com> Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
|
need to increase test coverage |
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks makes sense to me.
/wait
changelogs/current.yaml
Outdated
| moved the strict_dns, original_dst, and logical_dns clusters to extensions. If you use these clusters and override extensions_build_config.bzl you will now need to include it explicitly. | ||
| moved the static, strict_dns and original_dst clusters to extensions. If you use these clusters and override extensions_build_config.bzl you will now need to include it explicitly. | ||
| - area: loadbalancing | ||
| change: | | ||
| When active HC is enabled per cluster, slow start calculation now starts after first passing HC. Cluster membership duration condition is dropped from slow start calculation. | ||
| Endpoint can now re-enter slow start if active HC is configured per cluster, on each "unhealthy -> healthy" state transition. |
There was a problem hiding this comment.
I think you have some kind of merge issue here?
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
|
There was a failure in unrelated test, merged latest main. Let's see if it helps |
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
|
@mattklein123 so far the CI has been complaining about overall low test coverage for class RealHostDescription in logical_host.h. That class is not used anywhere in code and in tests. I have added test for RealHostDescription, but had to change visibility for the library that included it (now format check if complaining because of changed visibility) Wonder if better approach would have been to remove that class altogether? |
Yes please remove if it's dead code! Thank you. /wait |
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
| EXPECT_EQ("", data.host_description_->locality().zone()); | ||
| EXPECT_EQ("", data.host_description_->locality().sub_zone()); | ||
| EXPECT_EQ("foo.bar.com", data.host_description_->hostname()); | ||
| EXPECT_FALSE(data.host_description_->lastHcPassTime()); |
There was a problem hiding this comment.
i was wrong, class RealHostDescription is used in one function. So i reverted previous commit that changes visibility level and instead increased coverage in existing test. Those functions had 0% coverage. There were few more (mostly setters) with 0%, but this should be good enough to pass coverage threshold.
* origin/main: Start slow start window on first successful HC, make slow start re-enterable (envoyproxy#23946) contrib-sipproxy: rework sipproxy stats (envoyproxy#24165) build(deps): bump github/codeql-action from 2.1.32 to 2.1.35 (envoyproxy#24292) build(deps): bump node from `c59fb39` to `80844b6` in /examples/ext_authz/auth/http-service (envoyproxy#24274) build(deps): bump actions/upload-artifact from 2 to 3 (envoyproxy#24121) build(deps): bump mysql from `96439dd` to `66efaaa` in /examples/mysql (envoyproxy#24273) build(deps): bump grpcio from 1.50.0 to 1.51.1 in /examples/grpc-bridge/client (envoyproxy#24272) Allow mobile/library/common/jni/ to be built on non-Android platforms. (envoyproxy#24299) generic proxy: added drain support to generic proxy to doing graceful closes on connections when possible (envoyproxy#24220) Reduce Route/VirtualHost memory utilization by avoiding RateLimitPolicy instances when not needed (envoyproxy#24243) Reduce VirtualHost memory utilization by avoiding CatchAllVirtualCluster when not needed (envoyproxy#24182) Fix mobile/tools/check_format.sh to re-apply envoyproxy#2698 (envoyproxy#24300) upstream: don't require `source_address` in `upstream_bind_config` (envoyproxy#24250) Quiche roll 20221201150327 (envoyproxy#24290) Signed-off-by: JP Simard <jp@jpsim.com>
…/docs/sphinx-5.3.0 * origin/main: (50 commits) ci: Bump mobile build images (#24317) ci: update llvm on bazel CI (#24296) ci: Fix flaky coverage limit (#24320) ci: Fix change detection (part 2) (#24264) Start slow start window on first successful HC, make slow start re-enterable (#23946) contrib-sipproxy: rework sipproxy stats (#24165) build(deps): bump github/codeql-action from 2.1.32 to 2.1.35 (#24292) build(deps): bump node from `c59fb39` to `80844b6` in /examples/ext_authz/auth/http-service (#24274) build(deps): bump actions/upload-artifact from 2 to 3 (#24121) build(deps): bump mysql from `96439dd` to `66efaaa` in /examples/mysql (#24273) build(deps): bump grpcio from 1.50.0 to 1.51.1 in /examples/grpc-bridge/client (#24272) Allow mobile/library/common/jni/ to be built on non-Android platforms. (#24299) generic proxy: added drain support to generic proxy to doing graceful closes on connections when possible (#24220) Reduce Route/VirtualHost memory utilization by avoiding RateLimitPolicy instances when not needed (#24243) Reduce VirtualHost memory utilization by avoiding CatchAllVirtualCluster when not needed (#24182) Fix mobile/tools/check_format.sh to re-apply #2698 (#24300) upstream: don't require `source_address` in `upstream_bind_config` (#24250) Quiche roll 20221201150327 (#24290) Cleanup threading/ownership semantics of PlatformBridgeCertValidator (#2713) Add unit tests of the PlatformBridgeCertValidator (#2704) ... Signed-off-by: JP Simard <jp@jpsim.com>
…/docs/pygments-2.13.0 * origin/main: (25 commits) ci: Bump mobile build images (#24317) ci: update llvm on bazel CI (#24296) ci: Fix flaky coverage limit (#24320) ci: Fix change detection (part 2) (#24264) Start slow start window on first successful HC, make slow start re-enterable (#23946) contrib-sipproxy: rework sipproxy stats (#24165) build(deps): bump github/codeql-action from 2.1.32 to 2.1.35 (#24292) build(deps): bump node from `c59fb39` to `80844b6` in /examples/ext_authz/auth/http-service (#24274) build(deps): bump actions/upload-artifact from 2 to 3 (#24121) build(deps): bump mysql from `96439dd` to `66efaaa` in /examples/mysql (#24273) build(deps): bump grpcio from 1.50.0 to 1.51.1 in /examples/grpc-bridge/client (#24272) Allow mobile/library/common/jni/ to be built on non-Android platforms. (#24299) generic proxy: added drain support to generic proxy to doing graceful closes on connections when possible (#24220) Reduce Route/VirtualHost memory utilization by avoiding RateLimitPolicy instances when not needed (#24243) Reduce VirtualHost memory utilization by avoiding CatchAllVirtualCluster when not needed (#24182) Fix mobile/tools/check_format.sh to re-apply #2698 (#24300) upstream: don't require `source_address` in `upstream_bind_config` (#24250) Quiche roll 20221201150327 (#24290) Cleanup threading/ownership semantics of PlatformBridgeCertValidator (#2713) Add unit tests of the PlatformBridgeCertValidator (#2704) ... Signed-off-by: JP Simard <jp@jpsim.com>
Signed-off-by: Kateryna Nezdolii nezdolik@spotify.com
New changes to "slow start" logic:
Those changes make "slow start" in Envoy to align more with ELB slow start feature.
Commit Message:
Additional Description:
Risk Level: Medium
Testing: Done
Docs Changes: Done
Release Notes: Not Done
Platform Specific Features:
Fixes #18351