server: add support for range trigger overload actions#11697
server: add support for range trigger overload actions#11697akonradi wants to merge 22 commits intoenvoyproxy:masterfrom
Conversation
Make ThreadLocalOverloadState an interface that the Overload Manager can return an implementation of. This decouples the behavior specified by the header from the details of the implementation. Signed-off-by: Alex Konradi <akonradi@google.com>
Add a hook to allow the Overload Manager to be inserted in the timer creation process. The returned function is currently pass-through, but can be used in the future to create timers that are responsive to overload conditions. Signed-off-by: Alex Konradi <akonradi@google.com>
The only user of a null OverloadManager was the admin console. Switch that to using a no-op OverloadManager subclass so that the OM can be provided via reference in all cases. This makes the HCM initialization simpler since it doesn't need to worry about the null pointer. Signed-off-by: Alex Konradi <akonradi@google.com>
Use the newly-added timer factory on the overload manager to construct timers. The labels are currently unused, but will allow the overload manager to apply different modifications per-timer type. Signed-off-by: Alex Konradi <akonradi@google.com>
In order to implement linearly-decreasing connection timeouts in response to load, the Overload Manager needs a way to apply actions that take load into account. This commit extends the OM OverloadActionState so that instead of a binary 'Inactive'/'Active' state, it records one of 'Inactive'/'Scaling'/'Saturated' states, where 'Scaling' comes with a scaled value between 0 and 1 for the level of saturation. Signed-off-by: Alex Konradi <akonradi@google.com>
Add a config option and implementation for the Overload Manager to enable range triggers, which do not saturate immediately, but can provide a scaled value based on the resources being monitored. Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
4d08910 to
ee9338d
Compare
|
/assign antoniovicente |
|
I just realized that the overload action state work is probably separable from the timer indirection. Let me know if I should revert those commits here and put them on a separate PR. |
Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
| if (trigger_new_state == OverloadActionState::inactive()) { | ||
| active_gauge_.set(0); | ||
| } else if (trigger_new_state == OverloadActionState::saturated()) { | ||
| active_gauge_.set(1); |
There was a problem hiding this comment.
I think it would be useful to also have a gauge stat which can capture the "scaling" state. Feel to leave as a TODO for now.
| double value = 1 [(validate.rules).double = {lte: 1.0 gte: 0.0}]; | ||
| } | ||
|
|
||
| message RangeTrigger { |
There was a problem hiding this comment.
Should reuse existing base types for ranges, see https://github.com/envoyproxy/envoy/blob/master/api/envoy/type/v3/range.proto#L39
There was a problem hiding this comment.
Can you enforce min/max value while also use the existing range types?
There was a problem hiding this comment.
I think what we actually have here is closer to https://github.com/envoyproxy/envoy/blob/master/api/envoy/type/v3/percent.proto. Maybe add a range there?
There was a problem hiding this comment.
This seems inconsistent with the existing ThresholdTrigger definition, which uses a double in the range [0, 1].
There was a problem hiding this comment.
htuch: we need a low percentage and a high percentage at which behavior should transition to linear scaling. We need to define a new message type anyway.
| } | ||
|
|
||
| message RangeTrigger { | ||
| // If the resource pressure is greater than this value, the trigger will enter the scaling state. |
There was a problem hiding this comment.
This is the first time the reader encounters "scaling state". It should either be explain in detail or a link to the architecture docs added with :ref: added.
antoniovicente
left a comment
There was a problem hiding this comment.
Looking forward to extensions to overload manager capabilities. Thanks for taking this on.
| // Drop new requests when overloaded as soon as we have decoded the headers. | ||
| if (connection_manager_.overload_stop_accepting_requests_ref_ == | ||
| Server::OverloadActionState::Active) { | ||
| Server::OverloadActionState::saturated()) { |
There was a problem hiding this comment.
Defining saturation/inactive via equality doesn't seem like the most clear possible signal. How about adding a bool isSaturated() const accessor?
if (connection_manager_.overload_stop_accepting_requests_ref_.isSaturated()) {
There was a problem hiding this comment.
That doesn't seem like it scales very well; we'd end up with isSaturated(), isInactive(), and isScaling() to cover the three possible ranges. The options I explored were
- using a
variant<inactive{}, scaling{double}, saturated{}>, but that makes call sites messier and wastes space with the tag - adding a
state()method that returns an enum that is one of{Inactive, Scaling, Saturated} - adding a
value()method that returns the scale value for the scaling case - but then what does it return for the other cases?
Here's another option: I could get rid of the == and != operators and force callers to rely on <, <=, >, and >=. It makes it clear that there's an ordering for the values and doesn't expose double equality comparison.
There was a problem hiding this comment.
I can see us wanting methods that tell us if the resource is saturated or if it is operating in the scaled linear region.
isInactive() seems like an anti-pattern. You want !isSaturated() or scaleFactor() == 0 depending on what kind of resource you're operating on.
Are the comparison operators really needed?
| CMP_OPERATOR(<=); | ||
| CMP_OPERATOR(>=); | ||
| CMP_OPERATOR(>); | ||
| #undef CMP_OPERATOR |
There was a problem hiding this comment.
Is there precedent for implementing all of these operators?
Are operators other than == and != used?
There was a problem hiding this comment.
No, only == and != are used currently, but we'll want the rest of these for comparing state objects in the scaling regime.
| } | ||
|
|
||
| OverloadTimerFactory getTimerFactory() override { | ||
| return [this](absl::string_view /*unused*/, std::function<void()> callback) { |
There was a problem hiding this comment.
Use of an enum type that we can cast to to int would provide more efficient lookup than string_view.
There was a problem hiding this comment.
Do we want this to be available as an extension point? I could see implementers of custom filters and such wanting to register their own timers for scaling in response to load.
There was a problem hiding this comment.
Does it matter what kind of timer is being created? The actual timeout value is an input to the enable method.
What if Timer had a method enableTimerRange(const std::chrono::milliseconds& min, const std::chrono::milliseconds& max) that scales timeout based on overload manager state?
| connection_idle_timer_ = read_callbacks_->connection().dispatcher().createTimer( | ||
| [this]() -> void { onIdleTimeout(); }); | ||
| connection_idle_timer_ = | ||
| overload_timer_factory_(kConnectionIdleTimerLabel, [this]() -> void { onIdleTimeout(); }); |
There was a problem hiding this comment.
Should we consider some changes to how timers are created and enabled?
The thing to realize here is the pattern of creating a timer and immediately enabling it. Should the factory create and enable the timer internally? Doing that would provide us with some freedom regarding representation of timers that have similar deadline deltas. The current Timer class that accepts a delta on enableTimer, we may want a class that takes the timeout delta as a constructor argument and provides methods to register/unregister for that timeout.
There was a problem hiding this comment.
I'm not against changing the timer interface, but I think that should be addressed in another PR. That would also allow us to get rid of the redundancies of the enableTimer and enableHRTimer methods - if you construct a timer with a high-resolution value, it is a HR timer, otherwise a regular timer.
There was a problem hiding this comment.
You can call enableTimer with different values on the same timer object.
The issue of standard vs HR timers is separate. I think that based on how timers are usually created we only need enableHRTimer. I have a pending change that may obsolete the need for enableHRTimer.
| })) { | ||
| if (overload_manager.registerForAction( | ||
| action_name, dispatcher, [this](Server::OverloadActionState state) { | ||
| active_ = (state == Server::OverloadActionState::saturated()); |
There was a problem hiding this comment.
variable naming consistency: active_ is true when state is saturated(). Should the variable be called saturated_?
There was a problem hiding this comment.
I was trying to touch this code as little as possible, hence the minimal change. I'd suggest that since the heap shrinker can't be partially active, it doesn't make sense to talk about saturation here, and to leave this as a boolean active_.
| double value = 1 [(validate.rules).double = {lte: 1.0 gte: 0.0}]; | ||
| } | ||
|
|
||
| message RangeTrigger { |
There was a problem hiding this comment.
Can you enforce min/max value while also use the existing range types?
| return false; | ||
| } | ||
| const auto trigger_new_state = it->second->actionState(); | ||
| if (trigger_new_state == OverloadActionState::inactive()) { |
There was a problem hiding this comment.
Comparing to OverloadActionState::inactive() seems like an anti-pattern.
| /** | ||
| * Factory function exposed by the overload manager for creating timers. | ||
| */ | ||
| using OverloadTimerFactory = std::function<Event::TimerPtr(absl::string_view, Event::TimerCb)>; |
There was a problem hiding this comment.
Can you document the meaning of the first argument to this factory method function signature?
Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
Log only when it changes region, not when it changes value within the scaling region. Signed-off-by: Alex Konradi <akonradi@google.com>
Use the "active" stat only for threshold triggers, and "scale_value" stat for range triggers. Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
|
Followed @eziskind's suggestion to have different stats for different triggers and added more documentation for the overload manager. |
This should not be used by consumers of the ThreadLocalOverloadState API, only by the OverloadManagerImpl. Signed-off-by: Alex Konradi <akonradi@google.com>
…imeouts Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
|
I'm realizing that inserting the Overload Manager in front of every HCM timer creation is premature. I'm going to revert that here and leave that for another PR, so this one can focus on the range trigger addition. |
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Change the Overload Manager API to extend the overload action state with non-binary values. This will allow future overload actions to take effect in response to increasing load, instead of to the existing inactive/active binary values. This PR also adds a range field to the overload trigger config, though no actions currently trigger on the 'scaling' state. The existing behavior is preserved by replacing all places where OverloadActionState::Active was being used with OverloadActionState::saturated(). This is a minimal re-hashing of #11697 for ##11427. Risk Level: medium Testing: ran unit and integration tests Docs Changes: none Release Notes: add "scaling" trigger for OverloadManager actions Signed-off-by: Alex Konradi <akonradi@google.com>
Commit Message:
Change the Overload Manager API to extend the overload action state with non-binary values. This will allow future overload actions to take effect in response to increasing load, instead of to the existing inactive/active binary values. This PR also adds a range field to the overload trigger config, though no actions currently trigger on the 'scaling' state.
Additional Description:
The existing behavior is preserved by replacing all places where
OverloadActionState::Activewas being used withOverloadActionState::saturated().Risk Level: medium
Testing: ran unit and integration tests
Docs Changes: none
Release Notes: add "range" trigger for OverloadManager actions
#11427
/cc @antoniovicente
/cc @eziskind