Add overload manager for Envoy#3954
Conversation
Signed-off-by: Elisha Ziskind <eziskind@google.com>
zuercher
left a comment
There was a problem hiding this comment.
This looks like it's headed in the right direction. I put down some initial review comments for you to look at in addition to the failing test.
| public: | ||
| ThresholdTriggerImpl(const envoy::config::overload::v2alpha::ThresholdTrigger& config) | ||
| : threshold_(config.value()) { | ||
| RELEASE_ASSERT(threshold_ >= 0, ""); |
There was a problem hiding this comment.
Configuration validation should normally throw EnvoyException rather than crashing the server with RELEASE_ASSERT.
This one in particular, however, you could implement with the PGV (see https://github.com/envoyproxy/envoy/blob/master/api/envoy/type/percent.proto#L13 for an example). I think you'll need to invoke MessageUtil::validate on the OverloadManager config object somewhere for this to work. That may come for free later when OverloadManager is added to the config hierarchy.
| trigger = std::make_unique<ThresholdTriggerImpl>(trigger_config.threshold()); | ||
| break; | ||
| case envoy::config::overload::v2alpha::Trigger::TRIGGER_ONEOF_NOT_SET: | ||
| RELEASE_ASSERT(false, "missing trigger"); |
There was a problem hiding this comment.
style nit: I think we prefer
default:
NOT_REACHED_GCOVR_EXCL_LINE;
| } | ||
|
|
||
| const auto result = triggers_.insert(std::make_pair(trigger_config.name(), std::move(trigger))); | ||
| RELEASE_ASSERT(result.second, "duplicate trigger"); |
There was a problem hiding this comment.
As mentioned above, use an EnvoyException here. Likewise in the RELEASE_ASSERT statements below, when used for config validation.
| OverloadManagerImpl(Event::Dispatcher& dispatcher, | ||
| const envoy::config::overload::v2alpha::OverloadManager& config); | ||
|
|
||
| void start() override; |
There was a problem hiding this comment.
Style nit: use // Server::OverloadManager to indicate the interface you're implementing here (and group the interface method overrides without blank lines)
| Resource(const std::string& name, ResourceMonitorPtr monitor, OverloadManagerImpl& manager) | ||
| : name_(name), monitor_(std::move(monitor)), manager_(manager), pending_update_(false) {} | ||
|
|
||
| void onSuccess(const ResourceUsage& usage) override; |
| void OverloadManagerImpl::updateResourcePressure(const std::string& resource, double pressure) { | ||
| auto action_range = resource_to_actions_.equal_range(resource); | ||
| typedef std::unordered_multimap<std::string, std::string> ActionMap; | ||
| typedef std::unordered_multimap<std::string, ActionCallback> CallbackMap; |
There was a problem hiding this comment.
Consider putting these typedefs in the class definition.
| typedef std::unordered_multimap<std::string, ActionCallback> CallbackMap; | ||
| std::for_each(action_range.first, action_range.second, [&](ActionMap::value_type& entry) { | ||
| const std::string& action = entry.second; | ||
| auto action_it = actions_.find(entry.second); |
There was a problem hiding this comment.
nit: Use action instead of entry.second.
|
|
||
| message Trigger { | ||
| // The name of the resource this is a trigger for. | ||
| string name = 1; |
There was a problem hiding this comment.
Should this have a validation rule to prevent empty strings?
| message ThresholdTrigger { | ||
| // If the resource metric (ie. pressure) is greater than or equal to this value, the trigger | ||
| // will fire. | ||
| double value = 1; |
There was a problem hiding this comment.
Related to a previous comment, if this has a limited range, it should be documented/validated here.
| * Register a callback to be invoked when the specified action changes state (ie. becomes | ||
| * activated or inactivated). The callback is posted with the given dispatcher. | ||
| * Must only be called during Envoy initialization when still in single-threaded mode. | ||
| */ |
There was a problem hiding this comment.
style nit: Please add param tags for the parameters.
Signed-off-by: Elisha Ziskind <eziskind@google.com>
The percent protobuf allows values 0..100, while we just want a fraction from 0 to 1. Added PGV validation rule. Signed-off-by: Elisha Ziskind <eziskind@google.com>
|
Thanks for the review - all feedback should be addressed now. |
zuercher
left a comment
There was a problem hiding this comment.
One more comment from me.
Probably @mattklein123 or @htuch should have a look now.
| const bool active = isActive(); | ||
|
|
||
| auto it = triggers_.find(name); | ||
| RELEASE_ASSERT(it != triggers_.end(), ""); |
There was a problem hiding this comment.
These look like program invariants to me, in which case they should probably stay plain ASSERTs. (per https://github.com/envoyproxy/envoy/blob/master/STYLE.md#error-handling).
I think this is true for most if not all the RELEASE_ASSERTs you have.
Signed-off-by: Elisha Ziskind <eziskind@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
Looks great. Excited to see this land. Some questions and small comments.
| @@ -0,0 +1,63 @@ | |||
| syntax = "proto3"; | |||
There was a problem hiding this comment.
Can you make sure this shows up in the docs? I don't think it will without including this new package, but I'm not sure. Also, I think it would be worthwhile to add a new short arch overview section on this functionality and briefly describe the built-in capabilities. https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/arch_overview. Perhaps a new section on "Overload manager?"
Also we might also want some docs here with more details that finally link into the v2 docs: https://www.envoyproxy.io/docs/envoy/latest/configuration/configuration. Unclear if that is worth it or not if you have arch overview, so we can see how it goes?
Alternatively, if you want to do this in a follow up when you actually implement the listener actions, that's fine also. Can we check pending items like docs/stats/etc. in a tracking issue?
There was a problem hiding this comment.
I'm planning on adding docs in an upcoming PR when I hook the overload manager config into the bootstrap proto. We can use the original issue (#373) to track this unless you prefer a separate one.
|
|
||
| import "validate/validate.proto"; | ||
|
|
||
| // The Overload Manager provides an extensible framework to protect Envoy instances |
There was a problem hiding this comment.
Will need proto title for docs.
There was a problem hiding this comment.
Will do in a separate docs PR.
|
|
||
| message ResourceMonitor { | ||
| // The name of the resource monitor to instantiate. Must match a registered | ||
| // resource monitor type. |
There was a problem hiding this comment.
Can we list out the built-in ones? See other similar examples in docs for logging, filters, etc.
There was a problem hiding this comment.
ditto - deferring to a separate docs PR
| message OverloadAction { | ||
| // The name of the overload action. This is just a well-known string that listeners can | ||
| // use for registering callbacks. Custom overload actions should be named using reverse | ||
| // DNS to ensure uniqueness. |
There was a problem hiding this comment.
None yet. I'll be adding some in a future PR and will update the docs here as part of that.
| * changes state | ||
| */ | ||
| virtual void registerForAction(const std::string& action, Event::Dispatcher& dispatcher, | ||
| std::function<void(bool)> callback) PURE; |
There was a problem hiding this comment.
instead of bool can we create an enum with clear state names? In addition to docs this might be useful in the future if we need more states.
| virtual ~Trigger() {} | ||
|
|
||
| // Updates the current value of the metric and returns whether the trigger has changed state. | ||
| virtual bool updateValue(double value) = 0; |
There was a problem hiding this comment.
nit: s/= 0/PURE. Same elsewhere.
| namespace Envoy { | ||
| namespace Server { | ||
|
|
||
| class OverloadActionImpl { |
There was a problem hiding this comment.
nit: Since this doesn't actually implement any interface I would call this OverloadAction
Signed-off-by: Elisha Ziskind <eziskind@google.com>
Signed-off-by: Elisha Ziskind <eziskind@google.com>
|
tsan failure is unrelated to my change: |
|
@eziskind I kicked the test. Will take another pass tomorrow. |
mattklein123
left a comment
There was a problem hiding this comment.
Looks great, happy to see this land.
Add an overload manager module to protect an Envoy instance from resource starvation.
For issue #373 - see design doc linked from there
Risk level: low (not linked into envoy main yet)
Signed-off-by: Elisha Ziskind eziskind@google.com