Skip to content

Add overload manager for Envoy#3954

Merged
mattklein123 merged 6 commits intoenvoyproxy:masterfrom
eziskind:overload2
Jul 31, 2018
Merged

Add overload manager for Envoy#3954
mattklein123 merged 6 commits intoenvoyproxy:masterfrom
eziskind:overload2

Conversation

@eziskind
Copy link
Copy Markdown
Contributor

Add an overload manager module to protect an Envoy instance from resource starvation.
For issue #373 - see design doc linked from there

Risk level: low (not linked into envoy main yet)

Signed-off-by: Elisha Ziskind eziskind@google.com

Signed-off-by: Elisha Ziskind <eziskind@google.com>
Copy link
Copy Markdown
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's headed in the right direction. I put down some initial review comments for you to look at in addition to the failing test.

public:
ThresholdTriggerImpl(const envoy::config::overload::v2alpha::ThresholdTrigger& config)
: threshold_(config.value()) {
RELEASE_ASSERT(threshold_ >= 0, "");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration validation should normally throw EnvoyException rather than crashing the server with RELEASE_ASSERT.

This one in particular, however, you could implement with the PGV (see https://github.com/envoyproxy/envoy/blob/master/api/envoy/type/percent.proto#L13 for an example). I think you'll need to invoke MessageUtil::validate on the OverloadManager config object somewhere for this to work. That may come for free later when OverloadManager is added to the config hierarchy.

trigger = std::make_unique<ThresholdTriggerImpl>(trigger_config.threshold());
break;
case envoy::config::overload::v2alpha::Trigger::TRIGGER_ONEOF_NOT_SET:
RELEASE_ASSERT(false, "missing trigger");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: I think we prefer

default:
  NOT_REACHED_GCOVR_EXCL_LINE;

}

const auto result = triggers_.insert(std::make_pair(trigger_config.name(), std::move(trigger)));
RELEASE_ASSERT(result.second, "duplicate trigger");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, use an EnvoyException here. Likewise in the RELEASE_ASSERT statements below, when used for config validation.

OverloadManagerImpl(Event::Dispatcher& dispatcher,
const envoy::config::overload::v2alpha::OverloadManager& config);

void start() override;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: use // Server::OverloadManager to indicate the interface you're implementing here (and group the interface method overrides without blank lines)

Resource(const std::string& name, ResourceMonitorPtr monitor, OverloadManagerImpl& manager)
: name_(name), monitor_(std::move(monitor)), manager_(manager), pending_update_(false) {}

void onSuccess(const ResourceUsage& usage) override;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// ResourceMonitor::Callbacks

void OverloadManagerImpl::updateResourcePressure(const std::string& resource, double pressure) {
auto action_range = resource_to_actions_.equal_range(resource);
typedef std::unordered_multimap<std::string, std::string> ActionMap;
typedef std::unordered_multimap<std::string, ActionCallback> CallbackMap;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider putting these typedefs in the class definition.

typedef std::unordered_multimap<std::string, ActionCallback> CallbackMap;
std::for_each(action_range.first, action_range.second, [&](ActionMap::value_type& entry) {
const std::string& action = entry.second;
auto action_it = actions_.find(entry.second);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use action instead of entry.second.


message Trigger {
// The name of the resource this is a trigger for.
string name = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a validation rule to prevent empty strings?

message ThresholdTrigger {
// If the resource metric (ie. pressure) is greater than or equal to this value, the trigger
// will fire.
double value = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to a previous comment, if this has a limited range, it should be documented/validated here.

* Register a callback to be invoked when the specified action changes state (ie. becomes
* activated or inactivated). The callback is posted with the given dispatcher.
* Must only be called during Envoy initialization when still in single-threaded mode.
*/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: Please add param tags for the parameters.

eziskind added 2 commits July 26, 2018 14:22
Signed-off-by: Elisha Ziskind <eziskind@google.com>
The percent protobuf allows values 0..100, while we just want a fraction from 0 to 1. Added PGV validation rule.

Signed-off-by: Elisha Ziskind <eziskind@google.com>
@eziskind
Copy link
Copy Markdown
Contributor Author

Thanks for the review - all feedback should be addressed now.

Copy link
Copy Markdown
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment from me.

Probably @mattklein123 or @htuch should have a look now.

const bool active = isActive();

auto it = triggers_.find(name);
RELEASE_ASSERT(it != triggers_.end(), "");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like program invariants to me, in which case they should probably stay plain ASSERTs. (per https://github.com/envoyproxy/envoy/blob/master/STYLE.md#error-handling).

I think this is true for most if not all the RELEASE_ASSERTs you have.

Signed-off-by: Elisha Ziskind <eziskind@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Excited to see this land. Some questions and small comments.

@@ -0,0 +1,63 @@
syntax = "proto3";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure this shows up in the docs? I don't think it will without including this new package, but I'm not sure. Also, I think it would be worthwhile to add a new short arch overview section on this functionality and briefly describe the built-in capabilities. https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/arch_overview. Perhaps a new section on "Overload manager?"

Also we might also want some docs here with more details that finally link into the v2 docs: https://www.envoyproxy.io/docs/envoy/latest/configuration/configuration. Unclear if that is worth it or not if you have arch overview, so we can see how it goes?

Alternatively, if you want to do this in a follow up when you actually implement the listener actions, that's fine also. Can we check pending items like docs/stats/etc. in a tracking issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning on adding docs in an upcoming PR when I hook the overload manager config into the bootstrap proto. We can use the original issue (#373) to track this unless you prefer a separate one.


import "validate/validate.proto";

// The Overload Manager provides an extensible framework to protect Envoy instances
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need proto title for docs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in a separate docs PR.


message ResourceMonitor {
// The name of the resource monitor to instantiate. Must match a registered
// resource monitor type.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we list out the built-in ones? See other similar examples in docs for logging, filters, etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto - deferring to a separate docs PR

message OverloadAction {
// The name of the overload action. This is just a well-known string that listeners can
// use for registering callbacks. Custom overload actions should be named using reverse
// DNS to ensure uniqueness.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any built-in ones?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None yet. I'll be adding some in a future PR and will update the docs here as part of that.

* changes state
*/
virtual void registerForAction(const std::string& action, Event::Dispatcher& dispatcher,
std::function<void(bool)> callback) PURE;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of bool can we create an enum with clear state names? In addition to docs this might be useful in the future if we need more states.

virtual ~Trigger() {}

// Updates the current value of the metric and returns whether the trigger has changed state.
virtual bool updateValue(double value) = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/= 0/PURE. Same elsewhere.

namespace Envoy {
namespace Server {

class OverloadActionImpl {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since this doesn't actually implement any interface I would call this OverloadAction

eziskind added 2 commits July 30, 2018 15:28
Signed-off-by: Elisha Ziskind <eziskind@google.com>
Signed-off-by: Elisha Ziskind <eziskind@google.com>
@eziskind
Copy link
Copy Markdown
Contributor Author

tsan failure is unrelated to my change:
@envoy//test/extensions/filters/network/thrift_proxy:filter_integration_test TIMEOUT in 315.0s

@mattklein123
Copy link
Copy Markdown
Member

@eziskind I kicked the test. Will take another pass tomorrow.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, happy to see this land.

@mattklein123 mattklein123 merged commit 83b9e2d into envoyproxy:master Jul 31, 2018
@eziskind eziskind deleted the overload2 branch July 31, 2018 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants