WatchDog Extension hook#12416
Conversation
Signed-off-by: Kevin Baichoo <kbaichoo@google.com> WIP, added calls to callbacks and structures to store registered callbacks. Signed-off-by: Kevin Baichoo <kbaichoo@google.com> Exposed watchdog actions from configuration, hooked up guarddog to be able to check the registry for callbacks. Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
…tion object (instead of a plain cb) for the registered actions in case we need internal state, set up scaffolding for test. Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com> Slight cleanup. Signed-off-by: Kevin Baichoo <kbaichoo@google.com> Added an additional GuardDogActionFactory to clarify different action behaviors, and use RELEASE_ASSERT with messages instead of different signals. Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
| // If not specified the default is 0. | ||
| type.v3.Percent multikill_threshold = 5; | ||
|
|
||
| // Register actions that will fire on given WatchDog events. |
There was a problem hiding this comment.
you may want to say something about any guarantees about ordering that you intend to provide. They are mostly relevant for the case of KILL/MULTIKILL where it may be useful to specify several debug related actions followed by an alternate FATAL action used to replace the default guarddog initiated fatal action.
antoniovicente
left a comment
There was a problem hiding this comment.
Looking pretty good. Some minor comments below.
source/server/guarddog_impl.h
Outdated
| const std::chrono::milliseconds loop_interval_; | ||
| Stats::Counter& watchdog_miss_counter_; | ||
| Stats::Counter& watchdog_megamiss_counter_; | ||
| using EventToActionsMap = std::unordered_map<WatchDogAction::WatchdogEvent, |
There was a problem hiding this comment.
Use absl::flat_hash_map instead of std::unordered_map as requested by clang-tidy checks.
source/server/guarddog_impl.cc
Outdated
| const auto& actions = config.wdActions(); | ||
| for (const auto& action : actions) { | ||
| if (action.event() == WatchDogAction::UNKNOWN) { | ||
| ENVOY_LOG_MISC(error, "WatchDogAction specified with UNKNOWN event"); |
There was a problem hiding this comment.
Should we throw an exception on unknown/unspecified event?
There was a problem hiding this comment.
Will throw a ProtoValidationException() as that seemed to be the EnvoyException class that suits this scenario.
test/server/guarddog_impl_test.cc
Outdated
| NiceMock<Stats::MockStore> stats; | ||
| NiceMock<Configuration::MockMain> config(0, 0, 0, 0, 0); | ||
| std::vector<std::string> actions; | ||
| NiceMock<Configuration::MockMain> config(0, 0, 0, 0, 0, actions); |
There was a problem hiding this comment.
You could also write:
NiceMockConfiguration::MockMain config(0, 0, 0, 0, 0, {});
There was a problem hiding this comment.
Ended up going with
NiceMockConfiguration::MockMain config(0, 0, 0, 0, 0, std::vector<std::string>{});
as with plain {} the compiler was struggling with the type deduction.
test/server/guarddog_impl_test.cc
Outdated
| // the events vector passed to it. | ||
| // Instances of this class will be registered for GuardDogEvent through | ||
| // TestGuardDogActionFactory. | ||
| class LogGuardDogAction : public Configuration::GuardDogAction { |
There was a problem hiding this comment.
nit: RecordGuardDogAction may be a more accurate name.
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
|
/ retest |
|
🐴 hold your horses - no failures detected, yet. |
include/envoy/server/guarddog.h
Outdated
| #include "envoy/common/time.h" | ||
| #include "envoy/config/bootstrap/v3/bootstrap.pb.h" | ||
| #include "envoy/server/watchdog.h" | ||
| #include "envoy/thread/thread.h" |
There was a problem hiding this comment.
nit: Are envoy/config/bootstrap/v3/bootstrap.pb.h and/or envoy/common/time.h used by this header?
There was a problem hiding this comment.
Good catch, must have happened at some point in development (but it's unnecessary)
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
|
/assign @dio I think you're the on-call maintainer, so I'm supposed to assign it to you. PTAL. Thanks! |
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
|
IIUC now, the on-call maintainer just triages and figures out who to assign the bug. I think I'm suppose to ask: @envoyproxy/senior-maintainers to take a look. PTAL. |
mattklein123
left a comment
There was a problem hiding this comment.
LGTM with small API/doc comments. Thank you!
/wait
| option (udpa.annotations.versioning).previous_message_type = "envoy.config.bootstrap.v2.Watchdog"; | ||
|
|
||
| message WatchdogAction { | ||
| // In priority events are fired from KILL, MULTIKILL, MEGAMISS and MISS. |
There was a problem hiding this comment.
What is an "in priority event" ? Typo? Clarify?
There was a problem hiding this comment.
The events are fired in this order: KILL, MULTIKILL, MEGAMISS, MISS.
Changed it to the above to clarify.
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
LGTM modulo the doc nits. Don't feel you have to spend a lot of time on that. We can come back and fix them later so feel free to revert part of what you changed if it's not obvious. Thank you!
/wait
docs/root/extending/extending.rst
Outdated
| * Transport sockets | ||
| * BoringSSL private key methods | ||
| * :ref:`Transport sockets <envoy_v3_api_field_extensions.transport_sockets.tls.v3.CommonTlsContext.CertificateProvider.typed_config>` | ||
| * :ref:`BoringSSL FIPS <arch_overview_ssl_fips>` |
There was a problem hiding this comment.
Is this an extension point? Did you mean to ref link to the private key methods or have a different title?
There was a problem hiding this comment.
will revert the BoringSSL FIPs since I'm not entirely sure. I was basing this on pr #6326 but might be missing some context.
docs/root/extending/extending.rst
Outdated
| * :ref:`Transport sockets <envoy_v3_api_field_extensions.transport_sockets.tls.v3.CommonTlsContext.CertificateProvider.typed_config>` | ||
| * :ref:`BoringSSL FIPS <arch_overview_ssl_fips>` | ||
| * :ref:`Watchdog action <envoy_v3_api_msg_config.bootstrap.v3.Watchdog.WatchdogAction>` | ||
| * :ref:`HTTP internal redirects <arch_overview_internal_redirects>` |
There was a problem hiding this comment.
Is this an extension point?
There was a problem hiding this comment.
This was based off of pr #10908.
In https://github.com/envoyproxy/envoy/blob/master/api/envoy/config/route/v3/route_components.proto#L1722 there's a core.v3.TypedExtensionConfig. I linked arch_overview_internal_redirects as in the PR where the change was made http_connection_management.rst was modified primarily in that section it seems to accommodate.
I will revert here, but if you think this makes sense I'm happy to add this in another PR.
There was a problem hiding this comment.
I see, OK. I guess I would mildly prefer that we ref link directly to the TypedExtensionConfig field to make it more obvious along with any arch docs. In any case, I don't want to hold up your PR, but any follow ups to improve the docs would be appreciated!
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
Awesome work! Just a few small comments and we can ship.
/wait
| // For KILL/MULTIKILL there is a default PANIC that will kill the process, | ||
| // but it might be useful to specify several debug actions, and possibly |
There was a problem hiding this comment.
Sorry can you clarify this a bit? Does specifying any config for KILL/MULTIKILL override the default PANIC action? It's not clear from the doc.
There was a problem hiding this comment.
Changed it to specify the PANIC() call runs after the registered actions for that case if the process wasn't yet killed.
source/server/guarddog_impl.cc
Outdated
| if (action.event() == WatchDogAction::UNKNOWN) { | ||
| throw ProtoValidationException("WatchDogAction specified with UNKNOWN event", action); |
There was a problem hiding this comment.
Please a PGV annotation on the proto for enum defined only. That will make this impossible and checked by the schema.
There was a problem hiding this comment.
Great idea, thanks for pointing it out. Done.
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com> Commit Message: Updated the extending envoy documentation to include a missing extension point. Additional Description: See PR #12416 for additional context of this. Risk Level: low Testing: built the documents to ensure they generate correctly. Docs Changes: Release Notes:
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
* master: (67 commits) logger: support log control in admin interface and command line option for Fancy Logger (envoyproxy#12369) test: fix http_timeout_integration_test flake (envoyproxy#12654) [fuzz]added an input check in writefilter fuzzer and added test cases (envoyproxy#12628) add 'explicit' restriction. (envoyproxy#12643) scoped_rds_integration_test migrate from api v2 to api v3. (envoyproxy#12633) fuzz: added fuzz test for listener filter tls_inspector (envoyproxy#12617) testing: fix multiple race conditions in simulated time tests (envoyproxy#12527) [tls] Move handshaking behavior into SslSocketInfo. (envoyproxy#12571) header: getting rid of exception-throwing behaviors in header files [the rest] (envoyproxy#12611) router: add new ratelimited retry backoff strategy (envoyproxy#12202) [redis_proxy] added a constraint for route.prefix().size() (envoyproxy#12637) network: add tcp listener backlog config (envoyproxy#12625) runtime: debug log that condition is always true when fractionalPercent numerator > denominator (envoyproxy#12068) WatchDog Extension hook (envoyproxy#12416) router: add dynamic metadata header formatter (envoyproxy#11858) statsd: revert visibility to public (envoyproxy#12621) Fix regression of /build_* in gitignore (envoyproxy#12630) Added a missing extension point to documentation. (envoyproxy#12620) Reverts proxy protocol test on windows (envoyproxy#12619) caching: Improved the tests and coverage of the CacheFilter tree (envoyproxy#12544) ... Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Commit Message: Establish an extension point for actions to run based on Watch Dog Events.
Additional Description:
Risk Level: Medium
Testing: unit tests
Docs Changes: N/A
Release Notes: Included.
Fixes #11388