-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Error Handling and Macro Usage Policy Updates and Clarifications
Description:
Envoy contains over 1000 ASSERTs in core code, and over 400 exist in router and HTTP components. In some cases, ASSERTs are not used as statements of true invariants, but used incorrectly to indicate possible errors on the data plane. Especially as Envoy moves towards a hardened posture against extensions, dependencies, and upstreams, macro usage and when proper error handling is needed should be clarified. There is some evidence that ASSERTs are not consistently used for true invariants, either because of wishful assumptions on incoming traffic or from mistaking the invariant (GHSA-gxvv-x4p2-rppp, #13756, #9996, #9509).
In addition to broadening and clarifying types of errors that should be handled, this proposal also suggests adding guidelines for using ENVOY_BUGs, which can be used in addition to error handling and as a way to detect violations in release mode. This could especially help debugging complex crashes, where a condition was violated at some earlier point in time.
A link the the full proposal is here.
Summary
A high level of the changes, with significant differences bolded.
Gracefully handle all errors that may be caused by
- Untrusted data plane traffic (from downstreams, upstreams, or extensions like filters)
- Raised by the Envoy process environment and are plausible to happen (reading bad data from a runtime file, system calls that require permissions, network errors).
- Configuration related errors after startup (e.g. loading configuration at runtime, this is implicit from above)
- Third party dependency return codes (e.g. use of a 3P parser, an external RPC that requires permissions). Some return codes may be handled simply by continuing (e.g. calling an RPC out of process can be handled by “continuing”).
Macros:
- RELEASE_ASSERT: fatal check
- ASSERT: compiled out of opt mode, fatal in debug mode
- ENVOY_BUG: compiled in opt mode and log, fatal in debug mode
- ENVOY_BUG_ALPHA (effectively ENVOY_BUG): Used for alpha/rapidly changing protocols that want to have detectability on probable invariants. - ENVOY_BUG_INTERNAL (compiled by default to ASSERT, can be compiled to ENVOY_BUG): provable, bounded internal invariant. No error handling needed. If this is a weakly held belief that would result in misbehavior if triggered, use ENVOY_BUG and add error handling.
Some clarifications/additions in macro usage philosophy:
- ENVOY_BUGs provide detectability and more confidence than an ASSERT. This is useful for non-trivial conditions, those with complex control flow, and in rapidly changing protocols. Testing should be added to ensures Envoy can continue to operate even if an ENVOY_BUG condition is hit. Otherwise, there is no point in having it compile in release mode and increment a stat on failure.
- Do not ASSERT on conditions imposed by the external environment. Add error handling.
- ASSERTs should be standalone and understandable to a reader. Add comments.
- ASSERTs should be robust to future changes. For example, an ASSERT can document the internal state of a class or an internal post-condition that always holds due to a preceding call.
In addition, the main addition to the proposal is having clear guidelines. A high level bucketing can show potential usages of ASSERT/RELEASE_ASSERT, ENVOY_BUG, and error handling. There is always room for cross-over, dotted lines indicate places where that may happen.
Open Questions
- As time goes on, ENVOY_BUGs should be monitored in production and triaged upstream if needed. Logging levels/severity levels for ENVOY_BUGs may be useful?
- Can we provide tools to help users monitor ENVOY_BUG performance? Should incrementing a stat be configurable in case this is too expensive?
Proposed Actions
- Reach consensus on some of the additional places errors should be handled
- Update STYLE.md guidelines with examples
- Add and document ENVOY_BUG_INTERNAL/ASSERT_INVARIANT for internal class invariants that is ASSERT by default but can be configured to ENVOY_BUG.
- Hopefully ask reviewers and developers to move towards these standards over time
Relevant Links
#9087
