-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
This is a tracking issue and design overview for removing exceptions:
Title: Remove exceptions in Envoy's HTTP/1 and HTTP/2 codecs
Objective:
Remove exceptions in the presence of untrusted traffic, specifically on codec errors in the data plane with careful attention to codec callbacks and resumption points to ensure the control flow is preserved. This moderate risk change that will include runtime feature flag protection.
Background:
There are performance and security concerns around the use of C++ exceptions in the presence of untrusted traffic. Most simply, exceptions on the data plane are error-prone and offer the potential for denial of service attacks through resource attacks on exception paths and query-of-deaths resulting from uncaught exceptions (#10475). The pattern of throwing exceptions across a language boundary in http_parser and nghttp2’s C callbacks should generally be avoided.
Finally, Envoy deployments at scale are potentially vulnerable to NetSpectre attacks. While it comes with a performance cost, compiling the Envoy binary with Speculative Load Hardening (SLH) is a mitigation technique against Spectre v1 style exploits for TLS deployments. However, protecting exceptions is considerably harder and more complex than normal code, and SLH will not support exceptions. While Envoy’s data plane and control plane are compiled as a single binary, we can narrow our concern to code paths that can be trained with real executions, and then misspeculated with hostile data. This limits the attack surface and scope of the removal to Envoy’s data plane, and specifically, when dispatching incoming connection data to codecs.
Envoy’s HTTP/1 and HTTP/2 codec implementations make use of exceptions in codec callbacks for control flow. These exceptions, defined in exception.h, may catch non-recoverable HTTP protocol errors, client side errors, buffer flooding, and premature responses in the case of HTTP/1. The HTTP/1 codec contains 13 exception callsites, while the HTTP/2 codec implementation contains 7. Although the number of callsites are low, careful attention to the callbacks and resumption points is required to ensure that control flow is preserved.
Design Overview:
See the proof of concept: #10484
Our core design idea is to replace the use of exceptions for control flow with callback error codes and local variables that hold information about the error that occurred during the callback. Error status will have four Envoy-specific types mirroring the subclasses of exceptions. These error statuses will be handled in the HTTP Connection Manager and codec client where a connection dispatches incoming connection data to the codecs.
A local boolean dispatching_ in both codecs’ connections will be used to indicate a state under a dispatch call. Since we will only handle error statuses during a call to dispatch, it is only safe to replace the exceptions with an error status while we are dispatching. We will use debug ASSERTs to validate that the error statuses are set in a dispatching context. If some exceptional circumstance arises and the codec method throws outside of a dispatching context, we will instead RELEASE_ASSERT as this would have been an uncaught exception. In places where the input may be from filters, for example, RequestEncoder::encodeHeaders, proper error handling should be added to guard against intermediate filters causing errors.
To facilitate incremental progress for the change, the try/catch block around dispatch will remain while the HCM and codec client handles dispatch error statuses. The HCM and codec client will set a local status variable that will be set with the return status from dispatch or with the corresponding exception:
envoy/source/common/http/codec_client.cc
Lines 125 to 134 in 7a7facf
| Envoy::Http::Status status; | |
| try { | |
| status = codec_->dispatch(data); | |
| // Exception removal is still in migration. Soon we won't need to catch these exceptions, as | |
| // they'll be propagated through the callbacks and returned from dispatch. | |
| } catch (CodecProtocolException& e) { | |
| status = Envoy::Http::codecProtocolError(e.what()); | |
| } catch (PrematureResponseException& e) { | |
| status = Envoy::Http::prematureResponseError(e.what(), e.responseCode()); | |
| } |
Simultaneous handling of statuses and exceptions allows users to opt-out of the change during the deprecation period by using a runtime override that swaps out forked legacy versions of the codecs. To maintain the legacy codecs during the deprecation period, format rules run on presubmit that compare the legacy and new codecs to an expected “golden” diff that is checked in to Envoy. If a developer makes a codec change, they must ensure that they port the change to the legacy codec, or update the “golden” diff to reflect a change.
The runtime feature flag envoy.reloadable_features.new_codec_behavior and legacy versions will remain for a short deprecation period of about 4-6 weeks while canary testing and fuzzing provide confidence in the change. Fuzz testing will detect any violations to the assumption that codec methods are only called in a dispatching context by crashing on debug ASSERTs placed around throw sites. A simple self-differential fuzz target will also be checked-in to compare legacy and new codec behavior when dispatching incoming connection data. The error status type and message of the new codec’s dispatch should always match the exception type and message returned by the legacy codec’s. We will monitor any crashes caused by RELEASE_ASSERTs where exceptions were thrown outside dispatching while canary testing the change.
Tasks
- Implement Status and StatusOr Add Status and StatusOr classes #10550
- Introduce common codec changes to handle error statuses returned from codec dispatching [http] Introduce error status return type for dispatch (no-op) #10879
- Stage exception removal in HTTP/1: Callback Exception removal to error status, encodeHeaders, etc exception removal to error handling
- Stage exception removal in HTTP/2
- When removals approved, split H/1 and H/2 codecs with a runtime override [http] Initial codec splitting with test parametrization #10591
- Write self-differential codec fuzz target
- Merge staged exception removal PRs
- Monitor and canary changes
- Remove deprecated codecs and runtime switch
@yanavlasov @envoyproxy/security-team