Skip to content

[http] (Proof of concept) Remove an exception from the H/1 codec#10484

Closed
asraa wants to merge 18 commits intoenvoyproxy:masterfrom
asraa:poc
Closed

[http] (Proof of concept) Remove an exception from the H/1 codec#10484
asraa wants to merge 18 commits intoenvoyproxy:masterfrom
asraa:poc

Conversation

@asraa
Copy link
Copy Markdown
Contributor

@asraa asraa commented Mar 23, 2020

This is a proof of concept to remove exceptions from the codec.

  • The flow control stays the same by returning non-zero error code from codec callbacks. Information about the error (for eg, header or trailer overflow) is stored in a local variable codec_exception_ and returned as an absl::Status. Note: this only happens while dispatching.
  • If an error status is detected after a call to dispatch in the HCM or codec, we close the connection in the same way we catch a codec exception.
  • If we are not dispatching, the codec error is treated as a RELEASE_ASSERT.

Testing: See H/1 codec (large diff) for many headers testing that returns an error status

asraa added 2 commits March 20, 2020 16:30
Signed-off-by: Asra Ali <asraa@google.com>
asraa added 2 commits March 23, 2020 13:06
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without diving into the weeds, the basic control flow here makes sense. Is the plan-of-record to have a forked codec with runtime switch?

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level this looks great to me. Flushing out a few high level comments. Thank you so much for working on this!

/wait

try {
codec_->dispatch(data);
auto status = codec_->dispatch(data);
if (!status.ok()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we differentiate between codec exception and premature response exception here to be clear and maybe assert that it's just the codec type for now if the other is not implemented yet?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Yes. It looks like I'd need to differentiate between codec/premature, and also have some information on the HttpCode of the premature response.

We could wrap absl::Status to have that data (and have our defined HttpCodes), or we could have an accessor in to the codecs, for eg getCodecException()/getPrematureResponseException. wdyt? @yanavlasov

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could wrap absl::Status to have that data (and have our defined HttpCodes)

My preference would be for the return value to contain the needed info, similar to the exception it is replacing?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping Status would make it lose some of its value. Why do we need to distinguish between the two failure conditions? For protocol error stats?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we do different logic depending on the exception type currently. This needs to be modeled somehow.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this seems like absl::Status might not be what we want. Specifically its error space does not map to Envoy use cases well. Here we use the "Internal error", which is not strictly correct. absl::Status error space is mostly a reflection of internal Google practices centered around stubby and seems to not be well suited for us.
I think we should define our own type very similar to absl::Status but with our own error space, which would let us map exception types naturally. absl::Status is very small if you take various helpers out. We can use the same handle-body pattern to make it fit into a register and get rid of some absl::Status stuff we will not need and perhaps add stuff that we will need.
However, this will require us to make our own StatusOr<> as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for doing what @yanavlasov suggests if that is the cleanest way of getting what we need. It doesn't seem that bad/hard to duplicate.

if (data.length() > 0) {
for (const Buffer::RawSlice& slice : data.getRawSlices()) {
total_parsed += dispatchSlice(static_cast<const char*>(slice.mem_), slice.len_);
dispatching_ = true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider using Cleanup (or similar) here, to avoid any strangeness around not setting disaptching_ back to false in the early return cases. Granted, it might not matter but probably good to be consistent. If this is done you can probably just set dispatching_ above to true with a guard in a single place?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like this pattern as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is similar code in the H2 codec that should be changed to use cleanup also.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be set during maybeDirectDispatch? I think it wouldn't hurt.

Setting dispatch_ at the top of ConnectionImpl::dispatch + cleanup to unset seems like the way to go.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be set during maybeDirectDispatch? I think it wouldn't hurt.

As in, redundantly set to dispatching_ to true inside the method?

@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Mar 23, 2020

the basic control flow here makes sense. Is the plan-of-record to have a forked codec with runtime switch?

It might be. Would love more comments. We could fork the codecs, but the change to dispatch's return value would still need to be changed. (Or we could have an inner wrapper to Connection::dispatch).

@mattklein123
Copy link
Copy Markdown
Member

Is the plan-of-record to have a forked codec with runtime switch?

We discussed this in our meeting last week. If we do this in small incremental changes I don't personally feel the need to make this runtime controllable. If we do want to do that I think yes we are going to need to copy the code. cc @yanavlasov @asraa @antoniovicente @alyssawilk

asraa added 2 commits March 24, 2020 08:45
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
if (data.length() > 0) {
for (const Buffer::RawSlice& slice : data.getRawSlices()) {
total_parsed += dispatchSlice(static_cast<const char*>(slice.mem_), slice.len_);
dispatching_ = true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASSSERT(!dispatching_) would make sense here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

dispatching_ = false;
}
} else {
dispatching_ = true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

if (data.length() > 0) {
for (const Buffer::RawSlice& slice : data.getRawSlices()) {
total_parsed += dispatchSlice(static_cast<const char*>(slice.mem_), slice.len_);
dispatching_ = true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like this pattern as well.


static http_parser_settings settings_;

bool dispatching_ : 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer default initialize scalars with {} (this might be necessary to allow the suggested ASSERTs to work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was initialized like this in the H/2 codec, however in a new connection it is initialized as false... it makes more sense for me to initialize false, so I did {false}.

static http_parser_settings settings_;

bool dispatching_ : 1;
absl::Status codec_exception_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on the use of this new field?

We should also ASSERT(codec_exception_.ok()) in various places including at the top of dispatchSlice and places where we end up overwriting it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with custom type we can push this logic into the type and either make it ASSERT when an error condition is being overwritten or make it idempotent.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some methods shouldn't be entered when there's an error already set. I agree that we can have the custom type complain when an error value is overwritten.

Signed-off-by: Asra Ali <asraa@google.com>
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! some thoughts on my end

throw CodecProtocolException(absl::StrCat(header_type, " size exceeds limit"));
codec_exception_ = absl::InternalError(absl::StrCat(header_type, " size exceeds limit"));
// Crash if not dispatching_, otherwise exit early from the parser.
RELEASE_ASSERT(dispatching_, "CodecProtocolException occurred outside of dispatching");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to do this I'd prefer we have ASSERTS under completeLastHeader() and all friends that they're below dispatching as well.
Basically let's have fuzzing give us defense in depth for functions which should only be called here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this will likely break something, and I'm a little confused by our model of throwing exceptions -- RequestEncoderImpl::encodeHeaders gets called outside the scope of dispatching_ pretty often, and throws when there's missing path/method headers. I'm guessing that when it's called outside of dispatching_, we already have the guarantee that these exist?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, maybe I should put ASSERT's in the branches where we throw

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

know that this will likely break something, and I'm a little confused by our model of throwing exceptions -- RequestEncoderImpl::encodeHeaders gets called outside the scope of dispatching_ pretty often, and throws when there's missing path/method headers.

TBH I don't recall the history here. That "exception" shouldn't be an exception, it should just be an ASSERT/crash/etc. because it should never happen, so feel free to fix that however you want.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's only the encode pathways that may break something. I added ASSERTs!

Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Mar 25, 2020

Quick update: It looks like addressing quite a few comments here means we should have a custom Envoy::Status. Yan's working on that in parallel with me working on the plumbing to split the codec into a legacy and exception-removed one.

@stale
Copy link
Copy Markdown

stale bot commented Apr 1, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 1, 2020
Signed-off-by: Asra Ali <asraa@google.com>
@stale
Copy link
Copy Markdown

stale bot commented Apr 8, 2020

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot closed this Apr 8, 2020
asraa added 4 commits April 13, 2020 09:38
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@asraa asraa reopened this Apr 13, 2020
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Apr 13, 2020
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Apr 13, 2020

Hey!! I updated this PR to address the comments, including adding Yan's StatusOr in. Some outstanding comments I responded to. WDYT?

Signed-off-by: Asra Ali <asraa@google.com>
return 0;
Envoy::StatusOr<int> statusor =
static_cast<ConnectionImpl*>(parser->data)->onHeaderField(at, length);
return statusor.ok() ? statusor.value() : 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment on why we return 1 in the non-OK case? Same elsewhere. You might consider a human readable constant for the special return codes if that is helpful for readability.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I'd prefer a human readable constant. For uniformity, I think it makes the most sense to use -1, and reserve 1 and 2 (which have special meanings for on_headers_complete)

Comment on lines +455 to +456
// Crash if not dispatching_, otherwise set error and exit early from the parser.
RELEASE_ASSERT(dispatching_, "CodecProtocolError occurred outside of dispatching");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must be dispatching during header completion, right? Meaning, I think it's very easy to convince ourselves that this must be the case? If so I'm not sure it's worth this assert here. (vs. encode paths which are a lot more confusing).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yes, completeLastHeader has a very tractable call stack, all of which occur in callbacks.

throw CodecProtocolException(absl::StrCat(header_type, " size exceeds limit"));
codec_exception_ = absl::InternalError(absl::StrCat(header_type, " size exceeds limit"));
// Crash if not dispatching_, otherwise exit early from the parser.
RELEASE_ASSERT(dispatching_, "CodecProtocolException occurred outside of dispatching");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

know that this will likely break something, and I'm a little confused by our model of throwing exceptions -- RequestEncoderImpl::encodeHeaders gets called outside the scope of dispatching_ pretty often, and throws when there's missing path/method headers.

TBH I don't recall the history here. That "exception" shouldn't be an exception, it should just be an ASSERT/crash/etc. because it should never happen, so feel free to fix that however you want.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks awesome. Let me know how you want to proceed with the full/final review staging.

Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Apr 15, 2020

@yanavlasov @antoniovicente What do you think of this POC? There are some minor nits about adding ASSERTs unresolved. I think I've gotten the general structure of the HTTP/1 change, and I could proceed replacing all throws for HTTP/1.

If we want to split off, we should decide what to do about the common changes -- dupe them in our branches, and resolve conflicts before merging? Or a separate PR with the common changes (dispatch return value, HCM, codec_client, and test changes)

asraa added 2 commits April 16, 2020 10:15
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@asraa asraa closed this Apr 21, 2020
mattklein123 pushed a commit that referenced this pull request May 5, 2020
(This is the merge-able portion of #10484. It does not include the behavior changes in H/1. Only the necessary changes are there for H/1 and H/2 exception removal to happen separately.)

Description: This introduces a new return value, `Envoy::Http::Status` for `Connection::dispatch`. Currently, dispatch will always return an OK status. This facilitates the migration to remove exceptions in codecs and replace them with error statuses. The HCM and codec client can now handle statuses returned from dispatch, although they will continue to support exceptions (by translating them to the corresponding status) until legacy codecs are deprecated. 

Following this, PRs will stage exception removal in H/1 and H/2 codecs, codecs will be forked to have a legacy version, and a runtime override will allow users to continue to use legacy codecs during the migration. 

See issue #10878 for a full detailed plan and overview.

Risk: Medium-high. A no-op, but touches sensitive code.
Testing: All tests pass, this is a no-op
Issue: #10878

Signed-off-by: Asra Ali <asraa@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants