Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Http trailer support for HTTP/1.x#35337

Merged
stephentoub merged 20 commits intodotnet:masterfrom
caesar-chen:trailer_header
Mar 16, 2019
Merged

Http trailer support for HTTP/1.x#35337
stephentoub merged 20 commits intodotnet:masterfrom
caesar-chen:trailer_header

Conversation

@caesar-chen
Copy link
Contributor

I decided to separate HTTP/1.x and HTTP/2 trailing header support in 2 different PRs. For HTTP/1.x, trailing headers
are at the end of the chunked response message, while for HTTP/2, HEADERS frame starting the trailing header block
is sent after DATA frames have been sent (not part of the DATA frames). Thus it's hard to apply the new GenericLoopbackServer
to test for both cases.

Pending API review #34912, so I mark this PR as no merge.

/cc: @dotnet/ncl @geoffkizer @stephentoub @Tratcher

@caesar-chen caesar-chen added * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) area-System.Net.Http.SocketsHttpHandler labels Feb 14, 2019
@caesar-chen caesar-chen added this to the 3.0 milestone Feb 14, 2019
@caesar-chen caesar-chen self-assigned this Feb 14, 2019
@geoffkizer
Copy link

I think perhaps we're overcomplicating this.

The simplest approach here seems to be: Make TrailingHeaders work just like Headers, i.e. lazily instantiated, no setter.

This approach avoids handler lock-in, and has no additional overhead if you don't use trailers (the 99% case).

What it doesn't address is:
(1) If you do care about trailers, but there aren't actually any trailers on the response, then this will incur an additional allocation of an empty headers collection for TrailingHeaders. I'm not convinced this matters. This is a very uncommon case, and we already allocate quite a bit in this scenario anyway. One additional empty collection doesn't seem like it matters much.
(2) How can a user know when the trailers have been processed and made available? First of all, in the default case where the response is buffered, this isn't an issue, because trailers will always be made available immediately. It's only the non-buffered case where this is an issue. For this case, I think we can simply say that trailers are available once you have read to the end of the response stream. This seems straightforward and intuitive.

So, my suggestion is, let's just do the simple thing here.

@Tratcher
Copy link
Member

Sounds workable.

@stephentoub
Copy link
Member

For this case, I think we can simply say that trailers are available once you have read to the end of the response stream. This seems straightforward and intuitive.

What happens if someone accesses the collection before then? Presumably it would get a collection that could potentially be filled in/ mutated concurrently, so we'd need to be very clear that you couldn't actually access the collection until the stream ended, or else race conditions.

Other than that I'm fine with it. My primary concern is not negatively impacting the 99.9% where there are no trailers and the consumer doesn't care about trailers, and in such cases I would be concerned about allocating an additional collection. We just need to be sure then we don't accidentally force it into existence unnecessarily.

@davidsh
Copy link
Contributor

davidsh commented Feb 16, 2019

What happens if someone accesses the collection before then? Presumably it would get a collection that could potentially be filled in/ mutated concurrently, so we'd need to be very clear that you couldn't actually access the collection until the stream ended, or else race conditions.

This seems like a recipe for disaster. We should not give out a collection but then hope that someone doesn't modify it while our HTTP stack is trying to add elements to it.

If the headers aren't ready, we need to give out something immutable or NULL etc.

@geoffkizer
Copy link

HttpResponseMessage and its associated objects are not thread safe and should not be used in a concurrent manner.

I'm pretty sure there are plenty of ways you could get into trouble today if you do this. For example, if you try to read on the response stream from multiple threads concurrently, very bad things could happen.

@davidsh
Copy link
Contributor

davidsh commented Feb 16, 2019

For example, if you try to read on the response stream from multiple threads concurrently, very bad things could happen.

This specific type of thing should already be protected. Most of our network streams public API surface for Read/Write protect against starting a new Read/Write while one is in progress. They usually will throw an InvalidOperationException in those cases.

@geoffkizer
Copy link

To clarify a bit:

What happens if someone accesses the collection before then? Presumably it would get a collection that could potentially be filled in/ mutated concurrently, so we'd need to be very clear that you couldn't actually access the collection until the stream ended, or else race conditions.

We should ensure that the collection is never mutated concurrently. Rather, it is mutated at a very specific point, which is when the caller reads the response stream and there is no more data, so read returns 0 bytes to indicate EOF. I believe we've already agreed this is when trailers should be made available, independent of how they are actually surfaced.

If we do this, then it's fine to retrieve and access the collection either before or after receiving EOF. You just won't see any actual trailing headers until after you receive the EOF.

For HTTP/1.1, implementing this as described above is straightforward. For HTTP2, it's a little trickier because we can receive the trailing HEADERS frame concurrently to however the user is actually accessing the HttpResponseMessage object. That means we will probably need to store the received trailing headers on the Http2Stream initially, and only add them to the actual TraiingHeaders collection when we are returning EOF. But we need to do something like this anyway to implement the correct behavior for when trailing headers become available.

@geoffkizer
Copy link

This specific type of thing should already be protected. Most of our network streams public API surface for Read/Write protect against starting a new Read/Write while one is in progress.

As far as I'm aware, we don't do this for request/response streams in SocketsHttpHandler.

But regardless, I don't think HttpResponseMessage is intended to support concurrent access and I don't think we need to or want to provide guarantees here.

@caesar-chen
Copy link
Contributor Author

But regardless, I don't think HttpResponseMessage is intended to support concurrent access and I don't think we need to or want to provide guarantees here.

We will always return the HttpResponseMessage only after we have the Headers ready - we don't need to concern about the concurrent access since we have done with Headers collection. IMHO, it's consistent.

However, TrailingHeaders is not, and I think consistency is required - we should not return a half-done and modifiable collection to user. Therefore, as Stephen and Geoff have mentioned, two approaches to pick from:

a) Current pattern: use the setter. I can make it internal/private for now, and we can discuss later if we want to make setter public for all related properties.

b) Add an additional TrailingHeadersStatus property, so that users will know what's the current stage in parsing trailers. Notice: they still can ignore this new property, and nothing will prevent them from partying on the TrailingHeaders.

I vote for a) for simplicity and additional protection for concurrency. Thoughts?
@dotnet/ncl

@Tratcher
Copy link
Member

I think @geoffkizer had it nailed down pretty well. Make the headers lazy initialized without a setter like all the others and only populate them during the final Stream.ReadAsync. That way you're doing it on the application's thread and know the application isn't (shouldn't) be touching any of the other objects at the same time.

@caesar-chen
Copy link
Contributor Author

and only populate them during the final Stream.ReadAsync

Oh, sorry I missed that part. Sounds workable, let me fix in next commit.

@stephentoub
Copy link
Member

That way you're doing it on the application's thread

I'm not sure what that means, since the code could be:

var headers = resp.TrailingHeaders;
var read = stream.ReadAsync(...);
foreach (var header in headers) { ... }
await read;

so we'll need to be very clear about what the guarantees are. I suggest we explicitly say that consuming code should never access the collection property until EOF.

@stephentoub
Copy link
Member

We should also make sure that we don't inadvertently add to the collection while consuming trailing headers while draining a connection that's being returned back to the pool. Such trailers should be ignored.

@geoffkizer
Copy link

we'll need to be very clear about what the guarantees are. I suggest we explicitly say that consuming code should never access the collection property until EOF.

I don't think we need to be that strict (though it doesn't hurt to say that). The code you posted should work fine in this approach.

@caesar-chen
Copy link
Contributor Author

I will need this PR in for HTTP/2 work, PTAL, thanks.

/cc: @dotnet/ncl

if (descriptor.HeaderType == HttpHeaderType.Content)
if (isFromTrailer)
{
response.TrailingHeaders.TryAddWithoutValidation(descriptor, headerValue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do the same descriptor.HeaderType == HttpHeaderType.Request ? descriptor.AsCustomHeader() : descriptor dance that's done in the else block below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me since the headers go to their own collection. I made that change so it is consistent with "normal" headers.

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, but left a handful of comments/questions.

@wfurt wfurt self-assigned this Mar 11, 2019
@wfurt
Copy link
Member

wfurt commented Mar 15, 2019

your feedback should be covered @stephentoub. Please take another look when you get chance.

@stephentoub stephentoub merged commit ac9c746 into dotnet:master Mar 16, 2019
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* http trailer for http/1.1

* address feedback

* address feedback

* pr feedback

* stop blocking context

* fix framework test failiure

* address feedback

* seperate tests

* address some feedback

* more feedback from review

* correct ReadAsync

* feedback from review

* handle short ReadAsync()

* more feedback


Commit migrated from dotnet/corefx@ac9c746
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants