http1: allow configuring format of encoded http1 response headers#8499
http1: allow configuring format of encoded http1 response headers#8499snowp merged 25 commits intoenvoyproxy:masterfrom
Conversation
Adds a configuration option that will convert all header keys into Train-Case. This is useful to allow Envoy to respond with headers that match the casing of other systems, to ensure that the wire format of responses is unchanged when migrating to Envoy. Fixes envoyproxy#8463 Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
|
Mainly looking for API review at this point: is this too specific? Should we be aiming for something more flexible like arbitrary regex transformations with the ability to uppercase matches? For example, this could be represented by a series of regex matches: Open to ideas for how to best express this so that it can be useful for as many people as possible. |
mattklein123
left a comment
There was a problem hiding this comment.
Just flushing out a drive by API comment. Nice to see this being worked on!
api/envoy/api/v2/core/protocol.proto
Outdated
| string default_host_for_http_10 = 3; | ||
|
|
||
| message HeaderKeyFormat { | ||
| enum HeaderFormat { |
There was a problem hiding this comment.
enums are pretty fragile for stuff like this. I would recommend just making this a oneof with a required value and a single TrainCase message that is empty. In the future we can easily add configuration to the TrainCase message or add more complicated message types.
|
Pass-through would work in certain scenarios (h/1.1 all the way), but I had some concerns about supporting intermediate h2 hops, as we tend to use h2 wherever possible. Seems like @alyssawilk shared some of those concerns based on the conversation in #8463. |
Signed-off-by: Snow Pettersen <snowp@squareup.com>
api/envoy/api/v2/core/protocol.proto
Outdated
| string default_host_for_http_10 = 3; | ||
|
|
||
| message HeaderKeyFormat { | ||
| enum HeaderFormat { |
api/envoy/api/v2/core/protocol.proto
Outdated
| // Formats the header using Train-Case: the first character and any character following a | ||
| // special character will be capitalized if it's an alpha character. For example, | ||
| // "content-type" becomes "Content-Type", and "foo$b#$ar" becomes "Foo$B#$Ar". | ||
| TRAIN_CASE = 1; |
There was a problem hiding this comment.
Is "train case" a standard term? I haven't heard it before.
There was a problem hiding this comment.
i hadn't heard about it either before it was used in the issue that was filed. Googling around I can find some references to it (https://docs.rs/Inflector/0.7.0/inflector/cases/traincase/index.html, https://medium.com/infancyit/letter-case-special-styles-7fa369bf173d), though other sources have it mean something else (https://en.wikipedia.org/wiki/Letter_case#Special_case_styles says TRAIN-CASE, https://www.fluentcpp.com/2019/01/08/restmycase-a-c-library-for-formatting-string-cases/ says train_Case).
Open to naming suggestions here, given that this doesn't seem to be a very common term
There was a problem hiding this comment.
One option might be rfc7230_casing since we're basically trying to present the headers the way they are cased in the HTTP/1.1 RFCs.
There was a problem hiding this comment.
that sounds good to me. @PiotrSikora any concerns about taking the name of the RFC in vain?
There was a problem hiding this comment.
"TrainCase" is called PascalCase (aka Upper Camel Case).
We should also have the DONT_MODIFY case (probably a good default, and in theory it's backwards compatible, since HTTP headers are case-insensitive), because not all standard headers use PascalCase (e.g. Content-MD5 and TE), and use transcoding to PascalCase or lowercase only when transforming headers between HTTP/1.1 and HTTP/2.
There was a problem hiding this comment.
Nevermind, PascalCase is without hyphens.
There was a problem hiding this comment.
@htuch using RFC number here isn't appropriate, IMHO, because that would imply that the casing in defined in that RFC, whereas that's not the case (as far as I can tell).
There was a problem hiding this comment.
How about propercase_words? then we define a word as any sequence of alpha characters
api/envoy/api/v2/core/protocol.proto
Outdated
|
|
||
| // Describes how the keys for response headers should be formatted. By default, all header keys | ||
| // are lower cased. | ||
| HeaderKeyFormat response_header_key_format = 4; |
There was a problem hiding this comment.
Is this only for response? What about request headers forwarded to the upstream?
There was a problem hiding this comment.
i can add it for the request path too, but that seems to be a bit more involved since the http1 codec settings aren't currently being passed through to the connection pool/upstream http/1.1 codec (since all the current options apply only to the HCM).
How about i add the proto def /w not-implemented and leave that as a TODO?
There was a problem hiding this comment.
I think it makes sense to have it just be header_format, and keep it hidden until we have it implemented on both sides, because being allowd to set response header format where the settings are used for requests is weird.
WDYT?
There was a problem hiding this comment.
yeah i like that, will go with that
There was a problem hiding this comment.
Arguably though clients and backends might have different case senstivity concerns. E.g. one wants Rfc7230-Casing, the other wants 1337Spe4k-k451nG.
There was a problem hiding this comment.
Sure, but we have codec options in both places.
AFIK when there's protocol options on an HCM, this will affect how we serialize the response headers and when we set the protocol options in CDS it'll affect how we serialize request headers. Since we're not doing enforcement of request headers at the listener or response headers in the connection pool, I don't know what it would mean to have response headers in the connection pool Http1 options config.
I suggest we just have one name, and note it affects how we serialize the headers out at the codec.
There was a problem hiding this comment.
Yeah, I think the fact that the configuration for request and response headers wouldn't occur in the same location (HCM vs Cluster) makes having it be a single field make more sense than two fields with only one ever being relevant nicer
There was a problem hiding this comment.
Good point, a single name makes sense given the multiplicity of codec options on downstream/upstream.
|
@snowp ack, let's continue with explicit capitalization support then. |
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
|
This should be ready for review for whoever wants to pick this up |
alyssawilk
left a comment
There was a problem hiding this comment.
Very cool! First pass thoughts
|
|
||
| static_cast<StreamEncoderImpl*>(context)->encodeHeader(key_to_use, | ||
| header.value().getStringView()); | ||
| if (header_key_formatter_ != nullptr) { |
There was a problem hiding this comment.
I'd lean towards an Encode helper function which took the context, cormatter (which could be null) and string view header key
| namespace Envoy { | ||
| namespace Http { | ||
| namespace Http1 { | ||
| std::string ProperCaseHeaderKeyFormatter::format(absl::string_view key) const { |
There was a problem hiding this comment.
Looking at https://en.wikipedia.org/wiki/List_of_HTTP_header_fields
This won't do A-IM, TE Content-MD5, HTTP2-Setting, and about 10 others the way an HTTP/1 user might expect.
I'd lean towards saying we do standard capitalization of standard HTTP 1.1 headers that Envoy knows about (e.g. TE) with a link to headers Envoy knows about. ProperCaseExceptForRegisteredStandardizedHeaderExceptions is too wordy, but I think that's what we want?
This way if someone later on desperately needs P3P cased they awy they want, they can just add it to the list of Envoy known headers. And when we eventually add the Envoy extension mechanism for arbitrary headers we keep talking about, folks can do their own custom builds with TheirWhackyCAPs if they want.
Alternately if we want to go for the simple case, I think we should document there are known standard headers which will still look unusual to HTTP/1 consumers, and someone can implement ProperCaseByDefaultButWithOverrides later now we have an API for things.
@mattklein123 WDYT?
There was a problem hiding this comment.
Maybe StandardOrProperCase? I can go either way on whether we want to handle well known headers, it sounds a tad bit more in line with what someone might expect but it might be overkill.
Looking at Go's net/http (https://golang.org/src/net/http/header.go#L214) they do the same kind of formatting this PR does when canonicalizing headers except that they only uppercase letters after a hyphen.
There was a problem hiding this comment.
I don't have a strong opinion here as long as it is well documented. The way that we have things set up at the API layer we can always make things configurable later if needed.
There was a problem hiding this comment.
A soft suggestion of having StandardOrProperCase behavior; and perhaps condensing the config name down to StandardCase (which would follow well-known casings for well-known headers, and standard casing patterns for H/1 otherwise).
tools/spelling_dictionary.txt
Outdated
| Welford's | ||
| Werror | ||
| XDS | ||
| ar |
There was a problem hiding this comment.
can we change foo$b#$ar to foo$b#$are for all the various explantions of that transform to avoid this addition? Feels like it'd be a common typo of "are" which we should catch.
mattklein123
left a comment
There was a problem hiding this comment.
Cool stuff, a few drive by comments.
| bool saw_content_length = false; | ||
| headers.iterate( | ||
| [](const HeaderEntry& header, void* context) -> HeaderMap::Iterate { | ||
| [this](const HeaderEntry& header, void* context) -> HeaderMap::Iterate { |
There was a problem hiding this comment.
FWIW, this function used a C-style callback originally for performance reasons when I was optimizing for the horse race case. I would leave a comment here about looking at perf, or potentially just pre-create the lambda at construction time so that in the keep alive case we don't need any extra allocations.
There was a problem hiding this comment.
Do you think it's worth having multiple versions of iterate, one with C-style callback and one with std::function? afaik std::function adds some overhead compared to C-style callbacks due to the additional indirection
There was a problem hiding this comment.
Turns out I was able to do this without changing the signature, since I could just use the encoder that was already being passed as the context to resolve the formatter.
| namespace Envoy { | ||
| namespace Http { | ||
| namespace Http1 { | ||
| std::string ProperCaseHeaderKeyFormatter::format(absl::string_view key) const { |
There was a problem hiding this comment.
I don't have a strong opinion here as long as it is well documented. The way that we have things set up at the API layer we can always make things configurable later if needed.
api/envoy/api/v2/core/protocol.proto
Outdated
| oneof header_format { | ||
| // Formats the header by proper casing words: the first character and any character following | ||
| // a special character will be capitalized if it's an alpha character. For example, | ||
| // "content-type" becomes "Content-Type", and "foo$b#$ar" becomes "Foo$B#$Ar". |
There was a problem hiding this comment.
@envoyproxy/api-shepherds API LGTM. I think if there are nuances around details with how this is done, e.g. whether it's - only or any punctuation, we can sort them out with ProperCaseWords fields later.
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
alyssawilk
left a comment
There was a problem hiding this comment.
Looking good! I'd be inclined to bump risk to medium as there are now some extra refactors, but your call.
| // Formats the header by proper casing words: the first character and any character following | ||
| // a special character will be capitalized if it's an alpha character. For example, | ||
| // "content-type" becomes "Content-Type", and "foo$b#$are" becomes "Foo$B#$Are". | ||
| ProperCaseWords proper_case_words = 1; |
There was a problem hiding this comment.
Optional: Do you think it's worth calling out both that this is an effort to normalize HTTP/1 headers to what users expect, and that this will not behave "as expected" for all headers, e.g. TE? I can imagine that folks who have HTTP/1 casing issues will think this is the solution to their problem and I think documenting the simplicity vs the non-handled corner cases will probably save someone some trouble.
| Headers::get().TransferEncoding.get().size(), | ||
| Headers::get().TransferEncodingValues.Chunked.c_str(), | ||
| Headers::get().TransferEncodingValues.Chunked.size()); | ||
| encodeFormattedHeader(Headers::get().TransferEncoding.get().c_str(), |
There was a problem hiding this comment.
can we skip the c_str() here and in other places?
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
|
/azp run |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
Please merge master to fix coverage. /wait |
Signed-off-by: Snow Pettersen <snowp@squareup.com>
mattklein123
left a comment
There was a problem hiding this comment.
Nice, LGTM with some small comments.
/wait
| runHighWatermarkCallbacks(); | ||
| } | ||
|
|
||
| encode_header_cb_ = [](const HeaderEntry& header, void* context) -> HeaderMap::Iterate { |
There was a problem hiding this comment.
Sorry I think we can kill this member variable and revert this part of the change now, right?
| bool is_response_to_head_request_{false}; | ||
| bool is_content_length_allowed_{true}; | ||
| HeaderMap::ConstIterateCb encode_header_cb_; | ||
| HeaderKeyFormatter* header_key_formatter_; |
There was a problem hiding this comment.
nit: can this be const? (Both the pointer and the object)
| std::string ProperCaseHeaderKeyFormatter::format(absl::string_view key) const { | ||
| auto copy = std::string(key); | ||
|
|
||
| bool shouldCapitalize = true; |
Signed-off-by: Snow Pettersen <snowp@squareup.com>
…voyproxy#8499) Adds a configuration option that will convert all header keys into Proper-Case. This is useful to allow Envoy to respond with headers that match the casing of other systems, to ensure that the wire format of responses is unchanged when migrating to Envoy. Fixes envoyproxy#8463 Signed-off-by: Snow Pettersen <snowp@squareup.com>
|
So how to configure the custom header format, for example, I want to get this format header: Sec-WebSocket-Accept
not
Sec-Websocket-Accept |
Adds a configuration option that will convert all header keys into
Train-Case. This is useful to allow Envoy to respond with headers
that match the casing of other systems, to ensure that the wire format
of responses is unchanged when migrating to Envoy.
Signed-off-by: Snow Pettersen snowp@squareup.com
Risk Level: Mediun
Testing: UTs
Docs Changes: Proto comments
Release Notes: n/a
Fixes #8463