Fix/http desync#2181
Conversation
|
@benoitc all yours! |
gunicorn/http/message.py
Outdated
| elif name == "TRANSFER-ENCODING": | ||
| chunked = value.lower() == "chunked" | ||
| normalized_value = value.lower() | ||
| if normalized_value == "identity": |
There was a problem hiding this comment.
Gunicorn is a gateway so it shouldn't have a strong opinion about the transfer encoding. I think it's better to only test if the transfer encoding header is chuned or not.
if value.lower() == "chunked":
chunked = Trueand let it pass for other cases
There was a problem hiding this comment.
This removes the need of UnsupportedTransferEncoding also
There was a problem hiding this comment.
Your proposal can still be attacked by the following desync:
Content-Length: 5
Transfer-Encoding: chunked
Transfer-Encoding: Identity
Most proxy will use the chunked, but we will use the CL.
My understanding of Transfer-Encoding is that its a hop-by-hop header, meaning that we should parse it, use it and strip it before passing the data down the line.
For example, AWS ALB will not accept anything that is not identity or chunked. If you read my improvements section above, I suggest that we do something similar to other WSGI server: parse the request to the best of our ablity and only transfer data to the application using a CL. We do buffer the data in this application anyway and we don't support gzip, compress and deflate.
There was a problem hiding this comment.
reading a bit more about PEP333. It says: WSGI servers must handle any supported inbound "hop-by-hop" headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable. There is a great thread here to read: pallets/flask#367
There was a problem hiding this comment.
Gunicorn has a long history of supporting chunked encoding and last 20 is having the support of wsgi.input_terminated, i'm not inclined to change this actually.
if I understand your concern, in the case of your example
Content-Length: 5
Transfer-Encoding: chunked
Transfer-Encoding: Identity
TE should be Identity and not chunked? Or do we want it to be chunked (which is really weird).
My main concern is to make sure we support the last TE given whatever it is (there are other cases than "Identity" in some apis), so we must not return an error, if it's different.
|
Thanks for the patch! Seems to handle the case. Can you do th erequested changes? It would be good to make a release today. |
This is a first try at fixing #2176
I have looked at multiple server implementation and frankly I have seen a lot of different things especially concerning the TE parsing.
In summary the fixes are:
No whitespace is allowed between the header field-name and colonduplicate Content-Length header fields have been generated or combined by an upstream message processor, then the recipient MUST either reject the message as invalid or replace the duplicated field-values with a single valid Content-LengthidentityandchunkedTransport-EncodingFuture improvements:
Dict[str, List[str]]instead of aList[Tuple[str, str]]for headers