Skip to content

Fix/http desync#2181

Merged
benoitc merged 5 commits intobenoitc:masterfrom
Sytten:fix/http-desync
Nov 20, 2019
Merged

Fix/http desync#2181
benoitc merged 5 commits intobenoitc:masterfrom
Sytten:fix/http-desync

Conversation

@Sytten
Copy link
Copy Markdown
Contributor

@Sytten Sytten commented Nov 19, 2019

This is a first try at fixing #2176
I have looked at multiple server implementation and frankly I have seen a lot of different things especially concerning the TE parsing.
In summary the fixes are:

  • Don't rstrip the names of headers
    • Go server does that
    • RFC 7230 says No whitespace is allowed between the header field-name and colon
  • Only accept one Content-Length
    • Node does that
    • RFC 7230 says duplicate Content-Length header fields have been generated or combined by an upstream message processor, then the recipient MUST either reject the message as invalid or replace the duplicated field-values with a single valid Content-Length
    • It's easier to just deny that case than try to handle it and most likely induce a vulnerability
  • Only accept identity and chunked Transport-Encoding
    • In this implementation, the order does not matter (it probably should). The Go implementation only uses the first value of the header.
    • Seems to be in sync with the behaviour of AWS ALB
    • All other valid (gzip, compress, etc.) and invalid TE will return a 501, since we don't have readers for them I figured this was the right move, but feel free to correct me

Future improvements:

  • Handle the duplicates of headers earlier in the parsing
  • Use a Dict[str, List[str]] instead of a List[Tuple[str, str]] for headers
  • Drop the chunked TE for WSGI to app, this is what Waitress decided to do here and I think it's a good idea since we already buffer the whole thing in the server anyway

@Sytten
Copy link
Copy Markdown
Contributor Author

Sytten commented Nov 19, 2019

@benoitc all yours!

elif name == "TRANSFER-ENCODING":
chunked = value.lower() == "chunked"
normalized_value = value.lower()
if normalized_value == "identity":
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gunicorn is a gateway so it shouldn't have a strong opinion about the transfer encoding. I think it's better to only test if the transfer encoding header is chuned or not.

if value.lower() == "chunked":
  chunked = True

and let it pass for other cases

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes the need of UnsupportedTransferEncoding also

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your proposal can still be attacked by the following desync:

Content-Length: 5
Transfer-Encoding: chunked
Transfer-Encoding: Identity

Most proxy will use the chunked, but we will use the CL.
My understanding of Transfer-Encoding is that its a hop-by-hop header, meaning that we should parse it, use it and strip it before passing the data down the line.
For example, AWS ALB will not accept anything that is not identity or chunked. If you read my improvements section above, I suggest that we do something similar to other WSGI server: parse the request to the best of our ablity and only transfer data to the application using a CL. We do buffer the data in this application anyway and we don't support gzip, compress and deflate.

Copy link
Copy Markdown
Contributor Author

@Sytten Sytten Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading a bit more about PEP333. It says: WSGI servers must handle any supported inbound "hop-by-hop" headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable. There is a great thread here to read: pallets/flask#367

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gunicorn has a long history of supporting chunked encoding and last 20 is having the support of wsgi.input_terminated, i'm not inclined to change this actually.

if I understand your concern, in the case of your example

Content-Length: 5
Transfer-Encoding: chunked
Transfer-Encoding: Identity

TE should be Identity and not chunked? Or do we want it to be chunked (which is really weird).

My main concern is to make sure we support the last TE given whatever it is (there are other cases than "Identity" in some apis), so we must not return an error, if it's different.

@benoitc
Copy link
Copy Markdown
Owner

benoitc commented Nov 20, 2019

Thanks for the patch! Seems to handle the case. Can you do th erequested changes? It would be good to make a release today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants