Skip to content

Backwards Compatibility issue: remove chunked encoding support from WebOb (unless explicitly flagged by WSGI server) #279

@digitalresistor

Description

@digitalresistor

The problem with loosely defined specs

Unfortunately the WSGI spec does not have a good way to allow Chunked Encoding if you read the spec loosely. This is an issue because to support it you need to read all of the "should"'s as "must".

That would have been the correct thing to do anyway, but alas we are stuck with what we've got for now.

For example, to showcase the issue:

from wsgiref.simple_server import make_server

def hello_world_app(environ, start_response):
    print('before read...')
    environ['wsgi.input'].read()
    print('after read...')

    status = '200 OK'  # HTTP Status
    headers = [('Content-type', 'text/plain; charset=utf-8')]  # HTTP Headers
    start_response(status, headers)

    # The returned object is going to be printed
    return [b"Hello World"]

httpd = make_server('', 8000, hello_world_app)
print("Serving on port 8000...")

# Serve until process is killed
httpd.serve_forever()

With:

curl -X POST http://localhost:8000/

Will cause the server to hang forever as long as the client keeps the connection open.

  1. the WSGI PEP's specify that env['wsgi.input'] should be file like (i.e. have read/readline) (See PEP-3333)
  2. A server should allow read() to be called without an argument, and return the remainder of the client's input stream.
  3. A server should return empty bytestrings from any attempt to read from an empty or exhausted input stream.

See: https://www.python.org/dev/peps/pep-3333/#input-and-error-streams

However, wsgiref will pass the underlying file descriptor for the socket the client is connected to through to wsgi.input in the environ.

This means when you call .read() without a CONTENT_LENGTH, which may happen if the client is using Chunked Encoding (and WebOb tries its best to support that), then you and up hanging forever.

Good servers like waitress don't do this, and we can always call read() on wsgi.input without causing any issues, however it's not guaranteed.

Proposal

Choice number One (arguably the most correct choice)

Currently WebOb tries to be too smart for it's own good and that gets it into trouble...

I'd like to turn back the clock, remove any and all support for Chunked Encoding unless there is a flag in the environment that states it is supported. #278 is one such proposal. mod_wsgi has another way, potentially add some helper functions that find/validate the environment is right and then automatically enable Chunked Encoding. However if the server doesn't support it, then Chunked Encoding is completely dropped and the only way that WebOb will read from wsgi.input will be if there is a CONTENT_LENGTH set in the environment.

There have been previous reported issues with wsgiref's WSGI implementation:
#233, #116

Choice number Two

We completely ignore wsgiref, remove the limitations that were added in https://bitbucket.org/ianb/webob/issues/6 (ee7f027) and simply call .read() unconditionally if there is no CONTENT_LENGTH to limit us to.

I don't know of a good way of verifying if we are running under wsgiref or not to warn users of WebOb though.

Issues

Chunked encoding is apparently being used more and more by mobile clients, versus your run off the mill browser, but I imagine it has other uses too. While I am loathe to remove support for something, I do believe it should be better implemented. I'd like to allow a request body on all request methods (spec says it's allowed, just doesn't have any defined semantics) but I don't want to break WebOb on servers that don't implement PEP-3333 fully.

While wsgiref is the easiest to point out as being faulty, I am not sure choice number two is a good idea simple because there may be other WSGI servers that are not spec compliant...

I'd love feedback on this, and ideas on how to move this forward.


Related: #274, #233, #116, #278

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions