Skip to content

12412 allow opting out of parsing of multipart and form-encoding upload parsing in twisted.web#12413

Merged
itamarst merged 13 commits intotrunkfrom
12412-allow-opting-out-of-parsing-of-mime-upload-parsing-in-twistedweb
Jan 10, 2025
Merged

12412 allow opting out of parsing of multipart and form-encoding upload parsing in twisted.web#12413
itamarst merged 13 commits intotrunkfrom
12412-allow-opting-out-of-parsing-of-mime-upload-parsing-in-twistedweb

Conversation

@itamarst
Copy link
Contributor

@itamarst itamarst commented Jan 3, 2025

Scope and purpose

Fixes #12412

Gives users the ability to parse HTTP body uploads themselves without having to pay the costs of twisted.web's parsing of bodies.

@itamarst itamarst linked an issue Jan 3, 2025 that may be closed by this pull request
@codspeed-hq
Copy link

codspeed-hq bot commented Jan 3, 2025

CodSpeed Performance Report

Merging #12413 will degrade performances by 2.87%

Comparing 12412-allow-opting-out-of-parsing-of-mime-upload-parsing-in-twistedweb (3564bce) with trunk (30a679a)

Summary

❌ 1 regressions
✅ 33 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark trunk 12412-allow-opting-out-of-parsing-of-mime-upload-parsing-in-twistedweb Change
test_http11_server_chunked_request 1.8 ms 1.8 ms -2.87%

@itamarst itamarst marked this pull request as ready for review January 3, 2025 16:12
@chevah-robot chevah-robot requested a review from a team January 3, 2025 16:12
Copy link
Member

@glyph glyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request's biggest problem is a sprawling public interface with waaay too many attributes, and I would really prefer that we avoid adding even more of those; particularly not mutable public attributes plus class vars which may or may not be honored in specific parts of the lifecycle of the request and may therefore represent inaccurate values depending on when they are consulted.

There is also an inherent problem here where only the code that configures the request at the top of the request hierarchy specifies the parseBody configuration, and resources lower down in the tree may or may not be able to function correctly with an empty args during a post request, creating a sort of silent-failure type error.

That said… doing something here without designing a completely new fancy interface that solves every problem at once will at least get us un-stuck.

The @type annotation on twisted.web.http.Request says that args is unconditionally a dict, but, technically that has always been a lie; args starts as None and only takes on a non-None value at a specific point in the lifecycle.

So, we can weasel our way into a change which:

  • is technically compatible
  • looks a lot like what you've written here
  • doesn't blow up in scope
  • doesn't create silent failure conditions for resources that handle POSTs

by:

  • accepting parseBody as a constructor argument and only a constructor argument; storing the state privately
  • adding a for-real type annotation to args to note that it may be (and may always have been) None
  • documenting when args might be None
  • making the official check for "you didn't parse any arguments" to be request.args is None

This does still leave an edge case open, for code that looks like this:

def render_POST(self, request):
    if not request.args:
        deleteEverything()
    else:
        doUpdates(request.args)

I think that is probably fine since it was technically already broken in some weird edge cases and you will not opt yourself into this behavior by accident; somebody has to pass the new argument explicitly to get the new behavior. The other option here is to make it simply incompatible and delete the args attribute entirely in this situation, which I would also be fine with but may be a bit too belt-and-suspenders a change.

@itamarst
Copy link
Contributor Author

itamarst commented Jan 4, 2025

I will get back to this next week, but... pretty sure request.args also gets values from the URL query string or whatever it's called (?a=b). So I'm not sure any assumptions about its contents can be made that rely only on body parsing.

@itamarst
Copy link
Contributor Author

itamarst commented Jan 4, 2025

And I don't think "breaking child Resources" is always a problem. It's a big switch, but the main use case would be applications that control all their Resources (Tahoe-LAFS, Synapse).

@glyph
Copy link
Member

glyph commented Jan 4, 2025

And I don't think "breaking child Resources" is always a problem. It's a big switch, but the main use case would be applications that control all their Resources (Tahoe-LAFS, Synapse).

I think, on balance, that this is a worthwhile tradeoff to start getting some forward motion on this perpetual morass. But I also really don't like that these applications that "control" all their own resources will have no warning if something is broken. The type checker can't tell you, you don't get an error at runtime, you just find that parameters are mysteriously missing when they were previously requested.

The ideal solution here would be to have a marker interface that would let you indicate that you take something other than IRequest, then have the traversal code notice if you return a legacy object and populate args. But that is probably a ton more work, and unlikely to impact any practical applications.

@adiroiban
Copy link
Member

I am also +1 for doing something here.

Maybe we can have this implementes using a private attribute that is set by the request fatory.
For example via a new t.w.server.SiteRoot class that has a custom requestFactory that will set this flag.

Ultimately, I think that the ideal API is to allow setting these flags via your own Resource implementations.

We currently have Resource.isLeaf. I hope we can have something like ResourceGen2.haveParseBody that will select the desire behaviour.


I already have something like this implemented in my fork of twisted.web. What I am doing is:

  • Create a new class StreamingSite(http.HTTPFactory):
  • The StreamingSite has a custom requestFactory
  • The custom requestFactory sets various flags for the forked Request class.
  • The getResourceFor is rewritten to do traversal only based on request headers with help from the forked Request.
  • Another change is that the StreamingSite, in comparison to the default t.w.server.Site , does not write a log file.

Copy link
Member

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks good.

I think that this is a good step toward improving the Twisted HTTP server.

I think that is best to keep the Request._parseBody API private and only expose the API via the Site.

Thanks again!

@itamarst
Copy link
Contributor Author

itamarst commented Jan 9, 2025

OK, this is basically the same design (all or nothing flag, no indication that it has been set unfortunately) but with no public attributes, just additional constructor parameters.

@glyph glyph removed request for adiroiban and glyph January 9, 2025 21:06
@glyph
Copy link
Member

glyph commented Jan 9, 2025

please review

@chevah-robot chevah-robot requested a review from a team January 9, 2025 21:06
@glyph
Copy link
Member

glyph commented Jan 9, 2025

no indication that it has been set unfortunately

I think a follow-up PR for a read-only attribute, or at least some way to determine if it's been set (given that we can't make .args be None, as previously discussed) might be worthwhile. But we can shed that bike when we come to it.

Copy link
Member

@glyph glyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the last possible second I had the thought that perhaps a better way to address this more flexibly would be to have a bodyParsers argument that actually contained a callable that could do something more pluggable here, and we could have a LEGACY_MULTIPART_PARSER object that did this?

But, in that spirit, and taking a bunch less work, maybe rename parseBody to parseFormSubmitPOSTs or something to highlight its highly-specific semantics?

In any case, I am OK with this as is; please add something to the narrative docs that explains this, but it can be a single paragraph, and merge at your discretion.

@glyph glyph dismissed adiroiban’s stale review January 9, 2025 23:50

Feedback was addressed.

@itamarst
Copy link
Contributor Author

I don't think pluggable callables adds much, Tahoe-LAFS just overrides a method, can continue doing that. And it doesn't really enable an async API in any meaningful way.

@itamarst itamarst merged commit 68cfcb2 into trunk Jan 10, 2025
23 of 24 checks passed
@itamarst itamarst deleted the 12412-allow-opting-out-of-parsing-of-mime-upload-parsing-in-twistedweb branch January 10, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow opting out of parsing of MIME upload parsing in twisted.web

5 participants