Conversation
|
Does vendoring a stdlib function have license implications? I'd rather deal with the possibility of tinkering in the email module than copy the PSF license. |
|
@glyph I thought we've vendored stuff from the stdlib in Twisted before... is that not the case? |
|
Okay, I switched to using |
adiroiban
left a comment
There was a problem hiding this comment.
Thanks for the update.
I just left a very generic review.
I have never used treq so far.
It looks good. Only minor comments.
I prefer stdlib here... but multipart is also ok.
Regards
| # This seems to be the choice browsers make when encountering multiple | ||
| # content-type headers. | ||
| content_type, params = cgi.parse_header(content_types[-1]) | ||
| media_type, params = multipart.parse_options_header(content_types[-1]) |
There was a problem hiding this comment.
More of a FYI.
In Twisted we/I went for stdlib usage https://github.com/twisted/twisted/pull/12016/files#diff-3093a8f64dec64c2305f6322f276a5cee1437f0037efc660ce6524ffa58017a1R229-R234
My reasoning for using stdlib, is that this is quite common usage and if there is a bug, it's best to have it fixed in stdlib rather than rely on 3rd party packages.
There was a problem hiding this comment.
My reasoning for using stdlib, is that this is quite common usage and if there is a bug, it's best to have it fixed in stdlib rather than rely on 3rd party packages.
I'm curious what experience this is based on. In my experience, known bugs often remain in the stdlib for years (perhaps because there are relatively many barriers to landing a change in the stdlib, as well as the fact that the stdlib can only change with a release which happens relatively infrequently - plus such a release can not fix the issue for already-released versions of python).
If there is a third-party module of reasonable quality that is practical to use then I would generally prefer this over something from the stdlib.
There was a problem hiding this comment.
It depends on the source of the 3rd-party module. More dependencies = more attack surface, both in the literal sense (more people whose PyPI upload credentials can ruin your day), and in the metaphoric sense of unanticipated changes (what happens if a dependency goes AGPL3).
In this sense "the stdlib" or any existing dependency are roughly equivalent, and I'd probably prefer to depend more on our existing deps. Taking on a new dependency ought to give us a moment of pause, but really only a moment, especially if we know the maintainers or it's sufficiently widely used that we are not entirely on the hook for due diligence on its continued maintenance.
There was a problem hiding this comment.
In any case this is a blocker to 3.13 support and the dependency is only for tests, so we should probably not block on this minor concern :)
There was a problem hiding this comment.
To be clear, the dependency here is not just for tests. In 0a3d17b I switched to using the equivalent of cgi.parse_header() provided by multipart to avoid vendoring code under the PSF license as that seemed to give @glyph pause.
I picked multipart because it has a few familiar faces attached, e.g. @cjwatson, who has PyPI upload permissions. It is hosted under an individual's GitHub account rather than an org, so that's a downside.
I also considered pulling the relevant bits of cgi into a separate package and putting it on PyPI. I'd be happy to do that under the Twisted org if that's preferable to an external dependency.
I can only echo @exarkun's skepticism of using the stdlib for this. The stdlib has an anti-bytes bias that is often incorrect and has led to a lot of makework for us in Twisted. The cgi module has seen some baffling refactors, then deprecation and removal. Now we're told to use the email package, which isn't explicitly implementing the HTTP RFCs. Seems risky. Also, it was recently rewritten! It's possible for code to be too maintained.
|
|
||
| from io import BytesIO | ||
|
|
||
| from multipart import MultipartParser # type: ignore |
There was a problem hiding this comment.
it's ok to use multipart in the testing code, to double check the implementation... even if the code under tests would have use stdlib.
There was a problem hiding this comment.
For what it's worth, the cgi docs suggest the multipart package:
Deprecated since version 3.11, will be removed in version 3.13: This function, like the rest of the cgi module, is deprecated. It can be replaced with the functionality in the email package (e.g. email.message.EmailMessage/email.message.Message) which implements the same MIME RFCs, or with the multipart PyPI project.
While it suggests the email package too, I do not think that the email package is representative of real-world multipart/form-data parsers, in particular because it targets email-dialect MIME that involves nesting never seen in HTTP. (As just one example, Django's multipart parser doesn't support nesting. I guarantee no browser ever generates it.)
A more ideal test suite would test against a variety of real-world multipart parsers, but I don't think that'd have great ROI.
| self.assertIsInstance(resource.request_finishes[0].value, ConnectionDone) | ||
|
|
||
|
|
||
| class EncodingFromHeadersTests(unittest.TestCase): |
There was a problem hiding this comment.
maybe add a docstring to describe the criteria for grouping tests into this class.
This should help future devs know if they should add a new tests here or add it somewhere else.
There was a problem hiding this comment.
Sure! In the future it's fine to remind me of the Twisted coding standard, no need to write so much. :)
twm
left a comment
There was a problem hiding this comment.
Thanks for the review @adiroiban! I'll give a little time for folks to discuss the third-party dependency situation.
| # This seems to be the choice browsers make when encountering multiple | ||
| # content-type headers. | ||
| content_type, params = cgi.parse_header(content_types[-1]) | ||
| media_type, params = multipart.parse_options_header(content_types[-1]) |
There was a problem hiding this comment.
To be clear, the dependency here is not just for tests. In 0a3d17b I switched to using the equivalent of cgi.parse_header() provided by multipart to avoid vendoring code under the PSF license as that seemed to give @glyph pause.
I picked multipart because it has a few familiar faces attached, e.g. @cjwatson, who has PyPI upload permissions. It is hosted under an individual's GitHub account rather than an org, so that's a downside.
I also considered pulling the relevant bits of cgi into a separate package and putting it on PyPI. I'd be happy to do that under the Twisted org if that's preferable to an external dependency.
I can only echo @exarkun's skepticism of using the stdlib for this. The stdlib has an anti-bytes bias that is often incorrect and has led to a lot of makework for us in Twisted. The cgi module has seen some baffling refactors, then deprecation and removal. Now we're told to use the email package, which isn't explicitly implementing the HTTP RFCs. Seems risky. Also, it was recently rewritten! It's possible for code to be too maintained.
| self.assertIsInstance(resource.request_finishes[0].value, ConnectionDone) | ||
|
|
||
|
|
||
| class EncodingFromHeadersTests(unittest.TestCase): |
There was a problem hiding this comment.
Sure! In the future it's fine to remind me of the Twisted coding standard, no need to write so much. :)
|
|
||
| from io import BytesIO | ||
|
|
||
| from multipart import MultipartParser # type: ignore |
There was a problem hiding this comment.
For what it's worth, the cgi docs suggest the multipart package:
Deprecated since version 3.11, will be removed in version 3.13: This function, like the rest of the cgi module, is deprecated. It can be replaced with the functionality in the email package (e.g. email.message.EmailMessage/email.message.Message) which implements the same MIME RFCs, or with the multipart PyPI project.
While it suggests the email package too, I do not think that the email package is representative of real-world multipart/form-data parsers, in particular because it targets email-dialect MIME that involves nesting never seen in HTTP. (As just one example, Django's multipart parser doesn't support nesting. I guarantee no browser ever generates it.)
A more ideal test suite would test against a variety of real-world multipart parsers, but I don't think that'd have great ROI.
|
Thanks for the feedback. Much appreciated! I have approved the PR and I am ok with using My comment about Also, I remember some discussions for getting treq code include in Twisted and have it as a high-level API . From what I can see in this PR, I think that HTTP handling should be a core feature of any library, and this is why I would prefer to have http code implemented in stdlib. For example, in Twisted, HTTP protocol handling is part of main Twisted package, while for LDAP we have a separate package. Something similar for stdlib. I would expect Python to have a good enough HTTP handing, and use a separate package for LDAP handling. |
|
Any chance we could get a new release with this fix? Fedora rawhide just updated to Python 3.13b2 and treq is failing to build. This PR doesn't apply cleanly to 23.11.0. Thanks. |
|
I think that first we should fix the 3.13 support in twisted/twisted |
Unfortunately on Fedora we don't package multipart, but python-multipart which is a different project. And while not entirely sure of the stability of each project, the multipart used here looks abandoned upstream (no commits since 2021). |
|
|
That is awesome, thanks for the response here :). I'll work on getting it packaged for Fedora then. |
|
@opoplawski Thanks for the ping! I'm currently a bit loopy on cough medicine, but I should be able to manage a release this weekend or next week. |
Stop using
cgi, which was deprecated in Python 3.11 and dropped in Python 3.13. Replace it with themultipartpackage, which becomes a dependency.Fixes #355.