Eliminate use of `cgi` by twm · Pull Request #377 · twisted/treq

twm · 2024-01-02T04:54:19Z

Stop using cgi, which was deprecated in Python 3.11 and dropped in Python 3.13. Replace it with the multipart package, which becomes a dependency.

Fixes #355.

glyph · 2024-01-02T05:20:32Z

Does vendoring a stdlib function have license implications? I'd rather deal with the possibility of tinkering in the email module than copy the PSF license.

twm · 2024-01-02T05:23:19Z

@glyph I thought we've vendored stuff from the stdlib in Twisted before... is that not the case?

twm · 2024-01-02T05:59:49Z

Okay, I switched to using multipart for everything.

adiroiban

Thanks for the update.

I just left a very generic review.
I have never used treq so far.

It looks good. Only minor comments.

I prefer stdlib here... but multipart is also ok.

Regards

adiroiban · 2024-01-02T12:12:08Z

src/treq/content.py

    # This seems to be the choice browsers make when encountering multiple
    # content-type headers.
-    content_type, params = cgi.parse_header(content_types[-1])
+    media_type, params = multipart.parse_options_header(content_types[-1])


More of a FYI.

In Twisted we/I went for stdlib usage https://github.com/twisted/twisted/pull/12016/files#diff-3093a8f64dec64c2305f6322f276a5cee1437f0037efc660ce6524ffa58017a1R229-R234

My reasoning for using stdlib, is that this is quite common usage and if there is a bug, it's best to have it fixed in stdlib rather than rely on 3rd party packages.

My reasoning for using stdlib, is that this is quite common usage and if there is a bug, it's best to have it fixed in stdlib rather than rely on 3rd party packages.

I'm curious what experience this is based on. In my experience, known bugs often remain in the stdlib for years (perhaps because there are relatively many barriers to landing a change in the stdlib, as well as the fact that the stdlib can only change with a release which happens relatively infrequently - plus such a release can not fix the issue for already-released versions of python).

If there is a third-party module of reasonable quality that is practical to use then I would generally prefer this over something from the stdlib.

It depends on the source of the 3rd-party module. More dependencies = more attack surface, both in the literal sense (more people whose PyPI upload credentials can ruin your day), and in the metaphoric sense of unanticipated changes (what happens if a dependency goes AGPL3).

In this sense "the stdlib" or any existing dependency are roughly equivalent, and I'd probably prefer to depend more on our existing deps. Taking on a new dependency ought to give us a moment of pause, but really only a moment, especially if we know the maintainers or it's sufficiently widely used that we are not entirely on the hook for due diligence on its continued maintenance.

In any case this is a blocker to 3.13 support and the dependency is only for tests, so we should probably not block on this minor concern :)

To be clear, the dependency here is not just for tests. In 0a3d17b I switched to using the equivalent of cgi.parse_header() provided by multipart to avoid vendoring code under the PSF license as that seemed to give @glyph pause.

I picked multipart because it has a few familiar faces attached, e.g. @cjwatson, who has PyPI upload permissions. It is hosted under an individual's GitHub account rather than an org, so that's a downside.

I also considered pulling the relevant bits of cgi into a separate package and putting it on PyPI. I'd be happy to do that under the Twisted org if that's preferable to an external dependency.

I can only echo @exarkun's skepticism of using the stdlib for this. The stdlib has an anti-bytes bias that is often incorrect and has led to a lot of makework for us in Twisted. The cgi module has seen some baffling refactors, then deprecation and removal. Now we're told to use the email package, which isn't explicitly implementing the HTTP RFCs. Seems risky. Also, it was recently rewritten! It's possible for code to be too maintained.

adiroiban · 2024-01-02T12:13:04Z

src/treq/test/test_multipart.py


 from io import BytesIO

+from multipart import MultipartParser  # type: ignore


it's ok to use multipart in the testing code, to double check the implementation... even if the code under tests would have use stdlib.

For what it's worth, the cgi docs suggest the multipart package:

Deprecated since version 3.11, will be removed in version 3.13: This function, like the rest of the cgi module, is deprecated. It can be replaced with the functionality in the email package (e.g. email.message.EmailMessage/email.message.Message) which implements the same MIME RFCs, or with the multipart PyPI project.

While it suggests the email package too, I do not think that the email package is representative of real-world multipart/form-data parsers, in particular because it targets email-dialect MIME that involves nesting never seen in HTTP. (As just one example, Django's multipart parser doesn't support nesting. I guarantee no browser ever generates it.)

A more ideal test suite would test against a variety of real-world multipart parsers, but I don't think that'd have great ROI.

adiroiban · 2024-01-02T12:13:52Z

src/treq/test/test_content.py

+        self.assertIsInstance(resource.request_finishes[0].value, ConnectionDone)
+
+
+class EncodingFromHeadersTests(unittest.TestCase):


maybe add a docstring to describe the criteria for grouping tests into this class.
This should help future devs know if they should add a new tests here or add it somewhere else.

Sure! In the future it's fine to remind me of the Twisted coding standard, no need to write so much. :)

twm

Thanks for the review @adiroiban! I'll give a little time for folks to discuss the third-party dependency situation.

twm · 2024-01-03T05:54:27Z

src/treq/content.py

    # This seems to be the choice browsers make when encountering multiple
    # content-type headers.
-    content_type, params = cgi.parse_header(content_types[-1])
+    media_type, params = multipart.parse_options_header(content_types[-1])


To be clear, the dependency here is not just for tests. In 0a3d17b I switched to using the equivalent of cgi.parse_header() provided by multipart to avoid vendoring code under the PSF license as that seemed to give @glyph pause.

I picked multipart because it has a few familiar faces attached, e.g. @cjwatson, who has PyPI upload permissions. It is hosted under an individual's GitHub account rather than an org, so that's a downside.

I also considered pulling the relevant bits of cgi into a separate package and putting it on PyPI. I'd be happy to do that under the Twisted org if that's preferable to an external dependency.

I can only echo @exarkun's skepticism of using the stdlib for this. The stdlib has an anti-bytes bias that is often incorrect and has led to a lot of makework for us in Twisted. The cgi module has seen some baffling refactors, then deprecation and removal. Now we're told to use the email package, which isn't explicitly implementing the HTTP RFCs. Seems risky. Also, it was recently rewritten! It's possible for code to be too maintained.

twm · 2024-01-03T05:54:53Z

src/treq/test/test_content.py

+        self.assertIsInstance(resource.request_finishes[0].value, ConnectionDone)
+
+
+class EncodingFromHeadersTests(unittest.TestCase):


Sure! In the future it's fine to remind me of the Twisted coding standard, no need to write so much. :)

twm · 2024-01-03T06:08:52Z

src/treq/test/test_multipart.py


 from io import BytesIO

+from multipart import MultipartParser  # type: ignore


For what it's worth, the cgi docs suggest the multipart package:

Deprecated since version 3.11, will be removed in version 3.13: This function, like the rest of the cgi module, is deprecated. It can be replaced with the functionality in the email package (e.g. email.message.EmailMessage/email.message.Message) which implements the same MIME RFCs, or with the multipart PyPI project.

While it suggests the email package too, I do not think that the email package is representative of real-world multipart/form-data parsers, in particular because it targets email-dialect MIME that involves nesting never seen in HTTP. (As just one example, Django's multipart parser doesn't support nesting. I guarantee no browser ever generates it.)

A more ideal test suite would test against a variety of real-world multipart parsers, but I don't think that'd have great ROI.

adiroiban · 2024-01-05T11:40:25Z

Thanks for the feedback. Much appreciated!

I have approved the PR and I am ok with using multipart for treq.

My comment about multipart vs stdlib was more of a FYI for what was done in twisted/twisted to fix this issue.

Also, I remember some discussions for getting treq code include in Twisted and have it as a high-level API .

From what I can see in this PR, multipart is used in "production" only to parse the Content-Type header. I was hoping that HTTP and MIME have the same standard for Content-Type header format.

I think that HTTP handling should be a core feature of any library, and this is why I would prefer to have http code implemented in stdlib.

For example, in Twisted, HTTP protocol handling is part of main Twisted package, while for LDAP we have a separate package.

Something similar for stdlib. I would expect Python to have a good enough HTTP handing, and use a separate package for LDAP handling.

opoplawski · 2024-06-15T21:20:25Z

Any chance we could get a new release with this fix? Fedora rawhide just updated to Python 3.13b2 and treq is failing to build. This PR doesn't apply cleanly to 23.11.0. Thanks.

adiroiban · 2024-06-15T21:23:28Z

I think that first we should fix the 3.13 support in twisted/twisted

stratakis · 2024-06-18T13:24:27Z

Any chance we could get a new release with this fix? Fedora rawhide just updated to Python 3.13b2 and treq is failing to build. This PR doesn't apply cleanly to 23.11.0. Thanks.

Unfortunately on Fedora we don't package multipart, but python-multipart which is a different project. And while not entirely sure of the stability of each project, the multipart used here looks abandoned upstream (no commits since 2021).

cjwatson · 2024-06-18T14:04:51Z

multipart may not be super-active, but it isn't abandoned. I just released 0.2.5.

stratakis · 2024-06-18T14:26:09Z

multipart may not be super-active, but it isn't abandoned. I just released 0.2.5.

That is awesome, thanks for the response here :). I'll work on getting it packaged for Fedora then.

twm · 2024-06-27T20:55:56Z

@opoplawski Thanks for the ping! I'm currently a bit loopy on cough medicine, but I should be able to manage a release this weekend or next week.

twm added 6 commits January 1, 2024 21:05

Eliminate use of cgi.parse_multipart()

431173d

Vendor cgi.parse_header()

c1c1dc4

Eliminate use of cgi.parse_header()

b4fc499

Add change fragment

cd4c7f8

Reject empty quoted charset

5cc43b8

Make MyPy happy

852e343

twm force-pushed the bye-cgi-355 branch from bec1ce2 to 852e343 Compare January 2, 2024 05:05

Fix Python 3.7 compat

e03348f

twm requested a review from a team January 2, 2024 05:28

Avoid vendoring anything

0a3d17b

twm force-pushed the bye-cgi-355 branch from d8c8ccd to 0a3d17b Compare January 2, 2024 05:57

adiroiban approved these changes Jan 2, 2024

View reviewed changes

twm commented Jan 3, 2024

View reviewed changes

Merge branch 'trunk' into bye-cgi-355

c2210ca

glyph merged commit 8ccd453 into trunk Jan 6, 2024

glyph deleted the bye-cgi-355 branch January 6, 2024 01:51

pmisik mentioned this pull request Sep 18, 2024

Support for Python 3.13 buildbot/buildbot#7631

Closed

14 tasks


		from io import BytesIO

		from multipart import MultipartParser # type: ignore

		self.assertIsInstance(resource.request_finishes[0].value, ConnectionDone)


		class EncodingFromHeadersTests(unittest.TestCase):

Conversation

twm commented Jan 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glyph commented Jan 2, 2024

Uh oh!

twm commented Jan 2, 2024

Uh oh!

twm commented Jan 2, 2024

Uh oh!

adiroiban left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glyph Jan 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

twm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adiroiban commented Jan 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

opoplawski commented Jun 15, 2024

Uh oh!

adiroiban commented Jun 15, 2024 • edited by glyph Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stratakis commented Jun 18, 2024

Uh oh!

cjwatson commented Jun 18, 2024

Uh oh!

stratakis commented Jun 18, 2024

Uh oh!

twm commented Jun 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

twm commented Jan 2, 2024 •

edited

Loading

glyph Jan 2, 2024 •

edited

Loading

adiroiban commented Jan 5, 2024 •

edited

Loading

adiroiban commented Jun 15, 2024 •

edited by glyph

Loading