Skip to content

#12108 Speed up the twisted.web HTTP client#12109

Merged
itamarst merged 28 commits intotrunkfrom
12108-speed-up-twistedweb-http-client
Mar 15, 2024
Merged

#12108 Speed up the twisted.web HTTP client#12109
itamarst merged 28 commits intotrunkfrom
12108-speed-up-twistedweb-http-client

Conversation

@itamarst
Copy link
Contributor

@itamarst itamarst commented Mar 6, 2024

Scope and purpose

Fixes #12108
Fixes #12112

Why this is faster:

  • Minor reductions in general effort.
  • Only calculate Failure.parents on demand, since it's expensive and often not used.
  • Failure.__init__ does a huge amount of unnecessary work for simple cases, so avoided it with alternative constructor. The constructor results in different pickling behavior, because of subclassing BaseException it seems!!!! See https://gist.github.com/itamarst/4e984bddef23a96619b37bfb929e5b24

I run my benchmark script with hyperfine to reduce noise from dictionary ordering randomness and the like.

Before:

$ hyperfine --warmup 1 "python http_client_benchmark.py"
Benchmark 1: python http_client_benchmark.py
  Time (mean ± σ):      3.992 s ±  0.056 s    [User: 3.979 s, System: 0.012 s]
  Range (min … max):    3.917 s …  4.118 s    10 runs

After:

$ hyperfine --warmup 1 "python http_client_benchmark.py"
Benchmark 1: python http_client_benchmark.py
  Time (mean ± σ):      3.266 s ±  0.032 s    [User: 3.253 s, System: 0.012 s]
  Range (min … max):    3.222 s …  3.336 s    10 runs

ORIGINAL APPROACH, NOW OBSOLETE

Important: This relies on Failure objects being effectively immutable, which I think is the case but should be something to consider when reviewing.

I run my benchmark script with hyperfine to reduce noise from dictionary ordering randomness and the like.

Before:

$ hyperfine --warmup 1 "python http_client_benchmark.py"
Benchmark 1: python http_client_benchmark.py
  Time (mean ± σ):      3.992 s ±  0.056 s    [User: 3.979 s, System: 0.012 s]
  Range (min … max):    3.917 s …  4.118 s    10 runs

After:

$ hyperfine --warmup 1 "python http_client_benchmark.py"
Benchmark 1: python http_client_benchmark.py
  Time (mean ± σ):      3.022 s ±  0.025 s    [User: 3.005 s, System: 0.016 s]
  Range (min … max):    2.982 s …  3.051 s    10 runs

Test script:

from twisted.web._newclient import HTTP11ClientProtocol, Request
from twisted.internet.testing import StringTransport
from twisted.web.http_headers import Headers
import time

RESPONSE = """HTTP/1.1 200 OK
Host: blah
Foo: bar
Gaz: baz
Content-length: 3

abc""".replace("\n", "\r\n").encode("utf-8")

start = time.time()
for i in range(100_000):
    protocol = HTTP11ClientProtocol()
    protocol.makeConnection(StringTransport())
    request = Request(b"GET", b"/foo/bar", Headers({b"Host": [b"example.com"]}), None)
    d = protocol.request(request)
    protocol.dataReceived(RESPONSE)
    l = []
    d.addCallback(l.append)
    assert l
print("requests/sec:", 100_000 / (time.time() - start))

Follow-up PRs will:

  • Optimize http_headers.py, which does a lot of repetitive and unnecessary work.
  • Possibly split up the identity transfer encoding into two variants, one with a fixed length and one without.

Contributor Checklist:

This process applies to all pull requests - no matter how small.
Have a look at our developer documentation before submitting your Pull Request.

Below is a non-exhaustive list (as a reminder):

  • The title of the PR should describe the changes and starts with the associated issue number, like “#9782 Remove twisted.news. #1234 Brief description”.
  • A release notes news fragment file was create in src/twisted/newsfragments/ (see: Release notes fragments docs.)
  • The automated tests were updated.
  • Once all checks are green, request a review by leaving a comment that contains exactly the string please review.
    Our bot will trigger the review process, by applying the pending review label
    and requesting a review from the Twisted dev team.

@itamarst itamarst linked an issue Mar 6, 2024 that may be closed by this pull request
@itamarst
Copy link
Contributor Author

itamarst commented Mar 6, 2024

please review

@chevah-robot chevah-robot requested a review from a team March 6, 2024 16:15
Copy link
Member

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the changes.

For now, I just left this is a comment, and put back the team for review.

Happy to see performance improvements in the HTTP client.

I left a few comments inline.

As you commented in the description of the PR, my biggest concern is related to mutating the failure instance.


I remember seing instances in which a failure is mutated, but I don't remember the exact code or the ticket/issue number.

I think that for the purpose of the current code, it's ok to this change.

I expect that at least some downstream application/library that relies on Twisted is mutating failures

I think I have some places in my applications in which I mutate Failure.value. This was done for convenience...but I think the right thing to do is to have someting like newFailure = oldFailure.copy(Exception('new exception')) in which the traceback is kept.

I am doing this type of things mostly in high level application code.
The error messsage text generated by Twisted is not good enought for end users... but I still want to keep the traceback for troubleshoting


Maybe we can create a new FrozenFailure class and use it here.
The FrozenFailure will raise an error is someone tries to mutate it.

@adiroiban adiroiban requested a review from a team March 6, 2024 19:58
@itamarst
Copy link
Contributor Author

itamarst commented Mar 6, 2024

It's not clear to me there is a traceback in this case, since it's directly creating exceptions. Will validate tomorrow.

@adiroiban
Copy link
Member

It's not clear to me there is a traceback in this case, since it's directly creating exceptions. Will validate tomorrow.

You are right.
There is no traceback here, so traceback is not an issue.

Still, maybe it's a good idea to have different text messages, depending on the place from where this failure is returned.

@itamarst
Copy link
Contributor Author

itamarst commented Mar 11, 2024

So, the main issue is mutable Failure. FrozenFailure is certainly possible, but... it'd have to make the exception frozen somehow, when it's not clear how. So a generic version is hard. And any mutable exception attributes can be mutated even if __setattr__ is disabled... so a generic version seems difficult.

On the other hand, here it's just ResponseDone and ConnectionDone which don't have mutable attributes, really, and oughtn't be mutated. So might be doable in this case.

@itamarst
Copy link
Contributor Author

Actually, there's a number of places that just assume immutable exceptions already, e.g. the CONNECTION_DONE constant, so maybe just read-only Failure is sufficient.

@itamarst
Copy link
Contributor Author

OK, added a Failure subclass.

@glyph
Copy link
Member

glyph commented Mar 11, 2024

OK, added a Failure subclass.

This is probably fine, but thinking about it does send a chill down my spine.

I don't think that an immutable failure subclass is actually substitutable, or thereby compliant with the letter of the compatibility policy, as it stands. I'd kinda rather make all Failures be immutable if that is practical, and initially enforce it at type-time, which is not covered by the compatibility policy?

@exarkun
Copy link
Member

exarkun commented Mar 12, 2024

OK, added a Failure subclass.

This is probably fine, but thinking about it does send a chill down my spine.

cough CopyableFailure cough

I don't think that an immutable failure subclass is actually substitutable, or thereby compliant with the letter of the compatibility policy, as it stands. I'd kinda rather make all Failures be immutable if that is practical, and initially enforce it at type-time, which is not covered by the compatibility policy?

It's certainly not. I think only enforcing it at type-time is probably good enough. There will never be a 100% immutable Python object. If the type checker tells someone they're not allowed to change something and they go ahead and run code that changes it anyway, that seems like a choice that's up to them.

Ideally, there'd be a generic like typing.Readonly - as in, Readonly[Failure]. I don't think there is, but a ReadonlyFailure protocol that exposes all of Failure's public attributes, but only for reading is still simple, just more typing. Then I believe the definition of freeze is:

    def freeze(self) -> ReadonlyFailure: return self

@exarkun
Copy link
Member

exarkun commented Mar 12, 2024

(This PR also seems like a weird place to be solving this 24 year old design mistake)

@itamarst
Copy link
Contributor Author

itamarst commented Mar 12, 2024

python/typing#1123 has an example of how to define a Protocol with read-only attributes. But actually these days you can just use Final, which is simpler.

@itamarst
Copy link
Contributor Author

itamarst commented Mar 12, 2024

Given the Failure object is going to end up in places, e.g. Protocol.connectionLost for a body protocol of a Response, using typing to enforce read-only-ness might end up going pretty far in terms of changing the API.

@glyph
Copy link
Member

glyph commented Mar 12, 2024

cough CopyableFailure cough

Okay, perhaps a comparable amount of chills, but it is a proper Liskov substitute ;)

@glyph
Copy link
Member

glyph commented Mar 12, 2024

Then I believe the definition of freeze is:

My suggestion would be to just have Failure itself be frozen by default, because:

  1. Mutating a failure is a pretty weird operation that I would guess rarely happens?
  2. Any code doing the mutating would continue to work at runtime, unless it mutates one of these shared instances (but what mutation would such code be making, I wonder?) in which case behavior would change, but not in a technically incompatible way.
  3. Any code doing the mutating would become a type error, which seems like a reasonable way for users to catch it and fix things.

If my first assumption is wrong (this would not affect much code) then we should probably do another approach. Although then I'd really want to know what the common failure-mutation patterns are.

Copy link
Member

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes.

I only have a minor comment regarding the release notes wording.

Based on the latest discussions, I think that we can merge this PR as it is.

Also, I am also ok with merging this code with a standard Failure, just like it was in the first review request.

I have approved this PR, but maybe wait one day to receive more feedback from Jean-Paul and Glyph.


(This PR also seems like a weird place to be solving this 24 year old design mistake)

We don't have to make big API changes in this PR.
I think it is still usefull to discuss this issue.

I was not able to find an existing ticket/GitHub Issue discussing the read only design for Failure.

I guess we should create a new one now :)

I'd kinda rather make all Failures be immutable if that is practical,

I think that such a change should be done in a separate PR.

So my suggestion for this PR:

  • Create a new GitHub issue dedicated to the Failure read-only design (or find the existing one, if such an issue was already raised)
  • Commit this PR using using _FrozenFailure and add a note to remove it once the Failure read-only changes are made

Thanks again!

@itamarst
Copy link
Contributor Author

There is one existing Failure API that mutates it, https://docs.twisted.org/en/stable/api/twisted.python.failure.Failure.html#cleanFailure

@itamarst itamarst marked this pull request as ready for review March 14, 2024 12:54
@chevah-robot chevah-robot requested a review from a team March 14, 2024 12:54
@itamarst
Copy link
Contributor Author

please review

@itamarst itamarst requested a review from adiroiban March 14, 2024 12:54
@itamarst
Copy link
Contributor Author

MyPy failures don't appear to have anything to do with this branch...

@glyph
Copy link
Member

glyph commented Mar 14, 2024

MyPy failures don't appear to have anything to do with this branch...

Confirmed, fresh trunk checkout gives similar errors here. Investigating…

@glyph
Copy link
Member

glyph commented Mar 14, 2024

MyPy failures don't appear to have anything to do with this branch...

Confirmed, fresh trunk checkout gives similar errors here. Investigating…

filed #12113 to track and discuss

@glyph
Copy link
Member

glyph commented Mar 14, 2024

@itamarst an attempted fix is here #12115

@glyph
Copy link
Member

glyph commented Mar 14, 2024

OK, fix attempt seems to have worked, hopefully that can unblock this

@glyph
Copy link
Member

glyph commented Mar 15, 2024

Merging in the approved fix, so maybe we can get a successful run.

Copy link
Member

@glyph glyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some coding-standard compliance suggestions

typeInfo = failureDict["type"]
failureDict["type"] = type(typeInfo["__name__"], (), typeInfo)
f.__dict__ = failureDict
f.__setstate__(failureDict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this API used ?

In src/twisted/logger/__init__.py I can see only these as public API

from ._json import (
    eventAsJSON,
    eventFromJSON,
    jsonFileLogObserver,
    eventsFromJSONLogFile,
)

and I don't see any test for this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests were failing without this, so I think so.

tb = tb.tb_next

@property
def parents(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that in trunk we alrady have Failure.parents as public API and without documentation.

Can we add a few words about Failure.parents in this PR?

The documentation should go in Failure.__doc__ and not here... but I can't leave a comment there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an implementation detail, it's better if people don't use it. The fact it's public is a historic quirk...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info.

If this should be private, then maybe we need a followup issue to mark it private and raise a deprecation warning for any public usage of the API ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failure.__getstate__ is ignored by pickle Speed up twisted.web HTTP client

6 participants