fix performance issue with ResponseStream wrapper#6937
Merged
Conversation
bentsku
approved these changes
Sep 26, 2022
Contributor
bentsku
left a comment
There was a problem hiding this comment.
LGTM! 🚀 Thanks a lot for fixing this deep bug, great catch. Super excited that we can soon see the results of the S3 migration. Awesome! 👍
21 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes a performance issue that occurred with AWS responses that have streaming bodies. The change exposes the underlying iterator of
Responsedirectly through the_ResponseStreamwrapper, so invocations of the formfor item in streamno longer go through the pointlessread_intoabstraction.Detailed explanation
This issues surfaced with the new S3 provider, where
get_objectdownload would take exponentially longer the larger the body. When bodies are streamed, we wrap the response in a_ResponseStreamobject that exposes the response as a readable IO objectlocalstack/localstack/aws/client.py
Line 150 in 68c507e
The issue is that the
_ResponseStreamis invoked in the formfor item in stream. Through the RawIOBase abstraction, this creates an iterator by looping over calls toread_intowith a byte buffer of size 1.The culprit is this line in particular:
localstack/localstack/aws/client.py
Line 37 in 68c507e
Since the underlying response is one big blob in memory,
chunkwill actually contain the entire s3 object. Consequently, we're always popping a single byte off the entire chunk to populate the response, and copying the remaining buffer. This will take longer depending on how big chunk is, which is the exponential runtime behavior we were seeing.After the fix, read_into might actually be dead code now, since the underlying webserver always consumes the response stream using the
for data in streampattern. However,read_intois still necessary for theRawIOBaseimplementation, so I added tests to cover it.Reproduce
Here's a simple benchmark. On my machine this takes