-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Optimise push in S3 driver #4066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bf3d505 to
e49e3a1
Compare
corhere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I'm missing something. What's the point of double-buffering if the flush operation is blocking? Is a small final chunk really that big of a deal that it's worth doubling memory usage to avoid it?
Why can't the write loop be as simple as this? Copy to buffer until full, flush buffer, repeat until there is nothing left to copy.
var n int
defer func() { w.size += int64(n) }()
reader := bytes.NewReader(p)
for reader.Len() > 0 {
m, _ := w.buf.ReadFrom(reader)
n += m
if w.buf.Available() == 0 {
if err := w.flushPart(); err != nil {
return n, err
}
}
}
return n, nil
registry/storage/driver/s3-aws/s3.go
Outdated
| closed bool | ||
| committed bool | ||
| cancelled bool | ||
| ctx context.Context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is passing a context into the S3 API methods necessary for optimizing push? Storing a context in a struct is discouraged as it obfuscates the lifetime of the operation that the context is being applied to. I am not totally against doing so (io.WriteCloser falls within the exception criteria) but I would much prefer to have that discussion in its own PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am aware of this indeed, and yes it's sucky; the reason why the context is the writer field is that the writer makes a bunch of S3 API calls that require passing the context to them (unless we opt not to use the *WithContext() SDK methods at the expense of not being able to propagate the cancellations from the driver calls).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be feasible to split the context plumbing improvements out to a follow-up PR? I have a couple things to discuss which would otherwise bog down the review of the push optimization work in this PR. (That reminds me, I really need to finish writing that blog post about contexts...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont mind, happy to rip it out; as for the AWS SDK calls that accept this context as a param, do you prefer dropping the *WithContext() calls or temporarily passing context.TODO() to them with some comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer reverting the calls so they don't show up as changed lines in the PR diff. It's easier to review that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 0463ade
Curious about how you wanna go about this. I'm worried you'll suggest some wild context wrappers 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried you'll suggest some wild context wrappers 🙃
registry/storage/driver/s3-aws/s3.go
Outdated
| // NOTE(milosgajdos): writer maintains its own context | ||
| // passed to its constructor from storage driver. | ||
| func (w *writer) Cancel(_ context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the kind of thing I'm talking about regarding context lifetimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I commented above. Wish we could somehow link related comments other than like this: #4066 (comment)
| parts: parts, | ||
| size: size, | ||
| ready: d.NewBuffer(), | ||
| pending: d.NewBuffer(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered lazily allocating the pending buffer on demand? That'd cut down on memory consumption when writing files small enough to fit into a single chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝️ I see this was a "non-blocking" comment, but curious if you gave this some consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I said in one of my comments that it's something I need to have a think about but would move it to the follow-up. It's a sound suggestion worth considering, indeed.
2f7ce80 to
1fa0272
Compare
|
I've updated the
This is a billion-dollar question that we may want to address in the future 😄 I have done a bit of There is one more comment I need to think about a bit: lazy allocation of the |
acc7e9f to
29c98b3
Compare
corhere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another mystery: why is the storage driver not simply implemented in terms of https://pkg.go.dev/github.com/aws/aws-sdk-go@v1.45.16/service/s3/s3manager#Uploader?
registry/storage/driver/s3-aws/s3.go
Outdated
| closed bool | ||
| committed bool | ||
| cancelled bool | ||
| ctx context.Context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be feasible to split the context plumbing improvements out to a follow-up PR? I have a couple things to discuss which would otherwise bog down the review of the push optimization work in this PR. (That reminds me, I really need to finish writing that blog post about contexts...)
corhere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'll approve once you've squashed the PR. Don't worry about retaining my authorship credit on the suggestions.
I left a couple of optional suggestions for further improving the code. Emphasis on optional.
registry/storage/driver/s3-aws/s3.go
Outdated
| closed bool | ||
| committed bool | ||
| cancelled bool | ||
| ctx context.Context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried you'll suggest some wild context wrappers 🙃
This commit cleans up and attempts to optimise the performance of image push in S3 driver. There are 2 main changes: * we refactor the S3 driver Writer where instead of using separate bytes slices for ready and pending parts which get constantly appended data into them causing unnecessary allocations we use optimised bytes buffers; we make sure these are used efficiently when written to. * we introduce a memory pool that is used for allocating the byte buffers introduced above These changes should alleviate high memory pressure on the push path to S3. Co-authored-by: Cory Snider <corhere@gmail.com> Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
1f93200 to
b888b14
Compare
|
Alright, squashed, PTAL @corhere @thaJeztah @Jamstah |
corhere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I see a couple of minor formatting nits, but I am going to pretend I didn't notice them.
thaJeztah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave this a first glance, and looks great; I left some questions / comments (none really blocking, afaics)
| err = nil | ||
| } | ||
| return offset, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If err is not nil (and not a io.EOF), should offset still be returned, or should it return 0, err ?
If so, then perhaps something like this would be slightly more idiomatic;l
if err != nil && err != io.EOF {
return 0, err
}
return offset, nilThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for len(b.data) < cap(b.data) && err == nil { | ||
| var n int | ||
| n, err = r.Read(b.data[len(b.data):cap(b.data)]) | ||
| offset += int64(n) | ||
| b.data = b.data[:len(b.data)+n] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question; is it worth benchmarking if memoizing len(b.data) and cap(b.data) here would be beneficial (i.e., use an intermediate variable to store those as we're calling it multiple times). Maybe not; just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure it's worth it. Maybe? 🤷♂️ len and cap just return an unexported field from the underlying struct so I'd expect negligible gain here 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; len(b.data) and cap(b.data) are just accessing fields on the slice header which costs the same as any other struct field read. Compiler explorer reveals that the compiler is able to elide bounds-checks for the expression b.data[len(b.data):cap(b.data)] which it cannot do when those values are copied to local variables. Copying the slice itself to a local variable sounds promising as a way to avoid dereferencing pointers inside the loop, but as compiler explorer shows, the generated AMD64 code immediately spills the slice header fields to the stack.
tl;dr introducing intermediate variables will actually deoptimize the function. Compilers are pretty smart these days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sorry by "unexported fields" I meant "fields" and yeah, header is the right terminology. Thanks
| parts: parts, | ||
| size: size, | ||
| ready: d.NewBuffer(), | ||
| pending: d.NewBuffer(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝️ I see this was a "non-blocking" comment, but curious if you gave this some consideration.
| w.pendingPart = nil | ||
|
|
||
| buf := bytes.NewBuffer(w.ready.data) | ||
| if w.driver.MultipartCombineSmallPart && (w.pending.Len() > 0 && w.pending.Len() < int(w.driver.ChunkSize)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like the extra braces are not needed here
if w.driver.MultipartCombineSmallPart && w.pending.Len() > 0 && w.pending.Len() < int(w.driver.ChunkSize) {There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are not needed, no, but I felt like it'd be nice of these were semantically grouped -- it would have been more obvious if I split these on two separate lines; I'm happy to change back I just feel there are too many &&s in a single condition causing cognitive overload 🤷♂️
| p = nil | ||
| // try filling up the pending parts buffer | ||
| offset, err = w.pending.ReadFrom(reader) | ||
| n += int(offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we're calling this in a for; wondering if it would be beneficial to change n to an int64, and only cast to an int when returning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, indeed. I stuck with this at the time. There are a few returns here so I struggle to see the win 🤔 I'm willing to be persuaded though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was curious if benchmarking would show a difference; I just did a quick check, and there's no consistent difference, so not worth changing.
| offset, err = w.pending.ReadFrom(reader) | ||
| n += int(offset) | ||
| if err != nil { | ||
| return n, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to dig deeper into the whole flow, but is it intentional to return both n and an err here? (The code higher up at line 1418 does a return 0, err on errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an interesting shout. writer is supposed to implement https://pkg.go.dev/io#Writer
It returns the number of bytes written from p (0 <= n <= len(p)) and any error encountered that caused the write to stop early. Write must return a non-nil error if it returns n < len(p)
Now, the code on line 1418 does NOT write to p which is the byte slice passed into Write. I think it's fine to return 0, err there and n, err here. WDYT @corhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning the number of bytes written from p is a requirement of the io.Writer interface contract. Zero bytes have been written from p if control flow reaches line 1418.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That's the part I wanted to dig deeper into, because I suspected that information had to be preserved (even in the error case). I guess the interface is just slightly confusing to require both error and value, but it makes sense in context.
| reader := bytes.NewReader(p) | ||
|
|
||
| for reader.Len() > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess reader := bytes.NewReader(p) could be inlined here, but that's really nitpicking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could probably be more clever in this code but let's keep it as unless you insist 😄
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
|
PTAL @thaJeztah. All the important points have been addressed. |
thaJeztah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| p = nil | ||
| // try filling up the pending parts buffer | ||
| offset, err = w.pending.ReadFrom(reader) | ||
| n += int(offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was curious if benchmarking would show a difference; I just did a quick check, and there's no consistent difference, so not worth changing.
| offset, err = w.pending.ReadFrom(reader) | ||
| n += int(offset) | ||
| if err != nil { | ||
| return n, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That's the part I wanted to dig deeper into, because I suspected that information had to be preserved (even in the error case). I guess the interface is just slightly confusing to require both error and value, but it makes sense in context.
This PR attempts to optimise the performance of image push in the S3 driver. There are 2 main changes:
writerwhere instead of using simple bytes slices forreadyandpendingparts which get constantly appended data into them causing unnecessary allocations we use optimised bytes buffers; we make sure these are used efficientlyThese changes should alleviate high memory pressure on the push path to S3 especially when large images (layers) are pushed to the S3 store.
Related PR: