Skip to content

Stop ReadFromWithConcurrency submitting out of order writes#480

Closed
ncw wants to merge 1 commit into
pkg:masterfrom
ncw:fix-out-of-order-writes
Closed

Stop ReadFromWithConcurrency submitting out of order writes#480
ncw wants to merge 1 commit into
pkg:masterfrom
ncw:fix-out-of-order-writes

Conversation

@ncw

@ncw ncw commented Nov 11, 2021

Copy link
Copy Markdown
Contributor

This is for discussion. This is causing a real problem in rclone rclone/rclone#5806 but there may be a better way of fixing.


Before this change, ReadFromWithConcurrency would very often submit writes to the server out of order due to small timing differences when lots of blocks were queued and ready to send.

This causes more work for the server as seeking about the file isn't free. Writing chunks of a file out of order can also cause fragmentation and consequently loss of performance on that file.

This patch causes the write to the channel to be handled synchronously but the acks are received asynchronously allowing for many write packets to be in flight as before.

Before this change, ReadFromWithConcurrency would very often submit
writes to the server out of order due to small timing differences when
lots of blocks were queued and ready to send.

This causes more work for the server as seeking about the file isn't
free. Writing chunks of a file out of order can also cause
fragmentation and consequently loss of performance on that file.

This patch causes the write to the channel to be handled synchronously
but the acks are received asynchronously allowing for many write
packets to be in flight as before.
@ncw

ncw commented Nov 11, 2021

Copy link
Copy Markdown
Contributor Author

I speed tested this over a long wide pipe and I couldn't see any difference in speed before and after the patch, both about 40 MiB/s falling all the way back to 900KiB/s with concurrent writes disabled as you'd expect.

@ncw

ncw commented Nov 12, 2021

Copy link
Copy Markdown
Contributor Author

Thinking about this some more

  • this idea could be used in other reading and writing parts of the library
  • can replies to sftp messages come back out of order? If not then starting concurrency go routines isn't necessary, only one is needed

@drakkan

drakkan commented Nov 14, 2021

Copy link
Copy Markdown
Collaborator

Hi,

thanks for the contribution, it seems fine to me, however this code was written by @puellanivis, at the moment she has some personal issues, please wait some time for her to have a chance to have a look to the code, thanks for your patience

@puellanivis

Copy link
Copy Markdown
Collaborator

Indeed, sequential ordering of requests is probably a good idea, and I think I was writing it into the client updates.

There are some things I think that should be addressed, but as @drakkan mentioned, I'm currently out. (In hospital) and haven't been able to really do anything yet. This also happened as I was moving into a new apartment, and now missed the appointment for connecting my internet. 😫

@ncw

ncw commented Nov 14, 2021

Copy link
Copy Markdown
Contributor Author

Thanks for taking a look and happy to make changes when you are up to it. No hurry though and look after yourself.

@puellanivis

Copy link
Copy Markdown
Collaborator

Oh, I did want to mention, I did similar work already in PR 443, it might help to review that PR, and compare to what you're doing here. That'll end up being one of the first things I do myself.

@ncw

ncw commented Nov 24, 2021

Copy link
Copy Markdown
Contributor Author

Any thoughts on this PR?

@puellanivis puellanivis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had recommended reviewing PR-443 and paralleling what was done there, as it would be the first thing I would do. Doing so helped me spot a bug in the existing code, that we can fix as part of this PR.

Also, a number of little “trivial” things could have been caught ahead of review that would nevertheless would make the code inconsistent between two highly similar tasks.

Comment thread client.go

func (f *File) writeChunkAt(ch chan result, b []byte, off int64) (int, error) {
typ, data, err := f.c.sendPacket(ch, &sshFxpWritePacket{
func (f *File) startWriteChunkAt(ch chan result, b []byte, off int64) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function no longer provides any further functionality than just calling the f.c.dispatchRequest itself. Consider how in PR-443, I inlined the calls to dispatchRequest primarily because I also needed to get the ID used, in order to pass it into the later unmarshalStatus.

🤔 Actually, the existing code has a bug, that lead you astray here. See the later comment.

Comment thread client.go
return nil
}

func (f *File) writeChunkAt(ch chan result, b []byte, off int64) (int, error) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure I see the benefits of reimplementing writeChunkAt in terms of the new functions defined. Almost all of this is still already handled by sendPacket.

And, the existing writeChunkAt does not return len(b) if err != nil. This new implementation would return len(b) regardless of if there were an error or not.

This is one of the main reasons I didn’t bother refactoring readChunkAt in PR-443. Refactoring comes with risks, and it’s easy to introduce bugs accidentally. I prefer better to give a new path, and when all references to the old path are gone, then you can remove it safely. Until then, who knows what functionality people are accidentally depending on…

Comment thread client.go
n, err := r.Read(b)
if n > 0 {
read += int64(n)
ch := make(chan result, 1)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use newResChanPool(concurrency) as shown in PR-443, to avoid unnecessarily allocation of new channels for every write, rather than reusing ones that have already been received on. (The chan result used by dispatchRequest is a “guaranteed one receive, and only one receive” design, which allows us pool them.)

Comment thread client.go
n, err := r.Read(b)
if n > 0 {
read += int64(n)
ch := make(chan result, 1)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ch is a bit too generic of a name. There a number of channels already in scope here, and using “ch” suggests that this is The Channel—the singular one available in scope. Since this is one of many relevant channels in scope, a more specific name is desirable. Since this is the channel we will be receiving our results on, using res is a more appropriate name, same as PR-443.

Comment thread client.go

switch typ {
case sshFxpStatus:
id, _ := unmarshalUint32(data)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow. There’s a bug in this existing code. (My bad.) This id value here has to be the same as the ID field in the packet above. This is why the id is passed in PR-443 through the work channel, so that it can be properly tested for.

This mostly works because we’re never supposed to get an id mismatch here, but it’s still possible, and we should be handling it correctly if it does. (Basically, everyone has just been lucky so far.)

I noticed this bug while wondering why this PR didn’t also include the id in the channel, like PR-443 did. Answer: you probably weren’t aware of how important it was, since all you had was this buggy code I wrote. 😆

Comment thread client.go

select {
case workCh <- work{b, n, off}:
case workCh <- work{b, n, off, ch}:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR-443 ordered the parameters here differently. I would like to see code parallel each other as much as possible for things like this. (i.e. having a “standard” order for these fields is good, even if the original choice is entirely arbitrary.)

Also, note as per the bug I discovered in my code, we need to be passing the id properly as well, the same as PR-443.

Finally, the field n is no unused, since there is no reference to the field after this assignment. Please remove it the same as we would any other unused variable assignment.

Comment thread client.go
b []byte
n int
off int64
ch chan result

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, ch here is too non-specific when there are more than one relevant channels in scope.

@puellanivis

Copy link
Copy Markdown
Collaborator

Functionality was merged in with #482

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants