Skip to content

Fix compressed read retries in compressed blobs.#273

Merged
rubensf merged 1 commit intomasterfrom
downcompfix
Feb 9, 2021
Merged

Fix compressed read retries in compressed blobs.#273
rubensf merged 1 commit intomasterfrom
downcompfix

Conversation

@rubensf
Copy link
Copy Markdown
Contributor

@rubensf rubensf commented Feb 4, 2021

The current configuration will try to use the offset of the compressed
blob as the new offset for retrying reads. However, the compressed RE
API wants the offset to refer to the uncompressed blobs. Since the
compressed offset will usually be a lower number than the uncompressed
offset, we generally end up with a bigger blob than expected.

Since this Read interruptions are mostly transient (eg context deadlines
exceeded), our higher level retries have been mostly hiding the issue.

Also took the chance to beef up some comments around the thread tricky
compression code.

I'll eventually add retries tests to compressed blobs writes retry tests :)

@rubensf rubensf requested a review from ola-rozenfeld February 4, 2021 21:59
@google-cla google-cla bot added the cla: yes The author signed a CLA label Feb 4, 2021
@rubensf rubensf force-pushed the downcompfix branch 2 times, most recently from 5f84729 to 03972bb Compare February 4, 2021 22:05
@rubensf
Copy link
Copy Markdown
Contributor Author

rubensf commented Feb 4, 2021

In the process, I've removed the retrier based on a wrong digest, since this pretty much should have been our flake reason...

Wdyt?

// have to go somewhere or they'll block execution.
io.Copy(ioutil.Discard, r)
}
w.Close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this Close happening now?

Btw, I forget: did we add the Windows tests to presubmit? I'll patch this commit in and test it on Windows just in case. Also, please run a test with features=race, just in case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under if d.Size == sz { in the new file... if err := wt.Close().

And I thought the CI was doing features=race already, sighs. Caught a big, thanks for reminding me!

And i don't think we have windows CI.

t.Errorf("client.ReadBlob(ctx, digest) gave diff (-want, +got):\n%s", diff)
}
})
for _, comp := range []bool{false, true} {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much longer did it make the test? If it added a few seconds, then suggestion: make compression retry test a separate Test function from this one that includes the sleeping, because these two are really orthogonal. Ditto for files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non race tests still take < 10s (less than 1s of noticeable increase).

I'm not sure they're orthogonal - I'd argue they should be the same. As much as possible, comp vs noncomp should behave the same way.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if it doesn't make it noticeably longer, this is fine!

if limit > 0 && limit < sz {
sz = limit
}
if err = w.Reset(); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, if you moved this Reset outside of the closure, that tells me you no longer need it to be resettable. And you're removing the functionality of retrying on failed digest verification, too.
So, suggestion: base this PR on a revert of my #271. Should be much simpler, imo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mention that I getting rid of the retries in a comment. I think they were useful only because of this. I was wondering wdyt?

Maybe it'd be much simpler to rebase this on a revert, but I've already done this path :p.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to clean this up in the fixit next week, then. There's a lot of stuff here now that no longer should be that will just be confusing. We no longer need the whole ResettableWriter struct and interface, and the refactoring I did that made everything more complicated...
But I'm okay with leaving it as a cleanup task for the next CL.

The current configuration will try to use the offset of the _compressed_
blob as the new offset for retrying reads. However, the compressed RE
API wants the offset to refer to the _uncompressed_ blobs. Since the
compressed offset will usually be a lower number than the uncompressed
offset, we generally end up with a bigger blob than expected.

Since this Read interruptions are mostly transient (eg context deadlines
exceeded), our higher level retries have been mostly hiding the issue.

Also took the chance to beef up some comments around the thread tricky
compression code.

I'll _eventually_ add retries tests to compressed blobs writes retry tests :)
if limit > 0 && limit < sz {
sz = limit
}
if err = w.Reset(); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to clean this up in the fixit next week, then. There's a lot of stuff here now that no longer should be that will just be confusing. We no longer need the whole ResettableWriter struct and interface, and the refactoring I did that made everything more complicated...
But I'm okay with leaving it as a cleanup task for the next CL.

t.Errorf("client.ReadBlob(ctx, digest) gave diff (-want, +got):\n%s", diff)
}
})
for _, comp := range []bool{false, true} {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if it doesn't make it noticeably longer, this is fine!

@rubensf rubensf merged commit 55dae3d into master Feb 9, 2021
@rubensf rubensf deleted the downcompfix branch February 9, 2021 19:59
pl4nty pushed a commit to pl4nty/remote-apis-sdks that referenced this pull request Apr 18, 2025
The current configuration will try to use the offset of the _compressed_
blob as the new offset for retrying reads. However, the compressed RE
API wants the offset to refer to the _uncompressed_ blobs. Since the
compressed offset will usually be a lower number than the uncompressed
offset, we generally end up with a bigger blob than expected.

Since this Read interruptions are mostly transient (eg context deadlines
exceeded), our higher level retries have been mostly hiding the issue.

Also took the chance to beef up some comments around the thread tricky
compression code.

I'll _eventually_ add retries tests to compressed blobs writes retry tests :)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes The author signed a CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants