Fix compressed read retries in compressed blobs. by rubensf · Pull Request #273 · bazelbuild/remote-apis-sdks

rubensf · 2021-02-04T21:59:23Z

The current configuration will try to use the offset of the compressed
blob as the new offset for retrying reads. However, the compressed RE
API wants the offset to refer to the uncompressed blobs. Since the
compressed offset will usually be a lower number than the uncompressed
offset, we generally end up with a bigger blob than expected.

Since this Read interruptions are mostly transient (eg context deadlines
exceeded), our higher level retries have been mostly hiding the issue.

Also took the chance to beef up some comments around the thread tricky
compression code.

I'll eventually add retries tests to compressed blobs writes retry tests :)

rubensf · 2021-02-04T22:08:00Z

In the process, I've removed the retrier based on a wrong digest, since this pretty much should have been our flake reason...

Wdyt?

ola-rozenfeld · 2021-02-09T16:56:42Z

go/pkg/client/cas.go

 			// have to go somewhere or they'll block execution.
 			io.Copy(ioutil.Discard, r)
 		}
-		w.Close()


Where is this Close happening now?

Btw, I forget: did we add the Windows tests to presubmit? I'll patch this commit in and test it on Windows just in case. Also, please run a test with features=race, just in case.

Under if d.Size == sz { in the new file... if err := wt.Close().

And I thought the CI was doing features=race already, sighs. Caught a big, thanks for reminding me!

And i don't think we have windows CI.

ola-rozenfeld · 2021-02-09T17:22:03Z

go/pkg/client/retries_test.go

-				t.Errorf("client.ReadBlob(ctx, digest) gave diff (-want, +got):\n%s", diff)
-			}
-		})
+		for _, comp := range []bool{false, true} {


How much longer did it make the test? If it added a few seconds, then suggestion: make compression retry test a separate Test function from this one that includes the sleeping, because these two are really orthogonal. Ditto for files.

The non race tests still take < 10s (less than 1s of noticeable increase).

I'm not sure they're orthogonal - I'd argue they should be the same. As much as possible, comp vs noncomp should behave the same way.

Sure, if it doesn't make it noticeably longer, this is fine!

ola-rozenfeld · 2021-02-09T17:32:15Z

go/pkg/client/cas.go

 	if limit > 0 && limit < sz {
 		sz = limit
 	}
+	if err = w.Reset(); err != nil {


Wait, if you moved this Reset outside of the closure, that tells me you no longer need it to be resettable. And you're removing the functionality of retrying on failed digest verification, too.
So, suggestion: base this PR on a revert of my #271. Should be much simpler, imo.

I mention that I getting rid of the retries in a comment. I think they were useful only because of this. I was wondering wdyt?

Maybe it'd be much simpler to rebase this on a revert, but I've already done this path :p.

We'll have to clean this up in the fixit next week, then. There's a lot of stuff here now that no longer should be that will just be confusing. We no longer need the whole ResettableWriter struct and interface, and the refactoring I did that made everything more complicated...
But I'm okay with leaving it as a cleanup task for the next CL.

The current configuration will try to use the offset of the _compressed_ blob as the new offset for retrying reads. However, the compressed RE API wants the offset to refer to the _uncompressed_ blobs. Since the compressed offset will usually be a lower number than the uncompressed offset, we generally end up with a bigger blob than expected. Since this Read interruptions are mostly transient (eg context deadlines exceeded), our higher level retries have been mostly hiding the issue. Also took the chance to beef up some comments around the thread tricky compression code. I'll _eventually_ add retries tests to compressed blobs writes retry tests :)

ola-rozenfeld · 2021-02-09T19:53:26Z

go/pkg/client/cas.go

 	if limit > 0 && limit < sz {
 		sz = limit
 	}
+	if err = w.Reset(); err != nil {


We'll have to clean this up in the fixit next week, then. There's a lot of stuff here now that no longer should be that will just be confusing. We no longer need the whole ResettableWriter struct and interface, and the refactoring I did that made everything more complicated...
But I'm okay with leaving it as a cleanup task for the next CL.

ola-rozenfeld · 2021-02-09T19:54:20Z

go/pkg/client/retries_test.go

-				t.Errorf("client.ReadBlob(ctx, digest) gave diff (-want, +got):\n%s", diff)
-			}
-		})
+		for _, comp := range []bool{false, true} {


Sure, if it doesn't make it noticeably longer, this is fine!

The current configuration will try to use the offset of the _compressed_ blob as the new offset for retrying reads. However, the compressed RE API wants the offset to refer to the _uncompressed_ blobs. Since the compressed offset will usually be a lower number than the uncompressed offset, we generally end up with a bigger blob than expected. Since this Read interruptions are mostly transient (eg context deadlines exceeded), our higher level retries have been mostly hiding the issue. Also took the chance to beef up some comments around the thread tricky compression code. I'll _eventually_ add retries tests to compressed blobs writes retry tests :)

rubensf requested a review from ola-rozenfeld February 4, 2021 21:59

google-cla bot added the cla: yes The author signed a CLA label Feb 4, 2021

rubensf force-pushed the downcompfix branch 2 times, most recently from 5f84729 to 03972bb Compare February 4, 2021 22:05

ola-rozenfeld reviewed Feb 9, 2021

View reviewed changes

rubensf force-pushed the downcompfix branch from 03972bb to ee72768 Compare February 9, 2021 19:21

ola-rozenfeld approved these changes Feb 9, 2021

View reviewed changes

rubensf merged commit 55dae3d into master Feb 9, 2021

rubensf deleted the downcompfix branch February 9, 2021 19:59

EricBurnett mentioned this pull request Apr 19, 2021

Clarify WriteRequest.write_offset spec for compressed blobs bazelbuild/remote-apis#193

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix compressed read retries in compressed blobs.#273

Fix compressed read retries in compressed blobs.#273
rubensf merged 1 commit intomasterfrom
downcompfix

rubensf commented Feb 4, 2021

Uh oh!

rubensf commented Feb 4, 2021

Uh oh!

ola-rozenfeld Feb 9, 2021

Uh oh!

rubensf Feb 9, 2021

Uh oh!

ola-rozenfeld Feb 9, 2021

Uh oh!

rubensf Feb 9, 2021

Uh oh!

ola-rozenfeld Feb 9, 2021

Uh oh!

ola-rozenfeld Feb 9, 2021

Uh oh!

rubensf Feb 9, 2021

Uh oh!

ola-rozenfeld Feb 9, 2021

Uh oh!

ola-rozenfeld Feb 9, 2021

Uh oh!

ola-rozenfeld Feb 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rubensf commented Feb 4, 2021

Uh oh!

rubensf commented Feb 4, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants