Skip to content

Fix flightcontrol test and usage bugs#2273

Merged
tonistiigi merged 4 commits intomoby:masterfrom
sipsma:fix-2270
Aug 16, 2021
Merged

Fix flightcontrol test and usage bugs#2273
tonistiigi merged 4 commits intomoby:masterfrom
sipsma:fix-2270

Conversation

@sipsma
Copy link
Copy Markdown
Collaborator

@sipsma sipsma commented Jul 21, 2021

Fixes #2270

@tonistiigi The possible bug involving the executor host's file turned out to not be an issue because in that code branch the hostname var was in practice always set to defaultHostname. So this primarily fixes the bug in computeBlobchain and a few other smaller possible ones.

@sipsma sipsma requested a review from tonistiigi July 21, 2021 23:57
@sipsma sipsma force-pushed the fix-2270 branch 2 times, most recently from febcbb7 to ab62668 Compare July 22, 2021 00:35
Comment thread solver/jobs.go
return nil, err
}
res, err := s.g.Do(ctx, "cachemap", func(ctx context.Context) (ret interface{}, retErr error) {
res, err := s.g.Do(ctx, fmt.Sprintf("cachemap-%d", index), func(ctx context.Context) (ret interface{}, retErr error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks pretty bad. I wonder if it might have caused some cache misses. We should pick this commit to v0.9/

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks like it's existed since v0.3, so I guess it's managed to exist for a while without creating too many noticeable issues. Let me know if you want me to open the backport PR though, should it only be to v0.9 or also any versions before that?

Comment thread cache/blobs.go Outdated
if err := ensureCompression(ctx, sr, desc, compressionType, s); err != nil {
return nil, err
}
refInfo := sr.Info()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look quite equivalent. Previously if differ was in progress, !createIfNeeded would still wait for completion. And !forceCompression should not call differ twice, even in mediatype is different.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And !forceCompression should not call differ twice, even in mediatype is different

Fixed

Previously if differ was in progress, !createIfNeeded would still wait for completion

Agree that used to be the behavior, but was it on purpose? It seems strange to me for the behavior of createIfNeeded to be dependent on the value of createIfNeeded in other builds that happen to be running at the same time. If a user sets createIfNeeded to false, I feel like it makes the most sense to just never block on the differ running, especially since it's related to timing and thus never a deterministic thing anyways.

Let me know if I'm missing something, I can make it behave in the old way but it will convolute the code a bit more.

@sipsma sipsma force-pushed the fix-2270 branch 4 times, most recently from d913dde to 22468dc Compare July 27, 2021 02:18
@sipsma
Copy link
Copy Markdown
Collaborator Author

sipsma commented Jul 27, 2021

I added a unit test for GetRemote to the commit w/ changes to computeBlobChain (apparently didn't have coverage before).

After pushing the PR I see it keeps failing in CI only on Windows. It doesn't appear related to my change at all, it seems to just be because it's hitting a codepath previously never hit (it gets "invalid windows mount type: 'bind'" when trying to extract a blob to snapshotter).

So I updated it to just skip that test on Windows. Let me know if that's alright for now.

sipsma added 4 commits August 16, 2021 17:26
Previously, the flightcontrol group was being given a key just set to
the ref's ID, which meant that concurrent calls using different values
of compressionType, createIfNeeded and forceCompression would
incorrectly be de-duplicated.

The change here splits up the flightcontrol group into a few separate
calls and ensures that all the correct input variables are put into the
flightcontrol keys.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
The test was making an assertion that is no longer expected to always be
true after moby#2195, which purposely made flightcontrol less deterministic.
This lead to occasional failures.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
@sipsma
Copy link
Copy Markdown
Collaborator Author

sipsma commented Aug 16, 2021

@tonistiigi let me know what you think of the updates

Copy link
Copy Markdown
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify this also works well with #2246 , whichever gets merged first.

Comment thread cache/blobs.go
@sipsma
Copy link
Copy Markdown
Collaborator Author

sipsma commented Aug 16, 2021

Please verify this also works well with #2246 , whichever gets merged first.

Yep, at first glance it looks conceptually compatible even if the merge conflicts will be slightly ugly. I also left a note on that other PR to update the new TestGetRemote added here in case this is merged first, which should help catch any regressions.

@sipsma sipsma requested a review from tonistiigi August 16, 2021 22:29
@tonistiigi tonistiigi merged commit 124126e into moby:master Aug 16, 2021
@tonistiigi
Copy link
Copy Markdown
Member

I think for v0.9 let's take only the cachemap (and maybe pull) fix.

@sipsma sipsma mentioned this pull request Oct 1, 2021
@crazy-max crazy-max added this to the v0.10.0 milestone Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition in flightcontrol causing occasional test failure

3 participants