-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Fix async deletion in Convert #11628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hi @apostasie. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
core/images/converter/converter.go
Outdated
| var res images.Image | ||
| if dstRef != srcRef { | ||
| _ = is.Delete(ctx, dstRef) | ||
| _ = is.Delete(ctx, dstRef, images.SynchronousDelete()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is dstRef getting deleted here at all?
If there is a race, then synchronous delete would not be the correct solution. GC can happen at any type and if the results of GC changes expected state, that means that references aren't being properly held. References should be kept using a lease or an image reference.
What are the resources getting removed? Why would ensuring those resources are deleted before taking a new reference help with a race condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is dstRef getting deleted here at all?
I am not the author of that code.
I assume Create will fail if the image already exist?
If there is a race
There is clearly a race.
References should be kept using a lease or an image reference.
Convert gets a lease then returns an image ref that we hold on, right?
What are the resources getting removed?
The destination image is getting removed.
Why would ensuring those resources are deleted before taking a new reference help with a race condition?
Are you suggesting that async Delete will not do anything at all?
Anyhow, if you believe the existing code is fine, that's ok, I will just close.
LMK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a reproducer if you want to investigate this on your own
func main() {
const dockerContainerdaddress = "/run/user/501/containerd/containerd.sock"
ctx := context.Background()
srcRef := "docker.io/library/busybox:latest"
dstRef := "somethingelse"
client, err := containerd.New(dockerContainerdaddress, containerd.WithDefaultNamespace("default"))
if err != nil {
fmt.Println("client", err)
os.Exit(1)
}
platMC, _ := platformutil.NewMatchComparer(false, []string{runtime.GOOS + "/" + runtime.GOARCH})
var result *images.Image
result, err = converter.Convert(ctx, client, dstRef, srcRef, converter.WithPlatform(platMC))
if err != nil {
fmt.Println("convert", err)
os.Exit(1)
}
_, _, _, missing, err := images.Check(ctx, client.ContentStore(), result.Target, platMC)
if err != nil {
fmt.Println("check", err)
os.Exit(1)
}
fmt.Println("miss", missing)
}Add an extra sleep in converter.Convert to simulate pressure:
_ = is.Delete(ctx, dstRef)
+ time.Sleep(5 * time.Second)
res, err = is.Create(ctx, dstImg)Now:
nerdctl rmi -f $(nerdctl images -q)
nerdctl pull busybox
go run main.go
nerdctl images
go run main.go
nerdctl images
First go run:
miss []
REPOSITORY TAG IMAGE ID CREATED PLATFORM SIZE BLOB SIZE
somethingelse latest 901847821d9c Less than a second ago linux/arm64 4.174MB 1.846MB
busybox latest 37f7b378a29c 5 seconds ago linux/arm64 4.174MB 1.846MB
Second go run:
miss [{application/vnd.oci.image.index.v1+json sha256:901847821d9c85076405fcbd24c04c2a4860c542a8831a4b56ac66aea50a09d3 755 [] map[] [] <nil> }]
WARN[0000] content digest sha256:901847821d9c85076405fcbd24c04c2a4860c542a8831a4b56ac66aea50a09d3: not found
REPOSITORY TAG IMAGE ID CREATED PLATFORM SIZE BLOB SIZE
busybox latest 37f7b378a29c 10 seconds ago linux/arm64 4.174MB 1.846MB
My sync-delete patch actually is not helping for that specific reproducer, so, updated to just remove Delete entirely which seems to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether or not there is an underlying issue somewhere related to Delete is anyone's guess I suppose?
Anyhow, not particularly interested in looking into it.
I just need Convert to work.
Signed-off-by: apostasie <spam_blackhole@farcloser.world>
|
Hello, is there any progress on this PR? Can I do anything to help? Thank you |
Hard to tell. Underlying issue is still there in containerd - though we did ensure nerdctl no longer relies on this codepath to avoid it. |
nerdctl has struggled with a large number of occurrences of a racy issue, documented in: containerd/nerdctl#3513
more specifically in the context of a (simple) call to Convert:
containerd/nerdctl#3509
where an image we just created would sometimes disappear immediately.
It looks to me like this here is the root cause?
cc @AkihiroSuda and @lingdie