Skip to content

fix(dockerFetcher): resolve deadlock issue in dockerFetcher open#12126

Merged
fuweid merged 2 commits intocontainerd:mainfrom
lujinda:fix_fetch_deadlock
Jul 19, 2025
Merged

fix(dockerFetcher): resolve deadlock issue in dockerFetcher open#12126
fuweid merged 2 commits intocontainerd:mainfrom
lujinda:fix_fetch_deadlock

Conversation

@lujinda
Copy link
Copy Markdown
Contributor

@lujinda lujinda commented Jul 18, 2025

We found that in some environments, after multiple failed attempts to pull images, new image pulls would get stuck. Upon investigation, we discovered that the containerd stack contained a large number of entries like:

goroutine 12564 [select, 606 minutes]:
golang.org/x/sync/semaphore.(*Weighted).Acquire(0xc000442730, {0x55a867c9c9c8, 0xc001d18240}, 0x1)
        /go/src/github.com/containerd/containerd/vendor/golang.org/x/sync/semaphore/semaphore.go:74 +0x334
github.com/containerd/containerd/v2/core/remotes/docker.(*dockerBase).Acquire(...)
        /go/src/github.com/containerd/containerd/core/remotes/docker/resolver.go:493
github.com/containerd/containerd/v2/core/remotes/docker.dockerFetcher.open({0xc0008b94d0?}, {0x55a867c9c9c8, 0xc001d18240}, 0xc0008b94d0, {0xc000044700, 0x34}, 0x0, 0x1)
        /go/src/github.com/containerd/containerd/core/remotes/docker/fetcher.go:459 +0x317
github.com/containerd/containerd/v2/core/remotes/docker.dockerFetcher.Fetch.func1(0x0)
        /go/src/github.com/containerd/containerd/core/remotes/docker/fetcher.go:285 +0xb68
github.com/containerd/containerd/v2/core/remotes/docker.(*httpReadSeeker).reader(0xc0007a1340)
        /go/src/github.com/containerd/containerd/core/remotes/docker/httpreadseeker.go:156 +0xb5
github.com/containerd/containerd/v2/core/remotes/docker.(*httpReadSeeker).Read(0xc0007a1340, {0xc001f30000, 0x100000, 0x100000})
        /go/src/github.com/containerd/containerd/core/remotes/docker/httpreadseeker.go:52 +0x3f
io.ReadAtLeast({0x7f4fc81b4480, 0xc0007a1340}, {0xc001f30000, 0x100000, 0x100000}, 0x100000)
        /usr/local/go/src/io/io.go:335 +0x90
github.com/containerd/containerd/v2/core/content.copyWithBuffer({0x7f4fc81b44a0, 0xc00229b300}, {0x7f4fc81b4480, 0xc0007a1340})
        /go/src/github.com/containerd/containerd/core/content/helpers.go:317 +0x1a5
github.com/containerd/containerd/v2/core/content.Copy({0x55a867c9c9c8, 0xc002571bc0}, {0x55a867ca2c88, 0xc00229b300}, {0x7f4fc81b4480, 0xc0007a1340}, 0x6f9, {0xc002400a00, 0x47}, {0x0, ...})
        /go/src/github.com/containerd/containerd/core/content/helpers.go:194 +0x275
github.com/containerd/containerd/v2/core/remotes.Fetch({0x55a867c9c9c8, 0xc002571bc0}, {0x7f4fc81b2b30, 0xc0004323c0}, {0x55a867c8a2e0, 0xc0008150a0}, {{0xc000044700, 0x34}, {0xc002400a00, 0x47}, ...})
        /go/src/github.com/containerd/containerd/core/remotes/handlers.go:153 +0x899
github.com/containerd/containerd/v2/client.(*Client).fetch.FetchHandler.func8({0x55a867c9c9c8, 0xc002571b00}, {{0xc000044700, 0x34}, {0xc002400a00, 0x47}, 0x6f9, {0x0, 0x0, 0x0}, ...})
        /go/src/github.com/containerd/containerd/core/remotes/handlers.go:105 +0x2fa
github.com/containerd/containerd/v2/core/images.HandlerFunc.Handle(0xc0006d6918?, {0x55a867c9c9c8?, 0xc002571b00?}, {{0xc000044700, 0x34}, {0xc002400a00, 0x47}, 0x6f9, {0x0, 0x0, ...}, ...})
        /go/src/github.com/containerd/containerd/core/images/handlers.go:59 +0x63
github.com/containerd/containerd/v2/client.(*Client).fetch.Handlers.func9({0x55a867c9c9c8, 0xc002571b00}, {{0xc000044700, 0x34}, {0xc002400a00, 0x47}, 0x6f9, {0x0, 0x0, 0x0}, ...})
        /go/src/github.com/containerd/containerd/core/images/handlers.go:69 +0x13e
github.com/containerd/containerd/v2/core/images.HandlerFunc.Handle(0xc000bb93e0?, {0x55a867c9c9c8?, 0xc002571b00?}, {{0xc000044700, 0x34}, {0xc002400a00, 0x47}, 0x6f9, {0x0, 0x0, ...}, ...})
        /go/src/github.com/containerd/containerd/core/images/handlers.go:59 +0x63
github.com/containerd/containerd/v2/core/unpack.(*Unpacker).Unpack.func1({0x55a867c9ca00, 0xc001103180}, {{0xc000044700, 0x34}, {0xc002400a00, 0x47}, 0x6f9, {0x0, 0x0, 0x0}, ...})
        /go/src/github.com/containerd/containerd/core/unpack/unpacker.go:206 +0x48e
github.com/containerd/containerd/v2/core/images.HandlerFunc.Handle(0x55a8689a5c90?, {0x55a867c9ca00?, 0xc001103180?}, {{0xc000044700, 0x34}, {0xc002400a00, 0x47}, 0x6f9, {0x0, 0x0, ...}, ...})
        /go/src/github.com/containerd/containerd/core/images/handlers.go:59 +0x63
github.com/containerd/containerd/v2/core/images.Dispatch.func1()
        /go/src/github.com/containerd/containerd/core/images/handlers.go:166 +0xd6
golang.org/x/sync/errgroup.(*Group).add.func1()
        /go/src/github.com/containerd/containerd/vendor/golang.org/x/sync/errgroup/errgroup.go:130 +0x7e
created by golang.org/x/sync/errgroup.(*Group).add in goroutine 12518
        /go/src/github.com/containerd/containerd/vendor/golang.org/x/sync/errgroup/errgroup.go:98 +0x79

We also observed some error messages in the containerd logs:

rpc error: code = Canceled desc = failed to pull and unpack image "xxxxxx": failed to copy: httpReadSeeker: failed open: context canceled.

Through code analysis, we identified a potential deadlock risk in the open method of dockerFetcher.

Thx

Signed-off-by: jinda.ljd <jinda.ljd@alibaba-inc.com>
@k8s-ci-robot
Copy link
Copy Markdown

Hi @lujinda. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@github-project-automation github-project-automation bot moved this from Needs Triage to Review In Progress in Pull Request Review Jul 18, 2025
@dmcgowan dmcgowan added this pull request to the merge queue Jul 18, 2025
@dmcgowan dmcgowan removed this pull request from the merge queue due to a manual request Jul 18, 2025
@dmcgowan
Copy link
Copy Markdown
Member

I'm going to push one more condition to check here

Signed-off-by: Derek McGowan <derek@mcg.dev>
@dmcgowan dmcgowan requested a review from fuweid July 18, 2025 22:31
@dmcgowan dmcgowan added the cherry-pick/2.1.x Change to be cherry picked to release/2.1 branch label Jul 18, 2025
@lujinda
Copy link
Copy Markdown
Contributor Author

lujinda commented Jul 19, 2025

I'm going to push one more condition to check here

Thank you for the push. It appears that the Release call was overlooked when closing the body.

@fuweid fuweid added this pull request to the merge queue Jul 19, 2025
Merged via the queue into containerd:main with commit 8f3a6ed Jul 19, 2025
83 of 85 checks passed
@github-project-automation github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Jul 19, 2025
@fuweid
Copy link
Copy Markdown
Member

fuweid commented Jul 19, 2025

/cherry-pick release/2.1

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

@fuweid: new pull request created: #12127

Details

In response to this:

/cherry-pick release/2.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@brandond
Copy link
Copy Markdown
Contributor

Does this also affect 2.0?

@austinvazquez austinvazquez added cherry-picked/2.1.x PR commits are cherry picked into the release/2.1 branch and removed cherry-pick/2.1.x Change to be cherry picked to release/2.1 branch labels Jul 21, 2025
@austinvazquez
Copy link
Copy Markdown
Member

Does this also affect 2.0?

@brandond, no I do not believe so. The feature (multipart layer fetch) that introduced the deadlock issue was released in 2.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/distribution Image Distribution cherry-picked/2.1.x PR commits are cherry picked into the release/2.1 branch kind/bug needs-ok-to-test size/M

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants