Skip to content

Fix potential double-locking of RWMutex in device manager#136660

Open
gzb1128 wants to merge 2 commits intokubernetes:masterfrom
gzb1128:fix-device-manager-double-locking
Open

Fix potential double-locking of RWMutex in device manager#136660
gzb1128 wants to merge 2 commits intokubernetes:masterfrom
gzb1128:fix-device-manager-double-locking

Conversation

@gzb1128
Copy link

@gzb1128 gzb1128 commented Jan 31, 2026

The podDevices() function was calling containerDevices() while
holding the read lock, but containerDevices() also attempts to
acquire the same lock. This could cause a deadlock when a writer
tries to acquire the lock between the two RLock() calls.

Introduce containerDevicesLocked() that expects the caller to
already hold the lock, and use it from podDevices() to avoid
nested locking.

Related to #127826

NONE

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 31, 2026
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

Hi @gzb1128. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jan 31, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gzb1128
Once this PR has been reviewed and has the lgtm label, please assign ffromani for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jan 31, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 31, 2026
@gzb1128
Copy link
Author

gzb1128 commented Jan 31, 2026

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Jan 31, 2026
@gzb1128
Copy link
Author

gzb1128 commented Jan 31, 2026

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 31, 2026
The podDevices() function was calling containerDevices() while
holding the read lock, but containerDevices() also attempts to
acquire the same lock. This could cause a deadlock when a writer
tries to acquire the lock between the two RLock() calls.

Introduce containerDevicesLocked() that expects the caller to
already hold the lock, and use it from podDevices() to avoid
nested locking.

Signed-off-by: gzb1128 <591605936@qq.com>
@gzb1128 gzb1128 force-pushed the fix-device-manager-double-locking branch from ca5149f to e43bd9c Compare January 31, 2026 07:33
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 31, 2026
@gzb1128
Copy link
Author

gzb1128 commented Jan 31, 2026

/kind bug

@gzb1128
Copy link
Author

gzb1128 commented Jan 31, 2026

@ffromani from the original PR #136235. We noticed the original PR has been stagnant for 2 weeks, so we created this new PR to keep the fix moving forward. Would appreciate your review.

Add tests to verify the correctness of the RWMutex double-locking fix:

- TestPodDevices: Verify multi-container device aggregation
- TestContainerDevices: Verify basic functionality
- TestPodDevicesConcurrentAccess: Verify concurrent read-write safety

The concurrent test uses 10 readers and 5 writers competing for the
lock to ensure no deadlock occurs under the fixed implementation.

Run with -race flag to detect data races: go test -race ./...
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 31, 2026
@ffromani
Copy link
Contributor

@ffromani from the original PR #136235. We noticed the original PR has been stagnant for 2 weeks, so we created this new PR to keep the fix moving forward. Would appreciate your review.

thanks, but 2 weeks is a pretty normal (trending on lower end) for high-churn opensource projects. Let's see how we can move forward

@gzb1128
Copy link
Author

gzb1128 commented Feb 12, 2026

@ffromani

Thank you for the response. You're right - 2 weeks is indeed normal for such a large project, and I apologize for being impatient. I should have waited for the original PR #136235 to progress.

The code and tests are ready in this PR. However, I fully respect your decision as the reviewer and the original PR author's work. If you prefer to continue with the original PR, I'm happy to close this one. Please let me know how you'd like to proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Development

Successfully merging this pull request may close these issues.

3 participants