fix: close boltdb on metadata and mount plugin close#13348
Conversation
Co-authored-by: Rob Murray <rob.murray@docker.com> Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses delayed shutdown/teardown by ensuring BoltDB-backed state in the metadata DB and mount manager is explicitly closed during plugin/server shutdown, rather than relying on OS cleanup of lock files.
Changes:
- Add
(*metadata.DB).Close()to close the underlying BoltDB while coordinating with the metadata GC lock. - Update mount manager
Close()to also close its underlying BoltDB handle. - Update/add unit tests to validate DB closure via the manager/DB wrappers and assert
ErrDatabaseNotOpenon subsequent transactions.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| core/mount/manager/manager.go | Extend mount manager shutdown to close the underlying BoltDB (needs coordination with GC lock). |
| core/mount/manager/manager_test.go | Refactor tests to avoid DB sharing across subtests; add coverage asserting BoltDB is closed via manager close. |
| core/metadata/db.go | Add DB.Close() that closes the underlying transactor if it is an io.Closer, coordinated via wlock. |
| core/metadata/db_test.go | Update cleanups to call DB.Close() and add a test asserting the underlying BoltDB is closed. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dmcgowan
left a comment
There was a problem hiding this comment.
The GC case can be handled in a follow up as its not directly related. When there is a GC error, it can detect context cancellation and just exit. Its mostly cosmetic to avoid seeing a message about gc failure because database closed, that would likely be a rare race anyway.
|
/cherry-pick 2.3 |
|
@austinvazquez: cannot checkout DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherry-pick release/2.3 |
|
@austinvazquez: new pull request created: #13379 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Issue
When containerd shuts down its metadata or mount plugin never explicitly close the underlying bboltdb file deferring cleanup of the lock file to the underlying OS. The issue encountered is this causes variable delay for use cases where graceful teardown is desirable to reliably reset containerd.
Summary
This change adds closing of the metadata and mount boltdb to server shutdown.
Garbage collection consideration
The GC plugin currently does not implement Close, thus is skipped in server stop.
Closure of the metadata database should occur after acquiring the read-write lock to ensure no GC cycle is occurring concurrently. Transactions post-closure will result in
ErrDatabaseNotOpen. This should be fine as these resources can be scheduled for collection on the next containerd process invocation.Testing
Updated existing unit tests to close via plugin/manager rather than directly on the raw bolt handle.
Added new test cases to verify the underlying bolt Db is closed by asserting a new transaction returns ErrDatabaseNotOpen.