Fix: Introduce StagingLockfile to resolve overlay staging lock memory leak#2336
Conversation
mtrmac
left a comment
There was a problem hiding this comment.
Thanks!
The mechanism of the implementation looks good overall.
The primary purpose of doing this separate implementation was to ensure that the global stagingLockFile map does not grow without bounds. So far, I don’t see that this package reliably achieves that — it does seem to be true for the drivers/overlay caller, but there is no guarantee.
I’d expect
- explicit documentation on the constructors (
GetStagingLockFile/CreateAndLock/TryLockExisting?) for what the caller must do to release memory again. - ~All tests that test the expected ways to use the API (not
AssertPanics) to end with an assertion thatstagingLockFileis empty = we deallocated everything.- And I rather suspect that some of the implementation is leaking entries, at least on error paths - e.g. the retry path in
CreateAndLock.
- And I rather suspect that some of the implementation is leaking entries, at least on error paths - e.g. the retry path in
I think it’s very likely that the staging lock package doesn’t need all of the existing operations (no need for blocking Lock or plain Unlock, and maybe more such opportunities?); along with removal of the recursive read locking capability, that could allow simplifying things.
Note to self: I didn’t review the test coverage as a whole, checking whether it covers the primary use cases.
drivers/overlay/overlay.go
Outdated
| } | ||
|
|
||
| lock, err := lockfile.GetLockFile(filepath.Join(layerDir, stagingLockFile)) | ||
| lock, err := staging_lockfile.GetStagingLockFile(filepath.Join(layerDir, stagingLockFile)) |
There was a problem hiding this comment.
This has the create lock vs. cleanup race, doesn’t it? (Sure, it’s pre-existing.)
giuseppe
left a comment
There was a problem hiding this comment.
could you please squash patches that refactor code added as part of a previous patch in the PR?
That might have been my fault, I have asked for copies / moves of pre-existing code to be extra commits. There are various ways to structure this (maybe start with extracting |
As a guess, consider the following invariant
I think that might suffice to do everything we need, at the cost of 1 in-process mutex + file lock; but it’s very possible I have overlooked something. |
ok then no problem, having the move on its own commit could be useful for git bisect |
b15a9af to
ac2aba9
Compare
40a3d3e to
360f6a3
Compare
|
@mtrmac I have addressed your comments and added a test that tests |
mtrmac
left a comment
There was a problem hiding this comment.
Thanks, this looks great.
One important problem in CreateAndLock; and it might be more convenient to change the CreateAndLock API [but we can tune that later when adding new users]. Otherwise, mostly nits.
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
This commit refactors the StagingLockfile component: - Fix test, functions names. - Removed blocking Lock, plain Unlock and Read lock mechanism. - Updated comments to reflect the current logic and usage. Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
|
I fixed the last nits. |
mtrmac
left a comment
There was a problem hiding this comment.
LGTM, nice work.
(Hoping for another review due to how critical this is.)
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe, Honny1 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR introduces
StagingLockfile, a new internal lockfile designed to manage locking for temporary directories more effectively and to resolve a memory leak. The leak was caused by pruned lockfiles from overlay staging remaining in the global cache.Key methods provided by
StagingLockfile:CreateAndLock: Creates and locks a new unique file.TryLockPath: Attempts to lock an existing file. If the file does not exist, it will be created.UnlockAndDelete: Unlocks and deletes the file (fixes the memory leak).An internal
filelockcomponent has been introduced to encapsulate common primitives for lock file operations. This allowsStagingLockfileandLockfileto share core locking mechanisms, reducing code duplication.