Skip to content

Conversation

@ambarve
Copy link
Contributor

@ambarve ambarve commented Dec 4, 2023

If a snapshot removal fails (during garbage collection), the entire garbage collection operation is cancelled. This is problematic because once cleanup of any snapshot fails no other snapshots will be cleaned and the disk usage will just keep increasing.
Solution is to return snapshot removal errors wrapped as "ErrFailedPrecondition" errors. The garbage collectors continues cleanup if the error is of this type.

If a snapshot removal fails (during garbage collection), the entire garbage collection operation is
cancelled. This is problematic because once cleanup of any snapshot fails no other snapshots will be cleaned
and the disk usage will just keep increasing.
Solution is to return snapshot removal errors wrapped as "ErrFailedPrecondition" errors. The garbage
collectors continues cleanup if the error is of this type.

Signed-off-by: Amit Barve <ambarve@microsoft.com>
@k8s-ci-robot
Copy link

Hi @ambarve. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dmcgowan
Copy link
Member

dmcgowan commented Dec 5, 2023

It is worth considering implementing the Cleanup method. The garbage collection calls Cleanup at the end of GC and outside the GC lock, it is a good time to run disk cleanup and a failure can just be ignored and tried again the next time Cleanup is called. Besides the performance improvements by shortening the GC write lock, it provides guarantees that a name is reusable after GC (assuming the snapshots are able to be fully renamed on remove).

This change LGTM as a solution the problem you are seeing though.

@ambarve
Copy link
Contributor Author

ambarve commented Dec 5, 2023

@dmcgowan Thanks for the suggestion of implementing the Cleanup method. I think it will also help simplify the code between Windows & CimFS snapshotters.
I think we can merge this change for now and I will open a separate PR (once CimFS snapshotter is merged) to impelement Cleanup method in both Windows & CimFS snapshotters.

@samuelkarp
Copy link
Member

/ok-to-test

@dmcgowan dmcgowan added this pull request to the merge queue Dec 5, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 5, 2023
@samuelkarp samuelkarp added this pull request to the merge queue Dec 5, 2023
Merged via the queue into containerd:main with commit d55bfab Dec 5, 2023
@thaJeztah thaJeztah added the cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch label Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch ok-to-test platform/windows Windows size/XS snapshotters

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants