libimage: fix manifest race during listing#2400
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
mtrmac
left a comment
There was a problem hiding this comment.
ACK to the general idea (we don’t hold the storage lock long enough to get a consistent snapshot, and if a manifest list goes away between MultiList and querying the “dangling” property, it’s reasonable to consider the components dangling).
I’m worried about silently returning invalid data on I/O errors, especially when that can lead to removing images, though. Can this be restricted to only silently ignore the two relevant errors?
| if err != nil { | ||
| return nil, err | ||
| } | ||
| // ignore errors, common errors are |
There was a problem hiding this comment.
ctx is unused as of now, lint test is failing.
I wasn't sure if these would be the only ones it can encounter and I wanted to be on the safer side on |
… ugh, this can theoretically violate causality:
Is that possible? Worth worrying about? Right now we don’t have the infrastructure to do anything else… I guess we would want something like |
if we talk about operations like So overall I don't think we need to worry to about this case to much here. All I care about is to make the commands at least consistent so that they at least don't randomly fail which is far worse IMO. |
|
WFM. I have little sympathy for a user adding a reference to an untagged image, while running |
|
LGTM |
mtrmac
left a comment
There was a problem hiding this comment.
LGTM, feel free to merge as is.
I saw a flake in parallel podman testing, podman images can fail if the manifest was removed at the right time. In general listing should never be able to fail when another image or manifest is removed in parallel. Change the logic to convert to manifest and only collect the digests in the success case and ignore all other errors to make the listing more robust. I observed the following error from podman images: Error: locating image "xxx" for loading instance list: locating image with ID "xxx": image not known Signed-off-by: Paul Holzinger <pholzing@redhat.com>
All other errors are returned wrapped with the image ID so do the same when the manifest blobl decoding fails. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
|
created a podman test PR just to see if this breaks anything podman-container-tools/podman#25840 |
|
Podman PR looks good, PTAL again |
|
/lgtm |
|
Thanks! |
I saw a flake in parallel podman testing, podman images can fail if the manifest was removed at the right time. In general listing should never be able to fail when another image or manifest is removed in parallel.
Change the logic to convert to manifest and only collect the digests in the success case and ignore all other errors to make the listing more robust.
I observed the following error from podman images: Error: locating image "xxx" for loading instance list: locating image with ID "xxx": image not known