store: simplify imagestore implementation#1784
store: simplify imagestore implementation#1784openshift-merge-bot[bot] merged 4 commits intocontainers:mainfrom
Conversation
f7ad9de to
a5de3f7
Compare
|
Can you run a test against podman and Buildah as well? |
f6e38aa to
5385212
Compare
bc35798 to
814fddc
Compare
|
this is ready for review. Both Podman and Buildah pass Podman test PR: containers/podman#21070 |
|
LGTM |
mtrmac
left a comment
There was a problem hiding this comment.
There are parts of this, like removing all the d.imageStore conditions throughout overlay, which I very much like.
On the general principle, I suppose that’s pre-existing, but I just can’t see how this works. All of the read-only stores need to be RO locked, don’t they? And locks are handled in the generic code, drivers can’t do that for themselves. So any kind of iteration over stores in dir2 seems completely broken to me.
So… what’s really going on? Is dir2 only to handle cross-store parent layers during an overlay mount, or something? If that were so, the driver code should not be touching the other stores during any other operation; and still that cross-store parent situation would need to be somehow explicitly tracked and locked.
I’m afraid I can’t do a full review now. Maybe there is some way to split this into much smaller PRs?
drivers/overlay/overlay.go
Outdated
| if d.imageStore != "" { | ||
| _ = os.RemoveAll(filepath.Join(d.imageStore, stagingDir)) | ||
| } | ||
| _ = os.RemoveAll(filepath.Join(d.home, stagingDir)) |
There was a problem hiding this comment.
How does this work? stagingDir is here a per-driver directory, but getStagingDir creates per-layer-ID staging directories.
Similarly the EDIT check use of use in ListLayers seems inappropriate now — or quite possibly it needs to be retained for compatibility with older writers, but then it should be explicitly documented as such a non-obvious case.
Stylitically, I’d prefer:
- Define a separate
globalStagingDir/perLayerStagingDirconstants; don’t use a single string for two different purposes! - As I always want, document top-level symbols.
store.go
Outdated
| stores := driver.AdditionalImageStores() | ||
| if s.imageStoreDir != "" { | ||
| stores = append([]string{s.graphRoot}, stores...) | ||
| } |
There was a problem hiding this comment.
I would greatly appreciate consolidating this data further:
- Only ever call
AdditionalImageStores()once inload - Only ever have one
if s.imageStoreDir != ""condition, somewhere around there. - Together, populate something like
s.rwImageLayerStoreDirs/s.roImageLayerStoreDirs. And then have all other graph iterations be trivial loops.
| func (s *store) createGraphDriverLocked() (drivers.Driver, error) { | ||
| driverRoot := s.imageStoreDir | ||
| imageStoreBase := s.graphRoot | ||
| if driverRoot == "" { | ||
| driverRoot = s.graphRoot | ||
| imageStoreBase = "" | ||
| } | ||
| config := drivers.Options{ | ||
| Root: driverRoot, | ||
| ImageStore: imageStoreBase, | ||
| Root: s.graphRoot, | ||
| ImageStore: s.imageStoreDir, | ||
| RunRoot: s.runRoot, |
There was a problem hiding this comment.
AFAICS this effectively reverses the semantics of the two fields. While I 100% support the resulting simpler code and semantics, I’m not smart enough to review such changes in a GitHub PR without ~re-doing the work.
One possible approach:
- Define completely new fields with the new semantics. Set them in the first commit.
- In the first and/or subsequent commits, migrate the users to the new fields.
- Now that the old fields have no readers, possibly rename the fields back to the original names, with the new semantics,
store.go
Outdated
|
|
||
| ro := store != s.imageStoreDir && store != s.graphRoot | ||
|
|
||
| rls, err := s.newLayerStore(rlpath, glpath, s.graphDriver, false, ro) |
There was a problem hiding this comment.
Why is this introducing the “second read-write layer store” concept? Yes, it mirrors the “second read-write image store” thing (which I found dubious in #1578), but AFAICS the Delete code which uses the “second RW image store” never deletes layers from the second RW layer store.
So either that should be added as well, or (hopefully) we don’t need this complexity.
There was a problem hiding this comment.
this is needed because we have this weird specification that a layer must be deleted from the current store if it is found there. We even have a test for it :/
https://github.com/containers/storage/blob/main/tests/split-store.bats#L96-L101
I am fine if we decide to break this specification, but let's do it separately after we verify with CRI-O folks that it is not necessary.
There was a problem hiding this comment.
Fine… but I can’t see how that works. layersToRemove is only applied to one rlstore. That uses getLayerStoreLocked, and AFAICS never gets to write to this special member of roLayerStoresUseGetters.
I’m probably being stupid.
| newpath := filepath.Join(homedir, d.String(), "dir", filepath.Base(id)) | ||
| if _, err := os.Stat(newpath); err != nil { | ||
| additionalHomes := d.homes[1:] | ||
| if d.imageStore != "" { |
There was a problem hiding this comment.
Can’t this condition be determined at driver initialization time?
Every single consumer of d.homes is either indexing [0] or [:1], suggesting that the data can be recorded in some more convenient way.
drivers/vfs/driver.go
Outdated
| homedir = d.homes[0] | ||
| } | ||
| newpath := filepath.Join(homedir, d.String(), "dir", filepath.Base(id)) | ||
| if _, err := os.Stat(newpath); err != nil { |
There was a problem hiding this comment.
There was a problem hiding this comment.
this is still outstanding.
I don't see how this is different than the previous version, what locking do we need here?
|
|
||
| newpath := path.Join(homedir, id) | ||
|
|
||
| if _, err := os.Stat(newpath); err != nil { |
There was a problem hiding this comment.
There was a problem hiding this comment.
this is still outstanding.
I don't see how this is different than the previous version, what locking do we need here?
There was a problem hiding this comment.
Yes, see the top-level comment #1784 (review) .
One immediate thing that springs to mind is deleting the parent layer while the child is being created/mounted/unmounted. I don’t know that we can fully defend against those kinds of things (we don’t hold layers locked for the whole time a child is mounted, and mount reference counts are, I think, basically per-machine and not visible over network shares).
Or maybe a layer with the same ID (pulled layers have deterministic IDs) being simultaneously created in both stores, making the path returned by d.dir[2]() change over time — e.g. for getStagingDir as an example.
There might well be more, or maybe these are handled by some mechanism I don’t know about.
Assuming the concern is real:
- I’d like to see the scope of the problem restricted to the unavoidable minimum (whatever that is); in all situations where the generic code is iterating over
getROLayerStoresLockedto find the right layer, and handling locking, the driver should not be iterating over the secondary layer stores again / redundantly without locking. Structure the code, or document it, in a way that is likely to be maintained correctly over time. - In the situations where the problem remains outstanding afterwards and is not fixable short-term, we should loudly document the problem, the situations where it arises, and the constraints, for future maintainers (compare
LOCKING BUGinlayers.go) - Alternatively, if this is never going to be fixed, and users need to use the feature appropriately (“the image store must never be modified while it is used with outer stores” ?), clearly document that in end-user documentation, and maybe point at it in the code to acknowledge and justify the unsynchronized accesses.
- (As another option, a few years ago overlay used to do its own per-layer locking. I have removed that because I thought it’s redundant; actually external users can get a pointer to the driver object, so it is not 100% redundant, and doing that locking here might also be an option. I would prefer not to add that cost to every operation.)
I could see an argument that this is out of scope of this PR … I’m thinking that with this touching dir2 and dir, while the feature is still comparatively new, now is the best chance for fixing this we’ll ever have. Admittedly the timing is not super convenient now, but it will probably never be ideal.
There was a problem hiding this comment.
the assumption with additional image stores is that images/layers are not removed. There are cases where we don't grab any lock, e.g. a read-only file system, so that is already the current state.
When we use a read-only layer, the layer could disappear also while a container is running, and there is nothing that prevents the removal now:
# mkdir -p /tmp/store/{rw,ro,empty}
# podman --root /tmp/store/rw pull busybox
# mount --bind --read-only /tmp/store/rw /tmp/store/ro
# podman --root /tmp/store/empty --storage-opt overlay.imagestore=/tmp/store/ro run -d busybox top
5d4baa2643c77deca49ff9b1a0a10df5c2965626b725b8a2225db54706e69980
# podman --root /tmp/store/rw rmi busybox
Untagged: docker.io/library/busybox:latest
Deleted: 9211bbaa0dbd68fed073065eb9f0a6ed00a75090a9235eca2554c62d1e75c58f
There was a problem hiding this comment.
Yeah, the silent deletion of dependent parents seems hard to fix. It’s also not really documented that the users must prevent that, AFAICT.
What about concurrent creations, and other operations?
There was a problem hiding this comment.
I've not thought about them, but are they very different than delete though? We would end up with the same layer in multiple stores and use it from the first one where we find it.
There was a problem hiding this comment.
We could see e.g. a partial pull of a layer being started on our store, triggering the creation of a staging directory; before it finishes, the other image store creates the same layer; and now the code trying too commit the partially-pulled layer sees the new layer, determines a different staging directory path, and rejects the commit attempt. (I’m not saying that this is the only problem.)
There was a problem hiding this comment.
let's address this in a separate PR. I've played with it this morning, but I am not sure yet how this should be done.
Do you've any suggestions?
f3ffbd6 to
5c00dd3
Compare
store.go
Outdated
|
|
||
| ro := store != s.imageStoreDir && store != s.graphRoot | ||
|
|
||
| rls, err := s.newLayerStore(rlpath, glpath, s.graphDriver, false, ro) |
There was a problem hiding this comment.
Fine… but I can’t see how that works. layersToRemove is only applied to one rlstore. That uses getLayerStoreLocked, and AFAICS never gets to write to this special member of roLayerStoresUseGetters.
I’m probably being stupid.
19fc9a6 to
be5720d
Compare
|
Is this related to the Zstd work? |
no, this is not related to zstd |
|
I don't think it is too important for Podman 5.0, but it solves an issue that was reported some time ago: #1779 |
|
@kolyshkin @nalind PTAL |
|
@saschagrunert PTAL |
591d900 to
7d1b3f3
Compare
|
@haircommander @saschagrunert this PR affects the "split store" functionality in CRI-O |
|
cc @kannon92 |
7d1b3f3 to
2b50bc4
Compare
|
rebased |
this is a preparation for the next commit, which will use this feature. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2b50bc4 to
2a625d5
Compare
simplify the imagestore implementation and use it as a first-class store. It reduces the amount of code to deal with an image store in the driver (only overlay supports it at the moment). Closes: containers#1779 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when an image store is used, lock it as well as the graphroot. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2a625d5 to
d1cc476
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe, saschagrunert The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
simplify the imagestore implementation and use it as a first-class store. It reduces the amount of code to deal with an image store in the driver (only overlay supports it at the moment).
Closes: #1779
Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com