overlay: use private merged directory for AIS#2036
overlay: use private merged directory for AIS#2036openshift-merge-bot[bot] merged 3 commits intocontainers:mainfrom
Conversation
|
not ready for review, I am still validating it |
6fa7b85 to
b4ad83d
Compare
b4ad83d to
5c98373
Compare
store.go
Outdated
| exists := store.Exists(to) | ||
| store.stopReading() | ||
| if exists { | ||
| return nil, fmt.Errorf("mounting read/only store images is not allowed: %w", ErrLayerUnknown) |
There was a problem hiding this comment.
I didn’t read this carefully, just a skim: Doesn’t this mean that it will be completely impossible to push an image which contains base layers in an additional image store and some child layers in the primary store, because the base layers will not be exportable?
Or is the assumption that the graph driver will iterate through the additional stores without the top levels knowing? In that case 1) this might not be reachable, but also 2) that completely gives up on store locking; why is that OK?
(I know I don’t understand the AIS indirections, possible combinations, and locking, and quite possibly I would save everyone time by just staying out of this.)
There was a problem hiding this comment.
the Diff/Changes operation can trigger a mount, we didn't notice the race before because native overlay doesn't need it (we just access the layer diff), but how can we ensure two processes do not step on each other when mount/unmount are involved?
There was a problem hiding this comment.
Oh I don’t know how to solve this.
I continue to be confused about how AIS is supposed to work, I’m just noting that the PR, as is now, means that either things that used to work will stop working, or the PR is adding unnecessary code (and perhaps locking problems).
It seems to me that the basic situation is not even really additional-store related that much. See the LOCKING BUG comments in layers.go, isn’t that the same issue? (Again, saying that doesn’t mean I know how to fix it.)
There was a problem hiding this comment.
should I relax the condition for allow mount from other stores but keep the writing lock for the rw store?
There was a problem hiding this comment.
If the primary driver is consulting AIS layers anyway, I don’t think that checking the AIS layers here, again, makes things any better.
But really my first intuition is that the graph driver should not be accessing the AIS layers itself, in the first place — but then that’s clearly not reasonable for mounting a child layer depending on a parent in AIS.
I’m just too confused about the code to have an opinion.
| return mergedDir, nil | ||
| } | ||
|
|
||
| func (d *Driver) getMergedDir(id, dir string, inAdditionalStore bool) string { |
There was a problem hiding this comment.
This one seems worthy of a doc comment, say
/// getMergedDir returns the directory path that should be used as the mount point for the overlayfs
| } | ||
|
|
||
| func (d *Driver) getMergedDir(id, dir string, inAdditionalStore bool) string { | ||
| if inAdditionalStore { |
There was a problem hiding this comment.
Also
| if inAdditionalStore { | |
| // The merged directory may not exist in an additional store, so use a transient mount point in `/run` instead. | |
| if inAdditionalStore { |
or so?
There was a problem hiding this comment.
… and if locking is a concern, and is happening in a non-obvious way, that also should be written down somewhere.
There was a problem hiding this comment.
the reason for using the path under /run is that we hold only a read-only lock for the additional stores, so different processes can race when mounting/unmounting there and that is the root cause for #2033
| if inAdditionalStore { | ||
| // check the base name for extra safety | ||
| if strings.HasPrefix(mountpoint, d.runhome) && filepath.Base(mountpoint) == "merged" { | ||
| err := os.RemoveAll(filepath.Dir(mountpoint)) |
There was a problem hiding this comment.
There should never actually be any content underneath the merged directory, right? We should be able to safely just use os.Remove I think.
There was a problem hiding this comment.
I can split this one in os.Remove(mountpoint); os.Remove(filepath.Dir(mountpoint)), I thought the RemoveAll version was more readable.
There was a problem hiding this comment.
It's clearly more readable to use RemoveAll, but the two-step Remove is a much stronger "extra safety" than checking the pathname. I don't have a strong opinion though.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
5c98373 to
120642a
Compare
120642a to
05659bf
Compare
There was a problem hiding this comment.
A higher level comment, looking at the store changes:
-
This adds exclusive locking to
Diff. IIRC previously we have tried fairly hard to make that only hold a shared lock. I don’t know anything specific about the performance impact, apart from vaguely remembering this is supposed to be a big deal. Do we know that this is actually fine? -
A blocker for formal reasons: If I understand it correctly, the overlay graph driver how requires
layerStore.Diff/layerStore.Changesto hold the layer store lock for writing for the primary store. If that’s correct, I think that must be documented in thelayerStoremethods (// Requires startReading or startWriting.) — and due to the unusual design, including the reason; typically we only require “writing” when modifyinglayerStorestate, so if this is the overlay driver requiring exclusivity, that reason must be documented there as well.- On second thought, wouldn’t it be viable for
overlayto introduce its own special-purpose lock? Such a lock could only be used use it in the relevant cases, instead of this being visible at the top level.
- On second thought, wouldn’t it be viable for
| @@ -2930,15 +2930,40 @@ func (s *store) Unmount(id string, force bool) (bool, error) { | |||
| } | |||
|
|
|||
| func (s *store) Changes(from, to string) ([]archive.Change, error) { | |||
There was a problem hiding this comment.
I couldn’t find anything, apart from cmd here, that is using Changes. It’s tempting to think that it should be deprecated or somehow made private instead of expanding functionality. That’s probably not this PR.
The obvious impact is that serializing the
Thinking about this a bit more, for concurrency, we would want the mounts to be reference-counted; and automatically cleaned up on crash; and at that point sharing cross-process is ~free. And we already have that kind of infrastructure in … except that it does not apply to the read-only AIS layers, at all. And, of course, the |
… and it’s actually being used in this |
when NaiveDiff is used, the Diff/Changes operations can trigger the mount of the layer. Prevent that multiple processes step on each other and one of them performs an unmount while the other one is still accessing the mount. Closes: containers#2033 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
05659bf to
1e60d87
Compare
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
but a Diff can trigger a mount, as it is in the race condition I am trying to fix; so the locking here shouldn't be too different from what we do in
it is every driver except plain native overlay to require that, and vfs since there is no "mount" operation for vfs.
Every driver, except vfs and native overlay, require a "mount" operation to access the files in the layer, so IMO it is better to make it the default behavior |
use a private "merged" directory when mounting from an additional store. Operations like "Diff()" and "Changes()" cause an implicit mount when the naive differ is used. The issue was not observed earlier because native overlay can achieve these operations without using a mount. Since these mounts are performed read-only, and overlay supports multiple mounts using the same lowerdirs, use a private location for the "merged" directory. The location is owned by the current writeable store, that is locked for writing. Closes: containers#2033 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
1e60d87 to
683e806
Compare
|
tests pass both for Podmand and Buildah: |
|
@cgwalters @rhatdan @mtrmac if there are no blockers, let's get this one and #2031 merged so we can enable the Podman CI, we can fix/improve things later but at least we have a more solid base knowing the e2e tests pass |
|
I'm not going to pretend I understand all this code well, but narrowing in on this comment:
The model we have with AIS is basically that there's only one writable store, correct? In that case, I think I understand how this fixes the issue.
And you're saying we're not using that code though here? I'm not sure if we are or not, but it looks to me like this patch is mostly just modifying the location of the mounts. |
|
I guess bottom line I can just say |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Right, and we can’t do an exclusive lock to mount, then downgrade to a shared lock for creating the diff (that part could work), and then upgrade again to unmount (we don’t have code that could do this, at least ATM, and I’m not immediately sure that can be done without risking deadlocks). I understand, at least in general terms, the difficulty.
Right now, making If composefs So I think we need to figure out a way to keep this read-locked for overlay, and until we do, composefs just isn’t ready. But I’m fully open to being overridden on this.
The reason I am considering this paragraph “blocker” is that I want (This is the only “blocker” from my side, to the extent that I’m ignorant about |
I was mistaken here. We are using the mounts reference counting … but that reference counting at the primary store also, AFAICS, assumes that layer records exist on the primary store; looking at that yesterday, I didn’t see how it can work for counting additional-store layers that are not known to the primary store. But, also, I’m unfamiliar with the details, and strongly time-constrained right now; this might well be fine. |
|
One thing I would say here is I see an intersection with containers/container-libs#131 ...if what we mounted was always a single flattened composefs image, the only cross-store dependency would be refcounting of the underlying files behind the composefs (cc composefs/composefs#125 ), but with a read-only additional store we would be guaranteed that the underlying objects wouldn't go away. |
this is probably already broken with fuse-overlayfs, just nobody hit the race, because with fuse-overlayfs we are using the naive diff (thus causing the mount) |
If this is pointing at the start of … but I’m a bit lost at how that happens. AFAICS all calls (in Podman) are |
… and mistaken again? The I’m sorry. I’ve not been helping. |
|
/lgtm |
... as modified by #2036 . Also document a LOCKING BUG I don't now care to fix. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
#2080 . |
use a private "merged" directory when mounting from an additional store.
Operations like "Diff()" and "Changes()" cause an implicit mount when the naive differ is used.
The issue was not observed earlier because native overlay can achieve these operations without using a mount.
Since these mounts are performed read-only, and overlay supports multiple mounts using the same lowerdirs, use a private location for the "merged" directory. The location is owned by the current writeable store, that is locked for writing.
Closes: #2033