Skip to content

store: add support to split filesystem using imagestore#1549

Merged
rhatdan merged 1 commit intocontainers:mainfrom
flouthoc:pull-store
Apr 17, 2023
Merged

store: add support to split filesystem using imagestore#1549
rhatdan merged 1 commit intocontainers:mainfrom
flouthoc:pull-store

Conversation

@flouthoc
Copy link
Collaborator

@flouthoc flouthoc commented Apr 3, 2023

Allow storage users to split the filesystem of containers vs image store, imagestore if configured will pull images in image storage instead of the graphRoot while keeping the other parts still in the originally configured graphRoot.

@flouthoc flouthoc marked this pull request as draft April 3, 2023 07:48
@rhatdan
Copy link
Member

rhatdan commented Apr 3, 2023

I would prefer this to be ImageStore rather then PullStore. You even differentiate between the two within the code.

@rhatdan
Copy link
Member

rhatdan commented Apr 3, 2023

	imgStoreRoot := s.pullStore
	if imgStoreRoot == "" {
		imgStoreRoot = s.graphRoot
	}

@flouthoc
Copy link
Collaborator Author

flouthoc commented Apr 3, 2023

imagestore SGTM I'll amend this.

@flouthoc flouthoc changed the title store: add support to split filesystem using pull-store store: add support to split filesystem using imagestore Apr 4, 2023
@flouthoc flouthoc force-pushed the pull-store branch 5 times, most recently from e8973b8 to 72dca55 Compare April 4, 2023 10:03
@flouthoc flouthoc marked this pull request as ready for review April 4, 2023 10:03
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the RO layer be found when Image Store is used?

e.g. another client is configured to not use the Image Store, and will create all the layers in the regular store. How would it work when the Image Store is set

@flouthoc
Copy link
Collaborator Author

flouthoc commented Apr 4, 2023

Will the RO layer be found when Image Store is used?

e.g. another client is configured to not use the Image Store, and will create all the layers in the regular store. How would it work when the Image Store is set

I tried replaying this scenario, no RO layer will be not present if user switches to --imagestore in between, since --imagestore switches to a completely new image store. In such cases if users wants to use layers from previous store he should use additional-image-store and use original store as an additional-image-store manually.

@giuseppe
Copy link
Member

giuseppe commented Apr 4, 2023

Will the RO layer be found when Image Store is used?
e.g. another client is configured to not use the Image Store, and will create all the layers in the regular store. How would it work when the Image Store is set

I tried replaying this scenario, no RO layer will be not present if user switches to --imagestore in between, since --imagestore switches to a completely new image store. In such cases if users wants to use layers from previous store he should use additional-image-store and use original store as an additional-image-store manually.

I'd prefer the opposite to happen. If a --imagestore is specified, then images are pulled there and it is internally handled as an additional image store. The lookup would work in the same way it does now, first looking in the current layers then looking it up in the additional image stores.

When I build from a Containerfile, where would the layers end up?

@flouthoc
Copy link
Collaborator Author

flouthoc commented Apr 5, 2023

@giuseppe I made a modification where when --image-store is used exisintg graphRoot is automatically used as an secondary additionalStore internal allowing users to use images both from newly added image-store and the original graphRoot, I have a added an integration test: split-store - use graphRoot as an additional store by default which tests this behavior. Do you think this suffice the use-case mentioned above.

@flouthoc flouthoc force-pushed the pull-store branch 3 times, most recently from 18482ed to c57b081 Compare April 5, 2023 09:03
@flouthoc flouthoc requested a review from mrunalp April 5, 2023 11:39
@flouthoc
Copy link
Collaborator Author

flouthoc commented Apr 5, 2023

Tagging @mrunalp for inputs.

@mrunalp
Copy link
Contributor

mrunalp commented Apr 7, 2023

Looks fine to me. Did we test with two disks?

@flouthoc
Copy link
Collaborator Author

flouthoc commented Apr 8, 2023

I think it should work without any issues but let me test it out with two separate disk to confirm.

@flouthoc
Copy link
Collaborator Author

Okay I did test it with imagestore on a different disk than my original graphroot and it works fine. I used an external usb stick as imagestore and my current disk as original graphroot.

# Check if we mounted external usb
[fl@fedora ~]$ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0         7:0    0  32.3M  1 loop /var/lib/snapd/snap/snapd/12704
loop1         7:1    0  55.4M  1 loop /var/lib/snapd/snap/core18/2128
loop2         7:2    0  13.9M  1 loop /var/lib/snapd/snap/universal-ctags/552
sdb           8:16   1  14.3G  0 disk 
├─sdb1        8:17   1  14.3G  0 part /run/media/fl/12a7bc12-59b6-4aa2-b40a-76f67faf28b7
└─sdb2        8:18   1     1M  0 part /run/media/fl/UEFI_NTFS
zram0       252:0    0     8G  0 disk [SWAP]
nvme0n1     259:0    0 476.9G  0 disk 
├─nvme0n1p1 259:1    0   600M  0 part /boot/efi
├─nvme0n1p2 259:2    0     1G  0 part /boot
└─nvme0n1p3 259:3    0 475.4G  0 part /home

# Check if external use contains directory for podman-storage
[fl@fedora ~]$ ls  /run/media/fl/12a7bc12-59b6-4aa2-b40a-76f67faf28b7
lost+found  podman-storage

# Check our storage.conf
[fl@fedora ~]$ cat /home/fl/.config/containers/storage.conf
[storage]

imagestore = "/run/media/fl/12a7bc12-59b6-4aa2-b40a-76f67faf28b7/podman-storage"

# First set imagestore to `none` and pull ubuntu
# Second set imagestore to `usb` and pull alpine

# podman images should show image from both original graph root and usb-stick
[fl@fedora bin]$ ./podman images
WARN[0000] The storage 'driver' option should be set in /home/fl/.config/containers/storage.conf. A driver was picked automatically. 
REPOSITORY                TAG         IMAGE ID      CREATED      SIZE        R/O
docker.io/library/alpine  latest      9ed4aefc74f6  11 days ago  7.34 MB     false
docker.io/library/ubuntu  latest      08d22c0ceb15  4 weeks ago  80.3 MB     true

# run container where image is not on imagestore but on original graph root
[fl@fedora bin]$ ./podman run --pull=never -it --rm ubuntu sh
WARN[0000] The storage 'driver' option should be set in /home/fl/.config/containers/storage.conf. A driver was picked automatically. 
# exit
[fl@fedora bin]$ 

# run container where image is on imagestore but not on original graph root
[fl@fedora bin]$ ./podman run --pull=never -it --rm alpine sh
WARN[0000] The storage 'driver' option should be set in /home/fl/.config/containers/storage.conf. A driver was picked automatically. 
/ # exit
[fl@fedora bin]$ 

@flouthoc
Copy link
Collaborator Author

@giuseppe @mrunalp ^

@rhatdan
Copy link
Member

rhatdan commented Apr 10, 2023

You need to update the containers-storage.conf.5.md

flouthoc added a commit to flouthoc/storage that referenced this pull request Apr 21, 2023
When we added splitstore support via `--imagestore` the `graphRoot` was
added as an `rw` additionalImageStore so lets do same for the `vfs`
driver.

See PR for more details:
* containers#1549
* containers#1578

Signed-off-by: Aditya R <arajan@redhat.com>
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am probably too sleep-deprived but I just can’t understand how this works, starting with dir2.

Why are all callers of dir2 special-casing this? I would expect a single location in this package to contain some code to find an existing image, which callers can then use without any conditionals. And I would expect a single location in this package to contain some code to determine where to create new images or store more data, which callers can then use without any conditionals.

This is clearly not that, and I can’t understand how this is intended to work.

Again, maybe it is just me. But please someone take another look.

}

func (d *Driver) dir2(id string) (string, bool) {
func (d *Driver) dir2(id string) (string, string, bool) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the return values of this function?

I would strongly recommend a habit of documenting the intended purpose, all parameters, and all return values, for every function written.

}
workDirBase := dir
if imagestore != "" {
if _, err := os.Stat(dir); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this checking dir? Based on the Stat just a few lines above, this is not going to fail.

Comment on lines +841 to +843
"WorkDir": path.Join(workDirBase, "work"),
"MergedDir": path.Join(workDirBase, "merged"),
"UpperDir": path.Join(workDirBase, "diff"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all using the values dependent on s.imageStore.

But create is using workDirBase/"work", but dir/"merged". UpdateLayerIDMap is using dir/"work". Everything seems to be using dir/"diff".

What’s going on here?

If it is hard to keep track of the right directory, introduce methods like workDir(imageID) and have everything call them.

newpath := path.Join(d.home, id)
imageStore := ""
if d.imageStore != "" {
imageStore = path.Join(d.imageStore, id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we now have an “image store” variable, which is an individual per-image value, but d.imageStore is a per-store global.

And this function creates this {s.imageStore}/{id} value, but never checks it for any purpose. Why?

  • If {d.home}/{id} exists, this function returns that as the first return value
  • {d.home}/{id} does not exist but one of the “additional image stores” contain an /{id} path, this function returns that additional one.

What role does {s.imageStore}/{id} play here?

Before, this function fairly clearly returned “which directory contains the image data if the image exists, or where the data should be created if it doesn’t exist anywhere; is the image in an additional store”.

Now it returns … ???

In particular the various parent lookups only check in the {s.imageStore}/{id} location, outright failing if the image is not there. But this function does not check at all whether anything exists in that location.

@flouthoc
Copy link
Collaborator Author

I'll open a new PR to address these refactor comments.

@nalind
Copy link
Member

nalind commented May 10, 2023

I am probably too sleep-deprived but I just can’t understand how this works, starting with dir2.

Why are all callers of dir2 special-casing this? I would expect a single location in this package to contain some code to find an existing image, which callers can then use without any conditionals. And I would expect a single location in this package to contain some code to determine where to create new images or store more data, which callers can then use without any conditionals.

This is clearly not that, and I can’t understand how this is intended to work.

Again, maybe it is just me. But please someone take another look.

Looking at it now, I don't think I fully understand it, either. With the option set, when the overlay driver creates and removes layers under the graph root, continuing to record things like the list of lowers there, it also seems to be creating and removing directories under the image store location, and it's neither obvious nor commented why it puts things in one place or the other. If my mostly-guesswork view of what the overlay driver is doing is correct, then the overlay driver is not correctly setting disk quotas.
I definitely don't understand what's supposed to be happening differently in the vfs driver.

flouthoc added a commit to flouthoc/podman that referenced this pull request Jun 17, 2023
Add support for `--imagestore` in podman which allows users to split the filesystem of containers vs image store, imagestore if configured will pull images in image storage instead of the graphRoot while keeping the other parts still in the originally configured graphRoot.

This is an implementation of
containers/storage#1549 in podman.

Signed-off-by: Aditya R <arajan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants