Skip to content

Avoiding inode exhaustion by supporting idmapped mounts without contiguous mappings #2345

@giuseppe

Description

@giuseppe

Discussed in containers/podman#26205

Originally posted by nealef May 27, 2025

Problem

We have a business need that requires the use of dynamically created UID/GID mappings when running containers in a rootful environment. Unfortunately, this results in the exhaustion of inodes in the filesystem hosting the image store. What follows is a detailed look at the problem; some things we tried without resorting to changing podman/containers code; and a look at a an experimental solution we've come up with.

Background

Apologies for teaching your grandmother to suck eggs by going into detail but it helped me better understand the problem.

The following commit added code to check the UID and GID mapping specifications of a layer when looking for candidates to start a container. As indicated in the commit summary this was for performance reasons:

commit 13f745092f2685877ec13f0f984d89b3096d494b
Author: Daniel J Walsh [dwalsh@redhat.com](mailto:dwalsh@redhat.com)
Date:   Sat Jun 2 05:34:11 2018 -0400
 
    Vendor in latest containers/storage
 
    This vendor will improve the performance of using userns
    since it will save aside the image layer of the chown, so
    followup runnings of podman will use the new layer rather
    then chowning again.
 
    Signed-off-by: Daniel J Walsh [dwalsh@redhat.com](mailto:dwalsh@redhat.com)
 
    Closes: containers/podman#881
    Approved by: mheon

The code in question is found in containers/storage/store.go:

     layerMatchesMappingOptions := func(layer *Layer, options types.IDMappingOptions) bool {
        // If the driver supports shifting and the layer has no mappings, we can use it.
        if s.canUseShifting(options.UIDMap, options.GIDMap) && len(layer.UIDMap) == 0 && len(layer.GIDMap) == 0 {
            return true
        }
        // If we want host mapping, and the layer uses mappings, it's not the best match.
        if options.HostUIDMapping && len(layer.UIDMap) != 0 {
            return false
        }
        if options.HostGIDMapping && len(layer.GIDMap) != 0 {
            return false
        }
        // Compare the maps.
        return reflect.DeepEqual(layer.UIDMap, options.UIDMap) && reflect.DeepEqual(layer.GIDMap, options.GIDMap)
    }

What this means is that if a layer is required when starting a container the image store is checked to see if there is a matching layer. Candidate layers are then checked to see if their UID and GID maps match that specified on the podman run command.

For example, if I pull the image docker.io/library/almalinux:8 it will be placed in the image store /var/lib/containers/storage. Of specific interest is the location /var/lib/containers/storage/overlay where you would find:

drwx------. 6 root root     69 May 22 01:09 ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35

This directory contains a number of subdirectories notably diff and merged:

 # ls -l /var/lib/containers/storage/overlay/ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35/diff/
total 8
lrwxrwxrwx.  1 root root    7 Oct 9 2021 bin -> usr/bin
drwxr-xr-x.  2 root root    6 May 19 05:57 dev
drwxr-xr-x. 43 root root 4096 May 19 05:57 etc
drwxr-xr-x.  2 root root    6 Oct 9 2021 home
lrwxrwxrwx.  1 root root    7 Oct 9 2021 lib -> usr/lib
lrwxrwxrwx.  1 root root    9 Oct 9 2021 lib64 -> usr/lib64
drwxr-xr-x.  2 root root    6 Oct 9 2021 media
drwxr-xr-x.  2 root root    6 Oct 9 2021 mnt
drwxr-xr-x.  2 root root    6 Oct 9 2021 opt
dr-xr-xr-x.  2 root root    6 May 19 05:56 proc
dr-xr-x---.  2 root root   91 May 19 05:57 root
drwxr-xr-x. 12 root root 162 May 19 05:57 run
lrwxrwxrwx.  1 root root    8 Oct 9 2021 sbin -> usr/sbin
drwxr-xr-x.  2 root root    6 Oct 9 2021 srv
dr-xr-xr-x.  2 root root    6 May 19 05:56 sys
drwxrwxrwt.  2 root root    6 Oct 9 2021 tmp
drwxr-xr-x. 12 root root 144 May 19 05:56 usr
drwxr-xr-x. 19 root root 4096 May 19 05:56 var

# ls -l /var/lib/containers/storage/overlay/ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35/merged
total 0

This represents a layer of an image that is ready to be instantiated in a running container. The layer contains no UID/GID mapping information.

A subsequent podman run will cause podman to search for candidate layers to be used to run a container. podman will find this “template” (for wont of a better term) to be used to build a runnable container. As we have no GID/UID mapping specification set it will select this layer to be used in the container.

Once the container is up and running the storage area now contains:

drwx------. 5 root root     69 May 22 01:15 8d6ff43a9e5cb407ef22b323c7f7df2ba425d93a758d347a481bf6f1c5a2fcaa
drwx------. 6 root root     69 May 22 01:09 ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35

Note, there are now two entries: (1) The template and (2) the layer in use (“running” layer). The diff and merge subdirectories now contain:

# ls -l /var/lib/containers/storage/overlay/8d6ff43a9e5cb407ef22b323c7f7df2ba425d93a758d347a481bf6f1c5a2fcaa/diff/
total 0
drwxr-xr-x. 3 root root 42 May 22 01:15 run

#  ls -l /var/lib/containers/storage/overlay/8d6ff43a9e5cb407ef22b323c7f7df2ba425d93a758d347a481bf6f1c5a2fcaa/merged
total 8
lrwxrwxrwx.  1 root root    7 Oct 9 2021 bin -> usr/bin
drwxr-xr-x.  2 root root    6 May 19 05:57 dev
drwxr-xr-x. 43 root root 4096 May 19 05:57 etc
drwxr-xr-x.  2 root root    6 Oct 9 2021 home
lrwxrwxrwx.  1 root root    7 Oct 9 2021 lib -> usr/lib
lrwxrwxrwx.  1 root root    9 Oct 9 2021 lib64 -> usr/lib64
drwxr-xr-x.  2 root root    6 Oct 9 2021 media
drwxr-xr-x.  2 root root    6 Oct 9 2021 mnt
drwxr-xr-x.  2 root root    6 Oct 9 2021 opt
dr-xr-xr-x.  2 root root    6 May 19 05:56 proc
dr-xr-x---.  2 root root   91 May 19 05:57 root
drwxr-xr-x.  1 root root   42 May 22 01:15 run
lrwxrwxrwx.  1 root root    8 Oct 9 2021 sbin -> usr/sbin
drwxr-xr-x.  2 root root    6 Oct 9 2021 srv
dr-xr-xr-x.  2 root root    6 May 19 05:56 sys
drwxrwxrwt.  2 root root    6 Oct 9 2021 tmp
drwxr-xr-x. 12 root root 144 May 19 05:56 usr
drwxr-xr-x. 19 root root 4096 May 19 05:56 var

Once run this running layer is cleaned up and we end up with /var/lib/containers/storage/overlay containing:

drwx------. 6 root root     69 May 22 01:09 ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35

Using this same image we now run with different UID and GID mappings:

podman run --rm -it --user 531571:36847 --gidmap 36848:1102383:28688 \
    --gidmap 0:1065536:36846 --gidmap 19997967:199997967:1 --gidmap 36847:36847:1 \
    --uidmap 0:1065537:65536 --uidmap 19997967:19997967:1 --uidmap 531571:531571:1 \
    almalinux:8 pwd

This time when podman is looking for candidate layers to use it will note that our original “template” doesn’t have the UID/GID mapping we want. So, believing this won’t be the only time this layer will be used with the same mapping specification, it will prepare a second “template” and then instantiate a running container using that template.

drwx------. 5 1065537 1065536     69 May 22 01:29 b2f697229618ed8f7d611ffb40e16dff3ad722c0b508a91638badb9c1f175e2b <- The layer of a running container
drwx------. 6 1065537 1065536     82 May 22 01:29 fd94567821ceabbdef41787671ededd835db4f260b40795211eb3361c4d6f441. <- The new template
drwx------. 6 root    root        69 May 22 01:09 ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35

When the container completes running and is torn down we end up with the new template in the store and ready to be used again. However, if I now run with a different mapping:

podman run --rm -it --user 531571:36847 --gidmap 36848:1167919:28688 \
    --gidmap 0:1131072:36846 --gidmap 19997819:199997819:1 --gidmap 36847:36847:1 \
    --uidmap 0:1131074:65536 --uidmap 19997819:19997819:1 --uidmap 531571:531571:1 \
    docker.io/library/almalinux:8 pwd

We see, when running:

drwx------. 6 1131074 1131072     82 May 22 01:32 0de4baac4eb3fa9de37f727fe2601213b8b008dc2f48f115641941c312a2f1e4 <- Running container’s layer
drwx------. 5 1131074 1131072     69 May 22 01:32 6160f347c865728c2ce5bdba92b22a37bed9606ff910e0dded8d1d9ddbbf5b26 <- New template
drwx------. 6 1065537 1065536     82 May 22 01:29 fd94567821ceabbdef41787671ededd835db4f260b40795211eb3361c4d6f441 <- Previous template
drwx------. 6 root    root        69 May 22 01:09 ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35

When container completes running the container storage area now has:

drwx------. 6 1131074 1131072     82 May 22 01:32 0de4baac4eb3fa9de37f727fe2601213b8b008dc2f48f115641941c312a2f1e4
drwx------. 6 1065537 1065536     82 May 22 01:29 fd94567821ceabbdef41787671ededd835db4f260b40795211eb3361c4d6f441
drwx------. 6 root    root        69 May 22 01:09 ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35

And thus we grow and consume inodes. Repeat this process for hundreds or thousands of times then we exhaust the inodes of the file system.

Circumventions

In this section I describe a couple of approaches we looked at to circumvent this behaviour.

Additional Store Area

This is a feature of containers that may be configured in /etc/containers/storage.conf that can be used to save images.

[storage.options]
# Storage options to be passed to underlying storage drivers
# AdditionalImageStores is used to pass paths to additional Read/Only image stores
# Must be comma separated list.
additionalimagestores = [
    "/var/lib/containers/additional",
]

If an image is pulled into the additional storage area then when it is run only the running layers end up in /var/lib/containers/storage/overlay and are removed once the container completes running. This happens for whatever UID/GID mapping specification is on the podman run command line. Thus we see no growth in inodes.

A podman images command shows a slightly different output with the addition of the R/O column:

Drawbacks

There are a number of negatives to this approach:

  1. Building images based on images within this additional store will place the new image in the regular storage area and when used we see the inode growth problem arise.
  2. An image may be placed in the additional store area using the --output option on the podman command. However, if the is anything in the Docker file that requires modification of a layer (notably removing a file) then podman build objects and the action is ignored. This should come as no surprise as within the code and the configuration file this area is referred to as “read only”.
  3. An image can’t be removed from the additional area without a parameter on the command line.
  4. An image once tagged appears twice in the podman images output which would be confusing to end-users

Separate Image Store

Containers also allows for the specification of an imagestore area. This separates image layers “at rest” from those instantiated at run time (what containers refer to as graphroot which is its primary location of container storage).

imagestore = "/var/lib/containers/imagestore"

This also looked promising as this separation may lead to preventing inode growth. However, I was never able to get things running to test the effects of UID/GID mapping as a podman run yields:

Error: creating container storage: creating an ID-mapped copy of layer “0d099e256fa7cec3dea6e429f092e60a0df725aeda556c37ca5229191ea283ba”: error during chown: storage-chown-by-maps: lchown …

An strace shows the chown failing with -EROFS. Again, although the comments in the storage.conf file and man page don't state it, the code also refers to this area as read/only.

Options

So as you can see we tried some things that were faulty in concept and application. Therefore, we looked at new approaches.

According to the code and documentation podman is “working as designed”. However, I would argue there are good reasons to change the behaviour of podman (or rather the containers component it brings in during the package build) to make this performance feature introduced with commit 13f745092f2685877ec13f0f984d89b3096d494bconfigurable with a default of it being enabled.

To this end we added a new option to the podman run command: --nocache that would cause the code in store.go to stop looking for layers once a match was made despite the UID/GID mappings of that layer (we're happy with an imperfect match):

--- a/vendor/github.com/containers/storage/store.go
+++ b/vendor/github.com/containers/storage/store.go
@@ -1717,6 +1717,12 @@ func (s *store) imageTopLayerForMapping(image *Image, ristore roImageStore, rlst
                                                continue
                                        }
                                }
+
+                               // If we have disabled caching - making 13f745092f2685877ec13f0f984d89b3096d494b configurable
+                               if options.NoCache {
+                                       return cLayer, nil
+                               }
+
                                // If the layer matches the desired mappings, it's a perfect match,
                                // so we're actually done here.
                                if layerMatchesMappingOptions(cLayer, options) {

To achieve this we changed the definition of the type IDMappings to include a new field 'NoCache`:

--- a/vendor/github.com/containers/storage/types/idmappings.go
+++ b/vendor/github.com/containers/storage/types/idmappings.go
@@ -46,6 +46,7 @@ type IDMappingOptions struct {
        GIDMap         []idtools.IDMap
        AutoUserNs     bool
        AutoUserNsOpts AutoUserNsOptions
+       NoCache        bool
 }

Additional code was required in cmd/podman/containers/run.go to define the new flag and in the spec generation to use this value. If this approach is useful I can provide the entire code as a draft PR.

Findings

When we use this code and add the --nocache option to the run command we observe the desired behaviour:

  1. Following a pull of the image:
# ls -l /var/lib/containers/storage/overlay
total 0
brw-------. 1 root root 253, 6 May 26 22:44 backingFsBlockDev
drwx------. 6 root root     69 May 26 22:44 ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35
drwxr-xr-x. 2 root root     40 May 26 22:44 l
# find /var/lib/containers/storage -type f | wc -l
6870
  1. When container is up and running after using the --nocache option:
# podman run --nocache --rm -it --user 531573:46847 --gidmap 36848:1102383:28688 --gidmap 0:1065536:36846 --gidmap 19999137:19999137:1 --gidmap 36847:36847:1 --uidmap 0:1065537:65536 --uidmap 19999137:19999137:1 --uidmap 531573:531573:1 almalinux:8

# ls -l /var/lib/containers/storage/overlay
total 0
drwx------. 6 1065537 1065536     82 May 26 22:45 10c20bf529acbb8968a24d6bf328db90a8a092d0da3116889d3a2c2da6385218
brw-------. 1 root    root    253, 6 May 26 22:45 backingFsBlockDev
drwx------. 6 root    root        69 May 26 22:44 ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35
drwxr-xr-x. 2 root    root        74 May 26 22:45 l
# find /var/lib/containers/storage -type f | wc -l
20589
  1. After container completes:
# ls -l /var/lib/containers/storage/overlay
total 0
brw-------. 1 root root 253, 6 May 26 22:47 backingFsBlockDev
drwx------. 6 root root     69 May 26 22:44 ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35
drwxr-xr-x. 2 root root     40 May 26 22:47 l
# find /var/lib/containers/storage -type f | wc -l
6872
  1. When container is up and running without using --nocache:
# podman run --rm -it --user 531573:46847 --gidmap 36848:1102383:28688 --gidmap 0:1065536:36846 --gidmap 19999137:19999137:1 --gidmap 36847:36847:1 --uidmap 0:1065537:65536 --uidmap 19999137:19999137:1 --uidmap 531573:531573:1 almalinux:8

# ls -l /var/lib/containers/storage/overlay
total 0
drwx------. 5 1065537 1065536     69 May 26 22:48 8dae43dc002ffe0187f7928b412ae58aa1036825f93308b99e7fb8d1ef742aa4
brw-------. 1 root    root    253, 6 May 26 22:48 backingFsBlockDev
drwx------. 6 1065537 1065536     82 May 26 22:48 c4a56b7281326bc6a99f39c0df213cdd4c6286ffe6c4e2f73f9bcaff4d3ba3c9
drwx------. 6 root    root        69 May 26 22:44 ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35
drwxr-xr-x. 2 root    root       108 May 26 22:48 l
# find /var/lib/containers/storage -type f | wc -l
20592
  1. After container completes:
# ls -l /var/lib/containers/storage/overlay
total 0
brw-------. 1 root    root    253, 6 May 26 22:49 backingFsBlockDev
drwx------. 6 1065537 1065536     82 May 26 22:48 c4a56b7281326bc6a99f39c0df213cdd4c6286ffe6c4e2f73f9bcaff4d3ba3c9
drwx------. 6 root    root        69 May 26 22:44 ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35
drwxr-xr-x. 2 root    root        74 May 26 22:49 l
# find /var/lib/containers/storage -type f | wc -l
13729
  1. After a container is run without --nocache and with a different IDMapping:
# podman run --rm -it --user 531571:36847 --gidmap 36848:1167919:28688 --gidmap 0:1131072:36846 --gidmap 19997819:199997819:1 --gidmap 36847:36847:1 --uidmap 0:1131074:65536 --uidmap 19997819:19997819:1 --uidmap 531571:531571:1 almalinux:8

# ls -l /var/lib/containers/storage/overlay
total 0
drwx------. 6 1131074 1131072     82 May 26 22:51 52c5601ca33cd4861934161721f2e9110380e93c327cd39a4cbc36917770484d
brw-------. 1 root    root    253, 6 May 26 22:51 backingFsBlockDev
drwx------. 6 1065537 1065536     82 May 26 22:48 c4a56b7281326bc6a99f39c0df213cdd4c6286ffe6c4e2f73f9bcaff4d3ba3c9
drwx------. 6 root    root        69 May 26 22:44 ff4f19608a1944c0c2807cd533515673285a9632dc74bf020e83e18630d1ae35
drwxr-xr-x. 2 root    root       108 May 26 22:51 l
# find /var/lib/containers/storage -type f | wc -l
20586

Conclusion

Making the behaviour configurable appears to meet the requirement of preventing boundless growth of inode use as containers are run but the approach may (probably) be naïve. However, it may point to a more robust solution. Among other things such a change also means that things like podman inspect needs to cater for this new field within IDMappings or adding similar logic to the podman container create command.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions