As Docker continues to grow in popularity as a lightweight containerization platform, one aspect many users end up grappling with is managing ever-increasing storage demands. Left unchecked, the default overlay2 storage driver can accumulate significant amounts of unused cached data and image resources.
To maintain high performance, availability and stability, it is critical to understand how overlay2 works and best practices for keeping its footprint under control in dynamic container environments. In this comprehensive guide, we will provide an in-depth look at overlay2 storage management from the lens of containerization experts running large-scale production deployments.
The Overlay2 Storage Driver Under the Hood
Overlay2 is designed to balance fast container provisioning and disk efficiency by leveraging overlay filesystems rather than full duplication of files. As Docker documentation notes, copy-on-write semantics are implemented by overlay and overlay2 drivers to share resources where possible across images/containers.
When new images are pulled, overlay2 allocates dedicated directories for each layer or image on the host at /var/lib/docker/overlay2, representing the "lowerdir" files. Then it constructs unified mount points called "merged" directories to provide a single view across all layers. Changes made during container runtime are tracked in yet another directory, the "upperdir".
This approach minimizes I/O and storage demands by having multiple containers share the same base image layers. New containers based on existing images can avoid unnecessary file duplication. However, there are still some areas where disk usage can build up over time:
Unused Layers: overlay2 maintains all downloaded image layers even if they are no longer referenced or needed. This includes any layers that were intermediate steps used to assemble final images. So while 2 containers may share the same Ubuntu base layer, there could be 10 other variants of the Ubuntu layer from previous iterations that stick around.
Dangling Images: Images that are dangling have no associated running containers or tags pointing to them. They consume disk space while serving no direct purpose for current workloads.
Stopped Containers: Containers retain their entire filesystem state when they exit or are stopped. This includes the container‘s writable layer, all contents and any modifications made at runtime.
Cached Data: Docker stores certain types of ephemeral cached data under overlay2 as layers to improve preparation of new containers. This cache can grow independently of running containers.
While overlay2‘s architecture minimizes duplication and copying through sharing, these areas illustrate how substantial amounts of disk space can still accumulate over time in dynamic container environments.
Overlay2 Storage Growth in Production Environments
In real world production environments, overlay2 has been observed to rapidly consume significant chunks of available storage. According to one analysis, an additional 58% storage had to be allocated to account for Docker metadata and runtime layers. The container engine can expand to fill 10% or more of total capacity within just a few weeks as noted in this detailed study of Docker storage overheads.
Ultimately the actual growth rates depend heavily on change rates. Frequently updated images or containers with many modifications to root filesystems during their lifetime incur far greater overlay2 footprint than relatively static containers. However, admins need to plan for realistic scenarios – with growth rates varying from 3X to over 10X the actual container data being stored in extreme cases.
Overlay2 size thus needs active management – it does not automatically shrink or readjust when images and containers are removed. Simply removing containers, stopping them or deleting images retains the majority of their data still occupying disk blocks.
Impact of Uncontrolled Overlay2 Growth
Allowing unchecked expansion of overlay2‘s storage footprint can negatively impact containers and the entire Docker host in several ways:
-
Performance Degradation – Docker management operations rely heavily on the underlying storage system. As overlay usage grows from gigabytes to terabytes, noticeable lag is introduced browsing container metadata, accessing image layers, launching new containers, etc. This also reduces application performance.
-
Capacity Issues – Filling up storage drives threatens stability and reliability. Once capacities are hit, Docker can struggle to deploy containers, fail image pulls and potentially experience crashes or refusal to startup due to lack of available space.
-
Resource Constraints – Too much storage devoted just to overlay2 caching and access overhead directly takes away capacity available to user applications. Inefficient overlay2 data piles also consume extra memory and CPU cycles for storage I/O.
Based on real-world encounter with production overlay2 bloat, Docker experts now recognize actively purging unused data as essential. Just as unused application logs or temporary files get cleaned to free capacity, outdated or expired containers and images need pruning too.
Comparison to Other Docker Storage Drivers
It is worth comparing overlay2 to some alternatives like aufs and overlay to better understand tradeoffs. Aufs implements similar copy-on-write approaches but is considered more traditional union filesystem oriented toward performance over storage efficiency. Comparatively, overlay2 uses fewer inodes, minimizes duplication across image layers and leverages native Linux kernel support added specifically for containerization workflows.
The other storage driver overlay aims to deliver better performance for I/O intensive workloads relative to overlay2, at the cost of greater storage demands. Some benchmarks have recorded anywhere from 15-30% greater disk usage by the original overlay driver.
Overall while not as performant as overlay or aufs, overlay2 strikes a solid balance for a wide range of container deployments by delivering strong performance while keeping host storage footprint contained relative to alternatives. But it still requires active management.
Overlay2 Cleanup Approaches
Now that we have covered how overlay2 storage usage grows over time and the importance of periodically cleaning up accumulated container data, what are the recommended methods to practically reclaim capacity?
1. Remove Exited Containers
First priority is removing exited containers with docker container prune, which deletes all stopped containers. This alone reclaims the container writable layers and any associated volumes. Add -f to bypass prompts so cleanup can be scripted:
$ docker container prune -f
Deleted Containers:
c1d3baa5eeb93
09210703edab3
Total reclaimed space: 1.2GB
2. Leverage Docker System Prune
The docker system prune command provides a consolidated cleanup operation encompassing multiple types of unused data:
$ docker system prune -a -f
Deleted Containers:
c1d3baa5eeb93
09210703edab3
Deleted Networks:
my-unused-network-1
Deleted Images:
untagged: ubuntu:15.04
untagged: base-image:latest
Total reclaimed space: 1.9GB
The -a flag in particular removes unreferenced and dangling images.
3. Periodic Manual Image Cleanup
In addition to system prune, having a catalog of images identified by age or last usage can help guide more surgical clean up. Sort images by creation date or last used via docker images:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu latest 2d3b407c994b 6 weeks ago 117MB
fedora 34 dd35ab2afd12 12 weeks ago 220MB
custom base 865558c23d12 24 weeks ago 1.15GB
Then selectively remove oldest or least recently used images manually: docker rmi <image>. While more effort than system prune, this prevents losing versions you may want to keep.
4. Remove Unused Local Volumes
Check whether old volumes without associated containers can be deleted to reclaim capacity:
$ docker volume ls
local my_vol
$ docker volume prune
Deleted Volumes:
my_vol
Total reclaimed space: 2.1GB
5. Reset the system as a last resort
If repeated pruning and removals prove inadequate, resetting the Docker Engine by erasing /var/lib/docker completely wipes the slate clean as a last resort. This destroys all containers and images so should only be utilized if absolutely necessary. Not recommended in production scenarios.
Proactive Cleaning Best Practices
While the above outlines specific cleanup mechanisms, what are some best practices Docker experts recommend for staying ahead of overlay2 storage growth before it becomes an issue?
Integrate Pruning Into Build Pipelines
Adding prune operations directly into CI/CD pipelines means every image build or major deployment also handles cleanup intrinsically. No need for separate scheduled cleanup jobs.
Disk Usage Monitoring & Quotas
Keep an eye on overlay2 storage consumption via tools like docker system df and configure alerts when hitting thresholds. Consider enforcing quotas on /var/lib/docker if supported by the filesystem.
Prioritize Tagging Images
Tag images appropriately so outdated versions can be easily identified for removal later instead of dangling untagged layers building up.
Limit Base Images
Consolidate on common base images across containers to maximize layer sharing opportunities instead of bloated specialty images.
Volume Management Lifecycle
Manage volume lifecycles closer to how application data is handled; remove volumes associated with old containers proactively.
Conclusion
As an expert Docker storage driver, overlay2 employs several optimizations to minimize resource demands. However, production deployments at scale still require careful monitoring and a clear cleanup regimen to prevent uncontrolled growth accumulating over time.
Integrate overlay2 cleanup into deployment pipelines, monitor growth rates actively, leverage pareto-style image/container categorization to guide surgical pruning andreset the Docker daemon as a last resort when all else fails. Keeping overlay2‘s footprint contained delivers better performance, stability and easier capacity planning.


