Managing Docker on a single host is a script. Managing it on five, twenty, or fifty hosts the same way is a Sunday you do not get back. Ansible turns the entire flow, from installing Docker on a fresh Rocky 10 box to rolling a new container image across Ubuntu 24.04 fleet, into one repeatable role plus a handful of playbooks.
This guide walks through the full Ansible + Docker lab we use on production-style fleets: a single OS-dispatching role that installs Docker the right way on Rocky Linux 10 and Ubuntu 24.04, seven tested playbooks covering single containers, image build and push, docker compose stacks, rolling updates with healthcheck-gated serial rollout, named volume backups, and clean teardown. Every command was tested on a real lab on April 2026, not invented. The companion repo at c4geeks/ansible ships the role and playbooks ready to clone.
Tested April 2026 on Rocky Linux 10.1 + Ubuntu 24.04 LTS with ansible-core 2.16.14, community.docker 5.2.0, Docker Engine 29.4.1, docker compose plugin 2.40.x
Prerequisites
The lab is opinionated but small. Three Linux hosts, all on the same LAN or a flat private network:
- One control host running Rocky Linux 10 or Ubuntu 24.04. This is where Ansible runs from. 2 vCPU, 2 GB RAM is plenty.
- One Rocky Linux 10 managed host. 2 vCPU, 4 GB RAM, 30 GB disk.
- One Ubuntu 24.04 LTS managed host. 2 vCPU, 4 GB RAM, 30 GB disk.
- Passwordless SSH from control to each managed host (key auth, not password).
- Sudo with NOPASSWD for the managed-host user during testing. Tighten this later with the CIS hardening playbook.
If you only have one box to spare, the inventory-solo.ini shipped in the repo points everything at localhost with ansible_connection=local. Every playbook in this guide works against that inventory unchanged. If Ansible itself is not installed on the control yet, the Ansible installation guide covers the EPEL and APT paths.
Step 1: Set reusable shell variables
Every command from here references shell variables so you change one block at the top of your SSH session and paste the rest as-is. Export these before running anything:
export PROJECT_DIR="${HOME}/ansible-docker"
export CONTROL_HOST="ansible-control-01"
export ROCKY_HOST="docker-rocky-01"
export UBUNTU_HOST="docker-ubuntu-01"
export GHCR_USER="your-github-username"
export GHCR_PAT="ghp_replace_with_a_real_pat"
Confirm the variables stuck before doing anything destructive:
echo "Project: ${PROJECT_DIR}"
echo "Control: ${CONTROL_HOST}"
echo "Managed: ${ROCKY_HOST}, ${UBUNTU_HOST}"
The values hold only for the current shell. If you reconnect or jump into sudo -i, re-run the export block.
Step 2: Bootstrap the project and confirm reachability
Pull the companion repo on the control host. Everything sits under intermediate/ansible-docker/:
git clone https://github.com/c4geeks/ansible.git "${PROJECT_DIR}-src"
cp -r "${PROJECT_DIR}-src/intermediate/ansible-docker" "${PROJECT_DIR}"
cd "${PROJECT_DIR}"
ansible-galaxy collection install -r collections/requirements.yml
Open inventory.ini and replace the example IPs with your real hosts. The shipped inventory expects three lines:
[control]
ansible-control-01 ansible_host=10.0.1.10
[docker_hosts]
docker-rocky-01 ansible_host=10.0.1.11 ansible_user=rocky os_family=RedHat
docker-ubuntu-01 ansible_host=10.0.1.12 ansible_user=ubuntu os_family=Debian
[docker_hosts:vars]
ansible_python_interpreter=/usr/bin/python3
The os_family tag flows into the role’s task dispatcher in step 3. Ping every managed host before touching anything else. This is the first place SSH or Python misconfiguration shows up:
ansible -i inventory.ini docker_hosts -m ping
Both hosts should answer with a SUCCESS and "ping": "pong":

If a host returns UNREACHABLE, the fix is almost always missing SSH keys. ssh-copy-id rocky@${ROCKY_HOST} from the control fixes 90% of cases. The other 10% is sudo: confirm the managed user can run sudo -n true without a prompt.
Step 3: Install Docker the right way on Rocky and Ubuntu
Distro packages alone are not enough on either OS family. Ubuntu 24.04 ships docker.io in the universe repo, but it lags upstream by months and ships no docker compose plugin. Rocky 10 ships podman by default; Docker is not in AppStream at all. The role pulls Docker CE from the upstream repo for both, plus docker-compose-plugin and docker-buildx-plugin so compose v2 and buildx work without extra packages.
The role at roles/docker/ uses an OS dispatcher pattern. tasks/main.yml picks install_redhat.yml or install_debian.yml from the value of ansible_os_family, then runs post_install.yml for the daemon.json template and group setup, then verify.yml for the smoke test. The dispatcher is two lines:
- name: Dispatch to OS-family install path
ansible.builtin.include_tasks: "install_{{ ansible_os_family | lower }}.yml"
Run the install playbook against both managed hosts at once. The role is idempotent, so running it twice in a row produces zero changes on the second run.
ansible-playbook -i inventory.ini playbooks/01-install-docker.yml
The recap on a fresh lab takes about three minutes the first time and under twenty seconds on subsequent runs. The verify task captures the engine and compose versions on every host:

Two things in the install path are worth flagging because they are real-world traps. First, on Rocky 10 the kernel ships without xt_addrtype, which the Docker daemon needs at startup. The fix is to install the matching kernel-modules-extra package:
- name: Install kernel-modules-extra (provides xt_addrtype for Docker on Rocky 10)
ansible.builtin.dnf:
name: "kernel-modules-extra-{{ ansible_kernel }}"
state: present
notify: Load xt_addrtype
Without this, systemctl start docker on Rocky 10 fails with iptables: Extension addrtype revision 0 not supported, missing kernel module. The role bakes the fix in so you never see the error.
Second, after adding the user to the docker group the SSH session still has the old group cache, so a subsequent task that talks to the daemon as a non-root user gets permission denied. The role uses meta: reset_connection right after the group change to drop and re-establish the SSH connection, picking up the new group:
- name: Reset SSH connection so the docker group takes effect
ansible.builtin.meta: reset_connection
The daemon.json template at roles/docker/templates/daemon.json.j2 sets log rotation (10 MB per file, 3 files), the overlay2 storage driver, and a 65536 nofile ulimit. Override any of these in group_vars/docker_hosts.yml if your shop wants different defaults.
Step 4: Run a single container with docker_container
The community.docker.docker_container module covers about ninety percent of single-host needs. The module exposes more than fifty parameters; you almost never want them all. The keepers for production work are name, image, state, restart_policy, published_ports, volumes, env, and healthcheck.
The shipped playbooks/02-run-container.yml deploys an Uptime Kuma container on the Rocky managed host. It also creates a named volume so the data survives container recreation:
- name: Create named volume for Uptime Kuma data
community.docker.docker_volume:
name: uptime-kuma-data
state: present
- name: Run Uptime Kuma container
community.docker.docker_container:
name: uptime-kuma
image: louislam/uptime-kuma:1
state: started
pull: missing
restart_policy: unless-stopped
published_ports:
- "3001:3001"
volumes:
- "uptime-kuma-data:/app/data"
healthcheck:
test: ["CMD", "extra/healthcheck"]
interval: 60s
timeout: 30s
retries: 5
start_period: 30s
Apply it:
ansible-playbook -i inventory.ini playbooks/02-run-container.yml
Two parameters earn extra discussion. The pull parameter has three modes: always (refetch every run, marks the task changed every time even when the digest has not moved), missing (the default, pulls only if the local image is absent), and never (assume the image already exists, fail otherwise). For most fleets missing is the right default; reach for always only when you want the playbook to act as a forced rollout.
The restart_policy: unless-stopped setting matters because always will revive a container after you explicitly docker stop it. unless-stopped respects manual stops, which is what you want during incident response.
Step 5: Build and push images with docker_image
The community.docker.docker_image module covers build, tag, pull, and push in one place. The shipped playbooks/03-build-and-push.yml stages a tiny Python web server, builds it as a tagged image, then optionally authenticates to a registry and pushes.
- name: Build image with buildx (single arch)
community.docker.docker_image:
name: "{{ sample_app_image }}"
tag: "{{ sample_app_tag }}"
source: build
build:
path: "{{ build_context }}"
pull: true
force_source: false
- name: Authenticate to GHCR
community.docker.docker_login:
registry_url: "{{ registry_host }}"
username: "{{ registry_user }}"
password: "{{ registry_token }}"
reauthorize: true
no_log: true
- name: Push tagged image
community.docker.docker_image:
name: "{{ sample_app_image }}:{{ sample_app_tag }}"
push: true
source: local
Two flags trip people up. force_source: true rebuilds the image every run regardless of whether the build context changed. That is the right default in CI but wrong in playbooks meant to run on a cadence; leave it false in normal operation. no_log: true on the login task prevents the registry token from showing up in any verbose run; never skip it.
If you do not have a private registry yet, run the playbook with an empty token to skip the push and confirm the build half works:
ansible-playbook -i inventory.ini playbooks/03-build-and-push.yml -e 'registry_token='
The when: registry_token | length > 0 guard on the login and push tasks lets you run the same playbook with or without a token without editing it.
Step 6: Networks and named volumes
Networks and volumes get their own modules so they are first-class objects in your inventory rather than side effects of docker run. Use community.docker.docker_network for bridges, overlays, or macvlan, and community.docker.docker_volume for named volumes:
- name: Create app bridge network
community.docker.docker_network:
name: app-net
driver: bridge
ipam_config:
- subnet: 172.30.0.0/24
- name: Create app data volume
community.docker.docker_volume:
name: app-data
driver: local
Named volumes back up cleanly via the volume backup playbook in step 10. Bind mounts work but live alongside the host filesystem; if you ever need to migrate the host, named volumes are the cleaner choice.
Step 7: Manage docker compose stacks with docker_compose_v2
For anything beyond a single container, render a compose file from a Jinja2 template and let community.docker.docker_compose_v2 bring it up. The legacy community.docker.docker_compose module wraps compose v1 and is deprecated; always use the v2 variant on current Ansible. The shipped templates/observability/compose.yml.j2 defines an observability stack with Uptime Kuma plus cAdvisor on a shared bridge network:
services:
uptime-kuma:
image: louislam/uptime-kuma:1
container_name: uptime-kuma
restart: unless-stopped
ports:
- "{{ uptime_kuma_port }}:3001"
volumes:
- uptime-kuma-data:/app/data
networks:
- observability
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.49.1
container_name: cadvisor
restart: unless-stopped
privileged: true
ports:
- "{{ cadvisor_port }}:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker:/var/lib/docker:ro
networks:
- observability
volumes:
uptime-kuma-data:
networks:
observability:
driver: bridge
The deploy play renders the template, then brings up the stack on every managed host:
- name: Render compose.yml
ansible.builtin.template:
src: "{{ playbook_dir }}/../templates/observability/compose.yml.j2"
dest: "{{ stack_dir }}/compose.yml"
mode: "0644"
- name: Bring up the stack
community.docker.docker_compose_v2:
project_src: "{{ stack_dir }}"
state: present
pull: missing
Run it across the fleet:
ansible-playbook -i inventory.ini playbooks/04-compose-stack.yml
The stack lands healthy on both Rocky and Ubuntu in around forty seconds:

One trap: if you delete a service from compose.yml.j2 and re-run, the orphan container keeps running because docker_compose_v2 does not remove orphans by default. Add remove_orphans: true to the module call when you intentionally drop a service.
Step 8: Real deployment, real dashboard
The compose stack from step 7 is a working observability rig. Open Uptime Kuma on the first managed host at port 3001, you land on the admin setup wizard:

Create the admin, then add three monitors as a smoke test of the stack: an HTTP check on http://cadvisor:8080/healthz (services on the same compose network see each other by service name), an HTTP self-check on http://localhost:3001, and a TCP port check on localhost:2375. The third one will go red on purpose: the Docker daemon does not listen on TCP unless you explicitly enable it, so the failure proves the monitor system actually works:

The same compose pattern deploys Mattermost, Gitea, MediaWiki, or any other self-hosted app you care about. Swap the template, keep the play. The Mattermost setup walkthrough shows the same compose pattern adapted for a single-tenant chat server.
Step 9: Registry authentication with Ansible Vault
Hardcoding a registry token in group_vars/all.yml is fine for a five-minute demo and a fireable offense in production. Ansible Vault encrypts the secret at rest and decrypts it only when the playbook runs. Start by copying the example file the repo ships:
cp vault/registry.yml.example vault/registry.yml
$EDITOR vault/registry.yml
Replace the placeholder values with your real GHCR username and a personal access token scoped to write:packages. Then encrypt the file with a vault password you will remember:
ansible-vault encrypt vault/registry.yml
The file is now safe to commit alongside the playbooks. To use it during a build-and-push run, point Ansible at the encrypted vars file and ask for the vault password:
ansible-playbook -i inventory.ini playbooks/03-build-and-push.yml \
-e @vault/registry.yml --ask-vault-pass
Two patterns matter here. The no_log: true on the docker_login task prevents the token from leaking into -vvv output. The --ask-vault-pass flag is fine for a single laptop; on shared CI you switch to --vault-password-file with a secret-manager-backed file. The Ansible Vault tutorial goes deeper on rekeying, multiple vault IDs, and CI integration.
If you prefer Docker Hub over GHCR, swap the registry host: change registry_host: ghcr.io to registry_host: docker.io in group_vars/all.yml and put your Hub username and access token in the same vault file. The role does not care which registry it points at.
Step 10: Rolling updates across the fleet with serial
The point of running Ansible against a fleet is to update the fleet without taking everything down at once. The serial: 1 keyword on a play tells Ansible to process hosts one at a time, with the next host starting only after the previous one finishes successfully. Combined with a healthcheck wait, that is a full rolling deployment in twenty lines.
The shipped playbooks/05-rolling-update.yml demonstrates the pattern with a tiny Python app that prints its hostname:
- name: Rolling update across docker hosts (one at a time, healthcheck-gated)
hosts: docker_hosts
become: true
serial: 1
max_fail_percentage: 0
tasks:
- name: Pull new image
community.docker.docker_image:
name: "{{ new_image }}"
source: pull
force_source: true
- name: Recreate the container
community.docker.docker_container:
name: "{{ container_name }}"
image: "{{ new_image }}"
state: started
recreate: true
published_ports:
- "{{ app_port }}:{{ app_port }}"
volumes:
- "/opt/hello/app.py:/app.py:ro"
command: ["python", "/app.py"]
- name: Wait for healthcheck
ansible.builtin.uri:
url: "{{ healthcheck_url }}"
status_code: 200
register: hc
retries: 12
delay: 5
until: hc.status == 200
Run it:
ansible-playbook -i inventory.ini playbooks/05-rolling-update.yml
The recap proves the rollout is sequential, not parallel. Every TASK header runs against one host at a time, with the second host’s PLAY block starting only after the first host’s healthcheck passed:

Three settings make this production-grade. max_fail_percentage: 0 halts the rollout the instant any host fails, so a bad image does not propagate. The uri healthcheck retries twelve times with five-second delays, giving slow-starting apps a full minute to come up. recreate: true on the container forces a stop-remove-recreate even when the image tag has not changed, which is what you want for env or volume changes.
For fleets behind a load balancer, add pre_tasks that drain the host from upstream and post_tasks that re-add it, with the rollout sandwiched in between. The pattern is the same; the LB calls vary by vendor.
Step 11: Back up named volumes from Ansible
Stateful containers need backups, otherwise none of the Ansible automation matters when a disk fails. The shipped playbooks/06-backup-volumes.yml spins a throwaway Alpine container with the named volume mounted read-only and the backup directory mounted writable, then tars the volume contents into a stamped archive:
- name: Snapshot volume to tarball
ansible.builtin.command: >-
docker run --rm
-v {{ item }}:/data:ro
-v {{ backup_root }}:/backup
alpine
tar czf /backup/{{ item }}-{{ backup_stamp }}.tar.gz -C /data .
args:
creates: "{{ backup_root }}/{{ item }}-{{ backup_stamp }}.tar.gz"
loop: "{{ volumes_to_backup }}"
The creates: guard makes the task idempotent: re-running on the same timestamp is a no-op, but a new run produces a fresh archive. To restore, do the inverse: stop the container, recreate the volume, run the same Alpine container with tar xzf against the archive, then start the container again.
Schedule the backup playbook from cron on the control host or a CI runner. A weekly off-host copy to S3 or another backup target is the natural extension; both fit cleanly under the same playbook.
Step 12: Common errors and fixes from real test runs
Every error below was captured live during the lab build for this guide on April 2026. Each one trips up at least one search per week.
Error: Got permission denied while trying to connect to the Docker daemon socket
The user was added to the docker group but the SSH session has the old group cache. Drop the connection and reconnect, or in a playbook add meta: reset_connection right after the group change. The role’s post_install.yml already does this; if you wrote your own play, mirror the pattern.
Error: iptables Extension addrtype revision 0 not supported, missing kernel module
Rocky Linux 10’s stock kernel ships without the xt_addrtype netfilter module. Docker daemon refuses to start because it cannot install its NAT rules. Install the matching kernel-modules-extra package, run modprobe xt_addrtype, then restart Docker. The shipped role does this preemptively for any RedHat-family host.
Error: module docker_compose_v2 not found
The community.docker collection is too old or installed for the wrong Python interpreter. Pin to community.docker >= 4.0.0 in collections/requirements.yml (the shipped file already does this) and reinstall with ansible-galaxy collection install -r collections/requirements.yml --force. If the collection is installed system-wide but Ansible runs in a venv, set collections_path in ansible.cfg to include the right directory.
Error: docker_image task reports changed=true on every run
You set force_source: true on the build task, which forces a rebuild every time. Switch to force_source: false for run-on-cadence playbooks, or wrap the rebuild in a condition that fires only when source files change. Same applies to pull: always on docker_container: prefer pull: missing outside CI.
Error: Watchtower restarting client version 1.25 is too old
The original containrrr/watchtower image stopped receiving updates and its embedded Docker API client predates the API version the modern daemon negotiates. If you want Watchtower-style auto-updates, pull from the maintained nickfedor/watchtower fork or roll your own scheduled docker_image + docker_container playbook. The role’s compose template ships without Watchtower for this reason.
Module reference card
Bookmark this table. The five community.docker.* modules below cover almost every Ansible+Docker task you will write. The “skip” column lists parameters that exist but rarely need to be set.
| Module | Top parameters | Skip these |
|---|---|---|
docker_container | name, image, state, restart_policy, published_ports, volumes, env, healthcheck | command_handling, default_host_ip, comparisons (defaults are sane) |
docker_image | name, tag, source, build, push, force_source | archive_path, load_path (rare) |
docker_compose_v2 | project_src, state, pull, files, remove_orphans | definition (use file-based, not inline) |
docker_login | registry_url, username, password, reauthorize | email (deprecated), state (defaults to present) |
docker_network | name, driver, ipam_config | force, appends (defaults work) |
docker_volume | name, driver, driver_options | recreate (rare) |
The whole lab in this article runs from one role plus seven playbooks, totaling under 400 lines of YAML. Clone the repo folder, edit the inventory, and you have a tested pattern that scales from a single homelab box to fifty mixed-OS production hosts. Layer Molecule tests on top when you start treating the role as a library, and the same patterns scale further.