{"id":168411,"date":"2026-06-04T13:36:42","date_gmt":"2026-06-04T10:36:42","guid":{"rendered":"https:\/\/computingforgeeks.com\/?p=168411"},"modified":"2026-06-04T13:36:42","modified_gmt":"2026-06-04T10:36:42","slug":"ansible-kubernetes-cluster","status":"publish","type":"post","link":"https:\/\/computingforgeeks.com\/ansible-kubernetes-cluster\/","title":{"rendered":"Ansible with Kubernetes: Deploy and Manage a Cluster"},"content":{"rendered":"<p>Ansible and Kubernetes meet at two points, and they pull in different directions.<\/p>\n\n<p>The first is provisioning: turning a pile of fresh Ubuntu machines into a working cluster. kubeadm assembles the cluster, but something has to disable swap, load kernel modules, install containerd, lay down the package repo, and run kubeadm in the right order on the right hosts. That something is Ansible. The second point is day-to-day management. Once the cluster runs, you create namespaces, push Deployments, install Helm charts, and drain nodes for patching. The <code>kubernetes.core<\/code> collection does all of that declaratively, from the same control node, in the same playbook language.<\/p>\n\n<p>This guide covers both. We provision a kubeadm cluster with a set of Ansible roles, then manage real workloads on it with <code>kubernetes.core<\/code>: a Deployment, a Helm release, a node drain, and a worker added live. If Ansible itself is new on your control node, set it up first with the <a href=\"https:\/\/computingforgeeks.com\/install-ansible-rocky-linux-ubuntu\/\">install Ansible guide<\/a>; this article is part of the wider <a href=\"https:\/\/computingforgeeks.com\/ansible-automation-guide\/\">Ansible automation guide<\/a>.<\/p>\n\n<p><em>Run in June 2026 on Ubuntu 24.04 with Kubernetes 1.36 and the kubernetes.core 6.4 collection.<\/em><\/p>\n\n<h2>How Ansible and Kubernetes fit together<\/h2>\n\n<p>Keep the two jobs separate in your head, because they use different tools.<\/p>\n\n<p><strong>Provisioning<\/strong> runs against the nodes over SSH. Ansible becomes root, installs packages, and shells out to kubeadm. This is ordinary server automation that happens to end in a cluster. <strong>Management<\/strong> runs against the Kubernetes API, not the nodes. The <code>kubernetes.core<\/code> modules talk to the API server with the Python Kubernetes client, so they run on the control node itself and need a kubeconfig, not SSH. One repo holds both: roles for the first job, playbooks in a <code>manage\/<\/code> directory for the second.<\/p>\n\n<h2>Lab layout<\/h2>\n\n<p>Four machines, all Ubuntu 24.04:<\/p>\n\n<ul>\n<li><strong>Ansible controller<\/strong>, where you run the playbooks. It never joins the cluster.<\/li>\n<li><strong>One control-plane node<\/strong> (the kubeadm &#8220;first&#8221; node).<\/li>\n<li><strong>Two worker nodes<\/strong> to start. We add a third later without touching the first two.<\/li>\n<\/ul>\n\n<p>The controller reaches every node as a sudo-capable user over an SSH key, which is the only prerequisite the roles assume. Give each node 2 vCPU and at least 2 GB of RAM; the control plane is happier with 4 GB. kubeadm refuses to start on a single CPU.<\/p>\n\n<h2>Set up the Ansible controller<\/h2>\n\n<p>The controller needs Ansible, the Python Kubernetes client in the <em>same<\/em> environment Ansible runs from, the <code>kubernetes.core<\/code> collection, and Helm. Install pipx, then layer the pieces on top:<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>sudo apt update\nsudo apt install -y pipx python3-venv\npipx install --include-deps ansible\npipx inject ansible kubernetes\nansible-galaxy collection install kubernetes.core\ncurl -fsSL https:\/\/raw.githubusercontent.com\/helm\/helm\/main\/scripts\/get-helm-3 | bash<\/code><\/pre>\n\n\n<p>The <code>pipx inject<\/code> step is the one people miss. The <code>kubernetes.core<\/code> modules import the <code>kubernetes<\/code> Python library at runtime, and they look for it in Ansible&#8217;s own virtualenv. Installing it with a separate <code>pip<\/code> puts it somewhere Ansible cannot see, and every task fails with &#8220;Failed to import the required Python library (kubernetes)&#8221;. Inject it into the Ansible venv and the problem disappears.<\/p>\n\n<p>Confirm the collection and client are both present:<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>ansible-galaxy collection list | grep kubernetes<\/code><\/pre>\n\n\n<p>The collection and its version print on a single line:<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>kubernetes.core                          6.4.0<\/code><\/pre>\n\n\n<p>That confirms the collection and its Python client are both visible to Ansible, which is the combination the management playbooks depend on later.<\/p>\n\n<h2>Build the inventory<\/h2>\n\n<p>Group the nodes into a control plane and workers. The roles key off these group names, so the names matter.<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>[control_plane]\nk8s-cp1 ansible_host=192.168.1.168\n\n[workers]\nk8s-w1 ansible_host=192.168.1.169\nk8s-w2 ansible_host=192.168.1.170\n\n[k8s_cluster:children]\ncontrol_plane\nworkers<\/code><\/pre>\n\n\n<p>A handful of cluster-wide settings live in <code>group_vars\/all.yml<\/code>. This is also where the one networking decision that bites people gets made.<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>---\n# Kubernetes minor version. This is the pkgs.k8s.io repo path; bump it to upgrade.\nk8s_minor: &quot;v1.36&quot; # https:\/\/kubernetes.io\/releases\/\n\n# Pod network CIDR handed to kubeadm and Calico.\n# MUST NOT overlap your node\/LAN subnet (the lab nodes are on 192.168.1.0\/24).\npod_network_cidr: &quot;10.244.0.0\/16&quot;<\/code><\/pre>\n\n\n<p>The pod network CIDR must not overlap the subnet your nodes sit on. Calico&#8217;s own default is <code>192.168.0.0\/16<\/code>, and plenty of home and office LANs live inside that range. If they overlap, pod traffic and node traffic fight over the same addresses and routing breaks in ways that are miserable to debug. The lab nodes here are on <code>192.168.1.0\/24<\/code>, so the pods get <code>10.244.0.0\/16<\/code> instead. Pick any private range that your network does not already use.<\/p>\n\n<h2>Prepare every node<\/h2>\n\n<p>The <code>common<\/code> role runs on the whole cluster and does everything kubeadm expects to already be true: swap off, the bridge and overlay modules loaded, the networking sysctls set, containerd installed with the systemd cgroup driver, and the Kubernetes packages held at a fixed version.<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>---\n# Prereqs that every node (control plane and workers) needs before kubeadm runs.\n\n- name: Disable swap for the running session\n  ansible.builtin.command: swapoff -a\n  changed_when: false\n\n- name: Disable swap permanently in fstab\n  ansible.posix.mount:\n    path: &quot;{{ item }}&quot;\n    state: absent\n  loop:\n    - swap\n    - none\n  when: ansible_swaptotal_mb | int &gt; 0\n\n- name: Load kernel modules now\n  community.general.modprobe:\n    name: &quot;{{ item }}&quot;\n    state: present\n  loop:\n    - overlay\n    - br_netfilter\n\n- name: Load kernel modules on boot\n  ansible.builtin.copy:\n    dest: \/etc\/modules-load.d\/k8s.conf\n    content: |\n      overlay\n      br_netfilter\n    mode: &quot;0644&quot;\n\n- name: Apply sysctl settings for Kubernetes networking\n  ansible.posix.sysctl:\n    name: &quot;{{ item.key }}&quot;\n    value: &quot;{{ item.value }}&quot;\n    sysctl_file: \/etc\/sysctl.d\/k8s.conf\n    reload: true\n  loop:\n    - { key: net.bridge.bridge-nf-call-iptables, value: &quot;1&quot; }\n    - { key: net.bridge.bridge-nf-call-ip6tables, value: &quot;1&quot; }\n    - { key: net.ipv4.ip_forward, value: &quot;1&quot; }\n\n- name: Install containerd and apt prerequisites\n  ansible.builtin.apt:\n    name:\n      - containerd\n      - apt-transport-https\n      - ca-certificates\n      - curl\n      - gpg\n    state: present\n    update_cache: true\n\n- name: Create containerd config directory\n  ansible.builtin.file:\n    path: \/etc\/containerd\n    state: directory\n    mode: &quot;0755&quot;\n\n- name: Generate default containerd config\n  ansible.builtin.shell: containerd config default &gt; \/etc\/containerd\/config.toml\n  args:\n    creates: \/etc\/containerd\/config.toml\n\n- name: Use the systemd cgroup driver in containerd\n  ansible.builtin.lineinfile:\n    path: \/etc\/containerd\/config.toml\n    regexp: &#x27;^(\\s*)SystemdCgroup\\s*=&#x27;\n    line: &#x27;            SystemdCgroup = true&#x27;\n  notify: Restart containerd\n\n- name: Add the Kubernetes apt signing key\n  ansible.builtin.get_url:\n    url: &quot;https:\/\/pkgs.k8s.io\/core:\/stable:\/{{ k8s_minor }}\/deb\/Release.key&quot;\n    dest: \/etc\/apt\/keyrings\/kubernetes-apt-keyring.asc\n    mode: &quot;0644&quot;\n\n- name: Add the Kubernetes apt repository\n  ansible.builtin.apt_repository:\n    repo: &quot;deb [signed-by=\/etc\/apt\/keyrings\/kubernetes-apt-keyring.asc] https:\/\/pkgs.k8s.io\/core:\/stable:\/{{ k8s_minor }}\/deb\/ \/&quot;\n    filename: kubernetes\n    state: present\n\n- name: Install kubelet, kubeadm and kubectl\n  ansible.builtin.apt:\n    name:\n      - kubelet\n      - kubeadm\n      - kubectl\n    state: present\n    update_cache: true\n\n- name: Hold the Kubernetes packages at their current version\n  ansible.builtin.dpkg_selections:\n    name: &quot;{{ item }}&quot;\n    selection: hold\n  loop:\n    - kubelet\n    - kubeadm\n    - kubectl\n\n- name: Enable and start kubelet\n  ansible.builtin.systemd:\n    name: kubelet\n    enabled: true\n    state: started\n\n- name: Flush handlers so containerd restarts before kubeadm runs\n  ansible.builtin.meta: flush_handlers<\/code><\/pre>\n\n\n<p>One edit earns its place above all the others. Containerd ships with <code>SystemdCgroup<\/code> set to false, and on a systemd host it has to be true, or the kubelet and containerd disagree about who owns the cgroups and pods never leave <code>ContainerCreating<\/code>. On containerd 2.x the setting lives under the CRI runc options in <code>\/etc\/containerd\/config.toml<\/code>, which is why the role regenerates the default config first and edits that one line. The <code>apt-mark hold<\/code> at the end stops an unattended <code>apt upgrade<\/code> from dragging the cluster to a new minor behind your back.<\/p>\n\n<h2>Bring up the control plane<\/h2>\n\n<p>The <code>control_plane<\/code> role initialises the cluster, installs a kubeconfig for your login user, lays down the Calico CNI, and prints a join command the workers pick up. It is written to be safe to run twice: the <code>creates:<\/code> guard on <code>kubeadm init<\/code> means a second run never re-initialises a live cluster.<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>---\n# Initialise the control plane, lay down kubeconfig, install Calico, and\n# publish a join command the workers will pick up.\n\n- name: Check whether the control plane is already initialised\n  ansible.builtin.stat:\n    path: \/etc\/kubernetes\/admin.conf\n  register: kubeadm_admin\n\n- name: Pull control-plane images ahead of init\n  ansible.builtin.command: kubeadm config images pull\n  when: not kubeadm_admin.stat.exists\n  changed_when: true\n\n- name: Initialise the cluster with kubeadm\n  ansible.builtin.command: &gt;\n    kubeadm init\n    --pod-network-cidr={{ pod_network_cidr }}\n    --apiserver-advertise-address={{ ansible_host }}\n    --node-name={{ inventory_hostname }}\n  args:\n    creates: \/etc\/kubernetes\/admin.conf\n  register: kubeadm_init\n\n- name: Create .kube directory for the login user\n  ansible.builtin.file:\n    path: &quot;\/home\/{{ ansible_user }}\/.kube&quot;\n    state: directory\n    owner: &quot;{{ ansible_user }}&quot;\n    group: &quot;{{ ansible_user }}&quot;\n    mode: &quot;0750&quot;\n\n- name: Install kubeconfig for the login user\n  ansible.builtin.copy:\n    src: \/etc\/kubernetes\/admin.conf\n    dest: &quot;\/home\/{{ ansible_user }}\/.kube\/config&quot;\n    remote_src: true\n    owner: &quot;{{ ansible_user }}&quot;\n    group: &quot;{{ ansible_user }}&quot;\n    mode: &quot;0600&quot;\n\n- name: Detect the latest Calico release\n  ansible.builtin.uri:\n    url: https:\/\/api.github.com\/repos\/projectcalico\/calico\/releases\/latest\n    return_content: true\n  register: calico_release\n\n- name: Set the Calico version fact\n  ansible.builtin.set_fact:\n    calico_version: &quot;{{ calico_release.json.tag_name }}&quot;\n\n- name: Install the Calico operator CRDs\n  ansible.builtin.command: &gt;\n    kubectl --kubeconfig \/etc\/kubernetes\/admin.conf apply --server-side --force-conflicts\n    -f https:\/\/raw.githubusercontent.com\/projectcalico\/calico\/{{ calico_version }}\/manifests\/operator-crds.yaml\n  register: crds_apply\n  changed_when: &quot;&#x27;created&#x27; in crds_apply.stdout or &#x27;configured&#x27; in crds_apply.stdout&quot;\n\n- name: Install the Tigera (Calico) operator\n  ansible.builtin.command: &gt;\n    kubectl --kubeconfig \/etc\/kubernetes\/admin.conf apply --server-side --force-conflicts\n    -f https:\/\/raw.githubusercontent.com\/projectcalico\/calico\/{{ calico_version }}\/manifests\/tigera-operator.yaml\n  register: tigera_apply\n  changed_when: &quot;&#x27;created&#x27; in tigera_apply.stdout or &#x27;configured&#x27; in tigera_apply.stdout&quot;\n\n- name: Wait for the Installation CRD to register\n  ansible.builtin.command: &gt;\n    kubectl --kubeconfig \/etc\/kubernetes\/admin.conf wait --for condition=established --timeout=90s\n    crd\/installations.operator.tigera.io\n  changed_when: false\n\n- name: Render the Calico Installation manifest\n  ansible.builtin.template:\n    src: calico-custom-resources.yaml.j2\n    dest: \/root\/calico-custom-resources.yaml\n    mode: &quot;0644&quot;\n\n- name: Apply the Calico Installation\n  ansible.builtin.command: &gt;\n    kubectl --kubeconfig \/etc\/kubernetes\/admin.conf apply\n    -f \/root\/calico-custom-resources.yaml\n  register: calico_install\n  changed_when: &quot;&#x27;created&#x27; in calico_install.stdout or &#x27;configured&#x27; in calico_install.stdout&quot;\n\n- name: Generate a worker join command\n  ansible.builtin.command: kubeadm token create --print-join-command\n  register: join_cmd\n  changed_when: false\n\n- name: Stash the join command for the worker play\n  ansible.builtin.set_fact:\n    kubeadm_join_command: &quot;{{ join_cmd.stdout }}&quot;\n\n- name: Fetch the admin kubeconfig to the Ansible controller\n  ansible.builtin.fetch:\n    src: \/etc\/kubernetes\/admin.conf\n    dest: &quot;{{ playbook_dir }}\/admin.conf&quot;\n    flat: true<\/code><\/pre>\n\n\n<p>Calico ships as two pieces now. The operator CRDs go on first, then the operator itself, and only then does the <code>Installation<\/code> resource make sense to the API. Apply them out of order and you get <code>error: ... installations.operator.tigera.io not found<\/code>, which is exactly the trap the explicit CRD step and the <code>kubectl wait<\/code> avoid. The <code>Installation<\/code> manifest is a short template so the pod CIDR stays in one place:<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>apiVersion: operator.tigera.io\/v1\nkind: Installation\nmetadata:\n  name: default\nspec:\n  calicoNetwork:\n    ipPools:\n      - name: default-ipv4-ippool\n        blockSize: 26\n        cidr: {{ pod_network_cidr }}\n        encapsulation: VXLANCrossSubnet\n        natOutgoing: Enabled\n        nodeSelector: all()\n---\napiVersion: operator.tigera.io\/v1\nkind: APIServer\nmetadata:\n  name: default\nspec: {}<\/code><\/pre>\n\n\n<p>The operator reconciles that <code>Installation<\/code> into a running Calico deployment a few seconds after the API server comes up, and the pod CIDR matches the one kubeadm was handed.<\/p>\n\n<h2>Join the workers<\/h2>\n\n<p>The worker role is short. It checks whether the node already belongs to a cluster, and if not, runs the join command the control-plane play stashed in a host fact. The <code>stat<\/code> guard is what makes re-runs cheap: a node that already joined is skipped, not rejoined.<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>---\n- name: Check whether this node already joined the cluster\n  ansible.builtin.stat:\n    path: \/etc\/kubernetes\/kubelet.conf\n  register: kubelet_conf\n\n- name: Join the node to the cluster\n  ansible.builtin.command: &quot;{{ hostvars[groups[&#x27;control_plane&#x27;][0]][&#x27;kubeadm_join_command&#x27;] }}&quot;\n  when: not kubelet_conf.stat.exists\n  changed_when: true<\/code><\/pre>\n\n\n<p>That is the whole worker role. The join runs at most once per node, which is what makes growing the cluster later a no-op for the nodes already in it.<\/p>\n\n<h2>Run the bootstrap<\/h2>\n\n<p>One playbook ties the three roles together in order: prepare every node, build the control plane, join the workers, then wait for the whole cluster to report Ready.<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>---\n- name: Prepare every node for Kubernetes\n  hosts: k8s_cluster\n  become: true\n  roles:\n    - common\n\n- name: Bring up the control plane\n  hosts: control_plane\n  become: true\n  roles:\n    - control_plane\n\n- name: Join the worker nodes\n  hosts: workers\n  become: true\n  roles:\n    - worker\n\n- name: Wait for all nodes to report Ready\n  hosts: control_plane\n  become: true\n  tasks:\n    - name: Wait for nodes to be Ready\n      ansible.builtin.command: &gt;\n        kubectl --kubeconfig \/etc\/kubernetes\/admin.conf\n        wait --for=condition=Ready nodes --all --timeout=180s\n      register: nodes_ready\n      changed_when: false\n\n    - name: Show the cluster\n      ansible.builtin.command: kubectl --kubeconfig \/etc\/kubernetes\/admin.conf get nodes -o wide\n      register: get_nodes\n      changed_when: false\n\n    - name: Cluster nodes\n      ansible.builtin.debug:\n        var: get_nodes.stdout_lines<\/code><\/pre>\n\n\n<p>Run it from the controller:<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>ansible-playbook bootstrap.yml<\/code><\/pre>\n\n\n<p>The first run pulls the control-plane images and takes a few minutes; later runs are quick. When it finishes, every node is Ready and running the same Kubernetes version.<\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1960\" height=\"884\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-nodes-ready.png\" alt=\"Ansible playbook output showing three Kubernetes 1.36 nodes in Ready state\" class=\"wp-image-168405\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-nodes-ready.png 1960w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-nodes-ready-300x135.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-nodes-ready-1024x462.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-nodes-ready-768x346.png 768w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-nodes-ready-1536x693.png 1536w\" sizes=\"auto, (max-width: 1960px) 100vw, 1960px\" \/><\/figure>\n\n\n<p>That is a working cluster, built from four blank Ubuntu installs, with no manual SSH into any node. If you would rather understand the kubeadm steps by hand before automating them, the <a href=\"https:\/\/computingforgeeks.com\/install-kubernetes-kubeadm-rocky-almalinux\/\">kubeadm install walkthrough<\/a> covers the same flow one command at a time.<\/p>\n\n<h2>Manage workloads with kubernetes.core<\/h2>\n\n<p>From here the job changes. The <code>kubernetes.core.k8s<\/code> module sends manifests to the API server and reconciles them, the same way <code>kubectl apply<\/code> does, except it lives in a playbook you can template, loop, and gate on conditions. The playbook below creates a namespace, a ConfigMap, a Secret, a three-replica Deployment that consumes both, and a NodePort Service, then waits until the Deployment reports Available.<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>---\n# Manage workloads on the cluster with the kubernetes.core collection.\n# Runs on the Ansible controller and talks to the API server over the kubeconfig.\n- name: Deploy a demo web app with Ansible\n  hosts: localhost\n  connection: local\n  gather_facts: false\n  vars:\n    kubeconfig: &quot;{{ lookup(&#x27;env&#x27;, &#x27;HOME&#x27;) }}\/ansible-k8s\/admin.conf&quot;\n    app_namespace: demo\n  tasks:\n    - name: Create the namespace\n      kubernetes.core.k8s:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        api_version: v1\n        kind: Namespace\n        name: &quot;{{ app_namespace }}&quot;\n        state: present\n\n    - name: Publish the landing page as a ConfigMap\n      kubernetes.core.k8s:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        state: present\n        definition:\n          apiVersion: v1\n          kind: ConfigMap\n          metadata:\n            name: web-content\n            namespace: &quot;{{ app_namespace }}&quot;\n          data:\n            index.html: |\n              &lt;h1&gt;Deployed by Ansible&lt;\/h1&gt;\n              &lt;p&gt;nginx on Kubernetes, managed end to end with kubernetes.core.&lt;\/p&gt;\n\n    - name: Store an app secret\n      kubernetes.core.k8s:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        state: present\n        definition:\n          apiVersion: v1\n          kind: Secret\n          metadata:\n            name: web-secret\n            namespace: &quot;{{ app_namespace }}&quot;\n          type: Opaque\n          stringData:\n            api-key: rotate-me-in-vault\n\n    - name: Deploy the web application\n      kubernetes.core.k8s:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        state: present\n        definition:\n          apiVersion: apps\/v1\n          kind: Deployment\n          metadata:\n            name: web\n            namespace: &quot;{{ app_namespace }}&quot;\n            labels:\n              app: web\n          spec:\n            replicas: 3\n            selector:\n              matchLabels:\n                app: web\n            template:\n              metadata:\n                labels:\n                  app: web\n              spec:\n                containers:\n                  - name: nginx\n                    image: nginx:1.27\n                    ports:\n                      - containerPort: 80\n                    volumeMounts:\n                      - name: content\n                        mountPath: \/usr\/share\/nginx\/html\n                    env:\n                      - name: API_KEY\n                        valueFrom:\n                          secretKeyRef:\n                            name: web-secret\n                            key: api-key\n                volumes:\n                  - name: content\n                    configMap:\n                      name: web-content\n\n    - name: Expose the app on a NodePort\n      kubernetes.core.k8s:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        state: present\n        definition:\n          apiVersion: v1\n          kind: Service\n          metadata:\n            name: web\n            namespace: &quot;{{ app_namespace }}&quot;\n          spec:\n            type: NodePort\n            selector:\n              app: web\n            ports:\n              - port: 80\n                targetPort: 80\n                nodePort: 30080\n\n    - name: Wait until the deployment is Available\n      kubernetes.core.k8s_info:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        api_version: apps\/v1\n        kind: Deployment\n        name: web\n        namespace: &quot;{{ app_namespace }}&quot;\n        wait: true\n        wait_condition:\n          type: Available\n          status: &quot;True&quot;\n        wait_timeout: 150\n\n    - name: List the running pods\n      kubernetes.core.k8s_info:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        kind: Pod\n        namespace: &quot;{{ app_namespace }}&quot;\n        label_selectors:\n          - app=web\n      register: web_pods\n\n    - name: Show pod names and the nodes they landed on\n      ansible.builtin.debug:\n        msg: &quot;{{ web_pods.resources | map(attribute=&#x27;metadata.name&#x27;) | zip(web_pods.resources | map(attribute=&#x27;spec.nodeName&#x27;)) | list }}&quot;<\/code><\/pre>\n\n\n<p>Notice it runs against <code>localhost<\/code> with <code>connection: local<\/code>. These tasks never SSH anywhere; they reach the API over the kubeconfig that <code>bootstrap.yml<\/code> fetched to the controller. The <code>kubernetes.core.k8s_info<\/code> calls at the end give you read access in the same language, with a <code>wait_condition<\/code> that blocks until the rollout is genuinely ready instead of guessing with a sleep.<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>ansible-playbook manage\/01-deploy-app.yml<\/code><\/pre>\n\n\n<p>The pods land across the workers, and the ConfigMap content is served on the NodePort.<\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1960\" height=\"1350\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-deploy-app.png\" alt=\"Ansible deploying an nginx Deployment, Service and pods on Kubernetes with kubernetes.core\" class=\"wp-image-168406\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-deploy-app.png 1960w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-deploy-app-300x207.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-deploy-app-1024x705.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-deploy-app-768x529.png 768w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-deploy-app-1536x1058.png 1536w\" sizes=\"auto, (max-width: 1960px) 100vw, 1960px\" \/><\/figure>\n\n\n<p>Because the module reconciles state, re-running the playbook after an edit changes only what differs. Bump <code>replicas<\/code> to 5 and run it again and the three existing pods stay; Kubernetes adds two. That is the declarative model the <a href=\"https:\/\/docs.ansible.com\/ansible\/latest\/collections\/kubernetes\/core\/k8s_module.html\" target=\"_blank\" rel=\"noreferrer noopener\">kubernetes.core documentation<\/a> builds on, and it is why this beats a pile of shell calls to <code>kubectl<\/code>.<\/p>\n\n<h2>Install a Helm chart with Ansible<\/h2>\n\n<p>Most real clusters run Helm charts, and Ansible drives Helm without dropping to the shell. The <code>kubernetes.core.helm<\/code> and <code>helm_repository<\/code> modules add a repo and install or upgrade a release. metrics-server is a good first one: it is what <code>kubectl top<\/code> needs, and a kubeadm cluster does not ship it.<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>---\n# Install a Helm chart with Ansible. metrics-server powers `kubectl top`.\n- name: Install metrics-server with Helm\n  hosts: localhost\n  connection: local\n  gather_facts: false\n  vars:\n    kubeconfig: &quot;{{ lookup(&#x27;env&#x27;, &#x27;HOME&#x27;) }}\/ansible-k8s\/admin.conf&quot;\n  tasks:\n    - name: Add the metrics-server Helm repository\n      kubernetes.core.helm_repository:\n        name: metrics-server\n        repo_url: https:\/\/kubernetes-sigs.github.io\/metrics-server\/\n\n    - name: Install or upgrade the metrics-server release\n      kubernetes.core.helm:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        name: metrics-server\n        chart_ref: metrics-server\/metrics-server\n        release_namespace: kube-system\n        state: present\n        update_repo_cache: true\n        # kubeadm issues self-signed kubelet certs, so skip TLS verification to it.\n        values:\n          args:\n            - --kubelet-insecure-tls\n\n    - name: Wait for the metrics-server rollout\n      kubernetes.core.k8s_info:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        api_version: apps\/v1\n        kind: Deployment\n        name: metrics-server\n        namespace: kube-system\n        wait: true\n        wait_condition:\n          type: Available\n          status: &quot;True&quot;\n        wait_timeout: 150<\/code><\/pre>\n\n\n<p>The <code>--kubelet-insecure-tls<\/code> argument is the gotcha. kubeadm gives each kubelet a self-signed serving certificate, and metrics-server refuses to scrape it unless you tell it to skip that verification. Without the flag the pod runs but <code>kubectl top<\/code> answers &#8220;Metrics API not available&#8221; forever. Install it, give it a scrape cycle, and node metrics appear.<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>ansible-playbook manage\/02-helm-metrics-server.yml\nkubectl top nodes<\/code><\/pre>\n\n\n<p>Node CPU and memory now report through the metrics API:<\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1960\" height=\"1114\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-helm-metrics-server.png\" alt=\"Ansible Helm playbook installing metrics-server with kubectl top nodes output\" class=\"wp-image-168407\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-helm-metrics-server.png 1960w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-helm-metrics-server-300x171.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-helm-metrics-server-1024x582.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-helm-metrics-server-768x437.png 768w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-helm-metrics-server-1536x873.png 1536w\" sizes=\"auto, (max-width: 1960px) 100vw, 1960px\" \/><\/figure>\n\n\n<p>That is the case for running Helm through Ansible rather than by hand: the release is declared in a playbook you can re-run, template, and keep in version control next to everything else.<\/p>\n\n<h2>Drain a node and roll a deployment<\/h2>\n\n<p>Patching a node means moving its pods elsewhere first. <code>kubernetes.core.k8s_drain<\/code> cordons and drains in one task, and the matching <code>uncordon<\/code> brings the node back. The playbook drains a worker, confirms it is unschedulable, returns it to service, then triggers a rolling restart of the web Deployment by stamping a fresh annotation on the pod template.<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>---\n# Day-2 operations: drain a node for maintenance, bring it back, roll a deployment.\n- name: Node maintenance and a rolling restart with Ansible\n  hosts: localhost\n  connection: local\n  gather_facts: false\n  vars:\n    kubeconfig: &quot;{{ lookup(&#x27;env&#x27;, &#x27;HOME&#x27;) }}\/ansible-k8s\/admin.conf&quot;\n    target_node: k8s-w2\n    app_namespace: demo\n  tasks:\n    - name: Cordon and drain the node\n      kubernetes.core.k8s_drain:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        name: &quot;{{ target_node }}&quot;\n        state: drain\n        delete_options:\n          ignore_daemonsets: true\n          delete_emptydir_data: true\n          terminate_grace_period: 30\n          wait_timeout: 120\n\n    - name: Confirm the node is unschedulable\n      kubernetes.core.k8s_info:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        kind: Node\n        name: &quot;{{ target_node }}&quot;\n      register: drained_node\n\n    - name: Node scheduling state\n      ansible.builtin.debug:\n        msg: &quot;{{ target_node }} unschedulable = {{ drained_node.resources[0].spec.unschedulable | default(false) }}&quot;\n\n    # Real maintenance (kernel patch, reboot) would happen here.\n\n    - name: Bring the node back into the scheduler\n      kubernetes.core.k8s_drain:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        name: &quot;{{ target_node }}&quot;\n        state: uncordon\n\n    - name: Trigger a rolling restart of the web deployment\n      kubernetes.core.k8s:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        state: patched\n        kind: Deployment\n        name: web\n        namespace: &quot;{{ app_namespace }}&quot;\n        definition:\n          spec:\n            template:\n              metadata:\n                annotations:\n                  ansible.computingforgeeks.com\/restartedAt: &quot;{{ now(utc=true).isoformat() }}&quot;\n\n    - name: Wait for the rollout to finish\n      kubernetes.core.k8s_info:\n        kubeconfig: &quot;{{ kubeconfig }}&quot;\n        api_version: apps\/v1\n        kind: Deployment\n        name: web\n        namespace: &quot;{{ app_namespace }}&quot;\n        wait: true\n        wait_condition:\n          type: Available\n          status: &quot;True&quot;\n        wait_timeout: 150<\/code><\/pre>\n\n\n<p>Wrap the drain and uncordon around a real maintenance task and you have a repeatable patch window: drain, reboot the node with the <code>reboot<\/code> module, wait for it, uncordon. The rolling-restart trick at the end is the same one <code>kubectl rollout restart<\/code> uses under the surface, expressed as a patch.<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>ansible-playbook manage\/03-day2-operations.yml<\/code><\/pre>\n\n\n<p>The node leaves and rejoins the scheduler, and the Deployment rolls one pod at a time:<\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1960\" height=\"1018\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-day2-drain-node.png\" alt=\"Ansible day-2 playbook draining and uncordoning a Kubernetes worker node\" class=\"wp-image-168408\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-day2-drain-node.png 1960w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-day2-drain-node-300x156.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-day2-drain-node-1024x532.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-day2-drain-node-768x399.png 768w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-day2-drain-node-1536x798.png 1536w\" sizes=\"auto, (max-width: 1960px) 100vw, 1960px\" \/><\/figure>\n\n\n<p>The <code>k8s_drain<\/code> module handles the eviction along with the daemonset and emptydir edge cases that a hand-rolled wrapper around <code>kubectl drain<\/code> usually forgets.<\/p>\n\n<h2>Add a worker node<\/h2>\n\n<p>This is where the idempotent roles pay off. To grow the cluster, add the new node under <code>[workers]<\/code> in the inventory and run the same bootstrap playbook. Nothing else changes.<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>[workers]\nk8s-w1 ansible_host=192.168.1.169\nk8s-w2 ansible_host=192.168.1.170\nk8s-w3 ansible_host=192.168.1.157<\/code><\/pre>\n\n\n<p>Then re-run the bootstrap, unchanged:<\/p>\n\n\n<pre class=\"wp-block-code code\"><code>ansible-playbook bootstrap.yml<\/code><\/pre>\n\n\n<p>The existing nodes report <code>changed=0<\/code> because their state already matches. Only the new node runs the prep tasks and the join, and it is Ready in under a minute.<\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1960\" height=\"976\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-add-worker-node.png\" alt=\"Ansible re-run joining a new worker with kubectl get nodes showing four Ready nodes\" class=\"wp-image-168409\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-add-worker-node.png 1960w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-add-worker-node-300x149.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-add-worker-node-1024x510.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-add-worker-node-768x382.png 768w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-ansible-kubernetes-add-worker-node-1536x765.png 1536w\" sizes=\"auto, (max-width: 1960px) 100vw, 1960px\" \/><\/figure>\n\n\n<p>The same loop scales the other way for cloud fleets: instead of editing the inventory by hand, pull the node list from your provider with <a href=\"https:\/\/computingforgeeks.com\/ansible-dynamic-inventory-tutorial\/\">dynamic inventory<\/a> and let the count drive itself.<\/p>\n\n<h2>Troubleshooting<\/h2>\n\n<h3>Failed to import the required Python library (kubernetes)<\/h3>\n\n<p>The <code>kubernetes.core<\/code> modules cannot find the Python client. It is almost always installed into the wrong environment. If you installed Ansible with pipx, run <code>pipx inject ansible kubernetes<\/code> so the client lands in Ansible&#8217;s venv. With a system Ansible, install <code>python3-kubernetes<\/code> from apt instead.<\/p>\n\n<h3>error: &#8230; installations.operator.tigera.io not found<\/h3>\n\n<p>Calico&#8217;s <code>Installation<\/code> resource was applied before its CRD existed. Recent Calico splits the CRDs into <code>operator-crds.yaml<\/code>, which has to go on before <code>tigera-operator.yaml<\/code>. The role applies them in that order and then runs <code>kubectl wait --for condition=established<\/code> on the CRD, so the race cannot happen.<\/p>\n\n<h3>Pods stuck in ContainerCreating, nodes never Ready<\/h3>\n\n<p>Two usual causes. Either containerd is still on the cgroupfs driver (check that <code>SystemdCgroup = true<\/code> is set in <code>\/etc\/containerd\/config.toml<\/code> and containerd was restarted), or the pod CIDR overlaps your LAN and the CNI cannot route. Confirm the value in <code>group_vars\/all.yml<\/code> is a range your network does not use.<\/p>\n\n<h3>kubectl top says &#8220;Metrics API not available&#8221;<\/h3>\n\n<p>metrics-server is running but cannot scrape the kubelets. On a kubeadm cluster it needs <code>--kubelet-insecure-tls<\/code>, set through Helm values as shown above. Give it thirty seconds after the rollout for the first scrape before deciding it is broken.<\/p>\n\n<h2>Take it to production<\/h2>\n\n<p>The cluster here has one control-plane node, which is fine for a lab and wrong for anything you depend on. The same roles extend in a few clear steps. Run three control-plane nodes behind a load balancer and pass <code>--control-plane-endpoint<\/code> to <code>kubeadm init<\/code> so the API has a stable address to fail over to. Keep the Kubernetes minor pinned in <code>group_vars<\/code> and bump it deliberately, one minor at a time, rather than letting apt decide. Most of all, stop storing Secrets as plain text in a playbook: the <code>stringData<\/code> field in the deploy example is readable to anyone with the repo, so move those values behind <a href=\"https:\/\/computingforgeeks.com\/ansible-vault-tutorial\/\">Ansible Vault<\/a> and reference them as variables. The full set of roles and playbooks, ready to clone, lives in the <a href=\"https:\/\/github.com\/c4geeks\/ansible\/tree\/main\/integrations\/ansible-kubernetes\" target=\"_blank\" rel=\"noreferrer noopener\">companion repository<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Ansible and Kubernetes meet at two points, and they pull in different directions. The first is provisioning: turning a pile of fresh Ubuntu machines into a working cluster. kubeadm assembles the cluster, but something has to disable swap, load kernel modules, install containerd, lay down the package repo, and run kubeadm in the right order &#8230; <a title=\"Ansible with Kubernetes: Deploy and Manage a Cluster\" class=\"read-more\" href=\"https:\/\/computingforgeeks.com\/ansible-kubernetes-cluster\/\" aria-label=\"Read more about Ansible with Kubernetes: Deploy and Manage a Cluster\">Read more<\/a><\/p>\n","protected":false},"author":3,"featured_media":168540,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[606,329,316,317],"tags":[314,212,218,318],"cfg_series":[39825],"class_list":["post-168411","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ansible","category-automation","category-containers","category-kubernetes","tag-ansible","tag-automation","tag-containers","tag-kubernetes","cfg_series-ansible-mastery"],"_links":{"self":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/168411","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/comments?post=168411"}],"version-history":[{"count":2,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/168411\/revisions"}],"predecessor-version":[{"id":168413,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/168411\/revisions\/168413"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/media\/168540"}],"wp:attachment":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/media?parent=168411"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/categories?post=168411"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/tags?post=168411"},{"taxonomy":"cfg_series","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/cfg_series?post=168411"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}