{"id":168641,"date":"2026-06-10T09:20:51","date_gmt":"2026-06-10T06:20:51","guid":{"rendered":"https:\/\/computingforgeeks.com\/?p=168641"},"modified":"2026-06-10T09:20:51","modified_gmt":"2026-06-10T06:20:51","slug":"postgresql-high-availability-patroni-etcd-haproxy","status":"publish","type":"post","link":"https:\/\/computingforgeeks.com\/postgresql-high-availability-patroni-etcd-haproxy\/","title":{"rendered":"Set Up PostgreSQL High Availability with Patroni and HAProxy"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Before you reach for a high availability tool, the real problem to solve is the connection string. A single PostgreSQL server is easy to point an application at, but the moment that server dies, every client that cached its address is stuck. Replication alone does not fix this. You can have a perfectly healthy standby and still be down, because nothing promoted it and nothing told the application where to go. PostgreSQL high availability is two jobs: decide who the primary is, and give clients one address that always points at it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The combination that solves both, and the one most teams settle on, is Patroni for automated failover, etcd as the consensus store that keeps everyone honest, and HAProxy as the single front door. We built and broke this cluster in June 2026 on PostgreSQL 18 with Patroni 4.1.3, on Rocky Linux 10 (RHEL family) and Debian 13, with the package paths verified on Ubuntu 24.04 and 26.04 too, so the commands, the failover timing, and the per-distro differences below are all real. This guide walks the whole stack: a three-node Patroni cluster, an etcd quorum behind it, HAProxy splitting reads from writes, and a floating virtual IP so the proxy itself is not a single point of failure.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The PostgreSQL high availability architecture<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Each component earns its place by removing a specific failure mode. It helps to see the whole shape before touching a terminal.<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>          application\n              |\n        VIP 10.0.1.10            (Keepalived: floats between ha1\/ha2)\n              |\n   +----------+----------+\n   |  ha1          ha2    |       HAProxy  (VRRP MASTER \/ BACKUP)\n   |  :5000 writes -> \/primary  (200 only on the leader)\n   |  :5001 reads  -> \/replica  (200 only on standbys)\n   +----------+----------+\n              | health check on :8008\n   +----------+-----------+-----------+\n   |  pg1        pg2          pg3      |\n   |  Patroni    Patroni      Patroni  |\n   |  etcd       etcd         etcd     |   3-node DCS quorum\n   |  PostgreSQL PostgreSQL   PostgreSQL|\n   +-----------------------------------+\n        primary  <-- streaming --> standbys<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The roles break down like this:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Patroni<\/strong> runs on every database node. It manages PostgreSQL, watches its health, and races for a leader lock. Whoever holds the lock is the primary. If the lock expires, Patroni promotes a standby and demotes the old primary the moment it comes back.<\/li>\n<li><strong>etcd<\/strong> is the distributed configuration store that holds the leader lock. It uses the Raft consensus algorithm, so only one node can hold the lock at a time. This is what makes split-brain impossible at the cluster layer. It needs an odd number of members, three at minimum, for a quorum.<\/li>\n<li><strong>HAProxy<\/strong> is the single address clients connect to. It does not guess who the primary is. It asks Patroni&#8217;s REST API on each node and routes accordingly, so when the primary changes, the backend changes but the client&#8217;s endpoint never does.<\/li>\n<li><strong>Keepalived<\/strong> gives HAProxy the same treatment HAProxy gives PostgreSQL. A floating virtual IP moves between two HAProxy nodes using VRRP, so a dead proxy is not a dead cluster.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The trade-off worth naming up front: this is five machines to run one logical database. If you only need a warm standby for disaster recovery and can tolerate a manual promotion, plain <a href=\"https:\/\/computingforgeeks.com\/postgresql-replication-rocky-almalinux\/\">streaming replication<\/a> is simpler and cheaper. Reach for the full stack when an unplanned primary loss has to heal itself in under a minute with no human in the loop.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Prerequisites<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Five nodes give the cleanest separation: three for the database and etcd, two for the proxy layer. You can collapse the proxy onto the database nodes in a lab, but keeping them apart is what you want in production.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Three database nodes (<code>pg1<\/code>, <code>pg2<\/code>, <code>pg3<\/code>) and two proxy nodes (<code>ha1<\/code>, <code>ha2<\/code>), each a recent Linux server with at least 2 vCPU and 2 GB RAM.<\/li>\n<li>Static IPs on the same subnet, plus one free address for the virtual IP. This guide uses <code>10.0.1.11-13<\/code> for the database nodes, <code>10.0.1.21-22<\/code> for the proxies, and <code>10.0.1.10<\/code> for the VIP.<\/li>\n<li>Tested on: PostgreSQL 18, Patroni 4.1.3, etcd 3.4-3.6, HAProxy 3.0, Keepalived 2.2, on Rocky Linux 10 \/ AlmaLinux 10, Debian 13, and Ubuntu 24.04 \/ 26.04.<\/li>\n<li>Time synchronisation (chrony) running on every node. Clock skew breaks lease timing.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">If PostgreSQL is new to you, the single-server install guides for <a href=\"https:\/\/computingforgeeks.com\/install-postgresql-rocky-almalinux\/\">Rocky and AlmaLinux<\/a> and for <a href=\"https:\/\/computingforgeeks.com\/install-postgresql-ubuntu-debian\/\">Ubuntu and Debian<\/a> cover the basics this guide assumes.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Set the cluster variables<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The same addresses and passwords appear in dozens of commands. Export them once at the top of each shell session so you edit one block and paste the rest unchanged. Set these on whichever node you are working on, swapping the real values for yours.<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>export PG1=10.0.1.11\nexport PG2=10.0.1.12\nexport PG3=10.0.1.13\nexport VIP=10.0.1.10\nexport SUBNET=10.0.1.0\/24\n# Pick real passwords. These three accounts drive replication and failover.\nexport SUPERUSER_PWD='ChangeMe-Super#2026'\nexport REPL_PWD='ChangeMe-Repl#2026'\nexport REWIND_PWD='ChangeMe-Rewind#2026'<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">These hold for the current session only. Re-export them if you reconnect or switch to a root shell.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Install PostgreSQL, Patroni and etcd<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This is the one step where the distributions genuinely diverge. The package names, the repository setup, and one important gotcha differ between the RHEL family and the Debian family. Everything after this section is identical across all of them. Run the install on all three database nodes.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">On the RHEL family (Rocky Linux, AlmaLinux)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Add the PGDG repository and EPEL, which carries some of Patroni&#8217;s Python dependencies:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo dnf install -y https:\/\/download.postgresql.org\/pub\/repos\/yum\/reporpms\/EL-10-x86_64\/pgdg-redhat-repo-latest.noarch.rpm\nsudo dnf install -y https:\/\/dl.fedoraproject.org\/pub\/epel\/epel-release-latest-10.noarch.rpm<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">etcd and Patroni live in the PGDG &#8220;extras&#8221; repository, which ships disabled. Enable it, then import its signing key. On Enterprise Linux 10 this is the classic config-manager syntax, not the newer dnf5 form, and the key import is required because the extras repo verifies its own metadata:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo dnf config-manager --set-enabled pgdg-rhel10-extras\nsudo rpm --import \/etc\/pki\/rpm-gpg\/PGDG-RPM-GPG-KEY-RHEL<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Skip that <code>rpm --import<\/code> and the next command fails with <code>repomd.xml GPG signature verification error: Signing key not found<\/code>. With the key in place, install the stack:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo dnf install -y postgresql18-server postgresql18-contrib etcd patroni patroni-etcd<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Note the <code>patroni-etcd<\/code> package. On the RHEL family it pulls the etcd driver Patroni needs. Do not run <code>initdb<\/code> or enable the <code>postgresql<\/code> service here. Patroni initialises the data directory itself.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">On Debian and Ubuntu<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Add the PGDG apt repository keyed to your release codename (the variable resolves to <code>trixie<\/code>, <code>bookworm<\/code>, <code>noble<\/code>, or <code>resolute<\/code> automatically):<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo apt-get update\nsudo apt-get install -y curl ca-certificates\nsudo install -d \/usr\/share\/postgresql-common\/pgdg\nsudo curl -fsSL -o \/usr\/share\/postgresql-common\/pgdg\/apt.postgresql.org.asc https:\/\/www.postgresql.org\/media\/keys\/ACCC4CF8.asc\n. \/etc\/os-release\necho \"deb [signed-by=\/usr\/share\/postgresql-common\/pgdg\/apt.postgresql.org.asc] https:\/\/apt.postgresql.org\/pub\/repos\/apt ${VERSION_CODENAME}-pgdg main\" | sudo tee \/etc\/apt\/sources.list.d\/pgdg.list\nsudo apt-get update<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Install PostgreSQL, Patroni, and etcd. etcd was renamed on recent Debian and Ubuntu releases, so the package is <code>etcd-server<\/code> and <code>etcd-client<\/code>, not the old <code>etcd<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo apt-get install -y postgresql-18 patroni etcd-server etcd-client python3-etcd<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">That <code>python3-etcd<\/code> package is easy to miss and Patroni will not tell you politely. The Debian Patroni package does not pull the etcd driver the way the RHEL <code>patroni-etcd<\/code> package does, so without it Patroni starts and immediately dies with <code>Can not find suitable configuration of distributed configuration store. Available implementations: consul, kubernetes<\/code>. Installing <code>python3-etcd<\/code> puts etcd back on that list.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Now the gotcha that catches everyone. Installing <code>postgresql-18<\/code> on Debian or Ubuntu automatically creates and starts a cluster called <code>main<\/code> on port 5432. Patroni needs that port free and an empty data directory, so drop the auto-created cluster on every node:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>pg_lsclusters\nsudo pg_dropcluster --stop 18 main<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The RHEL packages have no equivalent step, because they never auto-initialise a cluster. This single difference is the most common reason a Debian Patroni node refuses to bootstrap.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What actually differs between the families<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Keep this table handy. Every path and name below shows up again in the config files, and getting one wrong is the difference between a clean start and a cryptic Python traceback.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Item<\/th><th>RHEL family (Rocky, Alma)<\/th><th>Debian, Ubuntu<\/th><\/tr><\/thead><tbody>\n<tr><td>PostgreSQL package<\/td><td><code>postgresql18-server<\/code><\/td><td><code>postgresql-18<\/code><\/td><\/tr>\n<tr><td>etcd driver for Patroni<\/td><td><code>patroni-etcd<\/code><\/td><td><code>python3-etcd<\/code><\/td><\/tr>\n<tr><td>etcd package<\/td><td><code>etcd<\/code><\/td><td><code>etcd-server<\/code> <code>etcd-client<\/code><\/td><\/tr>\n<tr><td>Auto-created cluster<\/td><td>none<\/td><td>drop with <code>pg_dropcluster --stop 18 main<\/code><\/td><\/tr>\n<tr><td>Patroni binaries<\/td><td><code>\/usr\/pgsql-18\/bin<\/code><\/td><td><code>\/usr\/lib\/postgresql\/18\/bin<\/code><\/td><\/tr>\n<tr><td>Data directory<\/td><td><code>\/var\/lib\/pgsql\/18\/data<\/code><\/td><td><code>\/var\/lib\/postgresql\/18\/main<\/code><\/td><\/tr>\n<tr><td>etcd config file<\/td><td><code>\/etc\/etcd\/etcd.conf<\/code><\/td><td><code>\/etc\/default\/etcd<\/code><\/td><\/tr>\n<tr><td>Patroni config file<\/td><td><code>\/etc\/patroni\/patroni.yml<\/code><\/td><td><code>\/etc\/patroni\/config.yml<\/code><\/td><\/tr>\n<tr><td>Firewall<\/td><td>firewalld<\/td><td>ufw<\/td><\/tr>\n<tr><td>Mandatory access control<\/td><td>SELinux (enforcing)<\/td><td>AppArmor (no action needed)<\/td><\/tr>\n<\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Open the firewall ports<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The cluster speaks on a handful of ports: PostgreSQL on 5432, the Patroni REST API on 8008, and etcd on 2379 (clients) and 2380 (peers). Open these on all three database nodes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">On the RHEL family, firewalld is the tool:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo firewall-cmd --permanent --add-port={5432,8008,2379,2380}\/tcp\nsudo firewall-cmd --reload<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">On Debian and Ubuntu, the same ports through ufw:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo ufw allow proto tcp to any port 5432,8008,2379,2380<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The two proxy nodes need 5000, 5001, and 7000 open for the HAProxy front ends and stats page, plus the VRRP protocol for Keepalived. We will cover those when we set up the proxy layer.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Bootstrap the etcd quorum<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">etcd comes first, because Patroni has nowhere to store the leader lock without it. Configure all three members as a static cluster. On the RHEL family the config lives in <code>\/etc\/etcd\/etcd.conf<\/code>; on Debian and Ubuntu it is <code>\/etc\/default\/etcd<\/code>. The contents are the same environment variables either way. Here is <code>pg1<\/code> (set the matching self-address on each node):<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>ETCD_NAME=pg1\nETCD_DATA_DIR=\/var\/lib\/etcd\nETCD_LISTEN_PEER_URLS=http:\/\/10.0.1.11:2380\nETCD_LISTEN_CLIENT_URLS=http:\/\/10.0.1.11:2379,http:\/\/127.0.0.1:2379\nETCD_INITIAL_ADVERTISE_PEER_URLS=http:\/\/10.0.1.11:2380\nETCD_ADVERTISE_CLIENT_URLS=http:\/\/10.0.1.11:2379\nETCD_INITIAL_CLUSTER=pg1=http:\/\/10.0.1.11:2380,pg2=http:\/\/10.0.1.12:2380,pg3=http:\/\/10.0.1.13:2380\nETCD_INITIAL_CLUSTER_STATE=new\nETCD_INITIAL_CLUSTER_TOKEN=pg-etcd<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Repeat on <code>pg2<\/code> and <code>pg3<\/code>, changing <code>ETCD_NAME<\/code> and the three self-addresses. The <code>ETCD_INITIAL_CLUSTER<\/code> line stays identical on all three. Start etcd on every node within a few seconds of each other so the initial election succeeds:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo systemctl enable --now etcd<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Confirm all three members joined and every endpoint is healthy:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>etcdctl --endpoints=http:\/\/${PG1}:2379,http:\/\/${PG2}:2379,http:\/\/${PG3}:2379 endpoint health -w table<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">All three should report <code>true<\/code> in the HEALTH column:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>+-----------------------+--------+------------+-------+\n|       ENDPOINT        | HEALTH |    TOOK    | ERROR |\n+-----------------------+--------+------------+-------+\n| http:\/\/10.0.1.11:2379 |   true | 2.21561ms  |       |\n| http:\/\/10.0.1.12:2379 |   true | 2.32060ms  |       |\n| http:\/\/10.0.1.13:2379 |   true | 1.62423ms  |       |\n+-----------------------+--------+------------+-------+<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Three is the minimum for a real quorum. With three members the cluster survives one failure and keeps a majority; drop to two and a single loss takes the whole DCS offline, which would freeze every failover decision.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Configure Patroni<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Patroni reads one YAML file per node. The only fields that change between nodes are <code>name<\/code> and the two <code>connect_address<\/code> lines. Write this to <code>\/etc\/patroni\/patroni.yml<\/code> on the RHEL family, or <code>\/etc\/patroni\/config.yml<\/code> on Debian and Ubuntu. The example below is the RHEL family version for <code>pg1<\/code>; the inline comment marks the one block Debian and Ubuntu change.<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>scope: pg-cluster\nnamespace: \/service\/\nname: pg1\n\nrestapi:\n  listen: 0.0.0.0:8008\n  connect_address: 10.0.1.11:8008\n\netcd3:\n  hosts:\n    - 10.0.1.11:2379\n    - 10.0.1.12:2379\n    - 10.0.1.13:2379\n\nbootstrap:\n  dcs:\n    ttl: 30\n    loop_wait: 10\n    retry_timeout: 10\n    maximum_lag_on_failover: 1048576\n    synchronous_mode: true\n    postgresql:\n      use_pg_rewind: true\n      use_slots: true\n      parameters:\n        wal_level: replica\n        hot_standby: \"on\"\n        max_wal_senders: 10\n        max_replication_slots: 10\n        wal_keep_size: 256MB\n        wal_log_hints: \"on\"\n  initdb:\n    - encoding: UTF8\n    - data-checksums\n  pg_hba:\n    - host all all 127.0.0.1\/32 scram-sha-256\n    - host all all 10.0.1.0\/24 scram-sha-256\n    - host replication replicator 10.0.1.0\/24 scram-sha-256\n\npostgresql:\n  listen: 0.0.0.0:5432\n  connect_address: 10.0.1.11:5432\n  data_dir: \/var\/lib\/pgsql\/18\/data          # Debian\/Ubuntu: \/var\/lib\/postgresql\/18\/main\n  bin_dir: \/usr\/pgsql-18\/bin                 # Debian\/Ubuntu: \/usr\/lib\/postgresql\/18\/bin\n  authentication:\n    superuser:\n      username: postgres\n      password: 'ChangeMe-Super#2026'\n    replication:\n      username: replicator\n      password: 'ChangeMe-Repl#2026'\n    rewind:\n      username: rewind_user\n      password: 'ChangeMe-Rewind#2026'\n  parameters:\n    password_encryption: scram-sha-256\n\nwatchdog:\n  mode: automatic\n  device: \/dev\/watchdog\n  safety_margin: 5\n\ntags:\n  nofailover: false\n  noloadbalance: false\n  clonefrom: false\n  nosync: false<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Two settings carry most of the weight here. <code>synchronous_mode: true<\/code> tells Patroni to keep one standby in lockstep with the primary and only ever promote that standby, which is what gives you a zero data loss failover. <code>use_pg_rewind: true<\/code> lets a failed primary rejoin as a standby without a full re-clone once it recovers. The <code>watchdog<\/code> block is the last line of defence against split-brain, covered in the next section.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Set the file ownership so the <code>postgres<\/code> user can read it, then move on to the watchdog before starting anything:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo chown postgres:postgres \/etc\/patroni\/patroni.yml\nsudo chmod 600 \/etc\/patroni\/patroni.yml<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Wire up the watchdog<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">etcd&#8217;s lock stops two nodes from both believing they are primary at the cluster layer, but there is a nastier edge case: a primary that is hung or paused long enough to lose its lock, then wakes up and keeps accepting writes for a few seconds before it notices. A hardware or software watchdog closes that window by resetting the node if Patroni stops feeding it a heartbeat. The Linux <code>softdog<\/code> module provides one on any machine, virtual or physical.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Load it at boot and let Patroni hand ownership of the device to the <code>postgres<\/code> user. Run this on all three database nodes:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>echo softdog | sudo tee \/etc\/modules-load.d\/softdog.conf\nsudo modprobe softdog\nsudo install -d \/etc\/systemd\/system\/patroni.service.d\nprintf '[Service]\\nExecStartPre=+\/sbin\/modprobe softdog\\nExecStartPre=+\/bin\/chown postgres \/dev\/watchdog\\n' | sudo tee \/etc\/systemd\/system\/patroni.service.d\/watchdog.conf\nsudo systemctl daemon-reload<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">With <code>mode: automatic<\/code> in the config, Patroni uses the watchdog when it is present and carries on without it when it is not, which is the right default for mixed environments. For maximum safety in production, switch that to <code>mode: required<\/code> so a node refuses to become leader if it cannot arm the watchdog. The trade-off is real: <code>required<\/code> means a node with a misconfigured watchdog will sit out rather than serve, so test it before you rely on it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Start the cluster<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Start <code>pg1<\/code> first and let it become the leader. It runs <code>initdb<\/code>, creates the three roles, and takes the lock:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo systemctl enable --now patroni<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Give it half a minute, then start Patroni on <code>pg2<\/code> and <code>pg3<\/code>. They clone from the leader with <code>pg_basebackup<\/code> and come up as standbys. Check the cluster from any node (use <code>config.yml<\/code> in place of <code>patroni.yml<\/code> on Debian and Ubuntu):<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo patronictl -c \/etc\/patroni\/patroni.yml list<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">One leader, one synchronous standby, and a streaming replica, all on the same timeline with zero lag:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"2200\" height=\"1068\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-patroni-cluster-status.png\" alt=\"patronictl list showing a three-node PostgreSQL HA cluster with one leader, one sync standby, and the etcd quorum healthy\" class=\"wp-image-168636\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-patroni-cluster-status.png 2200w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-patroni-cluster-status-300x146.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-patroni-cluster-status-1024x497.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-patroni-cluster-status-768x373.png 768w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-patroni-cluster-status-1536x746.png 1536w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-patroni-cluster-status-2048x994.png 2048w\" sizes=\"auto, (max-width: 2200px) 100vw, 2200px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>Sync Standby<\/code> role is the visible proof that <code>synchronous_mode<\/code> is working. That node has every committed transaction the primary has, which is exactly why Patroni will only ever promote it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Put HAProxy in front<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">HAProxy is what turns three database nodes into one address. The clever part is the health check: rather than guess which node is primary, HAProxy calls Patroni&#8217;s REST API on port 8008. Patroni answers <code>200<\/code> on <code>\/primary<\/code> only on the leader and <code>503<\/code> everywhere else, and the mirror image on <code>\/replica<\/code>. So a front end that checks <code>\/primary<\/code> naturally pools only the leader, and one that checks <code>\/replica<\/code> pools only the standbys. When the primary changes, the checks flip and HAProxy follows within a couple of seconds.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Install HAProxy on both proxy nodes:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo dnf install -y haproxy        # RHEL family\nsudo apt-get install -y haproxy    # Debian, Ubuntu<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Write the same <code>\/etc\/haproxy\/haproxy.cfg<\/code> to both nodes. Port 5000 carries writes to the primary, 5001 load-balances reads across the standbys, and 7000 serves the stats page:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>global\n    maxconn 2000\n    log \/dev\/log local0\n\ndefaults\n    log     global\n    mode    tcp\n    retries 2\n    timeout client  30m\n    timeout server  30m\n    timeout connect 4s\n    timeout check   5s\n\nlisten stats\n    mode http\n    bind *:7000\n    stats enable\n    stats uri \/\n    stats refresh 5s\n\n# Writes: only the Patroni leader answers 200 on \/primary\nlisten postgres_primary\n    bind *:5000\n    option httpchk\n    http-check send meth OPTIONS uri \/primary\n    http-check expect status 200\n    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions\n    server pg1 10.0.1.11:5432 maxconn 200 check port 8008\n    server pg2 10.0.1.12:5432 maxconn 200 check port 8008\n    server pg3 10.0.1.13:5432 maxconn 200 check port 8008\n\n# Reads: every healthy standby answers 200 on \/replica\nlisten postgres_replicas\n    bind *:5001\n    option httpchk\n    http-check send meth OPTIONS uri \/replica\n    http-check expect status 200\n    balance roundrobin\n    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions\n    server pg1 10.0.1.11:5432 maxconn 200 check port 8008\n    server pg2 10.0.1.12:5432 maxconn 200 check port 8008\n    server pg3 10.0.1.13:5432 maxconn 200 check port 8008<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>on-marked-down shutdown-sessions<\/code> directive is small but critical. It kills every existing connection to a server the instant that server fails its check, so when the primary goes down, clients are dropped immediately rather than left hanging on a demoted node. Leave it out and an application can keep trying to write to the old primary for the length of its connection timeout. Note also the modern check syntax: HAProxy 2.2 and newer use <code>http-check send meth OPTIONS uri \/primary<\/code> rather than the old one-line <code>option httpchk OPTIONS \/primary<\/code>, and Patroni dropped the legacy <code>\/master<\/code> endpoint in favour of <code>\/primary<\/code>, so older configs you find online will quietly health-check the wrong path.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">SELinux on the RHEL family needs one boolean flipped before HAProxy can reach the backends, because the confined HAProxy domain will not open arbitrary outbound ports by default:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo setsebool -P haproxy_connect_any 1<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Open the proxy ports and start the service on both nodes:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo firewall-cmd --permanent --add-port={5000,5001,7000}\/tcp && sudo firewall-cmd --reload   # RHEL family\nsudo systemctl enable --now haproxy<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The stats page at <code>http:\/\/10.0.1.21:7000\/<\/code> shows the routing decision in real time. The leader is green and UP in the <code>postgres_primary<\/code> pool, while the standbys correctly report DOWN there (they answer 503 on <code>\/primary<\/code>) and UP in the <code>postgres_replicas<\/code> pool.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"1000\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-haproxy-stats-page.png\" alt=\"HAProxy statistics page showing the PostgreSQL primary UP in the writes pool and the standbys UP in the reads pool, with role-based health checks\" class=\"wp-image-168637\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-haproxy-stats-page.png 1400w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-haproxy-stats-page-300x214.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-haproxy-stats-page-1024x731.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-haproxy-stats-page-768x549.png 768w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">That red row in the writes pool is not an error. A standby is supposed to fail the <code>\/primary<\/code> check, and HAProxy showing it DOWN there is the routing working as designed. If you set this up with one proxy, you would be done, but the proxy would now be your single point of failure. That is what Keepalived solves.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Add a floating VIP with Keepalived<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Keepalived runs a virtual IP across both proxy nodes using VRRP. One node holds the VIP as MASTER; if it or its HAProxy dies, the BACKUP takes the address over in about a second. The application only ever talks to the VIP, so it never knows a proxy failed. The same VRRP failover underpins a general-purpose <a href=\"https:\/\/computingforgeeks.com\/haproxy-keepalived-cluster-rocky-10\/\">HAProxy and Keepalived HA cluster<\/a> for any service, not just a database.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Install Keepalived on both proxy nodes, then write <code>\/etc\/keepalived\/keepalived.conf<\/code>. This is <code>ha1<\/code>, the MASTER:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>vrrp_script chk_haproxy {\n    script \"\/usr\/bin\/pidof haproxy\"\n    interval 2\n    weight 2\n    fall 2\n    rise 2\n}\n\nvrrp_instance VI_PG {\n    state MASTER\n    interface eth0\n    virtual_router_id 51\n    priority 101\n    advert_int 1\n    authentication {\n        auth_type PASS\n        auth_pass cfgha26\n    }\n    virtual_ipaddress {\n        10.0.1.10\/24\n    }\n    track_script {\n        chk_haproxy\n    }\n}<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">On <code>ha2<\/code>, use the identical file with two changes: <code>state BACKUP<\/code> and <code>priority 100<\/code>. The <code>chk_haproxy<\/code> script means the VIP only lives on a node whose HAProxy is actually running, so a crashed proxy hands the address over even if the node itself stays up. Match the <code>interface<\/code> name to your hardware (<code>ip -br addr<\/code> will tell you). Start it on both nodes:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo systemctl enable --now keepalived<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">If you copy the config in from another machine on an SELinux system and the service refuses to start with a permission error, run <code>sudo restorecon -v \/etc\/keepalived\/keepalived.conf<\/code> to fix its security label. A file edited in place gets the right label automatically.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Test the routing through the VIP<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Everything now hangs off one address. Point a write at the VIP on port 5000 and it lands on the primary; point a read at port 5001 and it lands on a standby. The <code>pg_is_in_recovery()<\/code> flag confirms which is which (false on the primary, true on a standby):<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>export PGPASSWORD=\"$SUPERUSER_PWD\"\npsql \"host=${VIP} port=5000 user=postgres dbname=postgres\" -c \"SELECT inet_server_addr(), pg_is_in_recovery();\"\npsql \"host=${VIP} port=5001 user=postgres dbname=postgres\" -c \"SELECT inet_server_addr(), pg_is_in_recovery();\"<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Writes resolve to the primary and reads spread across the standbys, all through the single VIP:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"2200\" height=\"1118\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-vip-connection-routing.png\" alt=\"psql connecting through the HAProxy virtual IP, with writes routed to the PostgreSQL primary and reads load-balanced to a replica\" class=\"wp-image-168638\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-vip-connection-routing.png 2200w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-vip-connection-routing-300x152.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-vip-connection-routing-1024x520.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-vip-connection-routing-768x390.png 768w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-vip-connection-routing-1536x781.png 1536w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-vip-connection-routing-2048x1041.png 2048w\" sizes=\"auto, (max-width: 2200px) 100vw, 2200px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">This is the payoff of the whole architecture. Your application&#8217;s connection string is <code>10.0.1.10:5000<\/code> for writes and <code>10.0.1.10:5001<\/code> for reads, and it never changes again.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Prove the failover<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">A cluster that has never failed over is a cluster you do not trust yet. Seed a row so we have something to lose, then kill the primary outright and watch what happens.<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>psql \"host=${VIP} port=5000 user=postgres dbname=postgres\" -c \"CREATE DATABASE appdb;\"\npsql \"host=${VIP} port=5000 user=postgres dbname=appdb\" -c \"CREATE TABLE t(id serial primary key, note text);\"\npsql \"host=${VIP} port=5000 user=postgres dbname=appdb\" -c \"INSERT INTO t(note) VALUES ('written-before-failover');\"<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Now simulate a hard failure. Power off the primary node, or stop both services on it. From a surviving node, watch the cluster react:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>watch -n2 'sudo patronictl -c \/etc\/patroni\/patroni.yml list'<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Within roughly twenty to twenty five seconds, Patroni notices the lock has expired, promotes the synchronous standby, and bumps the timeline. The async replica re-attaches to the new leader:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"2200\" height=\"652\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-automatic-failover.png\" alt=\"patronictl output before and after an automatic PostgreSQL failover, showing the sync standby promoted to leader on a new timeline\" class=\"wp-image-168639\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-automatic-failover.png 2200w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-automatic-failover-300x89.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-automatic-failover-1024x303.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-automatic-failover-768x228.png 768w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-automatic-failover-1536x455.png 1536w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/06\/wm-postgresql-ha-automatic-failover-2048x607.png 2048w\" sizes=\"auto, (max-width: 2200px) 100vw, 2200px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Because <code>synchronous_mode<\/code> guarantees the promoted node already held every committed transaction, the failover is lossless. The same connection string, retried, now lands on the new primary, the pre-failover row is intact, and new writes succeed:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>psql \"host=${VIP} port=5000 user=postgres dbname=appdb\" -c \"SELECT * FROM t;\"\npsql \"host=${VIP} port=5000 user=postgres dbname=appdb\" -c \"INSERT INTO t(note) VALUES ('written-after-failover');\"<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Bring the dead node back online and Patroni rejoins it automatically. Thanks to <code>use_pg_rewind<\/code>, it rewinds onto the new timeline and resumes streaming as a standby instead of demanding a full re-clone. The cluster is three healthy nodes again with no manual intervention.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One behaviour to know for planned maintenance: a manual <code>patronictl switchover<\/code> under <code>synchronous_mode<\/code> will only accept the current synchronous standby as the candidate. Aim it at an async replica and Patroni refuses with <code>candidate name does not match with sync_standby<\/code>. That is the safety guarantee doing its job, not a bug.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Hardening and where to go from here<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The cluster works, but a few choices separate a lab from production:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Encrypt etcd and replication.<\/strong> This guide ran etcd over plain HTTP for clarity. In production, give etcd TLS peer and client certificates and require <code>scram-sha-256<\/code> with TLS on the PostgreSQL replication connections. The leader lock and your WAL stream both cross the network.<\/li>\n<li><strong>Decide your synchronous posture deliberately.<\/strong> <code>synchronous_mode: true<\/code> buys zero data loss, but if the lone sync standby is also down, a strict configuration will refuse writes to protect consistency. Weigh that against <code>synchronous_mode_strict<\/code> and the number of standbys you keep in sync. The trade-off is durability versus write availability, and only you know which your application needs.<\/li>\n<li><strong>Add connection pooling.<\/strong> HAProxy balances connections but does not pool them, and PostgreSQL handles a flood of short-lived connections poorly. PgBouncer in front of (or alongside) each node keeps the connection count sane under load. It does not handle failover itself, which is exactly why it sits behind HAProxy rather than replacing it.<\/li>\n<li><strong>Watch the right things.<\/strong> Scrape Patroni&#8217;s <code>\/metrics<\/code> endpoint and alert on replication lag, the number of healthy etcd members, and timeline changes. A timeline bump you did not expect is a failover you did not notice.<\/li>\n<li><strong>Promote the watchdog to required.<\/strong> Once you have confirmed <code>softdog<\/code> arms cleanly on every node, switch <code>watchdog<\/code> to <code>mode: required<\/code> for the strongest split-brain guarantee.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">When this cluster outgrows a single primary&#8217;s write capacity, the next move is not a bigger box but read offloading and sharding: point read-heavy services at the 5001 port to spread load across standbys, and when even that is not enough, look at Citus for horizontal scale. For now, you have a database that survives losing any one node, heals itself, and presents one unchanging address to everything that depends on it. If you came from a simpler setup, comparing the write throughput here against a single server using the same method as our <a href=\"https:\/\/computingforgeeks.com\/database-benchmark-postgresql-mysql-mariadb\/\">PostgreSQL benchmark guide<\/a> is a good way to size the synchronous-commit overhead before you go live. If you also run MariaDB, the <a href=\"https:\/\/computingforgeeks.com\/mariadb-high-availability-galera-haproxy-keepalived\/\">same architecture built on Galera<\/a> is the sister guide to this one.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Before you reach for a high availability tool, the real problem to solve is the connection string. A single PostgreSQL server is easy to point an application at, but the moment that server dies, every client that cached its address is stuck. Replication alone does not fix this. You can have a perfectly healthy standby &#8230; <a title=\"Set Up PostgreSQL High Availability with Patroni and HAProxy\" class=\"read-more\" href=\"https:\/\/computingforgeeks.com\/postgresql-high-availability-patroni-etcd-haproxy\/\" aria-label=\"Read more about Set Up PostgreSQL High Availability with Patroni and HAProxy\">Read more<\/a><\/p>\n","protected":false},"author":17,"featured_media":168640,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[461,37631],"tags":[324,282,688],"cfg_series":[],"class_list":["post-168641","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-databases","category-postgresql","tag-databases","tag-linux","tag-postgresql"],"_links":{"self":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/168641","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/comments?post=168641"}],"version-history":[{"count":2,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/168641\/revisions"}],"predecessor-version":[{"id":168653,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/168641\/revisions\/168653"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/media\/168640"}],"wp:attachment":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/media?parent=168641"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/categories?post=168641"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/tags?post=168641"},{"taxonomy":"cfg_series","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/cfg_series?post=168641"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}