Using Patroni to Build a Highly Available Postgres Cluster

The last PG Phriday article focused on the architecture of a Patroni cluster—the how and why of the design. This time around, it’s all about actually building one. I’ve often heard that operating Postgres can be intimidating, and Patroni is on a level above that. Well, I won’t argue on the second count, but I can try to at least ease some of the pain.

To avoid an overwhelming deluge consisting of twenty pages of instructions, I’ve split this article into a series of three along these lines:

Etcd
Postgres and Patroni
HAProxy

This establishes each of the three layers that represent the full Patroni stack, and provides a convenient reference for later regarding each.

With that out of the way, let’s get started!

Why etcd?

The last article should have made it abundantly clear that the DCS is the nexus of communication and status for the whole cluster. As a result, it’s important to install it first and certify that it’s operational. Etcd is the default and the example most often deployed in Patroni clusters. It’s also the key/value storage system Kubernetes uses as a default, so it should be reliable enough for our needs.

Don’t forget to keep a browser tab opened to the etcd documentation handy.

What you’ll need

If you want to follow along with this demonstration, you’ll need:

The ability to create three VMs. Whether it’s Amazon EC2 instances, Microsoft Hyper-V, Xen, QEMU, Proxmox, Oracle VirtualBox, or even VMWare Fusion, make sure you have a hypervisor and know how to use it.
Three VMs running Debian Stable version 13. At the time of writing, this should be the Trixie release.
SSH access as a root-capable user on each VM.
An internet connection. If you have the first three, it’s likely you have this as well.

Believe it or not, that should actually be all that’s necessary. While these instructions focus on Debian packaging when possible, feel free to substitute RedHat equivalents if you’d rather be adventurous. Most of these instructions should work on any Linux system if you’re familiar with your platform of choice and know how to improvise.

If you want to make your life easier, add some lines to /etcd/hosts on the VMs to give each a name. IP addresses are great, but they’re not as convenient as “pg1”. Here’s an example:

192.168.6.10 pg1
192.168.6.11 pg2
192.168.6.12 pg3

Unless otherwise noted, execute commands described in this guide on each of the VMs.

Preparing each VM

Prior to installing etcd, let’s create a user named “etcd” to own the service and related data using a quick useradd command:

sudo useradd --system --create-home -s /bin/bash -d /var/lib/etcd etcd

It’s important to create the user as a “system” user, as these are often treated differently by Systemd.

Installing etcd

The first lesson is that most of these tools are not “properly” packaged. By that, I mean that there are no official .deb or .rpm packages that should be considered recent. The etcd software maintainers do not provide anything aside from Zip, tarballs, or source code. That means the first step is to visit the etcd GitHub release page and find the URL for the latest release.

With that URL, install with these commands:

export DL_FILE=etcd-v3.6.7-linux-amd64.tar.gz
wget https://github.com/etcd-io/etcd/releases/download/v3.6.7/${DL_FILE}
sudo tar -xf ${DL_FILE} -C /usr/local
sudo chown -R root:root /usr/local/etcd-v3.6.7-linux-amd64
sudo ln -s etcd-v3.6.7-linux-amd64 /usr/local/etcd

Then we want to invoke the Debian alternatives system to make the binaries easier to use:

sudo update-alternatives \
     --install /usr/sbin/etcd etcd /usr/local/etcd/etcd 100
sudo update-alternatives \
     --install /usr/bin/etcdctl etcdctl /usr/local/etcd/etcdctl 100

Finally, create a systemd service file to control the etcd service:

cat<<EOF|sudo tee /etc/systemd/system/etcd.service
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target

[Service]
User=etcd
Type=notify
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_NAME=%m
ExecStart=/usr/sbin/etcd --config-file /etc/etcd/etcd.yaml
Restart=always
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload

Don’t start the service; we haven’t configured it yet.

Configuring etcd

For etcd, the “hard” part is getting the configuration right. Take note of all bolded parameters, including name, advertise-client-urls, initial-advertise-peer-urls, listen-client-urls, and listen-peer-urls in the following block. These should all reflect the name or IP address of the server you’re configuring! The easiest way to do this is to use environment variables as shown in the example.

sudo mkdir /etc/etcd
sudo chown etcd:etcd /etc/etcd

# Make sure to set these for the node being configured
export MY_HOST=pg1
export MY_IP=192.168.6.10

cat<<EOF|sudo tee /etc/etcd/etcd.yaml
name: ${MY_HOST}
advertise-client-urls: http://${MY_HOST}:2379
data-dir: /var/lib/etcd/postgresql
initial-advertise-peer-urls: http://${MY_HOST}:2380
initial-cluster: pg1=http://pg1:2380,pg2=http://pg2:2380,pg3=http://pg3:2380
initial-cluster-state: new
initial-cluster-token: patroni_cluster
listen-client-urls: http://${MY_IP}:2379,http://127.0.0.1:2379
listen-peer-urls: http://${MY_IP}:2380
dial-timeout: 20s
read-timeout: 20s
write-timeout: 20s
EOF

The listen URLs are IP addresses because etcd attempts to resolve hostnames when specified for these parameters.

Once the configuration file exists on each of the servers, enable and start the etcd service itself.

sudo systemctl enable etcd
sudo systemctl start etcd

Since the configuration file states that this is a “new” cluster, the cluster won’t consider itself bootstrapped until all three servers are online and connected to each other. Give it a minute or two before continuing.

Validating the service

Once etcd starts on all nodes, it’s a good idea to verify that it’s working as expected before handing it over to Patroni. Start by launching the etcdctl tool to view the cluster member list, which should include all three nodes:

etcdctl member list

11be7ea7eac6dbc8, started, pg2, http://pg2:2380, http://pg2:2379, false
2a0c85329dcad4bd, started, pg3, http://pg3:2380, http://pg3:2379, false
608fb0f2dfc0e470, started, pg1, http://pg1:2380, http://pg1:2379, false

We can see here that all nodes are accounted for, but this doesn’t actually show the state of each, just that they have joined the cluster. For that, we need a different command:

etcdctl endpoint health --cluster

http://pg1:2379 is healthy: successfully committed proposal: took = 1.69574ms
http://pg2:2379 is healthy: successfully committed proposal: took = 1.774651ms
http://pg3:2379 is healthy: successfully committed proposal: took = 1.80241ms

And this is what we see if a node is offline:

etcdctl endpoint health --cluster

http://pg1:2379 is healthy: successfully committed proposal: took = 1.50183ms
http://pg3:2379 is healthy: successfully committed proposal: took = 3.106761ms
http://pg2:2379 is unhealthy: failed to commit proposal: context deadline exceeded

Finally, write a sample value to the DCS, retrieve it from a different node, and delete it from a third. This ensures that all nodes can write based on the consensus between them.

# Execute on node 1
etcdctl put testkey "Hello World"

OK

# Execute on node 2
etcdctl get testkey

testkey
Hello World

# Execute on node 3
etcdctl del testkey

1

This full lifecycle proves etcd is operating as expected, all nodes are fully operational, and this cluster is ready for Patroni.

Finishing up

You should have three VMs equipped with an etcd service at this stage, and that provides a convenient stopping point for the next article. If you were wondering why we’re installing everything on three nodes, this is because it’s the minimum viable HA cluster with any real meaning.

While it’s possible to run a two-node cluster, quorum requires a majority to guarantee consensus. Any node that disagrees is simply treated as incorrect, and should re-synchronize with the majority. So a two node cluster must always keep both nodes online or it can’t be trusted. A three node cluster has a spare, as it becomes possible to stop a single node while the other two maintain the consensus.

As a result, no real cluster has fewer than three nodes. Please note that this only applies to the etcd layer! Consider this cluster design:

This is the same as our original diagram, but with only two Postgres / Patroni elements. This is perfectly valid because the DCS layer itself maintains the quorum, so we don’t have to enforce that same constraint on Postgres or Patroni. This means we could theoretically operate two Postgres nodes in different regions under the assumption that there’s an externally managed DCS layer.

In the case of this demonstration however, we don’t have that luxury. To decouple Patroni from etcd that way requires a five node cluster: three for etcd, and two for Patroni and Postgres. That’s actually the superior approach for more sophisticated architectures since multiple Patroni clusters can share a single etcd resource.

We may explore that kind of advanced use case in the future, but for now, experiment with your new etcd cluster and we’ll see you next week!

Using Patroni to Build a Highly Available Postgres Cluster—Part 1: etcd

Why etcd?

What you’ll need

Preparing each VM

Installing etcd

Configuring etcd

Validating the service

Finishing up

Shaun Thomas

Get started today.

Using Patroni to Build a Highly Available Postgres Cluster—Part 1: etcd

Why etcd?

What you’ll need

Preparing each VM

Installing etcd

Configuring etcd

Validating the service

Finishing up

Shaun Thomas

SUBSCRIBE TO BLOG

Get started today.