{"id":168103,"date":"2026-06-01T09:00:00","date_gmt":"2026-06-01T06:00:00","guid":{"rendered":"https:\/\/computingforgeeks.com\/?p=168103"},"modified":"2026-05-27T00:35:40","modified_gmt":"2026-05-26T21:35:40","slug":"qdrant-kubernetes-cluster","status":"publish","type":"post","link":"https:\/\/computingforgeeks.com\/qdrant-kubernetes-cluster\/","title":{"rendered":"Multi-node Qdrant Cluster on Kubernetes"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">A single Qdrant pod handles plenty of throughput, but it has the same failure profile as any single Postgres or Redis: one bad disk, one OOM kill, one unattended kernel panic, and the service is down. Production deployments solve this the same way they solve it for other stateful databases: multiple nodes, replicated shards, and an orchestrator that can route around the dead ones. On Kubernetes that orchestrator is built in, and Qdrant ships a Helm chart that wires up the StatefulSet, headless service, and PVCs for you. If you do not already have a cluster handy, our <a href=\"https:\/\/computingforgeeks.com\/install-kubernetes-kubeadm-ubuntu-2604\/\">kubeadm install on Ubuntu<\/a> is the most straightforward path to a three-node lab on bare metal.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This guide builds a real 3-node Qdrant cluster on a fresh k3s install across three Ubuntu 24.04 VMs, deploys it via the official Helm chart, sharded with replication, and then proves the HA claims with two experiments: force-delete a pod under continuous query load, then roll out a new image version while a second loop watches. Both experiments returned zero non-2xx responses. Every output block in this guide is captured from that cluster.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Tested May 2026 on Ubuntu 24.04.4 LTS, k3s v1.35.5, Qdrant 1.18.1 via Helm chart v1.18.0, qdrant-client 1.18.0, fastembed 0.8.0.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why multi-node Qdrant, and what the cluster gives you<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Three things change when Qdrant runs as a cluster instead of a single pod. First, the consensus layer (raft) keeps the cluster&#8217;s metadata coherent: which collections exist, which peers own which shards, what the replication factor is. Second, the data plane is sharded so a single collection can spread across more nodes than would fit on one host. Third, replication factor &gt; 1 means each shard exists on at least two peers, so a node failure does not lose data.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The minimum useful Qdrant cluster is 3 peers. Two is enough for replication but not for raft (you need a majority to elect a leader; 2 nodes deadlock on a split). Five gives more headroom but doubles the storage cost. Three is the right starting point for most production workloads.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Two collection-level knobs control the data plane:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Setting<\/th><th>What it does<\/th><th>Typical value<\/th><\/tr><\/thead><tbody><tr><td><code>shard_number<\/code><\/td><td>How many shards split the collection. Higher = better parallelism, more overhead.<\/td><td>2\u00d7 peer count for small, 6+ for large<\/td><\/tr><tr><td><code>replication_factor<\/code><\/td><td>Copies of each shard. 2 tolerates 1 dead peer, 3 tolerates 2.<\/td><td>2 for prod, 3 for paranoid<\/td><\/tr><tr><td><code>write_consistency_factor<\/code><\/td><td>How many replicas must ack a write. Equal to replication factor = strongest consistency.<\/td><td>Equal to <code>replication_factor<\/code><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">For the demo collection in this guide: <code>shard_number=6<\/code> and <code>replication_factor=2<\/code>. That produces 12 shard-replicas across 3 peers, exactly 4 per peer.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Stand up the Kubernetes cluster<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Anything that gives you 3 Ready nodes and a default storage class works. The test bed for this guide is k3s on three Ubuntu 24.04 VMs (each 4 vCPU \/ 4 GB RAM \/ 20 GB disk), but EKS, GKE, or kubeadm-built clusters all behave the same once <code>kubectl get nodes<\/code> reports Ready.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">On the control-plane node:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>curl -sfL https:\/\/get.k3s.io | \\\n  sudo INSTALL_K3S_EXEC='server --cluster-init --disable=traefik' sh -\nsudo cat \/var\/lib\/rancher\/k3s\/server\/node-token<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">On each worker, with <code>CP_IP<\/code> set to the control plane&#8217;s IP and <code>TOKEN<\/code> set to the value above:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>curl -sfL https:\/\/get.k3s.io | \\\n  sudo K3S_URL=https:\/\/${CP_IP}:6443 K3S_TOKEN=${TOKEN} sh -<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Verify on the control plane:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>sudo kubectl get nodes -o wide\nNAME                  STATUS   ROLES                AGE   VERSION\ncfg-qdrant-k3s-1241   Ready    control-plane,etcd   8m    v1.35.5+k3s1\ncfg-qdrant-k3s-1242   Ready    &lt;none&gt;               7m    v1.35.5+k3s1\ncfg-qdrant-k3s-1243   Ready    &lt;none&gt;               7m    v1.35.5+k3s1<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">k3s ships a working <code>local-path<\/code> StorageClass and Klipper LoadBalancer by default, which covers for the chart&#8217;s PVCs and service. If you are on EKS, swap in <code>gp3<\/code> as the StorageClass and a managed ALB Controller for the service.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Deploy Qdrant via Helm<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The official chart lives at <code>https:\/\/qdrant.github.io\/qdrant-helm<\/code> and the App Version follows the Qdrant version: chart 1.18.0 deploys Qdrant 1.18.x. Install Helm and add the repo:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>curl -sfL https:\/\/raw.githubusercontent.com\/helm\/helm\/main\/scripts\/get-helm-3 \\\n  | sudo bash\nhelm repo add qdrant https:\/\/qdrant.github.io\/qdrant-helm\nhelm repo update<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The chart defaults to one replica with no cluster mode. Override that with a values file:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code># values.yaml\nreplicaCount: 3\nimage:\n  tag: v1.18.1\n\n# Spread pods across nodes\npodAntiAffinity:\n  enabled: true\n\n# Cluster mode: enables raft consensus + sharding\ncluster:\n  enabled: true\n\npersistence:\n  size: 4Gi\n  storageClassName: local-path\n\n# Strong api-key; in prod use a secretRef instead of an inline value\napiKey: cfg-lab-cluster-key-2026\n\nresources:\n  requests: {cpu: 250m, memory: 512Mi}\n  limits:   {cpu: 1000m, memory: 1Gi}<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Install into a namespace:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>kubectl create namespace qdrant\nhelm install qdrant qdrant\/qdrant -n qdrant -f values.yaml<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The chart creates a StatefulSet with 3 ordered replicas (<code>qdrant-0<\/code>, <code>qdrant-1<\/code>, <code>qdrant-2<\/code>), a regular ClusterIP service that load-balances reads, and a headless service that gives each pod a stable DNS name like <code>qdrant-0.qdrant-headless<\/code>. The headless DNS is how peers find each other for raft.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Wait for all three pods to land:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>kubectl get pods -n qdrant -o wide\nNAME       READY   STATUS    RESTARTS   AGE   IP          NODE\nqdrant-0   1\/1     Running   0          58s   10.42.1.3   cfg-qdrant-k3s-1242\nqdrant-1   1\/1     Running   2          58s   10.42.0.6   cfg-qdrant-k3s-1241\nqdrant-2   1\/1     Running   0          58s   10.42.2.3   cfg-qdrant-k3s-1243\n\nkubectl get pvc -n qdrant\nNAME                      STATUS   CAPACITY   STORAGECLASS\nqdrant-storage-qdrant-0   Bound    4Gi        local-path\nqdrant-storage-qdrant-1   Bound    4Gi        local-path\nqdrant-storage-qdrant-2   Bound    4Gi        local-path<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">qdrant-1 showing 2 restarts at startup is normal: the second peer hits a brief window where the first peer&#8217;s API is up but the raft consensus has not yet bootstrapped. The chart&#8217;s restart policy handles this without intervention.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"920\" height=\"800\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-k3s-3-node-cluster.png\" alt=\"Qdrant 3-node k3s cluster with kubectl get nodes pods pvc\" class=\"wp-image-168098\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-k3s-3-node-cluster.png 920w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-k3s-3-node-cluster-300x261.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-k3s-3-node-cluster-768x668.png 768w\" sizes=\"auto, (max-width: 920px) 100vw, 920px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Verify the raft cluster formed<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Three pods up does not mean the cluster is healthy. They could be running but not aware of each other. The <code>\/cluster<\/code> endpoint reveals the real raft state:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>kubectl port-forward -n qdrant svc\/qdrant 6333:6333 &amp;\ncurl -sS http:\/\/localhost:6333\/cluster \\\n    -H \"api-key: cfg-lab-cluster-key-2026\" | jq<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">A healthy 3-peer cluster reports the peer list, the current term and commit index, and the leader&#8217;s peer ID:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>{\n  \"result\": {\n    \"status\": \"enabled\",\n    \"peer_id\": 6466783665211289,\n    \"peers\": {\n      \"6466783665211289\": {\"uri\": \"http:\/\/qdrant-0.qdrant-headless:6335\/\"},\n      \"2653139728083735\": {\"uri\": \"http:\/\/qdrant-1.qdrant-headless:6335\/\"},\n      \"1945335202684494\": {\"uri\": \"http:\/\/qdrant-2.qdrant-headless:6335\/\"}\n    },\n    \"raft_info\": {\n      \"term\": 1,\n      \"commit\": 9,\n      \"leader\": 6466783665211289,\n      \"role\": \"Leader\",\n      \"is_voter\": true\n    },\n    \"consensus_thread_status\": {\n      \"consensus_thread_status\": \"working\",\n      \"last_update\": \"2026-05-26T16:31:01Z\"\n    }\n  },\n  \"status\": \"ok\"\n}<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Three peers, term 1 (consensus held on first election, no churn), no pending operations, consensus thread &#8220;working&#8221;. This is what you want to see; anything else (status disabled, term &gt; 5 after bootstrap, &#8220;writing&#8221; persistently) means raft is unhappy.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Create a sharded + replicated collection<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Cluster mode unlocks two collection params that did nothing on a single-node cluster: <code>shard_number<\/code> and <code>replication_factor<\/code>. Combined, they decide how the data plane is laid out:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>from qdrant_client import QdrantClient, models\n\nclient = QdrantClient(url=\"http:\/\/localhost:6333\",\n                      api_key=\"cfg-lab-cluster-key-2026\")\n\nclient.create_collection(\n    \"articles\",\n    vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE),\n    shard_number=6,\n    replication_factor=2,\n    write_consistency_factor=2,   # majority write: every replica acks\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Push 5,000 BGE-small embeddings into the new collection (the companion repo has the loader script). Then ask each pod for its local shard distribution:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>for POD in qdrant-0 qdrant-1 qdrant-2; do\n  IP=$(kubectl get pod -n qdrant $POD -o jsonpath='{.status.podIP}')\n  curl -sS \"http:\/\/${IP}:6333\/collections\/articles\/cluster\" \\\n    -H \"api-key: cfg-lab-cluster-key-2026\" \\\n    | jq -c '{peer_id: .result.peer_id,\n              local_shards: [.result.local_shards[].shard_id]}'\ndone<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The output captures the placement we expected. Each shard appears on exactly two peers:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>qdrant-0 local: shards [1, 2, 4, 5]   peer 6466783665211289\nqdrant-1 local: shards [0, 1, 3, 4]   peer 2653139728083735\nqdrant-2 local: shards [0, 2, 3, 5]   peer 1945335202684494\n\n  shard 0: qdrant-1, qdrant-2\n  shard 1: qdrant-0, qdrant-1\n  shard 2: qdrant-0, qdrant-2\n  shard 3: qdrant-1, qdrant-2\n  shard 4: qdrant-0, qdrant-1\n  shard 5: qdrant-0, qdrant-2<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">That distribution survives any single peer going down: every shard has a second copy on a different peer, so the data is reachable. The headless service plus the chart&#8217;s automatic peer-to-peer routing means a query sent to any pod transparently reaches whatever shards it needs.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"920\" height=\"800\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-cluster-raft-shards.png\" alt=\"Qdrant raft consensus state and 6-shard distribution across 3 pods\" class=\"wp-image-168099\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-cluster-raft-shards.png 920w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-cluster-raft-shards-300x261.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-cluster-raft-shards-768x668.png 768w\" sizes=\"auto, (max-width: 920px) 100vw, 920px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">HA in practice: force-delete a pod under load<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The point of replication factor 2 is that one peer can disappear without losing data or returning errors. Test it by killing a pod with prejudice (<code>--grace-period=0 --force<\/code>) while a tight query loop hammers the service:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code># Run 60 queries in a tight loop, ~0.2s apart\nfor i in $(seq 1 60); do\n    curl -sS -o \/dev\/null -w '%{http_code} ' --max-time 5 \\\n        http:\/\/localhost:6333\/collections\/articles\/points\/query \\\n        -H \"api-key: $KEY\" -H \"Content-Type: application\/json\" -d \"$BODY\"\n    sleep 0.2\ndone &amp;\n\n# In parallel: kill qdrant-2 hard\nkubectl delete pod qdrant-2 -n qdrant --grace-period=0 --force<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The real run on the test cluster recorded 60 successful queries during the outage, zero failures:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>== Disaster: kill qdrant-2 pod ==\n16:37:03.972 START\n16:37:03.972 pod deletion command sent\n   pod \"qdrant-2\" force deleted from qdrant namespace\n16:37:16.588 END  60 queries during the outage\n  during failure: ok=60  fail=0\n  status codes: 200: 60\n\n== State after the failure ==\nqdrant-0   1\/1   Running   0\nqdrant-1   1\/1   Running   2\nqdrant-2   0\/1   Running   0    # restarting, raft re-joining\n(8s later)\nqdrant-2   1\/1   Running   0    # back, shards re-synced<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The key behaviour: the StatefulSet recreated qdrant-2 in 8 seconds, the pod&#8217;s PVC was reattached (so the local segments were not lost), raft re-joined as a Follower, and shard state caught up automatically. All while queries kept flowing through qdrant-0 and qdrant-1.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"920\" height=\"800\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-pod-failure-zero-downtime.png\" alt=\"Qdrant 60 queries succeed during force-delete of one pod, zero downtime\" class=\"wp-image-168100\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-pod-failure-zero-downtime.png 920w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-pod-failure-zero-downtime-300x261.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-pod-failure-zero-downtime-768x668.png 768w\" sizes=\"auto, (max-width: 920px) 100vw, 920px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">If you run the same probe from outside the cluster via <code>kubectl port-forward<\/code>, the picture is different. The port-forward attaches to one specific pod; when that pod is the one you kill, the tunnel breaks and you see <code>connection refused<\/code> for a few seconds while a new tunnel opens. That is a port-forward artifact, not a cluster artifact. In production, the traffic enters through a LoadBalancer or Ingress that does proper endpoint slicing, which is what the in-cluster loop above simulates.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Rolling upgrade with zero downtime<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The chart&#8217;s StatefulSet uses the <code>OrderedReady<\/code> update strategy: one pod at a time, wait for it to come up before moving to the next. Combine that with replication factor 2 and the rest of the cluster stays available throughout. To test, start a continuous in-cluster query loop:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>kubectl run -n qdrant query-loop --image=curlimages\/curl --restart=Never -- \\\n  sh -c '\n    end=$(($(date +%s) + 600))\n    while [ $(date +%s) -lt $end ]; do\n      code=$(curl -sS -o \/dev\/null -w \"%{http_code}\" --max-time 5 \\\n        http:\/\/qdrant:6333\/collections\/articles \\\n        -H \"api-key: cfg-lab-cluster-key-2026\")\n      echo \"$(date \"+%H:%M:%S\") $code\"\n      sleep 0.5\n    done\n  '\n\n# Tail the logs in another shell\nkubectl logs -n qdrant -f query-loop &gt; \/tmp\/loop.log &amp;<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Then trigger the upgrade:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>helm upgrade qdrant qdrant\/qdrant -n qdrant --reuse-values \\\n    --set image.tag=v1.18.0\nkubectl rollout status statefulset\/qdrant -n qdrant<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The rollout takes about 21 seconds per pod (terminate, image pull is cached, start, raft re-join, become ready) for a total of ~64 seconds across all three. The in-cluster loop captured 128 queries during that window. Every single one returned HTTP 200:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>=== Results ===\nTotal queries: 128\nHTTP 200    : 128\nNon-200 codes: (none)\n\n=== Image now: docker.io\/qdrant\/qdrant:v1.18.0 ===\nqdrant-0   1\/1   Running   0   21s   cfg-qdrant-k3s-1242\nqdrant-1   1\/1   Running   0   38s   cfg-qdrant-k3s-1241\nqdrant-2   1\/1   Running   0   55s   cfg-qdrant-k3s-1243<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The continuous-query loop alongside the rollout output makes the zero-downtime claim auditable on a single terminal:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"920\" height=\"800\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-rolling-upgrade-zero-downtime.png\" alt=\"Qdrant rolling upgrade 128 of 128 queries returned HTTP 200\" class=\"wp-image-168101\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-rolling-upgrade-zero-downtime.png 920w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-rolling-upgrade-zero-downtime-300x261.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/05\/wm-qdrant-rolling-upgrade-zero-downtime-768x668.png 768w\" sizes=\"auto, (max-width: 920px) 100vw, 920px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Roll back with the same command and the opposite tag. The chart re-uses the PVCs across upgrades so the on-disk state survives every restart. That is what makes the StatefulSet model fit Qdrant cleanly: peer identity is stable (<code>qdrant-0<\/code> always rebinds the same PVC), so rejoining the cluster is a no-op from raft&#8217;s perspective.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Scaling: more peers, more shards<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Scale the StatefulSet up to add peers. The chart re-renders with the new <code>replicaCount<\/code> and the StatefulSet controller adds pods one at a time. Each new pod attaches its own PVC, joins raft, and becomes available for new shard placements:<\/p>\n\n\n\n<pre class=\"wp-block-code code\"><code>helm upgrade qdrant qdrant\/qdrant -n qdrant --reuse-values \\\n    --set replicaCount=5\nkubectl rollout status statefulset\/qdrant -n qdrant<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Existing shards do not automatically rebalance onto the new peers. To move shards, use the <code>\/collections\/{name}\/cluster<\/code> endpoint with a <code>move_shard<\/code> operation, which copies a shard to a new peer, waits for sync, then drops the old copy. The chart does not automate this because shard moves are heavy: a 10 GB shard takes minutes to copy and saturates the network.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Scaling down is the inverse: drop replicas with <code>helm upgrade --set replicaCount=N<\/code>, but first move shards off the peers being removed. A removed peer with shards still on it is a data loss event if the replication factor was equal to the dropped count.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Gotchas worth remembering<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Five real traps from this build:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Distroless image has no <code>curl<\/code> or shell.<\/strong> The Qdrant container exposes the binary only. <code>kubectl exec qdrant-0 -- curl<\/code> fails with <code>exec: \"curl\": executable file not found<\/code>. Use <code>kubectl port-forward<\/code> from a node, or <code>kubectl run --image=curlimages\/curl<\/code> for in-cluster probes.<\/li><li><strong><code>kubectl port-forward svc\/qdrant<\/code> attaches to one pod, not the service.<\/strong> When that pod is killed during a rolling upgrade or failure test, the tunnel breaks and you see <code>connection refused<\/code> until you reconnect. Use an Ingress\/LoadBalancer for real client traffic, or an in-cluster loop for HA testing.<\/li><li><strong>qdrant-1 restarts twice on first cluster bootstrap.<\/strong> Peer 2 starts before peer 1&#8217;s raft has stabilized; CrashLoopBackOff for ~30s is the normal path, not a bug. Wait for it.<\/li><li><strong>Shards do not rebalance automatically when you add peers.<\/strong> The new pods join raft and serve writes, but existing shards stay where they are until you move them explicitly. Plan shard moves during low-traffic windows because the copy saturates the network.<\/li><li><strong>local-path PVCs are tied to a specific node.<\/strong> If <code>cfg-qdrant-k3s-1242<\/code> dies and never comes back, <code>qdrant-0<\/code>&#8216;s PVC is orphaned and the pod stays Pending. On managed clouds (EBS, PD), the PVC moves with the pod. On bare metal, use a network storage class like Longhorn, OpenEBS, or Rook-Ceph for portable PVCs.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">From here to a production cluster<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">What we built is a working baseline. Three concrete next steps if you take this to production:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Front the chart&#8217;s ClusterIP service with an Ingress (TLS) or a LoadBalancer.<\/strong> The api-key travels in headers; serve it over HTTPS only. The patterns from the secure-qdrant-tls-nginx guide apply unchanged to a Kubernetes deployment when you put cert-manager in front.<\/li><li><strong>Pair the chart&#8217;s PVCs with a network storage class.<\/strong> Local-path on a single VM is fine for a lab; for real workloads use EBS (EKS), PD (GKE), or Longhorn on bare metal so a node loss does not strand a PVC.<\/li><li><strong>Wire the snapshot backup process from the previous guide to a CronJob.<\/strong> The Qdrant snapshot endpoint works identically in a cluster (it captures per-peer state), and a Kubernetes CronJob with an AWS or GCS credential makes the backup completely declarative.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The two HA experiments above (force-delete and rolling upgrade) are the regression test for whether the cluster&#8217;s safety properties still hold. Run them after every chart upgrade, every Kubernetes upgrade, and every cluster topology change. A 60-query loop and a force-deleted pod are a small price to pay for confidence that the next real incident will look like the test. To put a real ingress in front of the chart&#8217;s ClusterIP, the <a href=\"https:\/\/computingforgeeks.com\/install-nginx-ingress-kubernetes\/\">Nginx Ingress on Kubernetes<\/a> walkthrough covers the cert-manager piece and the annotations Qdrant&#8217;s API expects.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A single Qdrant pod handles plenty of throughput, but it has the same failure profile as any single Postgres or Redis: one bad disk, one OOM kill, one unattended kernel panic, and the service is down. Production deployments solve this the same way they solve it for other stateful databases: multiple nodes, replicated shards, and &#8230; <a title=\"Multi-node Qdrant Cluster on Kubernetes\" class=\"read-more\" href=\"https:\/\/computingforgeeks.com\/qdrant-kubernetes-cluster\/\" aria-label=\"Read more about Multi-node Qdrant Cluster on Kubernetes\">Read more<\/a><\/p>\n","protected":false},"author":3,"featured_media":168102,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[39034,316,461,35913,317,50],"tags":[17245,218,324,669],"cfg_series":[39865],"class_list":["post-168103","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-containers","category-databases","category-devops","category-kubernetes","category-linux-tutorials","tag-ai","tag-containers","tag-databases","tag-dev","cfg_series-qdrant-mastery"],"_links":{"self":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/168103","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/comments?post=168103"}],"version-history":[{"count":1,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/168103\/revisions"}],"predecessor-version":[{"id":168125,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/168103\/revisions\/168125"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/media\/168102"}],"wp:attachment":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/media?parent=168103"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/categories?post=168103"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/tags?post=168103"},{"taxonomy":"cfg_series","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/cfg_series?post=168103"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}