-
Notifications
You must be signed in to change notification settings - Fork 632
Description
Is there an existing issue already for this bug?
- I have searched for an existing issue, and could not find anything. I believe this is a new bug.
I have read the troubleshooting guide
- I have read the troubleshooting guide and I think this is a new bug.
I am running a supported version of CloudNativePG
- I have read the troubleshooting guide and I think this is a new bug.
Contact Details
No response
Version
1.27 (latest patch)
What version of Kubernetes are you using?
1.33
What is your Kubernetes environment?
Other
How did you install the operator?
YAML manifest
What happened?
After migrating from CNPG 1.26 to 1.27 I see the following:
┌─────────────────────────────────────────────────────────────────────────────── Pods(namespace)[9] ───────────────────────────────────────────────────────────────────────────────┐
│ NAME↑ PF READY STATUS RESTARTS IP NODE AGE │
│ postgresql-cluster-1 ● 1/1 Running 0 IP1 host1.net 7d2h │
│ postgresql-cluster-2 ● 1/1 Running 0 IP2 host2.net 7d2h │
│ postgresql-cluster-3 ● 1/1 Running 0 IP3 host3.net 52m │
By looking inside the pod spec, postgresql-cluster-3 is running the 1.27 image, while others (postgresql-cluster-1 and postgresql-cluster-2) still on 1.26.
Cluster phase is stuck in "Waiting for the instances to become active".
Judging by operator log, it can't proceed further because of the PG config mismatch on the pods. Config diff:
➜ ~ diff pg3 pg2
5d4
< cnpg.synchronous_standby_names_metadata = '{"method":"ANY","number":1,"standbyNames":["postgresql-cluster-2","postgresql-cluster-3"]}'
Looking at config diff it looks like it will never match since cnpg.synchronous_standby_names_metadata was introduced only in 1.27.
So the only solution is either to drop the existing 1.26 pods by hand or remove the following setting from the cluster spec:
maxSyncReplicas: 1
minSyncReplicas: 1
Cluster resource
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
creationTimestamp: "2025-05-29T09:37:55Z"
generation: 3
labels:
app.kubernetes.io/name: postgresql-cluster
name: postgresql-cluster
namespace: NS
resourceVersion: "433189235"
uid: 7ab5fa98-8e5b-4fb1-82a8-6196027bd65e
spec:
affinity:
additionalPodAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: cnpg.io/podRole
operator: In
values:
- instance
topologyKey: host.dev/rack
enablePodAntiAffinity: false
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: host.dev/rack
operator: Exists
backup:
barmanObjectStore:
data:
compression: gzip
encryption: AES256
jobs: 2
destinationPath: s3://bucket
endpointCA:
key: ca.crt
name: name
endpointURL: https://s3.net
s3Credentials:
accessKeyId:
key: keyId
name: s3-creds
secretAccessKey:
key: secret
name: s3-creds
wal:
compression: gzip
encryption: AES256
maxParallel: 2
retentionPolicy: 7d
target: prefer-standby
bootstrap:
recovery:
database: DB
owner: OWNER
recoveryTarget:
backupID: 20250527T220200
targetTime: "2025-05-28 10:36:00.000000"
secret:
name: bootstrap-secret
source: cluster
certificates:
serverCASecret: CA
serverTLSSecret: SECRET
description: DESC
enablePDB: true
enableSuperuserAccess: false
externalClusters:
- barmanObjectStore:
destinationPath: s3://BUCKET
endpointCA:
key: ca.crt
name: NAME
endpointURL: https://s3.net
s3Credentials:
accessKeyId:
key: keyId
name: s3-creds
secretAccessKey:
key: secret
name: s3-creds
wal:
maxParallel: 8
name: postgresql-cluster
failoverDelay: 0
imageName: pg16_image
instances: 3
logLevel: trace
maxSyncReplicas: 1
minSyncReplicas: 1
monitoring:
customQueriesConfigMap:
- key: queries
name: cnpg-default-monitoring
disableDefaultQueries: false
enablePodMonitor: false
postgresGID: ...
postgresUID: ...
postgresql:
parameters:
archive_mode: "on"
archive_timeout: 5min
dynamic_shared_memory_type: posix
full_page_writes: "on"
hot_standby_feedback: "on"
log_destination: csvlog
log_directory: /controller/log
log_filename: postgres
log_rotation_age: "0"
log_rotation_size: "0"
log_truncate_on_rotation: "false"
logging_collector: "on"
max_connections: "400"
max_parallel_workers: "32"
max_replication_slots: "32"
max_worker_processes: "32"
password_encryption: scram-sha-256
pg_failover_slots.drop_extra_slots: "on"
pg_failover_slots.synchronize_slot_names: name_like:%
pg_failover_slots.worker_nap_time: "60000"
pg_stat_statements.max: "10000"
pg_stat_statements.track: all
pg_stat_statements.track_utility: "off"
shared_memory_type: mmap
shared_preload_libraries: ""
ssl_max_protocol_version: TLSv1.3
ssl_min_protocol_version: TLSv1.3
wal_keep_size: 512MB
wal_level: logical
wal_log_hints: "on"
wal_receiver_timeout: 5s
wal_sender_timeout: 5s
pg_hba:
- hostssl support streaming_replica all cert
syncReplicaElectionConstraint:
enabled: true
nodeLabelsAntiAffinity:
- host.net/rack
primaryUpdateMethod: switchover
primaryUpdateStrategy: unsupervised
probes:
liveness:
isolationCheck:
connectionTimeout: 1000
enabled: true
requestTimeout: 1000
replicationSlots:
highAvailability:
enabled: true
slotPrefix: _cnpg_
synchronizeReplicas:
enabled: true
updateInterval: 30
resources:
limits:
cpu: "2"
memory: 7Gi
requests:
cpu: 1800m
memory: 7Gi
smartShutdownTimeout: 180
startDelay: 3600
stopDelay: 1800
storage:
resizeInUseVolumes: true
size: 10Gi
storageClass: CLASS
switchoverDelay: 3600
status:
availableArchitectures:
- goArch: amd64
hash: 0ac9a3dc1e7e0122ae5a89f03626ad5cbf3ba637a1232a6f95576d4801a043d9
certificates:
clientCASecret: CA
expirations:
some1: 2025-10-26 09:37:51 +0000 UTC
some2: 2025-08-27 09:32:56 +0000 UTC
some3: 2025-08-27 09:32:56 +0000 UTC
some4: 2029-05-02 08:00:00 +0000 UTC
replicationTLSSecret: secret
serverAltDNSNames:
- postgresql-cluster-rw
- postgresql-cluster-rw.NAMESPACE
- postgresql-cluster-rw.NAMESPACE.svc
- postgresql-cluster-rw.NAMESPACE.svc.cluster.local
- postgresql-cluster-r
- postgresql-cluster-r.NAMESPACE
- postgresql-cluster-r.NAMESPACE.svc
- postgresql-cluster-r.NAMESPACE.svc.cluster.local
- postgresql-cluster-ro
- postgresql-cluster-ro.NAMESPACE
- postgresql-cluster-ro.NAMESPACE.svc
- postgresql-cluster-ro.NAMESPACE.svc.cluster.local
serverCASecret: SECRET
serverTLSSecret: SECRET
cloudNativePGCommitHash: 1dc9a2909
cloudNativePGOperatorHash: 0ac9a3dc1e7e0122ae5a89f03626ad5cbf3ba637a1232a6f95576d4801a043d9
conditions:
- lastTransitionTime: "2025-08-14T12:44:47Z"
message: Cluster Is Not Ready
reason: ClusterIsNotReady
status: "False"
type: Ready
- lastTransitionTime: "2025-08-08T13:54:53Z"
message: Continuous archiving is working
reason: ContinuousArchivingSuccess
status: "True"
type: ContinuousArchiving
- lastTransitionTime: "2025-08-14T22:02:56Z"
message: Backup was successful
reason: LastBackupSucceeded
status: "True"
type: LastBackupSucceeded
- lastTransitionTime: "2025-08-14T12:44:42Z"
message: A single, unique system ID was found across reporting instances.
reason: Unique
status: "True"
type: ConsistentSystemID
configMapResourceVersion:
metrics:
cnpg-default-monitoring: "411429773"
currentPrimary: postgresql-cluster-1
currentPrimaryTimestamp: "2025-08-08T13:54:52.003603Z"
firstRecoverabilityPoint: "2025-08-07T22:02:24Z"
firstRecoverabilityPointByMethod:
barmanObjectStore: "2025-08-07T22:02:24Z"
healthyPVC:
- postgresql-cluster-1
- postgresql-cluster-2
- postgresql-cluster-3
image: PG16_IMG
instanceNames:
- postgresql-cluster-1
- postgresql-cluster-2
- postgresql-cluster-3
instances: 3
instancesReportedState:
postgresql-cluster-1:
ip: IP1
isPrimary: true
timeLineID: 30
postgresql-cluster-2:
ip: IP2
isPrimary: false
timeLineID: 30
postgresql-cluster-3:
ip: IP3
isPrimary: false
timeLineID: 30
instancesStatus:
healthy:
- postgresql-cluster-1
- postgresql-cluster-2
- postgresql-cluster-3
lastSuccessfulBackup: "2025-08-14T22:02:54Z"
lastSuccessfulBackupByMethod:
barmanObjectStore: "2025-08-14T22:02:54Z"
latestGeneratedNode: 3
managedRolesStatus: {}
pgDataImageInfo:
image: PG16_IMG
majorVersion: 16
phase: Waiting for the instances to become active
phaseReason: Some instances are not yet active. Please wait.
poolerIntegrations:
pgBouncerIntegration:
secrets:
- postgresql-cluster-pooler
pvcCount: 3
readService: postgresql-cluster-r
readyInstances: 3
secretsResourceVersion:
applicationSecretVersion: "366149351"
barmanEndpointCA: "382525586"
clientCaSecretVersion: "366149447"
replicationSecretVersion: "366149448"
serverCaSecretVersion: "382525586"
serverSecretVersion: "416429560"
switchReplicaClusterStatus: {}
systemID: "7412960607476973597"
targetPrimary: postgresql-cluster-1
targetPrimaryTimestamp: "2025-08-08T13:54:47.926682Z"
timelineID: 30
topology:
instances:
postgresql-cluster-1:
host.dev/rack: rack1
postgresql-cluster-2:
host.dev/rack: rack2
postgresql-cluster-3:
host.dev/rack: rack3
nodesUsed: 3
successfullyExtracted: true
writeService: postgresql-cluster-rwRelevant log output
I found the following useful operator log (level=trace):
2025-08-15 15:32:33.784
msg=try getting connection
2025-08-15 15:32:33.784
msg=Reconciliation loop start
2025-08-15 15:32:33.784
msg=Reconciliation loop end
2025-08-15 15:32:33.784
msg=Released logical plugin connection
2025-08-15 15:32:33.784
msg=Waiting for all Pods to have the same PostgreSQL configuration
2025-08-15 15:32:33.784
msg=haven't found any instance to create
2025-08-15 15:32:33.784
msg=Skipping reconciliation, no changes to be done
2025-08-15 15:32:33.784
msg=Skipping reconciliation, no changes to be done
2025-08-15 15:32:33.784
msg=Skipping cluster annotations reconciliation, because they are already present on pod
2025-08-15 15:32:33.784
msg=Skipping cluster label reconciliation, because they are already present on pod
2025-08-15 15:32:33.784
msg=Skipping cluster annotations reconciliation, because they are already present on pod
2025-08-15 15:32:33.784
msg=Skipping cluster label reconciliation, because they are already present on pod
2025-08-15 15:32:33.784
msg=Skipping reconciliation, no changes to be done
2025-08-15 15:32:33.784
msg=Skipping reconciliation, no changes to be done
2025-08-15 15:32:33.784
msg=Skipping reconciliation, no changes to be done
2025-08-15 15:32:33.784
msg=Skipping reconciliation, no changes to be done
2025-08-15 15:32:33.784
msg=Skipping reconciliation, no changes to be done
2025-08-15 15:32:33.784
msg=Skipping reconciliation, no changes to be done
2025-08-15 15:32:33.494
msg=correctly loaded the plugin client
2025-08-15 15:32:33.459
msg=correctly loaded the plugin client
2025-08-15 15:32:33.292
msg=Acquired logical plugin connectionCode of Conduct
- I agree to follow this project's Code of Conduct
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bug 🐛Something isn't workingSomething isn't working
Type
Projects
Status
Done