[Bug]: Former master cannot perform pg_rewind after failover if plugin is used for WAL archiving

### Is there an existing issue already for this bug?

- [x] I have searched for an existing issue, and could not find anything. I believe this is a new bug.

### I have read the troubleshooting guide

- [x] I have read the troubleshooting guide and I think this is a new bug.

### I am running a supported version of CloudNativePG

- [x] I have read the troubleshooting guide and I think this is a new bug.

### Contact Details

_No response_

### Version

1.27 (latest patch)

### What version of Kubernetes are you using?

1.34

### What is your Kubernetes environment?

Self-managed: kind (evaluation)

### How did you install the operator?

YAML manifest

### What happened?

Hi! 
While testing operator behavior on sudden network partitioning I've encountered situation when former primary fails to perform `pg_rewind` after network connection is returned, so that former primary is unable to catch up and become replica.

That only occurs when plugins are used for WAL archiving/restoration.

Precise steps for reproduction:

1. Create 3-node cluster with synchronous replication in KinD
2. Create pgbench deployment to create constant load on cluster
3. Disconnect kind node with current primary (ex. `docker network disconnect kind cnpg-worker3` )
4. Wait until failover is performed
5. Connect kind node with former primary back (ex. `docker network connect kind cnpg-worker3`)
6. See that former primary fails to perform pg_rewind with `--restore-target-wal` enabled due to unable to fetch WAL files

`pg_rewind` calls `manager wal-restore` with FULL path to target file:
```
/controller/manager wal-restore --log-destination /controller/log/postgres.json 00000001000000010000005F /var/lib/postgresql/data/pgdata/pg_wal/00000001000000010000005F
```

which we do not handle correctly at https://github.com/cloudnative-pg/cloudnative-pg/blob/9daa681325034a6b6548e347bbc4c0b54ea668db/internal/cmd/manager/walrestore/cmd.go#L126

making GPRC-call to plugin with duplicating PGDATA part of restore filepath:
```
/var/lib/postgresql/data/pgdata/var/lib/postgresql/data/pgdata/pg_wal/00000001000000010000005F
```


### Cluster resource

```shell
Cluster:


apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cluster-example
spec:
  instances: 3
  plugins:
  - name: cnpg-extensions.yandex.cloud      # Should always be "cnpg-extensions.yandex.cloud" for wal-g plugin
    isWALArchiver: true                     # Set to true to enable continuous WAL archiving
    parameters:
      backupConfig: example-backup-config   # Name of BackupConfig used to create backups
  postgresql:
    parameters:
      max_connections: "250"
      shared_buffers: 4GB
      effective_cache_size: 12GB
      maintenance_work_mem: 1GB
      checkpoint_completion_target: "0.9"
      wal_buffers: 16MB
      default_statistics_target: "100"
      random_page_cost: "1.1"
      effective_io_concurrency: "200"
      work_mem: 16513kB
      huge_pages: "off"
      min_wal_size: 1GB
      max_wal_size: 4GB
      max_worker_processes: "4"
      max_parallel_workers_per_gather: "2"
      max_parallel_workers: "4"
      max_parallel_maintenance_workers: "2"
    synchronous:
      method: any
      number: 1
      dataDurability: required
  storage:
    size: 1Gi


PGBench deployment:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: pgbench-runner-cnpg
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pgbench-runner-cnpg
  template:
    metadata:
      labels:
        app: pgbench-runner-cnpg
    spec:
      restartPolicy: Always  # important to avoid infinite restart of completed pods
      containers:
        - name: pgbench
          image: postgres:17
          resources:
            requests:
              cpu: "10m"
              memory: "128Mi"
            limits: {}
          env:
            - name: PGHOST
              valueFrom:
                secretKeyRef:
                  name: cluster-example-app
                  key: host
            - name: PGPORT
              valueFrom:
                secretKeyRef:
                  name: cluster-example-app
                  key: port
            - name: PGUSER
              valueFrom:
                secretKeyRef:
                  name: cluster-example-app
                  key: user
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: cluster-example-app
                  key: password
            - name: PGDATABASE
              valueFrom:
                secretKeyRef:
                  name: cluster-example-app
                  key: dbname
          command: ["sh", "-c"]
          args:
            - |
              echo "Initializing benchmark database if needed...";
              pgbench -i -s 60 -d $PGDATABASE || true;
              echo "Running pgbench...";
              pgbench -j 6 -c 60 -b tpcb-like --time 600 -d $PGDATABASE
```

### Relevant log output

```shell
Postgres instance logs:


{"level":"info","ts":"2025-11-05T15:32:38.4179151Z","msg":"Starting up pg_rewind","logger":"instance-manager","logging_pod":"cluster-separate-wal-storage-1","controller":"instance-cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"cluster-separate-wal-storage","namespace":"default"},"namespace":"default","name":"cluster-separate-wal-storage","reconcileID":"9063eeb8-8b27-45ec-a1cf-45305a5d63ff","logging_pod":"cluster-separate-wal-storage-1","pgdata":"/var/lib/postgresql/data/pgdata","options":["-P","--source-server","host=cluster-separate-wal-storage-rw user=streaming_replica port=5432 sslkey=/controller/certificates/streaming_replica.key sslcert=/controller/certificates/streaming_replica.crt sslrootcert=/controller/certificates/server-ca.crt application_name=cluster-separate-wal-storage-1 sslmode=verify-ca dbname=postgres","--target-pgdata","/var/lib/postgresql/data/pgdata","--restore-target-wal"]}

{"level":"info","ts":"2025-11-05T15:32:38.46084317Z","logger":"pg_rewind","msg":"pg_rewind: connected to server","pipe":"stderr","logging_pod":"cluster-separate-wal-storage-1"}

{"level":"info","ts":"2025-11-05T15:32:38.463751188Z","logger":"pg_rewind","msg":"pg_rewind: servers diverged at WAL location 1/7F1322D0 on timeline 1","pipe":"stderr","logging_pod":"cluster-separate-wal-storage-1"}

{"level":"info","ts":"2025-11-05T15:32:39.396083475Z","logger":"pg_rewind","msg":"pg_rewind: error: could not restore file \"00000001000000010000005F\" from archive","pipe":"stderr","logging_pod":"cluster-separate-wal-storage-1"}

{"level":"info","ts":"2025-11-05T15:32:39.396124895Z","logger":"pg_rewind","msg":"pg_rewind: error: could not find previous WAL record at 1/5FFFE728","pipe":"stderr","logging_pod":"cluster-separate-wal-storage-1"}

{"level":"error","ts":"2025-11-05T15:32:39.398028104Z","msg":"Failed to execute pg_rewind","logger":"instance-manager","logging_pod":"cluster-separate-wal-storage-1","controller":"instance-cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"cluster-separate-wal-storage","namespace":"default"},"namespace":"default","name":"cluster-separate-wal-storage","reconcileID":"9063eeb8-8b27-45ec-a1cf-45305a5d63ff","logging_pod":"cluster-separate-wal-storage-1","options":["-P","--source-server","host=cluster-separate-wal-storage-rw user=streaming_replica port=5432 sslkey=/controller/certificates/streaming_replica.key sslcert=/controller/certificates/streaming_replica.crt sslrootcert=/controller/certificates/server-ca.crt application_name=cluster-separate-wal-storage-1 sslmode=verify-ca dbname=postgres","--target-pgdata","/var/lib/postgresql/data/pgdata","--restore-target-wal"],"error":"exit status 1","stacktrace":"github.com/cloudnative-pg/machinery/pkg/log.(*logger).Error\n\tpkg/mod/github.com/cloudnative-pg/machinery@v0.3.1/pkg/log/log.go:125\ngithub.com/cloudnative-pg/cloudnative-pg/pkg/management/postgres.(*Instance).Rewind\n\tpkg/management/postgres/instance.go:1165\ngithub.com/cloudnative-pg/cloudnative-pg/internal/management/controller.(*InstanceReconciler).verifyPgDataCoherenceForPrimary\n\tinternal/management/controller/instance_startup.go:140\ngithub.com/cloudnative-pg/cloudnative-pg/internal/management/controller.(*InstanceReconciler).initialize\n\tinternal/management/controller/instance_controller.go:457\ngithub.com/cloudnative-pg/cloudnative-pg/internal/management/controller.(*InstanceReconciler).Reconcile\n\tinternal/management/controller/instance_controller.go:193\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:340\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:300\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller...[]).Start.func2.1\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:202"}



Plugin logs (using cnpg-plugin-wal-g):

{"level":"error","ts":"2025-11-05T15:32:39Z","logger":"plugin_wal","msg":"Failed to run wal-g fetch 00000001000000010000005F /var/lib/postgresql/data/pgdata/var/lib/postgresql/data/pgdata/pg_wal/00000001000000010000005F","method":"Restore","stdout":"","stderr":"ERROR: 2025/11/05 15:32:39.100804 open /var/lib/postgresql/data/pgdata/var/lib/postgresql/data/pgdata/pg_wal/00000001000000010000005F: no such file or directory\n","error":"subprocess run cmd result exit status 1"}
```

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Former master cannot perform pg_rewind after failover if plugin is used for WAL archiving #9067

Is there an existing issue already for this bug?

I have read the troubleshooting guide

I am running a supported version of CloudNativePG

Contact Details

Version

What version of Kubernetes are you using?

What is your Kubernetes environment?

How did you install the operator?

What happened?

Cluster resource

Relevant log output

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Former master cannot perform pg_rewind after failover if plugin is used for WAL archiving #9067

Description

Is there an existing issue already for this bug?

I have read the troubleshooting guide

I am running a supported version of CloudNativePG

Contact Details

Version

What version of Kubernetes are you using?

What is your Kubernetes environment?

How did you install the operator?

What happened?

Cluster resource

Relevant log output

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions