Skip to content

fix(pooler): clear stale TLS status fields when configuration is removed#9397

Merged
leonardoce merged 3 commits intomainfrom
dev/9392
Dec 15, 2025
Merged

fix(pooler): clear stale TLS status fields when configuration is removed#9397
leonardoce merged 3 commits intomainfrom
dev/9392

Conversation

@mnencia
Copy link
Member

@mnencia mnencia commented Dec 10, 2025

fix(pooler): clear stale TLS status fields when configuration is removed

The pooler status update logic only set status fields when resources were
present, but never cleared them when absent. This caused stale TLS
configuration (ServerCA, ClientCA, ClientTLS, ServerTLS) to persist in
status even after being removed from the spec, breaking pooler connectivity.

This particularly impacts users upgrading to v1.28 from any prior version.
Before v1.28, the ServerTLS status field was always set to the cluster's
server certificate. In v1.28, this field was repurposed for optional manual
TLS authentication to PostgreSQL. On upgrade, the stale pre-v1.28 value
persists, causing pgbouncer to use the wrong certificate and fail with
"ssl/tls alert unsupported certificate" errors. This breaks all application
connectivity through the pooler.

The same issue affects any user who configures custom TLS settings and later
removes them - pgbouncer continues using the stale certificates until manually
intervened.

This fix adds explicit clearing of optional TLS status fields when the
corresponding resources are not configured, ensuring status accurately reflects
the current specification.

Closes #9392

@mnencia mnencia requested review from a team, NiccoloFei, jsilvela and litaocdl as code owners December 10, 2025 11:20
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Dec 10, 2025
@cnpg-bot cnpg-bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.25 release-1.26 release-1.27 labels Dec 10, 2025
@github-actions
Copy link
Contributor

❗ By default, the pull request is configured to backport to all release branches.

  • To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
  • To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

@dosubot dosubot bot added bug 🐛 Something isn't working ok to merge 👌 This PR can be merged labels Dec 10, 2025
@mnencia
Copy link
Member Author

mnencia commented Dec 10, 2025

/test

@github-actions
Copy link
Contributor

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/20097122053

@armru
Copy link
Member

armru commented Dec 10, 2025

/test

@github-actions
Copy link
Contributor

@armru, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/20101357278

Copilot AI review requested due to automatic review settings December 10, 2025 15:40
@mnencia
Copy link
Member Author

mnencia commented Dec 10, 2025

/test limit=local ft=upgrade tl=5

@github-actions
Copy link
Contributor

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/20104344326

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical issue where stale ServerTLS status from v1.27 causes pgbouncer to use incorrect certificates after upgrading to v1.28, resulting in "unsupported certificate" errors. In v1.27, ServerTLS was always set to the cluster's server certificate, but in v1.28 it was repurposed for optional manual TLS authentication to PostgreSQL.

  • Clears stale ServerTLS (and other TLS/CA secret) status fields when the corresponding secrets are not configured
  • Adds comprehensive unit test for the v1.27 to v1.28 migration scenario
  • Adds E2E test to verify pooler connectivity after operator upgrade
  • Fixes test helper to use actual database name instead of hardcoded "app"

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
internal/controller/pooler_status.go Adds else clauses to clear ServerCA, ClientCA, ClientTLS, and ServerTLS status when secrets are nil, with clear migration comments for ServerTLS
internal/controller/pooler_status_test.go Adds unit test simulating v1.27 pooler with stale ServerTLS status being cleared on reconciliation
tests/e2e/upgrade_test.go Adds verification that pooler connections work after operator upgrade
tests/e2e/asserts_test.go Fixes test helper to dynamically retrieve database name from cluster config instead of hardcoding "app"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mnencia
Copy link
Member Author

mnencia commented Dec 10, 2025

/test

@github-actions
Copy link
Contributor

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/20105931475

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 11, 2025
@mnencia mnencia force-pushed the dev/9392 branch 2 times, most recently from daa14db to f168f72 Compare December 14, 2025 22:10
@mnencia mnencia changed the title fix(pooler): clear stale ServerTLS status on upgrade from v1.27 fix(pooler): clear stale TLS status fields when configuration is removed Dec 14, 2025
mnencia and others added 3 commits December 15, 2025 10:04
In v1.27, ServerTLS status field was always set to the cluster's
server certificate. In v1.28, this field was repurposed for optional
manual TLS authentication to PostgreSQL.

On upgrade, the stale value causes pgbouncer to use the wrong
certificate, resulting in "unsupported certificate" errors.

This fix clears the field when manual TLS is not configured.

Closes #9392

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
The test was hardcoded to use "app" as database name, but clusters
can be configured with different names (e.g., "appdb" in upgrade tests).

Now retrieves the database name from cluster configuration.

Closes #9392

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
@leonardoce leonardoce merged commit 0ffc5bf into main Dec 15, 2025
35 of 38 checks passed
@leonardoce leonardoce deleted the dev/9392 branch December 15, 2025 09:18
cnpg-bot pushed a commit that referenced this pull request Dec 15, 2025
…ved (#9397)

The pooler status update logic only set status fields when resources
were present, but never cleared them when absent. This caused stale TLS
configuration (ServerCA, ClientCA, ClientTLS, ServerTLS) to persist in
status even after being removed from the spec, breaking pooler
connectivity.

This particularly impacts users upgrading to v1.28 from any prior
version.
Before v1.28, the ServerTLS status field was always set to the cluster's
server certificate. In v1.28, this field was repurposed for optional
manual
TLS authentication to PostgreSQL. On upgrade, the stale pre-v1.28 value
persists, causing pgbouncer to use the wrong certificate and fail with
"ssl/tls alert unsupported certificate" errors. This breaks all
application
connectivity through the pooler.

The same issue affects any user who configures custom TLS settings and
later removes them - pgbouncer continues using the stale certificates until
manually intervened.

This fix adds explicit clearing of optional TLS status fields when the
corresponding resources are not configured, ensuring status accurately
reflects
the current specification.

Closes #9392

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
(cherry picked from commit 0ffc5bf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-requested ◀️ This pull request should be backported to all supported releases bug 🐛 Something isn't working lgtm This PR has been approved by a maintainer ok to merge 👌 This PR can be merged release-1.28 size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Pooler connection refused when upgrading to 1.28.0

6 participants