fix(pooler): clear stale TLS status fields when configuration is removed#9397
fix(pooler): clear stale TLS status fields when configuration is removed#9397leonardoce merged 3 commits intomainfrom
Conversation
|
❗ By default, the pull request is configured to backport to all release branches.
|
|
/test |
|
@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/20097122053 |
|
/test |
|
@armru, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/20101357278 |
|
/test limit=local ft=upgrade tl=5 |
|
@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/20104344326 |
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical issue where stale ServerTLS status from v1.27 causes pgbouncer to use incorrect certificates after upgrading to v1.28, resulting in "unsupported certificate" errors. In v1.27, ServerTLS was always set to the cluster's server certificate, but in v1.28 it was repurposed for optional manual TLS authentication to PostgreSQL.
- Clears stale ServerTLS (and other TLS/CA secret) status fields when the corresponding secrets are not configured
- Adds comprehensive unit test for the v1.27 to v1.28 migration scenario
- Adds E2E test to verify pooler connectivity after operator upgrade
- Fixes test helper to use actual database name instead of hardcoded "app"
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| internal/controller/pooler_status.go | Adds else clauses to clear ServerCA, ClientCA, ClientTLS, and ServerTLS status when secrets are nil, with clear migration comments for ServerTLS |
| internal/controller/pooler_status_test.go | Adds unit test simulating v1.27 pooler with stale ServerTLS status being cleared on reconciliation |
| tests/e2e/upgrade_test.go | Adds verification that pooler connections work after operator upgrade |
| tests/e2e/asserts_test.go | Fixes test helper to dynamically retrieve database name from cluster config instead of hardcoding "app" |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/test |
|
@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/20105931475 |
daa14db to
f168f72
Compare
In v1.27, ServerTLS status field was always set to the cluster's server certificate. In v1.28, this field was repurposed for optional manual TLS authentication to PostgreSQL. On upgrade, the stale value causes pgbouncer to use the wrong certificate, resulting in "unsupported certificate" errors. This fix clears the field when manual TLS is not configured. Closes #9392 Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
The test was hardcoded to use "app" as database name, but clusters can be configured with different names (e.g., "appdb" in upgrade tests). Now retrieves the database name from cluster configuration. Closes #9392 Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
…ved (#9397) The pooler status update logic only set status fields when resources were present, but never cleared them when absent. This caused stale TLS configuration (ServerCA, ClientCA, ClientTLS, ServerTLS) to persist in status even after being removed from the spec, breaking pooler connectivity. This particularly impacts users upgrading to v1.28 from any prior version. Before v1.28, the ServerTLS status field was always set to the cluster's server certificate. In v1.28, this field was repurposed for optional manual TLS authentication to PostgreSQL. On upgrade, the stale pre-v1.28 value persists, causing pgbouncer to use the wrong certificate and fail with "ssl/tls alert unsupported certificate" errors. This breaks all application connectivity through the pooler. The same issue affects any user who configures custom TLS settings and later removes them - pgbouncer continues using the stale certificates until manually intervened. This fix adds explicit clearing of optional TLS status fields when the corresponding resources are not configured, ensuring status accurately reflects the current specification. Closes #9392 Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com> (cherry picked from commit 0ffc5bf)
fix(pooler): clear stale TLS status fields when configuration is removed
The pooler status update logic only set status fields when resources were
present, but never cleared them when absent. This caused stale TLS
configuration (ServerCA, ClientCA, ClientTLS, ServerTLS) to persist in
status even after being removed from the spec, breaking pooler connectivity.
This particularly impacts users upgrading to v1.28 from any prior version.
Before v1.28, the ServerTLS status field was always set to the cluster's
server certificate. In v1.28, this field was repurposed for optional manual
TLS authentication to PostgreSQL. On upgrade, the stale pre-v1.28 value
persists, causing pgbouncer to use the wrong certificate and fail with
"ssl/tls alert unsupported certificate" errors. This breaks all application
connectivity through the pooler.
The same issue affects any user who configures custom TLS settings and later
removes them - pgbouncer continues using the stale certificates until manually
intervened.
This fix adds explicit clearing of optional TLS status fields when the
corresponding resources are not configured, ensuring status accurately reflects
the current specification.
Closes #9392