Skip to content

fix(metrics): skip pg_stat_archiver metrics on non-archiving standbys#9411

Merged
gbartolini merged 1 commit intomainfrom
dev/9101
Dec 16, 2025
Merged

fix(metrics): skip pg_stat_archiver metrics on non-archiving standbys#9411
gbartolini merged 1 commit intomainfrom
dev/9101

Conversation

@armru
Copy link
Member

@armru armru commented Dec 12, 2025

After a PostgreSQL instance is demoted from primary to standby, the pg_stat_archiver view retains stale data from when the instance was archiving WAL files. This causes misleading metrics and false alerts (e.g., "last archive age > 7 minutes") on standby nodes that do not actively archive WAL.

This fix adds a predicate_query to the pg_stat_archiver metrics collection that only runs the query on instances that actually perform WAL archiving:

  • Primary instances (not in recovery)
  • Designated primary replicas (archive_mode='always' in replica clusters)

Regular standby nodes will no longer export pg_stat_archiver metrics, preventing stale data from triggering false alerts.

Fixes #9101

Copilot AI review requested due to automatic review settings December 12, 2025 13:09
@armru armru requested a review from a team as a code owner December 12, 2025 13:09
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Dec 12, 2025
@cnpg-bot cnpg-bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.25 release-1.27 release-1.28 labels Dec 12, 2025
@github-actions
Copy link
Contributor

❗ By default, the pull request is configured to backport to all release branches.

  • To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
  • To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

@dosubot dosubot bot added the bug 🐛 Something isn't working label Dec 12, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an issue where PostgreSQL instances demoted from primary to standby retain stale pg_stat_archiver metrics, leading to false alerts on non-archiving standby nodes.

Key Changes:

  • Added a predicate_query to pg_stat_archiver metrics collection that ensures metrics are only gathered from instances actively performing WAL archiving
  • The predicate correctly identifies primary instances and designated primary replicas (with archive_mode='always')

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 12, 2025
After a PostgreSQL instance is demoted from primary to standby, the
pg_stat_archiver view retains stale data from when the instance was
archiving WAL files. This causes misleading metrics and false alerts
(e.g., "last archive age > 7 minutes") on standby nodes that do not
actively archive WAL.

This fix adds a predicate_query to the pg_stat_archiver metrics
collection that only runs the query on instances that actually perform
WAL archiving:
- Primary instances (not in recovery)
- Designated primary replicas (archive_mode='always' in replica clusters)

Regular standby nodes will no longer export pg_stat_archiver metrics,
preventing stale data from triggering false alerts.

Fixes #9101

Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
@gbartolini gbartolini merged commit e2d6bd0 into main Dec 16, 2025
20 of 24 checks passed
@gbartolini gbartolini deleted the dev/9101 branch December 16, 2025 15:36
cnpg-bot pushed a commit that referenced this pull request Dec 16, 2025
…ys (#9411)

After a PostgreSQL instance is demoted from primary to standby, the
`pg_stat_archiver` view retains stale data from when the instance was
archiving WAL files. This causes misleading metrics and false alerts
(e.g., "last archive age > 7 minutes") on standby nodes that do not
actively archive WAL.

This fix adds a predicate_query to the `pg_stat_archiver` metrics
collection that only runs the query on instances that actually perform
WAL archiving:

- Primary instances (not in recovery)
- Designated primary replicas (archive_mode='always' in replica
   clusters)

Regular standby nodes will no longer export `pg_stat_archiver` metrics,
preventing stale data from triggering false alerts.

Fixes #9101

Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
(cherry picked from commit e2d6bd0)
cnpg-bot pushed a commit that referenced this pull request Dec 16, 2025
…ys (#9411)

After a PostgreSQL instance is demoted from primary to standby, the
`pg_stat_archiver` view retains stale data from when the instance was
archiving WAL files. This causes misleading metrics and false alerts
(e.g., "last archive age > 7 minutes") on standby nodes that do not
actively archive WAL.

This fix adds a predicate_query to the `pg_stat_archiver` metrics
collection that only runs the query on instances that actually perform
WAL archiving:

- Primary instances (not in recovery)
- Designated primary replicas (archive_mode='always' in replica
   clusters)

Regular standby nodes will no longer export `pg_stat_archiver` metrics,
preventing stale data from triggering false alerts.

Fixes #9101

Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
(cherry picked from commit e2d6bd0)
cnpg-bot pushed a commit that referenced this pull request Dec 16, 2025
…ys (#9411)

After a PostgreSQL instance is demoted from primary to standby, the
`pg_stat_archiver` view retains stale data from when the instance was
archiving WAL files. This causes misleading metrics and false alerts
(e.g., "last archive age > 7 minutes") on standby nodes that do not
actively archive WAL.

This fix adds a predicate_query to the `pg_stat_archiver` metrics
collection that only runs the query on instances that actually perform
WAL archiving:

- Primary instances (not in recovery)
- Designated primary replicas (archive_mode='always' in replica
   clusters)

Regular standby nodes will no longer export `pg_stat_archiver` metrics,
preventing stale data from triggering false alerts.

Fixes #9101

Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
(cherry picked from commit e2d6bd0)
@leonardoce
Copy link
Contributor

Should we decide to loosen the requirement, I implemented #9475 to align the E2e tests.

mnencia pushed a commit that referenced this pull request Jan 20, 2026
…ys (#9411)

After a PostgreSQL instance is demoted from primary to standby, the
`pg_stat_archiver` view retains stale data from when the instance was
archiving WAL files. This causes misleading metrics and false alerts
(e.g., "last archive age > 7 minutes") on standby nodes that do not
actively archive WAL.

This fix adds a predicate_query to the `pg_stat_archiver` metrics
collection that only runs the query on instances that actually perform
WAL archiving:

- Primary instances (not in recovery)
- Designated primary replicas (archive_mode='always' in replica
   clusters)

Regular standby nodes will no longer export `pg_stat_archiver` metrics,
preventing stale data from triggering false alerts.

Fixes #9101

Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
(cherry picked from commit e2d6bd0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-requested ◀️ This pull request should be backported to all supported releases bug 🐛 Something isn't working lgtm This PR has been approved by a maintainer release-1.25 release-1.27 release-1.28 size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]:pg_stat_archiver metrics not reset after demotion (former primary still reports stale archive age)

7 participants