Skip to content

Exporting replication_target_status metric to use in prometheus alert#9385

Merged
aayushchouhan09 merged 1 commit intonoobaa:masterfrom
aayushchouhan09:targ-alert
Jan 23, 2026
Merged

Exporting replication_target_status metric to use in prometheus alert#9385
aayushchouhan09 merged 1 commit intonoobaa:masterfrom
aayushchouhan09:targ-alert

Conversation

@aayushchouhan09
Copy link
Member

@aayushchouhan09 aayushchouhan09 commented Jan 20, 2026

Describe the Problem

Currently, we don't have a metric which shows the status of target bucket reachability.

Explain the Changes

  1. We have exported a metrics NooBaa_replication_target_status which can be used by Prometheus alert.

Issues: Fixed #xxx / Gap #xxx

  1. JIRA: https://issues.redhat.com/browse/RHSTOR-8110
  2. Operator PR: Added a prometheus alert rule for unreachable target noobaa-operator#1778

Testing Instructions:

  1. Check operator PR.
  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • New Features

    • Added monitoring for replication target reachability, exposing a way to update target status so destinations are tracked as reachable or unreachable.
  • Bug Fixes

    • Replication scanning and sync flows now update target reachability on success and on errors, improving status accuracy and reliability.
    • Added a utility to programmatically set replication target reachability.

✏️ Tip: You can customize this high-level summary in your review settings.

@aayushchouhan09 aayushchouhan09 requested review from a team, alphaprinz, liranmauda, naveenpaul1 and tangledbytes and removed request for a team January 20, 2026 06:16
@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

📝 Walkthrough

Walkthrough

Adds a Prometheus Gauge replication_target_status and a public setter; introduces a utility to update it; and integrates explicit reachable/unreachable updates and error handling into replication scanner flows.

Changes

Cohort / File(s) Summary
Metric Definition
src/server/analytic_services/prometheus_reports/noobaa_core_report.js
Added replication_target_status Gauge with source_bucket and target_bucket labels and set_replication_target_status(source, target, is_reachable) method.
Replication Status Utility
src/server/utils/replication_utils.js
Added update_replication_target_status(source_bucket, target_bucket, is_reachable) to unwrap bucket names and call the new metric setter; exported the function.
Replication Scanners
src/server/bg_services/replication_scanner.js, src/server/bg_services/log_replication_scanner.js
On missing destination bucket or errors during bucket-diff / head-objects processing, mark target unreachable; on successful diff/processing mark reachable; added try-catch paths and explicit status updates.

Sequence Diagram(s)

sequenceDiagram
    participant Scanner as Replication Scanner
    participant Utils as Replication Utils
    participant Report as NooBaa Core Report
    participant Prometheus as Prometheus

    Scanner->>Scanner: Resolve source/destination buckets
    alt Destination bucket missing
        Scanner->>Utils: update_replication_target_status(source, dest_id, false)
        Utils->>Report: set_replication_target_status(source, dest_id, 0)
        Report->>Prometheus: Gauge.set(0)
    else Perform diff / head-objects
        Scanner->>Report: attempt bucket-diff / head-objects
        alt Ops succeed
            Scanner->>Utils: update_replication_target_status(source, dest, true)
            Utils->>Report: set_replication_target_status(source, dest, 1)
            Report->>Prometheus: Gauge.set(1)
        else Ops fail
            Scanner->>Utils: update_replication_target_status(source, dest, false)
            Utils->>Report: set_replication_target_status(source, dest, 0)
            Report->>Prometheus: Gauge.set(0)
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

size/M

Suggested reviewers

  • liranmauda
  • tangledbytes
  • naveenpaul1
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: exporting a new replication_target_status metric for Prometheus alerts, which is clearly reflected in all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@alphaprinz alphaprinz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment about a swallowed exception, otherwise LGTM.

Signed-off-by: Aayush Chouhan <achouhan@redhat.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/server/bg_services/log_replication_scanner.js`:
- Around line 168-194: The code currently marks the target reachable after
iterating candidates regardless of whether bucketDiff.get_buckets_diff was ever
called; introduce a boolean flag (e.g., diffTried or diffAttempted) before the
for-loop, set it to true right before/when bucketDiff.get_buckets_diff is
invoked inside the loop (the block that produces keys_diff_map and merges into
diff_keys), and only call
replication_utils.update_replication_target_status(src_bucket.name,
dst_bucket.name, true) after the loop if that flag is true; keep the existing
catch behavior (marking unreachable and rethrowing) unchanged so an exception
during get_buckets_diff still marks the target unreachable.

@aayushchouhan09 aayushchouhan09 merged commit b859495 into noobaa:master Jan 23, 2026
18 of 19 checks passed
@aayushchouhan09 aayushchouhan09 deleted the targ-alert branch January 23, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants