Added a CLI command to take a single DB backup by dannyzaken · Pull Request #1740 · noobaa/noobaa-operator

dannyzaken · 2025-12-02T07:23:51Z

Depends on #1737

Explain the changes

Added a CLI command to take a single on-demand DB backup
Accepting --name flag to pass a custom name. The default name is noobaa-db-pg-cluster-backup-[date and time]. e.g.: noobaa-db-pg-cluster-backup-20251201181523

Issues: Fixed #xxx / Gap #xxx

https://issues.redhat.com/browse/RHSTOR-8126

Testing Instructions:

Doc added/updated
Tests added

Summary by CodeRabbit

New Features
- CLI command to create CNPG backups as volume snapshots with progress and monitoring hints.
Improvements
- Recovery-aware bootstrap flow to initialize clusters from snapshots when recovery is configured.
- Scheduled backups now target standby and existing backups are cleaned up to avoid snapshot conflicts.
- Clearer lifecycle tracking and status transitions for creation vs. recovery.
Bug Fixes
- Backup objects now initialize their spec reliably so backups are created consistently.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-02T07:24:07Z

Walkthrough

Initializes CNPG Backup objects’ Spec; adds a CLI subcommand to create CNPG backups via VolumeSnapshots; and implements recovery-aware DB reconciler branching to bootstrap clusters from VolumeSnapshots, including snapshot listing and cleanup during recovery.

Changes

Cohort / File(s)	Summary
CNPG Backup Object Initialization `pkg/cnpg/cnpg.go`	`GetCnpgBackupObj` now sets `ObjectMeta` (`Name`, `Namespace`) and initializes `Spec` with an empty `BackupSpec` in the returned Backup object.
DB Reconciliation & Recovery Workflow `pkg/system/db_reconciler.go`	Adds recovery-aware bootstrap branching when `DBRecovery` is present (reconcileClusterRecovery), wires bootstrap.Recovery from snapshots, updates creation status handling to include `Recovering`, changes in-progress gating to exclude `Recovering`, adds scheduled-backup Target = Standby, cleans up backups during recovery, lists/filters snapshots and records `RecoveryStatus` metadata.
CLI Database Backup Subcommand `pkg/system/system.go`	Adds `CmdDBBackup()` and `RunDBBackup` to create CNPG `Backup` CRs targeting standby via `VolumeSnapshot`, validates system/config and snapshot class, generates backup names, creates the Backup CR, polls for the resulting `VolumeSnapshot`, and prints monitoring/kubectl hints.

Sequence Diagram(s)

sequenceDiagram
    participant Admin as Admin CLI
    participant System as system.RunDBBackup
    participant CNPG as CNPG Backup CR (controller)
    participant Reconciler as DB Reconciler
    participant VolumeSnap as VolumeSnapshot API
    participant NooBaa as NooBaa pods
    participant Status as Cluster Status

    Admin->>System: run db-backup
    System->>CNPG: create Backup CR (target: Standby, method: VolumeSnapshot)
    CNPG->>VolumeSnap: request/create VolumeSnapshot (async)
    VolumeSnap-->>CNPG: snapshot created / ready
    CNPG-->>System: Backup status updated with snapshot ref
    System->>Admin: poll/report snapshot readiness + kubectl tips

    Note over Reconciler,Status: Recovery-aware bootstrap flow
    Reconciler->>Status: detect DBRecovery present?
    alt Recovery configured
        Reconciler->>NooBaa: ensure pods stopped for restore
        NooBaa-->>Reconciler: pods stopped
        Reconciler->>VolumeSnap: list/filter snapshots (by cluster/age)
        Reconciler->>Reconciler: cleanup scheduled backups if conflicting
        Reconciler->>CNPG: set Bootstrap.Recovery (snapshot source)
        Reconciler->>Status: set RecoveryStatus (snapshot name/time, Pending)
    else Fresh init
        Reconciler->>CNPG: set Bootstrap.InitDB/import flow
        Reconciler->>Status: set Creating
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas needing extra attention:
- pkg/system/db_reconciler.go — recovery vs init branching, state transitions for RecoveryStatus, gating logic around Creating/Recovering.
- Snapshot listing/filtering and scheduled-backup cleanup to avoid accidental data loss or conflicting backups.
- pkg/system/system.go — polling/timeouts and VolumeSnapshot readiness detection.
- pkg/cnpg/cnpg.go — correctness of Backup CR initialization (metadata and empty Spec).

Possibly related PRs

Add DB backup logic to noobaa reconciliation #1726 — overlaps edits to CNPG backup constructors and reconciliation behavior.
Added logic for DB recovery from backup #1737 — related changes to CNPG backup construction and DB recovery workflows.

Suggested reviewers

jackyalbo
alphaprinz

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Added a CLI command to take a single DB backup' accurately summarizes the main change: introducing a new CLI subcommand for on-demand database backups.
Description check	✅ Passed	The description addresses the template sections with explanation of changes, issue reference, and testing checklist items, though testing instructions remain empty and doc/test checkboxes are unchecked.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

pkg/system/db_reconciler.go (1)
182-198: Unreachable fallback code for snapshot name.

At line 187, the condition r.NooBaa.Spec.DBSpec.DBRecovery != nil is always true in this branch because we're inside the else block that starts at line 165, which is only entered when r.NooBaa.Spec.DBSpec.DBRecovery != nil (the inverse of line 154's condition). Therefore, lines 189-192 are unreachable.

Consider simplifying:
-		snapshotName := ""
-		if r.NooBaa.Spec.DBSpec.DBRecovery != nil {
-			snapshotName = r.NooBaa.Spec.DBSpec.DBRecovery.VolumeSnapshotName
-		} else if r.NooBaa.Status.DBStatus.RecoveryStatus != nil {
-			// fall back to previously recorded snapshot name, if any
-			snapshotName = r.NooBaa.Status.DBStatus.RecoveryStatus.SnapshotName
-		}
+		snapshotName := r.NooBaa.Spec.DBSpec.DBRecovery.VolumeSnapshotName
 		r.NooBaa.Status.DBStatus.RecoveryStatus = &nbv1.DBRecoveryStatus{
 			Status:       nbv1.DBRecoveryStatusRunning,
 			SnapshotName: snapshotName,
 			RecoveryTime: &metav1.Time{Time: time.Now()},
 		}

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 078a893 and 53939e9.

📒 Files selected for processing (3)

pkg/cnpg/cnpg.go (1 hunks)
pkg/system/db_reconciler.go (10 hunks)
pkg/system/system.go (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

pkg/cnpg/cnpg.go (2)

pkg/hac/hac.go (1)

Name (22-22)

pkg/options/options.go (1)

Namespace (57-57)

pkg/system/db_reconciler.go (1)

pkg/cnpg/cnpg.go (2)

GetCnpgScheduledBackupObj (598-611)

GetCnpgBackupObj (613-626)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: run-operator-tests
GitHub Check: run-core-config-map-tests
GitHub Check: run-cli-tests
GitHub Check: cnpg-deployment-test
GitHub Check: run-kms-dev-test
GitHub Check: run-kms-key-rotate-test
GitHub Check: run-admission-test
GitHub Check: run-kms-tls-token-test
GitHub Check: run-kms-kmip-test
GitHub Check: run-azure-vault-test
GitHub Check: golangci-lint
GitHub Check: run-kms-tls-sa-test
GitHub Check: run-hac-test

🔇 Additional comments (8)

pkg/cnpg/cnpg.go (1)

613-626: LGTM!

The initialization of ObjectMeta with Name and Namespace and the empty BackupSpec{} aligns with how similar factory functions in this file are implemented (e.g., GetCnpgScheduledBackupObj, GetCnpgClusterObj). This ensures the backup object is properly initialized for downstream usage.

pkg/system/system.go (1)

124-134: LGTM!

The command definition follows the established patterns in this file and provides a clear interface for creating database backups with an optional custom name.

pkg/system/db_reconciler.go (6)

79-81: LGTM!

The recovery status is correctly marked as completed only when the cluster is ready and a recovery was in progress.

307-312: LGTM!

Good addition of the nil check for VolumeSnapshot before accessing its fields, preventing potential nil pointer dereference.

325-336: LGTM!

The cleanup function properly handles the case where the ScheduledBackup doesn't exist by checking with KubeCheckQuiet before deletion and ignoring NotFound errors.

338-386: LGTM!

The recovery flow is well-structured:

Sets initial recovery status

Ensures dependent pods are stopped before recovery

Configures the bootstrap with VolumeSnapshot source

Cleans up scheduled backups to prevent conflicts during recovery

426-448: LGTM!

The naming change to -scheduled-backup suffix and targeting standby replicas for backups are sensible choices that minimize impact on the primary database instance.

512-544: LGTM!

The function correctly:

Lists VolumeSnapshots using an efficient label selector

Filters to only include scheduled backup snapshots

Sorts by creation timestamp for proper retention ordering

pkg/system/db_reconciler.go

pkg/system/system.go

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

pkg/system/db_reconciler.go (1)

474-499: Past review comment still applies.

The concern about the log message at line 477 potentially overstating the number of snapshots to be deleted (when the recovery snapshot is among them) remains valid. The actual functionality is correct, but the log message could be misleading.

🧹 Nitpick comments (1)

pkg/system/db_reconciler.go (1)

209-214: Minor: Error message could be more accurate.

The error message says "cluster creation is still in progress" but the condition also covers the Recovering status. Consider updating for clarity:

 			if currentDBClusterStatus == nbv1.DBClusterStatusCreating || currentDBClusterStatus == nbv1.DBClusterStatusRecovering {
-				r.cnpgLog("the cluster spec was changed but the cluster creation is still in progress, skipping update")
-				return fmt.Errorf("cluster creation is still in progress, skipping update")
+				r.cnpgLog("the cluster spec was changed but the cluster %s is still in progress, skipping update", currentDBClusterStatus)
+				return fmt.Errorf("cluster %s is still in progress, skipping update", currentDBClusterStatus)
 			}

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e356914 and 70def4a.

📒 Files selected for processing (3)

pkg/cnpg/cnpg.go (1 hunks)
pkg/system/db_reconciler.go (4 hunks)
pkg/system/system.go (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/system/system.go

🧰 Additional context used

🧬 Code graph analysis (1)

pkg/system/db_reconciler.go (1)

pkg/apis/noobaa/v1alpha1/noobaa_types.go (2)

DBClusterStatusCreating (557-557)

DBClusterStatusRecovering (566-566)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: run-operator-tests
GitHub Check: run-admission-test
GitHub Check: golangci-lint
GitHub Check: run-kms-tls-sa-test
GitHub Check: run-kms-key-rotate-test
GitHub Check: run-azure-vault-test
GitHub Check: run-kms-kmip-test
GitHub Check: run-hac-test

🔇 Additional comments (3)

pkg/cnpg/cnpg.go (1)

613-626: LGTM!

The explicit initialization of Spec: cnpgv1.BackupSpec{} is consistent with the pattern used in GetCnpgScheduledBackupObj (line 608) and provides a clear, defensive initialization.

pkg/system/db_reconciler.go (2)

339-387: LGTM!

The recovery flow is well-structured:

Sets recovery status to pending

Ensures core/endpoint pods are stopped before proceeding

Configures bootstrap recovery from the specified VolumeSnapshot

Cleans up scheduled backups to prevent conflicts during recovery

The use of storagesnapshotv1.GroupName for the API group is correct.

446-448: Cold volume-snapshot backups on the primary will require stopping it.

CloudNativePG safely falls back to the primary instance when BackupTargetStandby is specified but no standby exists. However, setting Method = VolumeSnapshot with Online = false (cold backup) means that volume snapshots will stop the primary during backup operations, impacting database availability and writes—even for single-instance clusters.

Signed-off-by: Danny Zaken <dannyzaken@gmail.com>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

pkg/system/system.go (1)
955-957: Avoid hardcoding CNPG backup API group in kubectl hint

The example command:
log.Printf("kubectl -n %s get backups.postgresql.cnpg.noobaa.io %s", sys.Namespace, backupName)
hardcodes backups.postgresql.cnpg.noobaa.io. This can diverge from the actual API group used at runtime (e.g., when options.UseCnpgApiGroup is set and the group is postgresql.cnpg.io instead), while the Backup object itself relies on the CNPG helper constants.

Consider deriving the group from the CNPG package so the hint always matches the real CRD:
-	log.Printf("kubectl -n %s get backups.postgresql.cnpg.noobaa.io %s", sys.Namespace, backupName)
+	log.Printf("kubectl -n %s get backups.%s %s", sys.Namespace, cnpg.CnpgAPIGroup, backupName)
Please confirm that cnpg.CnpgAPIGroup (or the equivalent helper in your codebase) already accounts for options.UseCnpgApiGroup.

🧹 Nitpick comments (2)

pkg/system/system.go (2)

124-134: CmdDBBackup factory is consistent with existing CLI patterns

The command definition (Use, Short, Run, Args, and --name flag) matches the style of other system subcommands and should integrate smoothly.

If you want to polish UX later, consider adding a Long description with example usage and the default backup-name pattern.

913-952: RunDBBackup core flow looks sound; minor readability nit

The handler correctly:

Verifies the NooBaa CR exists.

Ensures CNPG DBSpec and DBBackup.VolumeSnapshot (with non-empty class) are configured.

Builds a cnpgv1.Backup targeting the CNPG cluster with method=volumeSnapshot, target=standby, and online=false via Online *bool. (pkg.go.dev)

Uses KubeCreateFailExisting and fails fast on create errors.

Only minor nit: offlineBackup is set to false and then assigned to backup.Spec.Online. This is logically correct (it means “offline/cold backup”), but the variable name is a bit confusing.

You might consider renaming offlineBackup to something like onlineBackup or online := false to better reflect the field it’s bound to. Please double‑check against the CNPG BackupSpec.Online semantics in the version you vendor.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70def4a and edef022.

📒 Files selected for processing (3)

pkg/cnpg/cnpg.go (1 hunks)
pkg/system/db_reconciler.go (4 hunks)
pkg/system/system.go (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

pkg/cnpg/cnpg.go
pkg/system/db_reconciler.go

🧰 Additional context used

🧬 Code graph analysis (1)

pkg/system/system.go (4)

pkg/cli/cli.go (1)

Run (61-66)

pkg/util/util.go (5)

Logger (905-907)

KubeCheck (571-592)

KubeCreateFailExisting (390-392)

Context (910-912)

KubeClient (271-282)

pkg/options/options.go (2)

SystemName (50-50)

Namespace (57-57)

pkg/cnpg/cnpg.go (1)

GetCnpgBackupObj (613-626)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: run-cli-tests
GitHub Check: run-kms-dev-test
GitHub Check: cnpg-deployment-test
GitHub Check: run-kms-key-rotate-test
GitHub Check: run-admission-test
GitHub Check: run-core-config-map-tests
GitHub Check: run-operator-tests
GitHub Check: run-azure-vault-test
GitHub Check: run-kms-tls-sa-test
GitHub Check: run-kms-kmip-test
GitHub Check: golangci-lint
GitHub Check: run-kms-tls-token-test
GitHub Check: run-hac-test

🔇 Additional comments (2)

pkg/system/system.go (2)
52-57: db-backup subcommand is wired correctly into system

Adding CmdDBBackup() to Cmd()'s subcommands cleanly exposes noobaa system db-backup without altering other flows.

963-985: Use label-based lookup for VolumeSnapshots instead of name-based polling, and propagate poll context to the client call

Two issues with the current polling approach:

Name-based lookup is fragile

The code currently queries for a single VolumeSnapshot by name matching backupName. However, according to CNPG documentation, VolumeSnapshots are identified by labels like cnpg.io/backupName and cnpg.io/cluster, not by a predictable naming scheme. Polling by exact name risks timing out even when the backup succeeds.

Poll context isn't propagated to the client

The ctx parameter from the poll function callback is not used; the Get call still uses util.Context() instead, so HTTP requests aren't tied to the polling timeout.

Replace the name-based Get with a label-based List:
-	err := wait.PollUntilContextCancel(pollCtx, interval*time.Second, true, func(ctx context.Context) (bool, error) {
-		volumeSnapshot := &storagesnapshotv1.VolumeSnapshot{}
-		volumeSnapshotErr := util.KubeClient().Get(util.Context(),
-			client.ObjectKey{Namespace: sys.Namespace, Name: backupName},
-			volumeSnapshot)
-		if errors.IsNotFound(volumeSnapshotErr) {
-			log.Printf("⏳ Volume snapshot %s not found yet", backupName)
-			return false, nil
-		}
-		if volumeSnapshotErr != nil {
-			return false, volumeSnapshotErr
-		}
-		return true, nil
-	})
+	err := wait.PollUntilContextCancel(pollCtx, interval*time.Second, true, func(ctx context.Context) (bool, error) {
+		volumeSnapshots := &storagesnapshotv1.VolumeSnapshotList{}
+		selector, _ := labels.Parse(
+			fmt.Sprintf("cnpg.io/cluster=%s,cnpg.io/backupName=%s", sys.Name+pgClusterSuffix, backupName),
+		)
+		volumeSnapshotErr := util.KubeClient().List(ctx, volumeSnapshots, &client.ListOptions{
+			Namespace:     sys.Namespace,
+			LabelSelector: selector,
+		})
+		if volumeSnapshotErr != nil {
+			return false, volumeSnapshotErr
+		}
+		if len(volumeSnapshots.Items) == 0 {
+			log.Printf("⏳ Volume snapshots for backup %s not found yet", backupName)
+			return false, nil
+		}
+		return true, nil
+	})
Also update line 985's user-facing hint to use label-based selection:
log.Printf(
  "kubectl -n %s get volumesnapshots.snapshot.storage.k8s.io -l 'cnpg.io/cluster=%s,cnpg.io/backupName=%s'",
  sys.Namespace, sys.Name+pgClusterSuffix, backupName,
)
Confirm that pgClusterSuffix is defined in your codebase and that label names match the CNPG version you are vendoring.

jackyalbo

LGTM, should we add a test to the cli tests?

pull-request-size bot added the size/L label Dec 2, 2025

dannyzaken force-pushed the danny-backup-cli branch from baba9e2 to 53939e9 Compare December 2, 2025 07:24

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

pkg/system/db_reconciler.go Show resolved Hide resolved

pkg/system/system.go Show resolved Hide resolved

pkg/system/system.go Outdated Show resolved Hide resolved

pkg/system/system.go Show resolved Hide resolved

dannyzaken changed the title ~~Danny backup cli~~ Added a CLI command to take a single DB backup Dec 2, 2025

dannyzaken force-pushed the danny-backup-cli branch 2 times, most recently from e356914 to 70def4a Compare December 2, 2025 08:36

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

Added a CLI command to take a single DB backup

edef022

Signed-off-by: Danny Zaken <dannyzaken@gmail.com>

dannyzaken force-pushed the danny-backup-cli branch from 70def4a to edef022 Compare December 2, 2025 09:01

dannyzaken requested a review from jackyalbo December 2, 2025 09:03

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

jackyalbo approved these changes Dec 2, 2025

View reviewed changes

dannyzaken merged commit 1a2ab94 into noobaa:master Dec 2, 2025
13 of 17 checks passed

coderabbitai bot mentioned this pull request Jan 13, 2026

Added noobaa-db-backup.md doc #1774

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a CLI command to take a single DB backup#1740

Added a CLI command to take a single DB backup#1740
dannyzaken merged 1 commit intonoobaa:masterfrom
dannyzaken:danny-backup-cli

dannyzaken commented Dec 2, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

jackyalbo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dannyzaken commented Dec 2, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Explain the changes

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

jackyalbo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dannyzaken commented Dec 2, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 2, 2025 •

edited

Loading