Skip to content

Add DB backup logic to noobaa reconciliation#1726

Merged
dannyzaken merged 1 commit intonoobaa:masterfrom
dannyzaken:danny-db-backup
Nov 18, 2025
Merged

Add DB backup logic to noobaa reconciliation#1726
dannyzaken merged 1 commit intonoobaa:masterfrom
dannyzaken:danny-db-backup

Conversation

@dannyzaken
Copy link
Member

@dannyzaken dannyzaken commented Nov 16, 2025

Explain the changes

  • If DB backup conf is set on Noobaa CR, pass to the Cluster CR
  • Added reconcileDBBackup step to ReconcileCNPGCluster
    • Create or update ScheduledBackup CR
    • Delete ScheduledBackup CR if backup is removed from Noobaa CR.
  • Added reconcileBackupRetention
    • lists existing volume snapshots and delete old ones according to maxSnapshots
  • update BackupStatusunder NooBaaDBStatus

Issues: Fixed #xxx / Gap #xxx

  1. https://issues.redhat.com/browse/RHSTOR-8124

Testing Instructions:

  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • New Features

    • CNPG backup support: reconciliation of backups and scheduled backups, retention/pruning, snapshot management, and status tracking (last/next backup, totals).
    • Registered CNPG Backup and ScheduledBackup resource types so they are recognized and manageable.
    • Cron schedule conversion to CNPG-compatible format.
  • Chores

    • Added RBAC for volume snapshot access and ensured snapshot types are available to the runtime.

@dannyzaken dannyzaken marked this pull request as draft November 16, 2025 17:15
@coderabbitai
Copy link

coderabbitai bot commented Nov 16, 2025

Walkthrough

Registers CNPG backup CRD types, adds CNPG backup object constructors, integrates backup reconciliation (create/update/delete, retention, schedule conversion) into the DB reconciler, registers VolumeSnapshot types in the runtime scheme, and grants RBAC for VolumeSnapshot operations.

Changes

Cohort / File(s) Summary
CNPG Type Registrations
pkg/apis/noobaa/v1alpha1/cnpg_types.go
Adds ScheduledBackup, ScheduledBackupList, Backup, and BackupList to CNPGSchemeBuilder.Register.
CNPG Constructors
pkg/cnpg/cnpg.go
Adds GetCnpgScheduledBackupObj(namespace, name), GetCnpgBackupObj(namespace, name), and GetCnpgBackupListObj(namespace) to construct CNPG backup objects with TypeMeta/Kind.
DB Backup Reconciliation
pkg/system/db_reconciler.go
Adds reconcileDBBackup, reconcileScheduledBackup, reconcileBackupRetention, cleanupDBBackup, helpers (getBackupResourceName, listBackupResourcesOrderByCreate), integrates backup reconciliation into main flow, updates status fields, and adds convertToSixFieldCron(schedule string).
Scheme Registration for VolumeSnapshot
pkg/util/util.go
Registers external-snapshotter VolumeSnapshot types with the runtime scheme (imports and init registration).
RBAC Role
deploy/role.yaml
Adds RBAC rule allowing get,list,watch,delete on volumesnapshots in snapshot.storage.k8s.io API group.
Bundle Hash Update
pkg/bundle/deploy.go
Updates Sha256_deploy_role_yaml constant to reflect changed deploy/role.yaml.
Go Module
go.mod
Promotes github.com/kubernetes-csi/external-snapshotter/client/v8 to a direct dependency (v8.2.0) and removes it from indirect requires.

Sequence Diagram(s)

sequenceDiagram
    participant Reconciler
    participant K8sAPI as "Kubernetes API"
    participant CNPG as "CNPG ScheduledBackup API"
    participant Snapshot as "VolumeSnapshot API"

    Reconciler->>K8sAPI: Read Cluster & DBBackup spec
    alt DBBackup spec present
        Reconciler->>Reconciler: convertToSixFieldCron(schedule)
        Reconciler->>CNPG: Get ScheduledBackup (by name)
        alt exists
            Reconciler->>CNPG: Update ScheduledBackup (schedule, retention)
        else not exists
            Reconciler->>CNPG: Create ScheduledBackup
        end
        CNPG->>K8sAPI: Create/Update ScheduledBackup resource
        K8sAPI-->>CNPG: Result
        Reconciler->>Reconciler: reconcileBackupRetention()
        Reconciler->>Snapshot: List/Delete old VolumeSnapshots
    else DBBackup spec absent
        Reconciler->>CNPG: Delete ScheduledBackup (if exists)
        CNPG->>K8sAPI: Delete resource
        K8sAPI-->>CNPG: Result
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay extra attention to:
    • convertToSixFieldCron(): cron parsing/validation edge cases.
    • reconcileScheduledBackup() / reconcileBackupRetention(): idempotency, ordering, and snapshot deletion safety.
    • Scheme registration and RBAC: ensure API version and imports match the snapshot client used.
    • get/list helpers and status updates in pkg/system/db_reconciler.go for correctness and error handling.

Possibly related PRs

  • 5.20 rebase #1682 — Modifies pkg/system/db_reconciler.go and cluster readiness/reconciliation flow; likely related to backup reconciliation integration.

Suggested reviewers

  • nimrod-becker

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding database backup logic to the noobaa reconciliation process, which is the primary purpose of this pull request.
Description check ✅ Passed The description covers the key changes and includes a linked issue, but the Testing Instructions section is incomplete with no actual instructions provided, and both documentation and testing checkboxes are unchecked.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9968968 and dd9cb97.

📒 Files selected for processing (3)
  • pkg/apis/noobaa/v1alpha1/cnpg_types.go (1 hunks)
  • pkg/cnpg/cnpg.go (1 hunks)
  • pkg/system/db_reconciler.go (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/system/db_reconciler.go (4)
pkg/apis/noobaa/v1alpha1/noobaa_types.go (2)
  • NooBaa (41-56)
  • DBBackupSpec (383-392)
pkg/system/reconciler.go (1)
  • Reconciler (59-135)
pkg/cnpg/cnpg.go (1)
  • GetCnpgScheduledBackupObj (598-611)
pkg/util/util.go (1)
  • KubeCheckQuiet (618-631)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: run-kms-dev-test
  • GitHub Check: run-kms-kmip-test
  • GitHub Check: run-kms-tls-sa-test
  • GitHub Check: run-core-config-map-tests
  • GitHub Check: golangci-lint
  • GitHub Check: run-admission-test
  • GitHub Check: cnpg-deployment-test
  • GitHub Check: run-cli-tests
  • GitHub Check: run-azure-vault-test
  • GitHub Check: run-hac-test
  • GitHub Check: run-kms-key-rotate-test
  • GitHub Check: run-kms-tls-token-test
  • GitHub Check: run-operator-tests
🔇 Additional comments (4)
pkg/cnpg/cnpg.go (1)

598-610: ScheduledBackup helper is consistent with existing CNPG helpers

Object construction (TypeMeta, ObjectMeta, empty Spec) matches the existing CNPG helpers and correctly uses CnpgAPIVersion. No issues here.

pkg/apis/noobaa/v1alpha1/cnpg_types.go (1)

36-45: ScheduledBackup type registrations look correct

Registering ScheduledBackup and ScheduledBackupList with the noobaa CNPG API group mirrors the existing Cluster/ImageCatalog registrations and matches the intended API surface.

pkg/system/db_reconciler.go (2)

7-7: New imports are appropriate

strings and k8s.io/apimachinery/pkg/api/errors are both used in the new cron conversion and backup cleanup logic; no unused-import concerns.

Also applies to: 16-16


99-103: Backup reconciliation is gated on a ready CNPG cluster

Hooking reconcileDBBackup() after the readiness check ensures ScheduledBackup resources are only reconciled when the cluster is usable, which aligns well with the CNPG backup model.

@dannyzaken dannyzaken marked this pull request as ready for review November 17, 2025 09:41
@dannyzaken dannyzaken requested review from a team, alphaprinz, naveenpaul1 and nimrod-becker and removed request for a team November 17, 2025 09:41
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/system/db_reconciler.go (2)

69-104: Backup reconciliation is gated on cluster readiness — consider earlier cleanup of old backups

Calling reconcileDBBackup only after isClusterReady means we never create/update scheduled backups for an unready cluster, which is safe. The side effect is that if a DBBackup spec is removed while the cluster remains not-ready, cleanup of an existing ScheduledBackup will be delayed until readiness. If you want the ScheduledBackup to be deleted immediately regardless of cluster health, you could special‑case the “DBBackup removed” path to call cleanupDBBackup before the readiness check (while still keeping creation/update gated on readiness).


403-446: ScheduledBackup reconciliation logic is sound; consider ownership and flags

The new reconcileDBBackup, reconcileScheduledBackup, and cleanupDBBackup functions correctly:

  • Create/update a ScheduledBackup with the converted schedule, the CNPG cluster name, method volumeSnapshot, and online=false.
  • Delete an existing ScheduledBackup when DBBackup is unset, tolerating NotFound.

Two optional improvements you might consider:

  • Set .spec.backupOwnerReference on the ScheduledBackup (e.g., "cluster" or "self") so CNPG attaches ownership to created Backup objects instead of using the default "none".
  • For readability, the offlineBackup variable could be renamed (e.g., online := false) so that scheduledBackup.Spec.Online = &online reads more directly.

Behavior-wise this looks good for an initial implementation.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd9cb97 and 23338f7.

📒 Files selected for processing (3)
  • pkg/apis/noobaa/v1alpha1/cnpg_types.go (1 hunks)
  • pkg/cnpg/cnpg.go (1 hunks)
  • pkg/system/db_reconciler.go (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/apis/noobaa/v1alpha1/cnpg_types.go
  • pkg/cnpg/cnpg.go
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/system/db_reconciler.go (4)
pkg/apis/noobaa/v1alpha1/noobaa_types.go (2)
  • NooBaa (41-56)
  • DBBackupSpec (383-392)
pkg/system/reconciler.go (1)
  • Reconciler (59-135)
pkg/cnpg/cnpg.go (1)
  • GetCnpgScheduledBackupObj (598-611)
pkg/util/util.go (1)
  • KubeCheckQuiet (618-631)
🔇 Additional comments (3)
pkg/system/db_reconciler.go (3)

3-19: Imports for cron parsing and Kubernetes errors look appropriate

strings and k8s.io/apimachinery/pkg/api/errors are both used in the new backup/cron logic; import set is coherent and minimal.


279-295: Cluster backup stanza wiring is straightforward and handles removal correctly

Mapping DBBackup.VolumeSnapshot.VolumeSnapshotClass into Cluster.Spec.Backup.VolumeSnapshot.ClassName and explicitly setting Cluster.Spec.Backup = nil when DBBackup is absent gives predictable behavior and plays well with wasClusterSpecChanged. The nil guard on VolumeSnapshot also prevents a crash on malformed specs. No further issues here.


614-629: Including PostgresConfiguration and Backup in change detection is appropriate

Extending wasClusterSpecChanged to compare PostgresConfiguration.Parameters and Backup ensures we actually update the CNPG cluster when DB configuration or backup settings change, without over‑widening the comparison. This aligns well with the new backup wiring.

@dannyzaken dannyzaken changed the title initial db backup logic in system reconciliation Add DB backup logic to noobaa reconciliation Nov 17, 2025
- If DB backup conf is set on Noobaa CR, pass to the Cluster CR
- Added reconcileDBBackup step to ReconcileCNPGCluster
  - Create or update ScheduledBackup CR
  - Delete ScheduledBackup CR if backup is removed from Noobaa CR.

Signed-off-by: Danny Zaken <dannyzaken@gmail.com>

update

Signed-off-by: Danny Zaken <dannyzaken@gmail.com>

implemented logic to delete old volume snapshots

When reconciling the DB cluster, listing existing volume snapshots and deleting old ones according to maxSnapshots

Signed-off-by: Danny Zaken <dannyzaken@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23338f7 and b5825fe.

📒 Files selected for processing (7)
  • deploy/role.yaml (1 hunks)
  • go.mod (1 hunks)
  • pkg/apis/noobaa/v1alpha1/cnpg_types.go (1 hunks)
  • pkg/bundle/deploy.go (2 hunks)
  • pkg/cnpg/cnpg.go (1 hunks)
  • pkg/system/db_reconciler.go (6 hunks)
  • pkg/util/util.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/apis/noobaa/v1alpha1/cnpg_types.go
  • pkg/cnpg/cnpg.go
  • pkg/bundle/deploy.go
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/util/util.go (1)
pkg/apis/apis.go (1)
  • AddToScheme (11-13)
pkg/system/db_reconciler.go (4)
pkg/apis/noobaa/v1alpha1/noobaa_types.go (2)
  • NooBaa (41-56)
  • DBBackupStatus (576-588)
pkg/system/reconciler.go (1)
  • Reconciler (59-135)
pkg/cnpg/cnpg.go (2)
  • GetCnpgScheduledBackupObj (598-611)
  • GetCnpgBackupListObj (623-631)
pkg/util/util.go (1)
  • KubeCheckQuiet (620-633)
🪛 Checkov (3.2.334)
deploy/role.yaml

[medium] 1-171: Minimize Roles and ClusterRoles that grant permissions to bind RoleBindings or ClusterRoleBindings

(CKV_K8S_157)


[medium] 1-171: Minimize Roles and ClusterRoles that grant permissions to escalate Roles or ClusterRoles

(CKV_K8S_158)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: run-core-config-map-tests
  • GitHub Check: cnpg-deployment-test
  • GitHub Check: run-operator-tests
  • GitHub Check: run-cli-tests
  • GitHub Check: run-kms-key-rotate-test
  • GitHub Check: golangci-lint
  • GitHub Check: run-azure-vault-test
  • GitHub Check: run-hac-test
  • GitHub Check: run-kms-dev-test
  • GitHub Check: run-kms-kmip-test
  • GitHub Check: run-kms-tls-sa-test
  • GitHub Check: run-admission-test
  • GitHub Check: run-kms-tls-token-test

Comment on lines +288 to +301
if r.CNPGCluster.Spec.Backup == nil {
r.CNPGCluster.Spec.Backup = &cnpgv1.BackupConfiguration{
VolumeSnapshot: &cnpgv1.VolumeSnapshotConfiguration{
ClassName: backupSpec.VolumeSnapshot.VolumeSnapshotClass,
Online: &offlineBackup,
},
}
} else {
r.CNPGCluster.Spec.Backup.VolumeSnapshot.ClassName = backupSpec.VolumeSnapshot.VolumeSnapshotClass
}
} else {
// Remove backup configuration if not specified
r.CNPGCluster.Spec.Backup = nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard VolumeSnapshot before dereferencing

r.CNPGCluster.Spec.Backup can already exist with a different backup method (e.g., Barman) where VolumeSnapshot is still nil. In that case the else branch would dereference VolumeSnapshot and crash the operator. Please ensure the pointer is allocated (and that Online is refreshed) before using it.

-	} else {
-		r.CNPGCluster.Spec.Backup.VolumeSnapshot.ClassName = backupSpec.VolumeSnapshot.VolumeSnapshotClass
-	}
+	} else {
+		if r.CNPGCluster.Spec.Backup.VolumeSnapshot == nil {
+			r.CNPGCluster.Spec.Backup.VolumeSnapshot = &cnpgv1.VolumeSnapshotConfiguration{}
+		}
+		r.CNPGCluster.Spec.Backup.VolumeSnapshot.ClassName = backupSpec.VolumeSnapshot.VolumeSnapshotClass
+		r.CNPGCluster.Spec.Backup.VolumeSnapshot.Online = &offlineBackup
+	}
🤖 Prompt for AI Agents
In pkg/system/db_reconciler.go around lines 288 to 301, the code assumes
r.CNPGCluster.Spec.Backup.VolumeSnapshot is non-nil and directly sets ClassName,
which will panic if Backup exists but VolumeSnapshot is nil (e.g., when another
backup method was previously used); ensure you check and allocate
r.CNPGCluster.Spec.Backup.VolumeSnapshot when Backup is non-nil and
VolumeSnapshot is nil, set its ClassName and replace or refresh the Online
pointer value (e.g., assign a new bool value) so you never dereference a nil
VolumeSnapshot or reuse a stale Online pointer.

Comment on lines +504 to +521
r.cnpgLog("deleting snapshot %s", snapshotName)
// if encountering an error we only report it and continue with the reconciliation
if err := r.Client.Delete(r.Ctx, &storagesnapshotv1.VolumeSnapshot{
ObjectMeta: metav1.ObjectMeta{
Name: snapshotName,
Namespace: r.CNPGCluster.Namespace,
},
}); err != nil {
r.cnpgLogError("got error deleting snapshot %s. error: %v", snapshotName, err)
// skipping the deletion of the backup so we can try to delete again next time
continue
}

r.cnpgLog("deleting backup %s", backup.Name)
if err := r.Client.Delete(r.Ctx, &backup); err != nil {
r.cnpgLogError("got error deleting backup %s. error: %v", backup.Name, err)
continue
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle already-deleted snapshots during retention

If an admin (or another controller) removes the VolumeSnapshot out-of-band, Delete returns NotFound and we never progress to deleting the Backup CR. That leaves stale backups around forever and keeps retention above the requested cap. Please treat IsNotFound as a success path so the backup object can still be pruned.

-		if err := r.Client.Delete(r.Ctx, &storagesnapshotv1.VolumeSnapshot{
+		if err := r.Client.Delete(r.Ctx, &storagesnapshotv1.VolumeSnapshot{
 			ObjectMeta: metav1.ObjectMeta{
 				Name:      snapshotName,
 				Namespace: r.CNPGCluster.Namespace,
 			},
-		}); err != nil {
-			r.cnpgLogError("got error deleting snapshot %s. error: %v", snapshotName, err)
-			// skipping the deletion of the backup so we can try to delete again next time
-			continue
-		}
+		}); err != nil && !errors.IsNotFound(err) {
+			r.cnpgLogError("got error deleting snapshot %s. error: %v", snapshotName, err)
+			// skipping the deletion of the backup so we can try to delete again next time
+			continue
+		}
🤖 Prompt for AI Agents
In pkg/system/db_reconciler.go around lines 504 to 521, the code treats any
error from deleting the VolumeSnapshot as a failure and skips deleting the
associated Backup; change the error handling so that if the delete returns a
NotFound error (use k8s.io/apimachinery/pkg/api/errors.IsNotFound(err)) you
consider it successful and continue to the backup deletion path, while only
continuing/aborting on other error types; ensure you import the api errors
package if not present and log that the snapshot was already gone when
IsNotFound is true.

}

r.cnpgLog("deleting snapshot %s", snapshotName)
// if encountering an error we only report it and continue with the reconciliation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encountered?

Copy link
Contributor

@alphaprinz alphaprinz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Had just one annoying typo.
Also, would there be some written KB/upstream doc explaining how to restore from a backup?

@dannyzaken dannyzaken merged commit a392e52 into noobaa:master Nov 18, 2025
15 of 16 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Jan 5, 2026
2 tasks
@coderabbitai coderabbitai bot mentioned this pull request Jan 13, 2026
2 tasks
@coderabbitai coderabbitai bot mentioned this pull request Mar 17, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants