Add DB backup logic to noobaa reconciliation#1726
Conversation
WalkthroughRegisters CNPG backup CRD types, adds CNPG backup object constructors, integrates backup reconciliation (create/update/delete, retention, schedule conversion) into the DB reconciler, registers VolumeSnapshot types in the runtime scheme, and grants RBAC for VolumeSnapshot operations. Changes
Sequence Diagram(s)sequenceDiagram
participant Reconciler
participant K8sAPI as "Kubernetes API"
participant CNPG as "CNPG ScheduledBackup API"
participant Snapshot as "VolumeSnapshot API"
Reconciler->>K8sAPI: Read Cluster & DBBackup spec
alt DBBackup spec present
Reconciler->>Reconciler: convertToSixFieldCron(schedule)
Reconciler->>CNPG: Get ScheduledBackup (by name)
alt exists
Reconciler->>CNPG: Update ScheduledBackup (schedule, retention)
else not exists
Reconciler->>CNPG: Create ScheduledBackup
end
CNPG->>K8sAPI: Create/Update ScheduledBackup resource
K8sAPI-->>CNPG: Result
Reconciler->>Reconciler: reconcileBackupRetention()
Reconciler->>Snapshot: List/Delete old VolumeSnapshots
else DBBackup spec absent
Reconciler->>CNPG: Delete ScheduledBackup (if exists)
CNPG->>K8sAPI: Delete resource
K8sAPI-->>CNPG: Result
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
pkg/apis/noobaa/v1alpha1/cnpg_types.go(1 hunks)pkg/cnpg/cnpg.go(1 hunks)pkg/system/db_reconciler.go(7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/system/db_reconciler.go (4)
pkg/apis/noobaa/v1alpha1/noobaa_types.go (2)
NooBaa(41-56)DBBackupSpec(383-392)pkg/system/reconciler.go (1)
Reconciler(59-135)pkg/cnpg/cnpg.go (1)
GetCnpgScheduledBackupObj(598-611)pkg/util/util.go (1)
KubeCheckQuiet(618-631)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: run-kms-dev-test
- GitHub Check: run-kms-kmip-test
- GitHub Check: run-kms-tls-sa-test
- GitHub Check: run-core-config-map-tests
- GitHub Check: golangci-lint
- GitHub Check: run-admission-test
- GitHub Check: cnpg-deployment-test
- GitHub Check: run-cli-tests
- GitHub Check: run-azure-vault-test
- GitHub Check: run-hac-test
- GitHub Check: run-kms-key-rotate-test
- GitHub Check: run-kms-tls-token-test
- GitHub Check: run-operator-tests
🔇 Additional comments (4)
pkg/cnpg/cnpg.go (1)
598-610: ScheduledBackup helper is consistent with existing CNPG helpersObject construction (TypeMeta, ObjectMeta, empty Spec) matches the existing CNPG helpers and correctly uses
CnpgAPIVersion. No issues here.pkg/apis/noobaa/v1alpha1/cnpg_types.go (1)
36-45: ScheduledBackup type registrations look correctRegistering
ScheduledBackupandScheduledBackupListwith the noobaa CNPG API group mirrors the existing Cluster/ImageCatalog registrations and matches the intended API surface.pkg/system/db_reconciler.go (2)
7-7: New imports are appropriate
stringsandk8s.io/apimachinery/pkg/api/errorsare both used in the new cron conversion and backup cleanup logic; no unused-import concerns.Also applies to: 16-16
99-103: Backup reconciliation is gated on a ready CNPG clusterHooking
reconcileDBBackup()after the readiness check ensures ScheduledBackup resources are only reconciled when the cluster is usable, which aligns well with the CNPG backup model.
07fab47 to
23338f7
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/system/db_reconciler.go (2)
69-104: Backup reconciliation is gated on cluster readiness — consider earlier cleanup of old backupsCalling
reconcileDBBackuponly afterisClusterReadymeans we never create/update scheduled backups for an unready cluster, which is safe. The side effect is that if aDBBackupspec is removed while the cluster remains not-ready, cleanup of an existingScheduledBackupwill be delayed until readiness. If you want the ScheduledBackup to be deleted immediately regardless of cluster health, you could special‑case the “DBBackup removed” path to callcleanupDBBackupbefore the readiness check (while still keeping creation/update gated on readiness).
403-446: ScheduledBackup reconciliation logic is sound; consider ownership and flagsThe new
reconcileDBBackup,reconcileScheduledBackup, andcleanupDBBackupfunctions correctly:
- Create/update a
ScheduledBackupwith the converted schedule, the CNPG cluster name, methodvolumeSnapshot, andonline=false.- Delete an existing
ScheduledBackupwhenDBBackupis unset, toleratingNotFound.Two optional improvements you might consider:
- Set
.spec.backupOwnerReferenceon the ScheduledBackup (e.g.,"cluster"or"self") so CNPG attaches ownership to createdBackupobjects instead of using the default"none".- For readability, the
offlineBackupvariable could be renamed (e.g.,online := false) so thatscheduledBackup.Spec.Online = &onlinereads more directly.Behavior-wise this looks good for an initial implementation.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
pkg/apis/noobaa/v1alpha1/cnpg_types.go(1 hunks)pkg/cnpg/cnpg.go(1 hunks)pkg/system/db_reconciler.go(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/apis/noobaa/v1alpha1/cnpg_types.go
- pkg/cnpg/cnpg.go
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/system/db_reconciler.go (4)
pkg/apis/noobaa/v1alpha1/noobaa_types.go (2)
NooBaa(41-56)DBBackupSpec(383-392)pkg/system/reconciler.go (1)
Reconciler(59-135)pkg/cnpg/cnpg.go (1)
GetCnpgScheduledBackupObj(598-611)pkg/util/util.go (1)
KubeCheckQuiet(618-631)
🔇 Additional comments (3)
pkg/system/db_reconciler.go (3)
3-19: Imports for cron parsing and Kubernetes errors look appropriate
stringsandk8s.io/apimachinery/pkg/api/errorsare both used in the new backup/cron logic; import set is coherent and minimal.
279-295: Cluster backup stanza wiring is straightforward and handles removal correctlyMapping
DBBackup.VolumeSnapshot.VolumeSnapshotClassintoCluster.Spec.Backup.VolumeSnapshot.ClassNameand explicitly settingCluster.Spec.Backup = nilwhenDBBackupis absent gives predictable behavior and plays well withwasClusterSpecChanged. The nil guard onVolumeSnapshotalso prevents a crash on malformed specs. No further issues here.
614-629: Including PostgresConfiguration and Backup in change detection is appropriateExtending
wasClusterSpecChangedto comparePostgresConfiguration.ParametersandBackupensures we actually update the CNPG cluster when DB configuration or backup settings change, without over‑widening the comparison. This aligns well with the new backup wiring.
23338f7 to
6ce5340
Compare
- If DB backup conf is set on Noobaa CR, pass to the Cluster CR - Added reconcileDBBackup step to ReconcileCNPGCluster - Create or update ScheduledBackup CR - Delete ScheduledBackup CR if backup is removed from Noobaa CR. Signed-off-by: Danny Zaken <dannyzaken@gmail.com> update Signed-off-by: Danny Zaken <dannyzaken@gmail.com> implemented logic to delete old volume snapshots When reconciling the DB cluster, listing existing volume snapshots and deleting old ones according to maxSnapshots Signed-off-by: Danny Zaken <dannyzaken@gmail.com>
6ce5340 to
b5825fe
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
deploy/role.yaml(1 hunks)go.mod(1 hunks)pkg/apis/noobaa/v1alpha1/cnpg_types.go(1 hunks)pkg/bundle/deploy.go(2 hunks)pkg/cnpg/cnpg.go(1 hunks)pkg/system/db_reconciler.go(6 hunks)pkg/util/util.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- pkg/apis/noobaa/v1alpha1/cnpg_types.go
- pkg/cnpg/cnpg.go
- pkg/bundle/deploy.go
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/util/util.go (1)
pkg/apis/apis.go (1)
AddToScheme(11-13)
pkg/system/db_reconciler.go (4)
pkg/apis/noobaa/v1alpha1/noobaa_types.go (2)
NooBaa(41-56)DBBackupStatus(576-588)pkg/system/reconciler.go (1)
Reconciler(59-135)pkg/cnpg/cnpg.go (2)
GetCnpgScheduledBackupObj(598-611)GetCnpgBackupListObj(623-631)pkg/util/util.go (1)
KubeCheckQuiet(620-633)
🪛 Checkov (3.2.334)
deploy/role.yaml
[medium] 1-171: Minimize Roles and ClusterRoles that grant permissions to bind RoleBindings or ClusterRoleBindings
(CKV_K8S_157)
[medium] 1-171: Minimize Roles and ClusterRoles that grant permissions to escalate Roles or ClusterRoles
(CKV_K8S_158)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: run-core-config-map-tests
- GitHub Check: cnpg-deployment-test
- GitHub Check: run-operator-tests
- GitHub Check: run-cli-tests
- GitHub Check: run-kms-key-rotate-test
- GitHub Check: golangci-lint
- GitHub Check: run-azure-vault-test
- GitHub Check: run-hac-test
- GitHub Check: run-kms-dev-test
- GitHub Check: run-kms-kmip-test
- GitHub Check: run-kms-tls-sa-test
- GitHub Check: run-admission-test
- GitHub Check: run-kms-tls-token-test
| if r.CNPGCluster.Spec.Backup == nil { | ||
| r.CNPGCluster.Spec.Backup = &cnpgv1.BackupConfiguration{ | ||
| VolumeSnapshot: &cnpgv1.VolumeSnapshotConfiguration{ | ||
| ClassName: backupSpec.VolumeSnapshot.VolumeSnapshotClass, | ||
| Online: &offlineBackup, | ||
| }, | ||
| } | ||
| } else { | ||
| r.CNPGCluster.Spec.Backup.VolumeSnapshot.ClassName = backupSpec.VolumeSnapshot.VolumeSnapshotClass | ||
| } | ||
| } else { | ||
| // Remove backup configuration if not specified | ||
| r.CNPGCluster.Spec.Backup = nil | ||
| } |
There was a problem hiding this comment.
Guard VolumeSnapshot before dereferencing
r.CNPGCluster.Spec.Backup can already exist with a different backup method (e.g., Barman) where VolumeSnapshot is still nil. In that case the else branch would dereference VolumeSnapshot and crash the operator. Please ensure the pointer is allocated (and that Online is refreshed) before using it.
- } else {
- r.CNPGCluster.Spec.Backup.VolumeSnapshot.ClassName = backupSpec.VolumeSnapshot.VolumeSnapshotClass
- }
+ } else {
+ if r.CNPGCluster.Spec.Backup.VolumeSnapshot == nil {
+ r.CNPGCluster.Spec.Backup.VolumeSnapshot = &cnpgv1.VolumeSnapshotConfiguration{}
+ }
+ r.CNPGCluster.Spec.Backup.VolumeSnapshot.ClassName = backupSpec.VolumeSnapshot.VolumeSnapshotClass
+ r.CNPGCluster.Spec.Backup.VolumeSnapshot.Online = &offlineBackup
+ }🤖 Prompt for AI Agents
In pkg/system/db_reconciler.go around lines 288 to 301, the code assumes
r.CNPGCluster.Spec.Backup.VolumeSnapshot is non-nil and directly sets ClassName,
which will panic if Backup exists but VolumeSnapshot is nil (e.g., when another
backup method was previously used); ensure you check and allocate
r.CNPGCluster.Spec.Backup.VolumeSnapshot when Backup is non-nil and
VolumeSnapshot is nil, set its ClassName and replace or refresh the Online
pointer value (e.g., assign a new bool value) so you never dereference a nil
VolumeSnapshot or reuse a stale Online pointer.
| r.cnpgLog("deleting snapshot %s", snapshotName) | ||
| // if encountering an error we only report it and continue with the reconciliation | ||
| if err := r.Client.Delete(r.Ctx, &storagesnapshotv1.VolumeSnapshot{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: snapshotName, | ||
| Namespace: r.CNPGCluster.Namespace, | ||
| }, | ||
| }); err != nil { | ||
| r.cnpgLogError("got error deleting snapshot %s. error: %v", snapshotName, err) | ||
| // skipping the deletion of the backup so we can try to delete again next time | ||
| continue | ||
| } | ||
|
|
||
| r.cnpgLog("deleting backup %s", backup.Name) | ||
| if err := r.Client.Delete(r.Ctx, &backup); err != nil { | ||
| r.cnpgLogError("got error deleting backup %s. error: %v", backup.Name, err) | ||
| continue | ||
| } |
There was a problem hiding this comment.
Handle already-deleted snapshots during retention
If an admin (or another controller) removes the VolumeSnapshot out-of-band, Delete returns NotFound and we never progress to deleting the Backup CR. That leaves stale backups around forever and keeps retention above the requested cap. Please treat IsNotFound as a success path so the backup object can still be pruned.
- if err := r.Client.Delete(r.Ctx, &storagesnapshotv1.VolumeSnapshot{
+ if err := r.Client.Delete(r.Ctx, &storagesnapshotv1.VolumeSnapshot{
ObjectMeta: metav1.ObjectMeta{
Name: snapshotName,
Namespace: r.CNPGCluster.Namespace,
},
- }); err != nil {
- r.cnpgLogError("got error deleting snapshot %s. error: %v", snapshotName, err)
- // skipping the deletion of the backup so we can try to delete again next time
- continue
- }
+ }); err != nil && !errors.IsNotFound(err) {
+ r.cnpgLogError("got error deleting snapshot %s. error: %v", snapshotName, err)
+ // skipping the deletion of the backup so we can try to delete again next time
+ continue
+ }🤖 Prompt for AI Agents
In pkg/system/db_reconciler.go around lines 504 to 521, the code treats any
error from deleting the VolumeSnapshot as a failure and skips deleting the
associated Backup; change the error handling so that if the delete returns a
NotFound error (use k8s.io/apimachinery/pkg/api/errors.IsNotFound(err)) you
consider it successful and continue to the backup deletion path, while only
continuing/aborting on other error types; ensure you import the api errors
package if not present and log that the snapshot was already gone when
IsNotFound is true.
| } | ||
|
|
||
| r.cnpgLog("deleting snapshot %s", snapshotName) | ||
| // if encountering an error we only report it and continue with the reconciliation |
alphaprinz
left a comment
There was a problem hiding this comment.
LGTM. Had just one annoying typo.
Also, would there be some written KB/upstream doc explaining how to restore from a backup?
Explain the changes
reconcileDBBackupstep toReconcileCNPGClusterreconcileBackupRetentionmaxSnapshotsBackupStatusunderNooBaaDBStatusIssues: Fixed #xxx / Gap #xxx
Testing Instructions:
Summary by CodeRabbit
New Features
Chores