-
Notifications
You must be signed in to change notification settings - Fork 136
Velero backup controller impl #1762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughThis PR introduces a BackupJob controller system with Velero backup strategy support. It adds BackupJob API types, implements reconciliation logic for managing backup jobs via Velero, includes credential resolution for S3 storage, CRD definitions, RBAC permissions, and example resources with end-to-end test coverage. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant BJobCtrl as BackupJob<br/>Reconciler
participant VStrat as Velero Strategy<br/>Reconciler
participant K8s as Kubernetes API
participant Velero as Velero Service
participant S3 as S3 Storage
User->>K8s: Create BackupJob
K8s->>BJobCtrl: Trigger reconcile
BJobCtrl->>BJobCtrl: Validate StrategyRef
BJobCtrl->>VStrat: Route to Velero handler
rect rgb(200, 220, 255)
Note over VStrat: Credential Resolution
VStrat->>K8s: Read Bucket resource
VStrat->>K8s: Read BucketAccess resource
VStrat->>K8s: Read storage secret
VStrat->>VStrat: Extract S3 credentials
end
rect rgb(200, 240, 200)
Note over VStrat: Provision Velero Resources
VStrat->>K8s: Create/update cloud-credentials<br/>(in cozy-velero namespace)
VStrat->>K8s: Create/update BackupStorageLocation
VStrat->>K8s: Create/update VolumeSnapshotLocation
end
rect rgb(255, 240, 200)
Note over VStrat: Create Velero Backup
VStrat->>K8s: Create Velero Backup resource<br/>(with BackupJob owner labels)
K8s->>Velero: Trigger backup job
Velero->>S3: Upload backup artifacts
Velero->>K8s: Update Backup status
end
rect rgb(240, 200, 200)
Note over VStrat: Finalize
VStrat->>K8s: Create Backup resource<br/>(with Velero reference)
VStrat->>K8s: Update BackupJob status<br/>(Succeeded/Failed)
K8s->>BJobCtrl: Status updated
end
BJobCtrl->>User: BackupJob Complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @androndo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces the foundational components for integrating Velero as a backup strategy within the Cozystack backup system. It defines the necessary API types for a Velero backup strategy and implements a dedicated controller to manage Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new Velero backup strategy controller. The implementation is a good start, but there are several critical and high-severity issues that need to be addressed before this can be considered production-ready.
Key issues include:
- Critical bugs in the application type validation logic.
- Hardcoded
storageLocationwhich should be dynamically resolved. - A dangerous implementation that backs up all Virtual Machines in a namespace instead of the specific one requested.
- Several incorrect assumptions about the Velero
BackupCRD structure, leading to non-functional logic for error handling and artifact creation. - Minor issues like incorrect log messages and hardcoded requeue times.
Please review the detailed comments for each issue.
6347c38 to
5649fb1
Compare
5649fb1 to
6ad416a
Compare
0fe45eb to
0d71a23
Compare
1094574 to
52838ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
packages/system/backup-controller/templates/rbac.yaml (2)
24-26: Consider scoping secrets permissions.The secrets permissions are quite broad (create/get/list/watch/update/patch across all namespaces). The controller creates secrets in the
cozy-veleronamespace. Consider if this could be scoped more narrowly usingresourceNamesor if there's a need for such broad access.
30-32: Consider adding delete permission for Velero resources.The controller creates BackupStorageLocations, VolumeSnapshotLocations, and Secrets per BackupJob, but there's no cleanup mechanism visible. Consider whether
deletepermission should be added for garbage collection when BackupJobs are deleted, or document that these resources are intentionally retained.internal/backupcontroller/velerostrategy_controller.go (3)
214-216: Use constant for requeue duration.Line 216 uses a hardcoded
5 * time.Secondwhile the rest of the file usesdefaultRequeueAfterordefaultActiveJobPollingIntervalconstants.🔎 Proposed fix
// Still in progress (InProgress, New, etc.) - return ctrl.Result{RequeueAfter: 5 * time.Second}, nil + return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil }
381-396: Avoid blocking sleep in reconciler.Using
time.Sleepin a reconciler blocks the worker goroutine. For conflict retries, consider returning with requeue instead, or use a client with built-in retry. The same pattern appears increateVolumeSnapshotLocation(lines 424-439).Alternatively, use a server-side apply or patch approach to avoid conflicts entirely:
// Example using Patch instead of Update to reduce conflicts patch := client.MergeFrom(foundBSL.DeepCopy()) foundBSL.Spec = bsl.Spec if err := r.Patch(ctx, foundBSL, patch); err != nil { return fmt.Errorf("failed to patch BackupStorageLocation: %w", err) }
450-477: Consider deduplicating conditions.The
markBackupJobFailedfunction appends to conditions without checking for duplicates. If the reconciler retries after a transient error during status update, it could append multiple "Ready=False" conditions. Consider using a helper likemeta.SetStatusConditionfromk8s.io/apimachinery/pkg/api/meta.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (9)
api/backups/v1alpha1/backupjob_types.gocmd/backup-controller/main.goexamples/desired-backup.yamlgo.modinternal/backupcontroller/backupjob_controller.gointernal/backupcontroller/velerostrategy_controller.gopackages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yamlpackages/system/backup-controller/templates/rbac.yamlpackages/system/backupstrategy-controller/definitions/strategy.backups.cozystack.io_jobs.yaml
🧰 Additional context used
📓 Path-based instructions (7)
packages/system/**
📄 CodeRabbit inference engine (AGENTS.md)
Use Helm Chart umbrella pattern with vendored upstream charts in
charts/directory
Files:
packages/system/backup-controller/templates/rbac.yamlpackages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml
**/*.yaml
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms
Files:
packages/system/backup-controller/templates/rbac.yamlpackages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yamlexamples/desired-backup.yaml
api/**
📄 CodeRabbit inference engine (AGENTS.md)
Define Kubernetes CRDs in the api directory following kubebuilder conventions
Files:
api/backups/v1alpha1/backupjob_types.go
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)
Files:
api/backups/v1alpha1/backupjob_types.gointernal/backupcontroller/backupjob_controller.gointernal/backupcontroller/velerostrategy_controller.gocmd/backup-controller/main.go
internal/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Files:
internal/backupcontroller/backupjob_controller.gointernal/backupcontroller/velerostrategy_controller.go
cmd/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory
Files:
cmd/backup-controller/main.go
go.mod
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT modify go.mod manually; use
go getcommand instead
Files:
go.mod
🧠 Learnings (5)
📚 Learning: 2025-07-10T12:14:27.197Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1161
File: packages/apps/virtual-machine/templates/dashboard-resourcemap.yaml:6-12
Timestamp: 2025-07-10T12:14:27.197Z
Learning: Kubernetes RBAC rules with resourceNames work correctly for list/watch verbs. When resourceNames is specified in an RBAC rule, it properly restricts access to only those named resources, even for list and watch operations. Examples: `kubectl get resource resourcename -w` watches for changes on a single resource, and `kubectl get resource --field-selector .metadata.name=resourcename` lists a specific resource. The Kubernetes API server correctly distinguishes such requests from their less specific counterparts.
Applied to files:
packages/system/backup-controller/templates/rbac.yaml
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to pkg/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in pkg directory
Applied to files:
cmd/backup-controller/main.go
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to internal/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Applied to files:
cmd/backup-controller/main.go
🧬 Code graph analysis (2)
internal/backupcontroller/backupjob_controller.go (3)
api/backups/v1alpha1/backupjob_types.go (1)
BackupJob(96-102)api/backups/strategy/v1alpha1/job_types.go (1)
JobStrategyKind(25-25)api/backups/strategy/v1alpha1/velero_types.go (1)
VeleroStrategyKind(24-24)
internal/backupcontroller/velerostrategy_controller.go (4)
api/backups/v1alpha1/backupjob_types.go (1)
BackupJob(96-102)internal/backupcontroller/backupjob_controller.go (1)
BackupJobReconciler(20-24)api/backups/strategy/v1alpha1/velero_types.go (1)
Velero(32-38)api/backups/v1alpha1/backup_types.go (5)
Backup(103-109)BackupSpec(51-77)BackupArtifact(35-48)BackupStatus(80-93)BackupPhaseReady(30-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (10)
examples/desired-backup.yaml (1)
1-20: LGTM!The example manifest is well-structured and demonstrates proper usage of the BackupJob CR with all required references (applicationRef, storageRef, strategyRef). The manual trigger label is a nice touch for filtering.
api/backups/v1alpha1/backupjob_types.go (1)
88-89: LGTM!The kubebuilder markers correctly enable the status subresource and add a Phase print column with priority=0 for default visibility. This follows kubebuilder conventions and aligns with the controller's need to update BackupJob status.
packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml (1)
17-21: LGTM!The CRD manifest correctly adds the Phase printer column and status subresource, consistent with the kubebuilder markers in
api/backups/v1alpha1/backupjob_types.go. This enables the controller to update status independently and provides visibility into BackupJob phase via kubectl.Also applies to: 240-241
internal/backupcontroller/backupjob_controller.go (1)
26-67: LGTM!The reconcile logic is well-structured with:
- Proper logger initialization from context
- Defensive nil checks for
StrategyRef.APIGroup- Clear logging at appropriate verbosity levels
- Clean switch dispatch to strategy-specific handlers
The event recorder field is properly wired for use in the Velero strategy controller.
cmd/backup-controller/main.go (2)
50-56: LGTM!Scheme registrations for
strategyv1alpha1andvelerov1are correctly added in the init function, enabling the controller to work with Velero and strategy custom resources. This follows controller-runtime patterns.
162-169: LGTM!The BackupJobReconciler is correctly wired with the manager's client, scheme, and event recorder. The setup follows the same pattern as PlanReconciler and adheres to kubebuilder conventions. Based on learnings, this follows controller-runtime patterns for Go code in the cmd directory.
internal/backupcontroller/velerostrategy_controller.go (3)
72-105: LGTM on validation logic.The validation logic correctly checks for supported application and storage types with clear error messages. The early return pattern on failures is well-implemented.
611-666: Backup resource creation looks correct.The
createBackupResourcefunction properly:
- Creates the artifact with Velero URI scheme
- Extracts takenAt from Velero backup timestamps
- Sets driver metadata for Velero backup reference
- Uses owner references to link back to BackupJob
- Handles optional PlanRef
584-594: Label selector is correctly aligned with VirtualMachine labeling convention.The verification confirms that VirtualMachine resources created by the Helm chart apply the
app.kubernetes.io/instancelabel using{{ .Release.Name }}, which follows thevirtual-machine-{name}prefix convention defined in the resource definition. The backup controller's label selector using"virtual-machine-" + backupJob.Spec.ApplicationRef.Namecorrectly matches this convention.go.mod (1)
20-20: Velero v1.17.1 is a valid and current release.No code-level issues identified. Verify compatibility with your cluster's Velero version during deployment testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
internal/backupcontroller/velerostrategy_controller.go (2)
204-205: Use the defined constant for consistency.Line 205 hardcodes
5 * time.Secondinstead of using thedefaultActiveJobPollingIntervalconstant defined at line 48.🔎 Proposed fix
// Still in progress (InProgress, New, etc.) - return ctrl.Result{RequeueAfter: 5 * time.Second}, nil + return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil
399-440: Consider extracting the retry logic to reduce duplication.The retry logic in lines 416-431 is identical to lines 373-388 in
createBackupStorageLocation. Consider extracting this into a generic helper function to improve maintainability.💡 Example refactor
func (r *BackupJobReconciler) updateWithRetry(ctx context.Context, obj client.Object, updateSpec func() error) error { logger := log.FromContext(ctx) for i := 0; i < 3; i++ { key := client.ObjectKeyFromObject(obj) if err := r.Get(ctx, key, obj); err != nil { return fmt.Errorf("failed to get object for update: %w", err) } if err := updateSpec(); err != nil { if errors.IsConflict(err) && i < 2 { logger.V(1).Info("conflict updating object, retrying", "attempt", i+1) time.Sleep(100 * time.Millisecond) continue } return err } return nil } return fmt.Errorf("failed after 3 retry attempts") }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/backupcontroller/velerostrategy_controller.go
🧰 Additional context used
📓 Path-based instructions (2)
internal/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Files:
internal/backupcontroller/velerostrategy_controller.go
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)
Files:
internal/backupcontroller/velerostrategy_controller.go
🧠 Learnings (2)
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
🧬 Code graph analysis (1)
internal/backupcontroller/velerostrategy_controller.go (4)
internal/backupcontroller/backupjob_controller.go (1)
BackupJobReconciler(20-24)api/backups/v1alpha1/backupjob_types.go (1)
BackupJob(96-102)api/backups/strategy/v1alpha1/velero_types.go (1)
Velero(32-38)api/backups/v1alpha1/backup_types.go (3)
Backup(103-109)BackupSpec(51-77)BackupStatus(80-93)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (10)
internal/backupcontroller/velerostrategy_controller.go (10)
3-44: LGTM! Clean type definitions and imports.The imports are well-organized and the type definitions for S3Credentials and bucketInfo are clear and appropriate for the credential resolution flow.
46-59: LGTM! Well-defined constants and helper functions.The constants are appropriately scoped and the helper functions follow Kubernetes patterns.
61-116: LGTM! Well-structured reconciliation flow with proper error handling.The early exit logic, validation checks, and status initialization are all correctly implemented. The handling of NotFound errors when the BackupJob is deleted (lines 105-109, 182-186) is particularly good.
208-295: LGTM! Credential resolution follows COSI patterns correctly.The function properly resolves S3 credentials through the Bucket → BucketAccess → Secret chain. The use of the "bucket-" prefix for BucketAccess names (line 247) aligns with the COSI convention and previous feedback. Error handling appropriately separates field absence from parsing errors.
297-353: LGTM! Proper Secret management for Velero credentials.The credential format (lines 311-319) matches Velero's AWS plugin expectations. Setting
Data = nil(line 338) before updating withStringDatais the correct Kubernetes pattern to ensure the StringData takes precedence. Event recording provides good observability.
355-397: LGTM! Proper conflict handling for Velero resource updates.The retry logic (lines 373-388) correctly handles conflicts from Velero's concurrent status updates. The DeepEqual check (line 371) prevents unnecessary updates.
442-469: LGTM! Comprehensive failure handling.The function properly updates all relevant status fields and handles the case where the BackupJob is deleted during reconciliation (lines 459-462). The condition handling follows Kubernetes conventions.
471-566: LGTM! Well-orchestrated Velero resource creation.The dynamic creation of BackupStorageLocation and VolumeSnapshotLocation based on resolved Bucket credentials is correctly implemented. The namespace-prefixed location names (line 477) avoid conflicts in the shared cozy-velero namespace. Comprehensive event recording provides good observability.
603-666: LGTM! Proper Backup resource creation with correct owner references.The function correctly materializes a Backup resource from the Velero Backup data. The owner reference now includes
Controller: true(line 635), addressing previous feedback. Timestamp selection logic (lines 612-617) appropriately prefers the Velero StartTimestamp over CreationTimestamp.
582-586: No action needed. The label selector correctly matches VirtualMachine labels as enforced by the Helm chart template, which appliesapp.kubernetes.io/instance: {{ .Release.Name }}to all VirtualMachine resources. The CRD definition specifies avirtual-machine-prefix, ensuring the label value follows the expectedvirtual-machine-<name>convention consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/backupcontroller/velerostrategy_controller.go (1)
205-205: Use the defined constant for consistency.This line uses a hardcoded
5 * time.Secondwhile the rest of the file usesdefaultRequeueAfterordefaultActiveJobPollingIntervalconstants. Use the constant for consistency and maintainability.🔎 Proposed fix
- return ctrl.Result{RequeueAfter: 5 * time.Second}, nil + return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
internal/backupcontroller/velerostrategy_controller.gopackages/system/backup-controller/templates/strategy.yaml
🧰 Additional context used
📓 Path-based instructions (4)
packages/system/**
📄 CodeRabbit inference engine (AGENTS.md)
Use Helm Chart umbrella pattern with vendored upstream charts in
charts/directory
Files:
packages/system/backup-controller/templates/strategy.yaml
**/*.yaml
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms
Files:
packages/system/backup-controller/templates/strategy.yaml
internal/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Files:
internal/backupcontroller/velerostrategy_controller.go
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)
Files:
internal/backupcontroller/velerostrategy_controller.go
🧠 Learnings (2)
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (10)
packages/system/backup-controller/templates/strategy.yaml (1)
1-5: LGTM!The Velero strategy template is correctly structured with an empty spec, which is appropriate for a default strategy resource. This aligns with the controller implementation that fetches and uses strategy references.
internal/backupcontroller/velerostrategy_controller.go (9)
1-22: LGTM!The package declaration and imports are appropriate for a Velero backup controller implementation, including necessary K8s APIs, controller-runtime packages, and Velero v1 types.
24-59: LGTM!The type definitions and helper functions are well-structured. Good to see the hardcoded requeue duration now defined as a constant (
defaultRequeueAfter), addressing previous feedback about maintainability.
208-295: LGTM!The credential resolution logic correctly follows the flow from Bucket → BucketAccess → Secret → S3 credentials. The BucketAccess naming convention (
bucket-prefix) and the separate error handling forerrand!foundcases address previous feedback effectively.
297-381: LGTM!The secret creation and update logic is well-implemented. The comparison logic (lines 338-364) ensures that updates only occur when data actually changes, reducing unnecessary operations and potential conflicts. This addresses the "better s3 secret update" improvement mentioned in the commit message.
383-468: LGTM!The storage location creation/update functions properly handle idempotency with conflict retry logic. The retry mechanism (3 attempts with 100ms delay) is adequate for handling concurrent updates from Velero's status controller.
470-497: LGTM!The failure handling properly updates status, adds conditions, and gracefully handles NotFound errors when the BackupJob is deleted during reconciliation.
499-629: Significant improvements to backup orchestration.The dynamic storage location setup (lines 505-594) and scoped label selector (lines 610-614) address critical issues from previous reviews. The storage location is no longer hardcoded, and the backup is properly scoped to the specific VM instead of backing up the entire namespace.
631-694: LGTM!The Backup resource creation correctly materializes artifacts with a simple URI format (lines 635-637), sets appropriate timestamps, and includes proper owner references with
Controller: true(line 663). These changes address previous feedback about non-existent fields and missing controller ownership.
77-84: The validation and backup strategy are correct; no changes needed.The
apps.cozystack.io/VirtualMachinerefers to a Helm application definition, not a wrapper resource. The Helm chart for this application createskubevirt.io/v1/VirtualMachineresources in the same namespace. The validation at lines 77-84 correctly ensures the ApplicationRef points to a VirtualMachine Helm application, and the backup at line 606 correctly includes the actualkubevirt.ioVirtualMachine resources that were created by the Helm chart deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
internal/backupcontroller/velerostrategy_controller.go (1)
633-637: Verify label selector matches deployed VirtualMachine instances.The label selector assumes
app.kubernetes.io/instanceequalsvirtual-machine-{ApplicationRef.Name}. As flagged in previous reviews, if VirtualMachine Helm releases use{{ .Release.Name }}for this label, backups may miss VMs deployed with different release names.Consider one of these approaches:
- Enforce a naming convention that guarantees release name matches the pattern
- Query the actual VirtualMachine resource to determine its labels at runtime
- Use a more stable selector like
app.kubernetes.io/name: "virtual-machine"combined with namespace filteringRun the following script to verify the label pattern on deployed VirtualMachine resources:
#!/bin/bash # Check if VirtualMachine resources in the cluster have the expected label pattern echo "=== Checking VirtualMachine resources for label patterns ===" kubectl get virtualmachines.kubevirt.io --all-namespaces -o json | \ jq -r '.items[] | "\(.metadata.namespace)/\(.metadata.name) labels: \(.metadata.labels)"' echo "" echo "=== Checking for app.kubernetes.io/instance label values ===" kubectl get virtualmachines.kubevirt.io --all-namespaces -o json | \ jq -r '.items[] | select(.metadata.labels."app.kubernetes.io/instance" != null) | "\(.metadata.namespace)/\(.metadata.name): \(.metadata.labels."app.kubernetes.io/instance")"'
🧹 Nitpick comments (1)
internal/backupcontroller/velerostrategy_controller.go (1)
233-233: Use the defined constant for consistency.Line 233 uses a hardcoded
5 * time.Secondfor requeuing, but the constantdefaultRequeueAfteris defined at the top of the file for this purpose. Use the constant for maintainability.🔎 Proposed fix
- return ctrl.Result{RequeueAfter: 5 * time.Second}, nil + return ctrl.Result{RequeueAfter: defaultRequeueAfter}, nil
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/backupcontroller/velerostrategy_controller.go
🧰 Additional context used
📓 Path-based instructions (2)
internal/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Files:
internal/backupcontroller/velerostrategy_controller.go
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)
Files:
internal/backupcontroller/velerostrategy_controller.go
🧠 Learnings (2)
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (7)
internal/backupcontroller/velerostrategy_controller.go (7)
26-78: Good: Helper functions and constants improve maintainability.The introduction of
loggerWithDebug, structured credential types, and consolidated constants (likedefaultRequeueAfter) addresses previous feedback about hardcoded values and improves code clarity. The helper functions likeboolPtrandstorageS3SecretNamefollow Go best practices.
95-102: Validation correctly enforces VirtualMachine support only.The validation logic correctly restricts ApplicationRef to
VirtualMachinewith APIGroupapps.cozystack.io. Previous comments mentioned HelmRelease support, but that appears to have been removed from scope. The current implementation is consistent with this constraint.
236-323: LGTM: Credential discovery follows COSI patterns.The function correctly resolves S3 credentials from a Bucket storageRef by following the COSI flow: Bucket → BucketAccess → credentials secret → BucketInfo. The stable naming pattern (
bucket-<bucketname>) and error handling have been properly addressed per previous feedback.
325-409: Well-implemented idempotent secret management.The function correctly creates or updates Velero credentials in the expected AWS config format. The data comparison logic (lines 376-388) prevents unnecessary updates, and proper events are emitted for observability. The secret format accommodates SeaweedFS as an S3-compatible backend.
411-496: Robust conflict handling for concurrent Velero updates.Both
createBackupStorageLocationandcreateVolumeSnapshotLocationimplement proper idempotent create-or-update logic with conflict retries. The 3-retry mechanism with 100ms delays handles concurrent status updates from Velero's controller, andreflect.DeepEqualprevents unnecessary spec updates.
498-520: LGTM: Standard failure handling pattern.The function correctly sets completion timestamp, phase, message, and adds a Ready=False condition. This follows Kubernetes conventions for status reporting.
654-717: Well-structured Backup resource creation.The function correctly creates a Backup resource with proper owner references (including
Controller: true), simplified artifact URI that doesn't depend on non-existent Velero status fields, and accurate driver metadata. Previous concerns about missing Controller field and incorrect field access have been addressed.
239c02a to
dee4e10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
hack/e2e-apps/backup.bats (3)
26-30: Consider usingmktempfor temporary credential files.Writing to a fixed path
/tmp/bucket-backup-credentials.jsoncan cause race conditions in parallel test runs and leaves credentials in a predictable location.🔎 Proposed fix
+ CRED_FILE=$(mktemp /tmp/bucket-backup-credentials.XXXXXX.json) + trap "rm -f $CRED_FILE" EXIT + # Get bucket credentials for later S3 verification - kubectl -n ${namespace} get secret bucket-${bucket_name} -ojsonpath='{.data.BucketInfo}' | base64 -d > /tmp/bucket-backup-credentials.json - ACCESS_KEY=$(jq -r '.spec.secretS3.accessKeyID' /tmp/bucket-backup-credentials.json) - SECRET_KEY=$(jq -r '.spec.secretS3.accessSecretKey' /tmp/bucket-backup-credentials.json) - BUCKET_NAME=$(jq -r '.spec.bucketName' /tmp/bucket-backup-credentials.json) - ENDPOINT=$(jq -r '.spec.secretS3.endpoint' /tmp/bucket-backup-credentials.json) + kubectl -n ${namespace} get secret bucket-${bucket_name} -ojsonpath='{.data.BucketInfo}' | base64 -d > "$CRED_FILE" + ACCESS_KEY=$(jq -r '.spec.secretS3.accessKeyID' "$CRED_FILE") + SECRET_KEY=$(jq -r '.spec.secretS3.accessSecretKey' "$CRED_FILE") + BUCKET_NAME=$(jq -r '.spec.bucketName' "$CRED_FILE") + ENDPOINT=$(jq -r '.spec.secretS3.endpoint' "$CRED_FILE")
121-127: Consider usingjqfor more robust JSON parsing.Word splitting on kubectl jsonpath output works for K8s resource names but is fragile. Using
jqwith a proper array iteration would be more robust and handle edge cases better.🔎 Proposed fix
# Find backup by pattern matching namespace-backupjob - for backup in $(kubectl -n cozy-velero get backups.velero.io -o jsonpath='{.items[*].metadata.name}' 2>/dev/null); do - if echo "$backup" | grep -q "^${namespace}-${backupjob_name}-"; then - VELERO_BACKUP_NAME=$backup - VELERO_BACKUP_PHASE=$(kubectl -n cozy-velero get backups.velero.io $backup -o jsonpath='{.status.phase}' 2>/dev/null || echo "") - break - fi - done + VELERO_BACKUP_NAME=$(kubectl -n cozy-velero get backups.velero.io -o json 2>/dev/null | \ + jq -r --arg pattern "^${namespace}-${backupjob_name}-" \ + '.items[].metadata.name | select(test($pattern))' | head -1)
133-141: Consider adding explicit failure handling for better test diagnostics.The timeout correctly waits for either
CompletedorFailed, but if the backup fails, the test will fail silently at the assertion. Adding a check for theFailedstate with a diagnostic message would help debug issues.🔎 Proposed improvement
# Verify Velero Backup is Completed VELERO_BACKUP_PHASE=$(kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}') + if [ "$VELERO_BACKUP_PHASE" = "Failed" ]; then + echo "Velero backup failed. Failure reason:" + kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.failureReason}' + false + fi [ "$VELERO_BACKUP_PHASE" = "Completed" ]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hack/e2e-apps/backup.bats
🔇 Additional comments (4)
hack/e2e-apps/backup.bats (4)
3-8: LGTM!Test variables are well-named and clearly define the resources under test.
67-72: Acceptable test setup with comprehensive wait conditions.The
sleep 5before waits is a common E2E pattern to handle resource propagation delays. The wait conditions properly cover all dependent resources (HelmRelease, DataVolume, PVC, VM).
98-106: LGTM!The BackupJob lifecycle is properly tested by waiting for the
Running→Succeededphase transition. The explicit assertion at lines 105-106 adds clarity even though it's technically redundant after the wait.
181-183: LGTM!Port-forward cleanup correctly handles the background process with proper error suppression.
hack/e2e-apps/backup.bats
Outdated
| ENDPOINT_HOST=$(echo $ENDPOINT | sed 's|https\?://||' | cut -d: -f1) | ||
| ENDPOINT_PORT=$(echo $ENDPOINT | sed 's|https\?://||' | cut -d: -f2) | ||
| if [ -z "$ENDPOINT_PORT" ]; then | ||
| ENDPOINT_PORT=8333 | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Endpoint port parsing has a bug for URLs without explicit ports.
When ENDPOINT is https://host.example.com (no port), cut -d: -f2 returns //host.example.com, not empty. The -z check won't trigger, leaving an invalid port.
🔎 Proposed fix
# Extract endpoint host and port for port-forwarding
- ENDPOINT_HOST=$(echo $ENDPOINT | sed 's|https\?://||' | cut -d: -f1)
- ENDPOINT_PORT=$(echo $ENDPOINT | sed 's|https\?://||' | cut -d: -f2)
- if [ -z "$ENDPOINT_PORT" ]; then
+ ENDPOINT_NO_PROTO=$(echo "$ENDPOINT" | sed 's|https\?://||')
+ ENDPOINT_HOST=$(echo "$ENDPOINT_NO_PROTO" | cut -d: -f1)
+ if echo "$ENDPOINT_NO_PROTO" | grep -q ':'; then
+ ENDPOINT_PORT=$(echo "$ENDPOINT_NO_PROTO" | cut -d: -f2)
+ else
ENDPOINT_PORT=8333
fi🤖 Prompt for AI Agents
In hack/e2e-apps/backup.bats around lines 145 to 149, the port extraction uses
cut -d: -f2 which returns the wrong value for URLs without an explicit port
(e.g. https://host.example.com) so the -z check fails; change the logic to first
strip the scheme (remove http:// or https://), then check if the remaining host
string contains a colon and only then extract the part after the colon as
ENDPOINT_PORT; if no colon is present set ENDPOINT_PORT=8333. Ensure the check
distinguishes presence of ":" before extracting the port so URLs without ports
fall back to the default.
134228d to
5620045
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
internal/backupcontroller/velerostrategy_controller.go (1)
633-637: Verify the label selector matches VirtualMachine deployments.The label selector assumes VMs have
app.kubernetes.io/instance: virtual-machine-{ApplicationRef.Name}. This works for Cozystack's Helm-deployed VMs but may miss VMs deployed differently.This was flagged in a previous review. The current implementation follows Cozystack's Helm naming conventions, but consider documenting this assumption or adding validation that the target VM exists with the expected label.
🧹 Nitpick comments (5)
internal/backupcontroller/velerostrategy_controller.go (5)
232-234: Use the defined constant for requeue duration.Line 233 uses a hardcoded
5 * time.Secondinstead of the already-defineddefaultActiveJobPollingIntervalconstant.🔎 Proposed fix
// Still in progress (InProgress, New, etc.) - return ctrl.Result{RequeueAfter: 5 * time.Second}, nil + return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil }
247-251: Consider extracting the hardcoded API version.The version
"v1alpha1"is hardcoded. If the Bucket API evolves, this could become a maintenance burden.
381-387: Consider usingbytes.Equalinstead ofreflect.DeepEqualfor byte slices.For comparing byte slices,
bytes.Equalis more idiomatic and slightly more efficient thanreflect.DeepEqual.🔎 Proposed fix
+import "bytes" + for k, newVal := range newData { existingVal, exists := existingData[k] - if !exists || !reflect.DeepEqual(existingVal, newVal) { + if !exists || !bytes.Equal(existingVal, newVal) { dataChanged = true break }
506-512: Consider usingmeta.SetStatusConditionfor idempotent condition updates.Appending conditions directly could lead to duplicates if this function is called multiple times before the status update succeeds. Using
meta.SetStatusConditionfromk8s.io/apimachinery/pkg/api/metaensures idempotent updates.🔎 Proposed fix
+import "k8s.io/apimachinery/pkg/api/meta" + // Add condition - backupJob.Status.Conditions = append(backupJob.Status.Conditions, metav1.Condition{ + meta.SetStatusCondition(&backupJob.Status.Conditions, metav1.Condition{ Type: "Ready", Status: metav1.ConditionFalse, Reason: "BackupFailed", Message: message, LastTransitionTime: now, })
706-708: Remove redundant nil check.The
artifactvariable is unconditionally assigned on line 658, so this nil check will always be true.🔎 Proposed fix
if backupJob.Spec.PlanRef != nil { backup.Spec.PlanRef = backupJob.Spec.PlanRef } - if artifact != nil { - backup.Status.Artifact = artifact - } + backup.Status.Artifact = artifact if err := r.Create(ctx, backup); err != nil {
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (11)
api/backups/v1alpha1/backupjob_types.gocmd/backup-controller/main.goexamples/desired-backup.yamlgo.modhack/e2e-apps/backup.batsinternal/backupcontroller/backupjob_controller.gointernal/backupcontroller/velerostrategy_controller.gopackages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yamlpackages/system/backup-controller/templates/rbac.yamlpackages/system/backup-controller/templates/strategy.yamlpackages/system/backupstrategy-controller/definitions/strategy.backups.cozystack.io_jobs.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/system/backup-controller/templates/strategy.yaml
- hack/e2e-apps/backup.bats
- packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml
🧰 Additional context used
📓 Path-based instructions (7)
go.mod
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT modify go.mod manually; use
go getcommand instead
Files:
go.mod
internal/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Files:
internal/backupcontroller/backupjob_controller.gointernal/backupcontroller/velerostrategy_controller.go
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)
Files:
internal/backupcontroller/backupjob_controller.goapi/backups/v1alpha1/backupjob_types.gointernal/backupcontroller/velerostrategy_controller.gocmd/backup-controller/main.go
api/**
📄 CodeRabbit inference engine (AGENTS.md)
Define Kubernetes CRDs in the api directory following kubebuilder conventions
Files:
api/backups/v1alpha1/backupjob_types.go
**/*.yaml
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms
Files:
examples/desired-backup.yamlpackages/system/backup-controller/templates/rbac.yaml
packages/system/**
📄 CodeRabbit inference engine (AGENTS.md)
Use Helm Chart umbrella pattern with vendored upstream charts in
charts/directory
Files:
packages/system/backup-controller/templates/rbac.yaml
cmd/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory
Files:
cmd/backup-controller/main.go
🧠 Learnings (6)
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to go.mod : Do NOT modify go.mod manually; use `go get` command instead
Applied to files:
go.mod
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-10T12:14:27.197Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1161
File: packages/apps/virtual-machine/templates/dashboard-resourcemap.yaml:6-12
Timestamp: 2025-07-10T12:14:27.197Z
Learning: Kubernetes RBAC rules with resourceNames work correctly for list/watch verbs. When resourceNames is specified in an RBAC rule, it properly restricts access to only those named resources, even for list and watch operations. Examples: `kubectl get resource resourcename -w` watches for changes on a single resource, and `kubectl get resource --field-selector .metadata.name=resourcename` lists a specific resource. The Kubernetes API server correctly distinguishes such requests from their less specific counterparts.
Applied to files:
packages/system/backup-controller/templates/rbac.yaml
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to pkg/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in pkg directory
Applied to files:
cmd/backup-controller/main.go
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to internal/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Applied to files:
cmd/backup-controller/main.go
🧬 Code graph analysis (3)
internal/backupcontroller/backupjob_controller.go (4)
pkg/apiserver/apiserver.go (1)
Scheme(53-53)api/backups/v1alpha1/backupjob_types.go (1)
BackupJob(96-102)api/backups/strategy/v1alpha1/job_types.go (1)
JobStrategyKind(25-25)api/backups/strategy/v1alpha1/velero_types.go (1)
VeleroStrategyKind(24-24)
internal/backupcontroller/velerostrategy_controller.go (4)
internal/backupcontroller/backupjob_controller.go (1)
BackupJobReconciler(20-24)api/backups/v1alpha1/backupjob_types.go (1)
BackupJob(96-102)api/backups/strategy/v1alpha1/velero_types.go (1)
Velero(32-38)api/backups/v1alpha1/backup_types.go (5)
Backup(103-109)BackupSpec(51-77)BackupArtifact(35-48)BackupStatus(80-93)BackupPhaseReady(30-30)
cmd/backup-controller/main.go (2)
internal/backupcontroller/backupjob_controller.go (1)
BackupJobReconciler(20-24)pkg/apiserver/apiserver.go (1)
Scheme(53-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (12)
api/backups/v1alpha1/backupjob_types.go (1)
88-89: LGTM!The kubebuilder annotations correctly enable the status subresource and expose a "Phase" column in kubectl output. This aligns with the BackupJobReconciler's status update pattern in the controller implementation.
examples/desired-backup.yaml (1)
1-20: LGTM!The example manifest correctly demonstrates a manual BackupJob creation with all required references (applicationRef, storageRef, strategyRef) properly configured. The API groups align with what the controller validates.
internal/backupcontroller/velerostrategy_controller.go (3)
63-69: LGTM!Constants are well-defined and centralized. The
defaultRequeueAfteraddresses the past review feedback about hardcoded durations.
429-444: LGTM!The retry-on-conflict logic correctly handles concurrent Velero status updates. The 3-attempt limit with 100ms backoff is reasonable.
680-688: LGTM!The owner reference now correctly sets
Controller: true, addressing the previous review feedback about garbage collection.cmd/backup-controller/main.go (2)
162-169: LGTM!The BackupJobReconciler is correctly wired with the manager, including the EventRecorder for emitting Kubernetes events. This follows controller-runtime patterns as per coding guidelines.
54-55: LGTM!Scheme registrations for strategy/v1alpha1 and velero/v1 are correctly added, enabling the reconciler to work with these API types.
packages/system/backup-controller/templates/rbac.yaml (1)
30-32: LGTM!The Velero resource permissions correctly cover the resources the controller needs to manage (backups, backupstoragelocations, volumesnapshotlocations, restores).
internal/backupcontroller/backupjob_controller.go (3)
26-39: LGTM!The reconcile function follows controller-runtime best practices: structured logging, proper NotFound handling (returning nil error), and clear error paths.
41-52: LGTM!The two-step APIGroup validation (nil check followed by value check) is defensive and prevents nil pointer dereference. Logging at V(1) for skip cases keeps the logs clean during normal operation.
55-66: LGTM!The strategy dispatch logic correctly routes to the appropriate reconciler based on the StrategyRef.Kind, with informative logging for unsupported kinds.
go.mod (1)
20-20: Velero v1.17.1 is the latest stable release and has no known product CVEs.Verification confirms that Velero v1.17.1, released November 7, 2025, is the current stable version. No product-specific security advisories apply to this release (the only notable Velero CVE, CVE-2020-3996, affects much older versions prior to 1.4.3/1.5.2).
| - apiGroups: [""] | ||
| resources: ["secrets"] | ||
| verbs: ["create", "get", "list", "watch", "update", "patch"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Search for cleanup logic related to secret deletion
find packages/system/backup-controller -type f -name "*.go" -o -name "*.yaml" | head -20Repository: cozystack/cozystack
Length of output: 815
🏁 Script executed:
# Check for any cleanup or deletion logic in the backup controller
rg -n "delete.*secret|Secret.*Delete|cleanup|orphan|finalizer" packages/system/backup-controller/ -t go -t yamlRepository: cozystack/cozystack
Length of output: 45
🏁 Script executed:
# Search for secret creation logic to understand the controller's workflow
rg -n "cozy-velero|create.*[Ss]ecret|Secret.*create" packages/system/backup-controller/ -t go -t yamlRepository: cozystack/cozystack
Length of output: 45
🏁 Script executed:
# Check if there are owner references or lifecycle management patterns
rg -n "owner.*reference|OwnerReference|finalizer|garbage.*collection" packages/system/backup-controller/ -t goRepository: cozystack/cozystack
Length of output: 45
🏁 Script executed:
# Look at the backup-controller deployment to see where the actual controller code comes from
cat packages/system/backup-controller/templates/deployment.yaml | head -50Repository: cozystack/cozystack
Length of output: 1488
🏁 Script executed:
# Search the entire repo for backupcontroller Go code
find . -type f -name "*.go" | xargs grep -l "backupcontroller\|backup-controller" | head -10Repository: cozystack/cozystack
Length of output: 344
🏁 Script executed:
# Look for any internal packages related to backup
find . -type d -name "*backup*" | grep -E "internal|pkg|cmd"Repository: cozystack/cozystack
Length of output: 146
🏁 Script executed:
# Check the BackupJob controller for deletion/cleanup logic
cat internal/backupcontroller/backupjob_controller.goRepository: cozystack/cozystack
Length of output: 2608
🏁 Script executed:
# Check the Velero strategy controller for secret management
cat internal/backupcontroller/velerostrategy_controller.goRepository: cozystack/cozystack
Length of output: 28771
Add delete permission and implement cleanup for orphaned Velero credential secrets.
The concern is valid: the controller creates secrets in the cozy-velero namespace but cannot use owner references due to cross-namespace limitations. No cleanup logic currently exists when BackupJobs are deleted, causing secrets to be orphaned. This will accumulate over time. Either implement a finalizer on BackupJob to delete associated secrets, or add cleanup logic triggered on BackupJob deletion. This requires adding delete permission for secrets in the RBAC rules.
🤖 Prompt for AI Agents
packages/system/backup-controller/templates/rbac.yaml lines 24-26: the RBAC
rules grant create/get/list/watch/update/patch on secrets but lack the "delete"
verb required to remove orphaned Velero credential secrets; update the verbs
array to include "delete" and then implement cleanup in the controller by adding
a BackupJob finalizer (or handling BackupJob deletion events) that enumerates
and deletes the secrets the controller created in the cozy-velero namespace (use
labels or predictable names to identify them) and removes the finalizer after
successful deletion so secrets are cleaned up on BackupJob removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
hack/e2e-apps/backup.bats (2)
17-17:pkillpattern may terminate unrelated port-forwards.The pattern
kubectl.*port-forward.*seaweedfs-s3could inadvertently kill port-forwards from other tests or sessions. Consider storing the PID when the port-forward starts (line 207) and using it here for precise cleanup.🔎 Proposed fix
+# Track port-forward PID at file scope for teardown +PORT_FORWARD_PID="" + teardown() { # Clean up resources (runs even if test fails) # ... - # Clean up port-forward if still running - pkill -f "kubectl.*port-forward.*seaweedfs-s3" 2>/dev/null || true + # Clean up port-forward if PID is known + if [ -n "$PORT_FORWARD_PID" ]; then + kill $PORT_FORWARD_PID 2>/dev/null || true + wait $PORT_FORWARD_PID 2>/dev/null || true + fiThen at line 206-207, capture the PID:
kubectl -n tenant-root port-forward service/seaweedfs-s3 8333:8333 > /dev/null 2>&1 & -PORT_FORWARD_PID=$! +PORT_FORWARD_PID=$! # Now accessible in teardown
194-202: Timeout loop succeeds on "Failed" status.The grep pattern
'Completed\|Failed'at line 195 will exit the loop successfully even if the backup reaches "Failed" status. While line 201 will correctly fail the test, the timeout message could be misleading.🔎 Proposed refinement
# Wait for Velero Backup to complete (with timeout) timeout 300 sh -ec " - until kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' | grep -q 'Completed\|Failed'; do + until kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' | grep -q 'Completed'; do + # Exit early if backup failed + if kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' | grep -q 'Failed'; then + echo 'Backup failed' >&2 + exit 1 + fi sleep 5 done "
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hack/e2e-apps/backup.bats
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (1)
hack/e2e-apps/backup.bats (1)
206-206: Verify the hardcodedtenant-rootnamespace is correct.The port-forward targets
tenant-root, but the test uses namespacetenant-test. This might be intentional if the SeaweedFS S3 service runs in a shared namespace, but please confirm this is the expected configuration.
| kubectl -n ${namespace} delete bucket.apps.cozystack.io ${bucket_name} --wait=false 2>/dev/null || true | ||
|
|
||
| # Clean up temporary files | ||
| rm -f /tmp/bucket-backup-credentials.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete temporary file cleanup.
Line 90 creates /tmp/bucket-backup-credentials-fixed.json, but teardown only removes /tmp/bucket-backup-credentials.json.
🔎 Proposed fix
# Clean up temporary files
- rm -f /tmp/bucket-backup-credentials.json
+ rm -f /tmp/bucket-backup-credentials.json /tmp/bucket-backup-credentials-fixed.json📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rm -f /tmp/bucket-backup-credentials.json | |
| rm -f /tmp/bucket-backup-credentials.json /tmp/bucket-backup-credentials-fixed.json |
🤖 Prompt for AI Agents
In hack/e2e-apps/backup.bats around line 46, the teardown only removes
/tmp/bucket-backup-credentials.json but the test creates
/tmp/bucket-backup-credentials-fixed.json at line 90; update the cleanup to
remove both temporary files (or adjust teardown to delete any matching
/tmp/bucket-backup-credentials*.json) so both
/tmp/bucket-backup-credentials.json and
/tmp/bucket-backup-credentials-fixed.json are removed during teardown.
hack/e2e-apps/backup.bats
Outdated
| sleep 10 | ||
|
|
||
| # Set up MinIO client | ||
| mc alias set local https://localhost:8333 $ACCESS_KEY $SECRET_KEY --insecure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Protocol mismatch: HTTPS used where HTTP is expected.
The mc alias is configured with https://localhost:8333, but line 89 patches the bucket endpoint to use http://seaweedfs-s3.... If the SeaweedFS S3 service actually serves HTTP (not HTTPS), the connection will fail even with --insecure. The --insecure flag only skips certificate validation for HTTPS connections; it doesn't handle HTTP/HTTPS protocol mismatches.
🔎 Proposed fix
# Set up MinIO client
- mc alias set local https://localhost:8333 $ACCESS_KEY $SECRET_KEY --insecure
+ mc alias set local http://localhost:8333 $ACCESS_KEY $SECRET_KEY📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| mc alias set local https://localhost:8333 $ACCESS_KEY $SECRET_KEY --insecure | |
| mc alias set local http://localhost:8333 $ACCESS_KEY $SECRET_KEY |
🤖 Prompt for AI Agents
In hack/e2e-apps/backup.bats around line 216, the mc alias is using
https://localhost:8333 which mismatches the patched S3 endpoint that serves
HTTP; update the alias to use http://localhost:8333 (and remove the unnecessary
--insecure flag) so the protocol matches the SeaweedFS S3 service and the
connection succeeds.
| timestamp := j.Status.StartedAt.Time.Format("2006-01-02-15-04-05") | ||
| veleroBackupName := fmt.Sprintf("%s-%s-%s", j.Namespace, j.Name, timestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessary. BackupJobs have unique names within their namespace, you don't need to put a timestamp in the name of the backup.velero.io resource.
| timestamp := j.Status.StartedAt.Time.Format("2006-01-02-15-04-05") | |
| veleroBackupName := fmt.Sprintf("%s-%s-%s", j.Namespace, j.Name, timestamp) | |
| veleroBackupName := fmt.Sprintf("%s-%s", j.Namespace, j.Name) |
|
|
||
| backup := &backupsv1alpha1.Backup{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: fmt.Sprintf("%s-backup", backupJob.Name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Name: fmt.Sprintf("%s-backup", backupJob.Name), | |
| Name: backupJob.Name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
hack/e2e-apps/backup.bats (1)
213-215: Fix S3 endpoint protocol: use HTTP instead of HTTPS.Line 215 configures the MinIO client with
https://localhost:8333, but the SeaweedFS S3 service listens only on HTTP (port 8333 withhttpsPort: 0disabled in configuration). TheMC_INSECURE=1flag only skips certificate validation for HTTPS; it cannot resolve a protocol mismatch. Change line 215 to usehttp://:Suggested change
# Set up MinIO client with insecure flag (use environment variable for all commands) export MC_INSECURE=1 - mc alias set local https://localhost:8333 $ACCESS_KEY $SECRET_KEY + mc alias set local http://localhost:8333 $ACCESS_KEY $SECRET_KEY
🧹 Nitpick comments (1)
hack/e2e-apps/backup.bats (1)
177-177: Consider replacing arbitrary sleeps with polling loops for more reliable tests.Lines 177 and 211 use fixed 30-second sleeps to wait for asynchronous operations. These arbitrary delays can make tests unnecessarily slow (if the operation completes faster) or flaky (if the operation takes longer than 30 seconds). Polling loops with timeouts provide more reliable and efficient waiting.
🔎 Example polling pattern for line 177
- print_log "Wait a bit for the backup to be created and appear in the API" - sleep 30 + print_log "Wait for backup to appear in API" + timeout 60 sh -ec "until kubectl -n cozy-velero get backups.velero.io 2>/dev/null | grep -q '^${namespace}-${backupjob_name}-'; do sleep 2; done"For line 211, consider polling the S3 backup path instead:
- # Wait a bit for backup data to be written to S3 - sleep 30 + # Poll for backup data in S3 + timeout 60 sh -ec "until mc ls local/${BUCKET_NAME}/backups/ 2>/dev/null | grep -q '${VELERO_BACKUP_NAME}'; do sleep 5; done"Also applies to: 211-211
| print_log "Wait for BackupJob to start" | ||
| kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=60s --for=jsonpath='{.status.phase}'=Running | ||
|
|
||
| print_log "Wait for BackupJob to complete" | ||
| kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=300s --for=jsonpath='{.status.phase}'=Succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition: waiting for transient Running phase may cause flaky test failures.
The test waits for phase=Running (line 158) before waiting for phase=Succeeded (line 161). If the BackupJob transitions directly to Succeeded (or completes faster than the polling interval), the Running wait will timeout even though the backup succeeded.
🔎 Proposed fix
- print_log "Wait for BackupJob to start"
- kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=60s --for=jsonpath='{.status.phase}'=Running
-
print_log "Wait for BackupJob to complete"
kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=300s --for=jsonpath='{.status.phase}'=Succeeded📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print_log "Wait for BackupJob to start" | |
| kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=60s --for=jsonpath='{.status.phase}'=Running | |
| print_log "Wait for BackupJob to complete" | |
| kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=300s --for=jsonpath='{.status.phase}'=Succeeded | |
| print_log "Wait for BackupJob to complete" | |
| kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=300s --for=jsonpath='{.status.phase}'=Succeeded |
🤖 Prompt for AI Agents
In hack/e2e-apps/backup.bats around lines 157-161, the test currently waits for
the transient Running phase before waiting for Succeeded which can race if the
BackupJob skips or quickly passes Running; remove the intermediate wait for
Running and only wait for the backupjob to reach Succeeded (with the
existing/adjusted timeout) so the check succeeds whether the job goes through
Running or directly to Succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (6)
internal/template/template.go (2)
39-56: LGTM! Helper correctly traverses JSON structures.The
mapAtStringsfunction correctly handles the recursive traversal of maps, slices, and strings. Non-string types (numbers, booleans, null) appropriately pass through unchanged.Note: The function mutates the input structures in place, which is efficient but could be surprising. Consider documenting this behavior if the function becomes public.
58-68: LGTM! Standard text/template usage.The
templatefunction correctly uses Go'stext/templatepackage for string templating. Error handling is appropriate.If additional template functions are needed in the future (e.g.,
{{ lower .Name }}), consider using.Funcs(template.FuncMap{...})before parsing.internal/template/template_test.go (2)
65-67: Use structured comparison instead of JSON string comparison.Comparing JSON-marshaled strings (lines 65-66) is fragile—field ordering or whitespace differences could cause false failures even when the objects are semantically equal.
🔎 Proposed fix using reflect.DeepEqual or cmp.Diff
- if string(b1) != string(b2) { - t.Fatalf("transformed value not equal to reference value, expected: %s, got: %s", string(b1), string(b2)) + if !reflect.DeepEqual(reference, *got) { + t.Fatalf("transformed value not equal to reference value\nexpected: %+v\ngot: %+v", reference, *got) }Or use
github.com/google/go-cmp/cmp:+ if diff := cmp.Diff(reference, *got); diff != "" { + t.Fatalf("transformed value mismatch (-want +got):\n%s", diff) + }
11-68: Expand test coverage to include error cases and edge cases.The test only covers the happy path. Consider adding tests for:
- Template parsing errors (malformed syntax:
{{ .Missing)- Template execution errors (missing keys:
{{ .Undefined }})- Empty input objects
- Nil template context
- Non-templated strings (pass-through behavior)
internal/backupcontroller/velerostrategy_controller.go (2)
218-218: Use constant instead of hardcoded duration.Line 218 uses a hardcoded
5 * time.Secondinstead of the defined constantdefaultRequeueAfterordefaultActiveJobPollingInterval. Use the constant for consistency with the rest of the file.🔎 Proposed fix
// Still in progress (InProgress, New, etc.) - return ctrl.Result{RequeueAfter: 5 * time.Second}, nil + return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil
140-166: Consider handling multiple backups in list.While the labels should ensure only one Velero Backup exists per BackupJob, there's no handling or logging if
len(veleroBackupList.Items) > 1. Consider adding a debug or warning log to catch unexpected cases where multiple backups share the same labels (e.g., due to bugs or manual intervention).🔎 Proposed enhancement
if len(veleroBackupList.Items) == 0 { // Create Velero Backup ... } else { + if len(veleroBackupList.Items) > 1 { + logger.V(1).Info("found multiple Velero Backups for BackupJob, using first one", + "count", len(veleroBackupList.Items)) + } // Assign the first backup from the list veleroBackup = &veleroBackupList.Items[0] }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
api/backups/strategy/v1alpha1/velero_types.goapi/backups/v1alpha1/backupjob_types.goapi/backups/v1alpha1/groupversion_info.gointernal/backupcontroller/backupjob_controller.gointernal/backupcontroller/velerostrategy_controller.gointernal/template/template.gointernal/template/template_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- api/backups/v1alpha1/backupjob_types.go
🧰 Additional context used
📓 Path-based instructions (3)
internal/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Files:
internal/template/template_test.gointernal/template/template.gointernal/backupcontroller/backupjob_controller.gointernal/backupcontroller/velerostrategy_controller.go
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)
Files:
internal/template/template_test.gointernal/template/template.goapi/backups/strategy/v1alpha1/velero_types.gointernal/backupcontroller/backupjob_controller.gointernal/backupcontroller/velerostrategy_controller.goapi/backups/v1alpha1/groupversion_info.go
api/**
📄 CodeRabbit inference engine (AGENTS.md)
Define Kubernetes CRDs in the api directory following kubebuilder conventions
Files:
api/backups/strategy/v1alpha1/velero_types.goapi/backups/v1alpha1/groupversion_info.go
🧠 Learnings (2)
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
🧬 Code graph analysis (4)
internal/template/template_test.go (1)
internal/template/template.go (1)
Template(9-37)
api/backups/strategy/v1alpha1/velero_types.go (2)
internal/template/template.go (1)
Template(9-37)api/backups/v1alpha1/backup_types.go (1)
BackupSpec(51-77)
internal/backupcontroller/backupjob_controller.go (4)
api/backups/v1alpha1/backupjob_types.go (1)
BackupJob(101-107)api/backups/v1alpha1/groupversion_info.go (1)
GroupVersion(34-34)api/backups/strategy/v1alpha1/job_types.go (1)
JobStrategyKind(25-25)api/backups/strategy/v1alpha1/velero_types.go (1)
VeleroStrategyKind(25-25)
internal/backupcontroller/velerostrategy_controller.go (5)
internal/backupcontroller/backupjob_controller.go (1)
BackupJobReconciler(25-31)api/backups/v1alpha1/backupjob_types.go (6)
BackupJob(101-107)BackupJobPhaseSucceeded(36-36)BackupJobPhaseFailed(37-37)OwningJobNamespaceLabel(26-26)OwningJobNameLabel(25-25)BackupJobPhaseRunning(35-35)api/backups/strategy/v1alpha1/velero_types.go (1)
Velero(33-39)api/backups/v1alpha1/backup_types.go (5)
Backup(103-109)BackupList(114-118)BackupArtifact(35-48)BackupSpec(51-77)BackupStatus(80-93)internal/template/template.go (1)
Template(9-37)
🔇 Additional comments (7)
api/backups/v1alpha1/groupversion_info.go (1)
28-34: LGTM! Clean refactor to constants.The introduction of
thisGroupandthisVersionconstants eliminates string literal duplication and improves maintainability. The constants are appropriately unexported and the publicGroupVersionAPI remains unchanged.internal/backupcontroller/backupjob_controller.go (4)
25-31: LGTM! Reconciler struct enhanced with dynamic client support.The additions of
dynamic.Interface,meta.RESTMapper, andRecorderfollow controller-runtime patterns and enable dynamic resource management and event emission.
33-46: LGTM! Improved logging and error handling.The enhanced logging with context (namespace, name) and appropriate verbosity levels (V(1) for debug) improves observability. Error handling correctly distinguishes not-found from other errors.
48-61: LGTM! Robust StrategyRef validation.The nil check for
StrategyRef.APIGroupand the APIGroup matching logic are correct. The detailed logging (expected vs. got) aids debugging when strategies don't match.
77-92: LGTM! Standard controller setup with dynamic client.The initialization of
dynamic.Interfaceandmeta.RESTMapperfollows controller-runtime patterns. Error handling is correct, and the HTTP client is appropriately created for the RESTMapper.api/backups/strategy/v1alpha1/velero_types.go (2)
51-59: No action required. This is a new API feature (v1alpha1) with no backward compatibility concerns. Existing Velero resources have empty specs and will continue to work with the new VeleroSpec structure.
9-9: Velero v1.17.1 is the current latest stable version with no security advisories. The import is actively used forVeleroSpec.
| templateFunc := func(in string) string { | ||
| out, err := template(in, templateContext) | ||
| if err != nil { | ||
| return in | ||
| } | ||
| return out | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Template errors are silently swallowed.
The templateFunc closure returns the original input string when template() fails, hiding template parsing or execution errors from the caller. This makes debugging difficult—users will see unexpanded template syntax (e.g., {{ .Missing }}) in the output without any indication that an error occurred.
Consider either:
- Collecting errors during traversal and returning them, or
- Logging the error, or
- Returning a sentinel value that causes
Templateto fail
🔎 Proposed fix to propagate template errors
One approach is to collect errors during traversal:
+ var templateErrors []error
templateFunc := func(in string) string {
out, err := template(in, templateContext)
if err != nil {
+ templateErrors = append(templateErrors, fmt.Errorf("template error in %q: %w", in, err))
return in
}
return out
}
unstructured = mapAtStrings(unstructured, templateFunc)
+ if len(templateErrors) > 0 {
+ return nil, fmt.Errorf("template errors: %v", templateErrors)
+ }Note: This requires importing "fmt" and adjusting the closure to capture templateErrors.
Committable suggestion skipped: line range outside the PR's diff.
## What this PR does This patch implements the Reconcile function for BackupJobs with a Velero strategy ref. ### Release note ```release-note [backups] Implement the Velero backup strategy controller. ``` Signed-off-by: Andrey Kolkov <androndo@gmail.com>
0cd5666 to
5f21c6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
hack/e2e-apps/backup.bats (3)
213-215: Verify HTTPS/HTTP compatibility with SeaweedFS S3.Using
https://localhost:8333assumes SeaweedFS S3 serves HTTPS. If it serves HTTP, the connection will fail even withMC_INSECURE=1(which only skips certificate validation, not protocol mismatch).#!/bin/bash # Check what protocol SeaweedFS S3 service uses kubectl get svc seaweedfs-s3 -n tenant-root -o yaml | grep -A10 'ports:' # Check if there's TLS configuration kubectl get svc seaweedfs-s3 -n tenant-root -o jsonpath='{.spec.ports[*].name}'
157-161: Race condition: waiting for transientRunningphase may cause flaky failures.If the BackupJob transitions quickly from creation to
Succeeded(or skipsRunningentirely), the wait on line 158 will timeout even though the backup succeeded. TheSucceededwait at line 161 is sufficient.🔎 Proposed fix
- print_log "Wait for BackupJob to start" - kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=60s --for=jsonpath='{.status.phase}'=Running - print_log "Wait for BackupJob to complete" kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=300s --for=jsonpath='{.status.phase}'=Succeeded
191-197: Redundant wait logic with potential infinite loop.Lines 191-194 contain an unbounded
untilloop, but line 197 duplicates the wait with a proper timeout. Either remove the unbounded loop or add a timeout to it.🔎 Proposed fix — remove redundant unbounded loop
print_log "Verify Velero Backup was found" [ -n "$VELERO_BACKUP_NAME" ] - echo '# Wait for Velero Backup to complete' >&3 - until kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' | grep -q 'Completed\|Failed'; do - sleep 5 - done - print_log "Verify Velero Backup is Completed" timeout 90 sh -ec "until [ \"\$(kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' 2>/dev/null)\" = \"Completed\" ]; do sleep 30; done"internal/backupcontroller/velerostrategy_controller.go (2)
145-166: Critical bug:veleroBackupis never assigned from the list when backups exist.When
len(veleroBackupList.Items) > 0, the code proceeds to useveleroBackup(lines 166, 180, 190, 211) but it remains an uninitialized empty struct from line 130. This causes incorrect phase checks and invalid data passed tocreateBackupResource.🔎 Proposed fix
if len(veleroBackupList.Items) == 0 { // Create Velero Backup logger.Debug("Velero Backup not found, creating new one") if err := r.createVeleroBackup(ctx, j, veleroStrategy); err != nil { logger.Error(err, "failed to create Velero Backup") return r.markBackupJobFailed(ctx, j, fmt.Sprintf("failed to create Velero Backup: %v", err)) } // After successful Velero backup creation, set phase to Running if j.Status.Phase != backupsv1alpha1.BackupJobPhaseRunning { logger.Debug("setting BackupJob phase to Running after successful Velero backup creation") j.Status.Phase = backupsv1alpha1.BackupJobPhaseRunning if err := r.Status().Update(ctx, j); err != nil { logger.Error(err, "failed to update BackupJob phase to Running") return ctrl.Result{}, err } } logger.Debug("created Velero Backup, requeuing", "veleroBackupName", veleroBackupName) // Requeue to check status return ctrl.Result{RequeueAfter: defaultRequeueAfter}, nil } + + // Assign the backup from the list + veleroBackup = &veleroBackupList.Items[0] logger.Debug("found existing Velero Backup", "veleroBackupName", veleroBackupName, "phase", veleroBackup.Status.Phase)
519-524: Missing error check after fetching application resource.The error from
r.Resource(...).Get()at line 519 is not checked before usingappat line 521. If the application doesn't exist,appwill be nil and the template call will fail with an unclear error.🔎 Proposed fix
app, err := r.Resource(mapping.Resource).Namespace(ns).Get(ctx, backupJob.Spec.ApplicationRef.Name, metav1.GetOptions{}) + if err != nil { + return fmt.Errorf("failed to get application resource %s/%s: %w", backupJob.Spec.ApplicationRef.Kind, backupJob.Spec.ApplicationRef.Name, err) + } veleroBackupSpec, err := template.Template(&strategy.Spec.Template.Spec, app.Object)packages/system/backup-controller/templates/rbac.yaml (1)
24-26: Missingdeletepermission for secrets cleanup.The controller creates secrets in the
cozy-veleronamespace (viacreateS3CredsForVelero), but withoutdeletepermission, orphaned secrets will accumulate when BackupJobs are deleted. Cross-namespace owner references cannot be used for automatic garbage collection.🔎 Proposed fix
- apiGroups: [""] resources: ["secrets"] - verbs: ["create", "get", "list", "watch", "update", "patch"] + verbs: ["create", "get", "list", "watch", "update", "patch", "delete"]
🧹 Nitpick comments (8)
internal/template/template_test.go (1)
11-68: Consider adding edge case coverage and error handling tests.The test validates basic templating functionality correctly, but consider adding:
- Tests for malformed templates (e.g.,
{{ .NonExistent }})- Tests for empty strings and nil values
- Tests for special characters in template values
- Table-driven tests to cover multiple scenarios efficiently
The JSON string comparison approach is acceptable but be aware it relies on deterministic field ordering.
api/backups/v1alpha1/backupjob_types.go (1)
24-27: Consider reviewing the label naming convention.The label keys use an unconventional format with CamelCase components:
backups.cozystack.io/owned-by.BackupJobNamebackups.cozystack.io/owned-by.BackupJobNamespaceKubernetes label keys typically use lowercase with hyphens. Consider whether this format:
- Follows existing conventions in your codebase
- Should use a format like
backups.cozystack.io/owned-by-backupjob-nameinstead- Might create confusion between the "owned-by" and "BackupJobName" portions
If this format is intentional and consistent with your project's conventions, this comment can be ignored.
internal/backupcontroller/velerostrategy_controller.go (4)
217-218: Use constant instead of hardcoded value.Line 218 uses
5 * time.Seconddirectly, butdefaultActiveJobPollingIntervalis defined for this purpose.🔎 Proposed fix
// Still in progress (InProgress, New, etc.) - return ctrl.Result{RequeueAfter: 5 * time.Second}, nil + return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil
575-578: Simplify unnecessaryfmt.Sprintf.
fmt.Sprintf("%s", backupJob.Name)is redundant—usebackupJob.Namedirectly.🔎 Proposed fix
backup := &backupsv1alpha1.Backup{ ObjectMeta: metav1.ObjectMeta{ - Name: fmt.Sprintf("%s", backupJob.Name), + Name: backupJob.Name, Namespace: backupJob.Namespace,
605-607: Remove unnecessary nil check.
artifactis always non-nil since it's unconditionally assigned at line 557. This check is dead code.🔎 Proposed fix
- if artifact != nil { - backup.Status.Artifact = artifact - } + backup.Status.Artifact = artifact
396-438: Good retry pattern for conflict handling.The retry loops with conflict detection for BackupStorageLocation and VolumeSnapshotLocation updates handle concurrent status updates from Velero correctly. Consider extracting the common retry logic into a helper function to reduce duplication.
Also applies to: 440-481
packages/system/backup-controller/templates/rbac.yaml (1)
30-32: Consider addingdeletepermission for Velero resources.For cleanup consistency, you may want
deleteon Velero resources if the controller should clean up orphaned BackupStorageLocations, VolumeSnapshotLocations, or Velero Backups when BackupJobs are removed.🔎 Proposed fix
- apiGroups: ["velero.io"] resources: ["backups", "backupstoragelocations", "volumesnapshotlocations", "restores"] - verbs: ["create", "get", "list", "watch", "update", "patch"] + verbs: ["create", "get", "list", "watch", "update", "patch", "delete"]hack/e2e-apps/backup.bats (1)
127-131: Consider removing the arbitrarysleep 5.The
sleep 5on line 127 before waiting for resources is likely unnecessary if the subsequentwaitcommands have proper timeouts. If needed for HelmRelease propagation, add a comment explaining the reason.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (16)
api/backups/strategy/v1alpha1/velero_types.goapi/backups/v1alpha1/backupjob_types.goapi/backups/v1alpha1/groupversion_info.gocmd/backup-controller/main.goexamples/desired-backup.yamlgo.modhack/e2e-apps/backup.batshack/e2e-apps/bucket.batsinternal/backupcontroller/backupjob_controller.gointernal/backupcontroller/velerostrategy_controller.gointernal/template/template.gointernal/template/template_test.gopackages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yamlpackages/system/backup-controller/templates/rbac.yamlpackages/system/backup-controller/templates/strategy.yamlpackages/system/backupstrategy-controller/definitions/strategy.backups.cozystack.io_jobs.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
- internal/template/template.go
- cmd/backup-controller/main.go
- go.mod
- packages/system/backup-controller/templates/strategy.yaml
- internal/backupcontroller/backupjob_controller.go
- hack/e2e-apps/bucket.bats
🧰 Additional context used
📓 Path-based instructions (5)
api/**
📄 CodeRabbit inference engine (AGENTS.md)
Define Kubernetes CRDs in the api directory following kubebuilder conventions
Files:
api/backups/v1alpha1/groupversion_info.goapi/backups/v1alpha1/backupjob_types.goapi/backups/strategy/v1alpha1/velero_types.go
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)
Files:
api/backups/v1alpha1/groupversion_info.goapi/backups/v1alpha1/backupjob_types.gointernal/backupcontroller/velerostrategy_controller.gointernal/template/template_test.goapi/backups/strategy/v1alpha1/velero_types.go
internal/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Files:
internal/backupcontroller/velerostrategy_controller.gointernal/template/template_test.go
**/*.yaml
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms
Files:
examples/desired-backup.yamlpackages/system/backup-controller/templates/rbac.yamlpackages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml
packages/system/**
📄 CodeRabbit inference engine (AGENTS.md)
Use Helm Chart umbrella pattern with vendored upstream charts in
charts/directory
Files:
packages/system/backup-controller/templates/rbac.yamlpackages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml
🧠 Learnings (3)
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-10T12:14:27.197Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1161
File: packages/apps/virtual-machine/templates/dashboard-resourcemap.yaml:6-12
Timestamp: 2025-07-10T12:14:27.197Z
Learning: Kubernetes RBAC rules with resourceNames work correctly for list/watch verbs. When resourceNames is specified in an RBAC rule, it properly restricts access to only those named resources, even for list and watch operations. Examples: `kubectl get resource resourcename -w` watches for changes on a single resource, and `kubectl get resource --field-selector .metadata.name=resourcename` lists a specific resource. The Kubernetes API server correctly distinguishes such requests from their less specific counterparts.
Applied to files:
packages/system/backup-controller/templates/rbac.yaml
🧬 Code graph analysis (3)
internal/backupcontroller/velerostrategy_controller.go (4)
internal/backupcontroller/backupjob_controller.go (1)
BackupJobReconciler(25-31)api/backups/strategy/v1alpha1/velero_types.go (1)
Velero(33-39)api/backups/v1alpha1/backup_types.go (6)
Backup(103-109)BackupList(114-118)BackupArtifact(35-48)BackupSpec(51-77)BackupStatus(80-93)BackupPhaseReady(30-30)internal/template/template.go (1)
Template(9-37)
internal/template/template_test.go (1)
internal/template/template.go (1)
Template(9-37)
api/backups/strategy/v1alpha1/velero_types.go (2)
internal/template/template.go (1)
Template(9-37)api/backups/v1alpha1/backup_types.go (1)
BackupSpec(51-77)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (14)
api/backups/v1alpha1/groupversion_info.go (1)
28-34: LGTM! Good refactoring to use constants.The introduction of
thisGroupandthisVersionconstants improves maintainability. Note that the kubebuilder annotation on line 19 still contains the hardcoded string "backups.cozystack.io", but this is unavoidable since annotations are comments and cannot reference constants.examples/desired-backup.yaml (1)
1-20: LGTM! Well-structured example manifest.The BackupJob example clearly demonstrates the CR structure with all three required references (applicationRef, storageRef, strategyRef). The use of
backups.cozystack.io/triggered-by: manuallabel is helpful for distinguishing manual backups.packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml (2)
17-20: LGTM! Phase column addition enhances kubectl output.The additionalPrinterColumns for Phase aligns with the kubebuilder annotation in the Go type definition and will provide useful visibility into BackupJob status.
240-241: LGTM! Status subresource properly enabled.The status subresource addition aligns with the
+kubebuilder:subresource:statusannotation inapi/backups/v1alpha1/backupjob_types.goand follows Kubernetes best practices for status handling.api/backups/v1alpha1/backupjob_types.go (1)
93-94: LGTM! Kubebuilder annotations properly configured.The status subresource and Phase print column annotations align correctly with the CRD definition in
packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml.api/backups/strategy/v1alpha1/velero_types.go (2)
9-9: LGTM on Velero v1 import.The import alias
velerov1is clear and follows Go naming conventions for external API imports.
50-59: Clean template design for Velero backup specifications.The
VeleroSpec→VeleroTemplate→velerov1.BackupSpechierarchy provides a clear structure for templating Velero backups. This aligns well with thetemplate.Templategeneric function ininternal/template/template.gothat processes these specs.internal/backupcontroller/velerostrategy_controller.go (4)
28-41: LGTM on logger wrapper.The
loggerWithDebugwrapper provides a convenientDebug()method that maps toV(1).Info(), improving code readability while maintaining proper log verbosity levels.
43-63: LGTM on credential structs.The
S3CredentialsandbucketInfostructs are well-defined for handling COSI bucket credentials. The nested structure inbucketInfocorrectly mirrors the expected JSON format from BucketInfo secrets.
65-79: LGTM on constants and helpers.Good practice defining
defaultRequeueAfteranddefaultActiveJobPollingIntervalas constants. ThestorageS3SecretNamefunction provides consistent naming for Velero credential secrets.
310-394: LGTM on S3 credentials management.Good implementation of idempotent secret creation with proper change detection using
reflect.DeepEqual. The event recording provides useful audit trail.packages/system/backup-controller/templates/rbac.yaml (1)
6-17: LGTM on core backup permissions.Proper separation of read-only access to
plansand full lifecycle access tobackupjobsandbackupswith status subresource.hack/e2e-apps/backup.bats (2)
9-47: Good teardown implementation for reliable cleanup.The
teardownfunction properly cleans up all resources (Velero backups, BSL/VSL, secrets, BackupJob, VM, Bucket) even on test failure. Using pattern matching to find Velero backups is a good approach given the generated names.
70-83: LGTM on bucket setup and readiness checks.Good use of multiple
waitcommands to ensure all bucket-related resources (HelmRelease, BucketClaim, BucketAccess) are ready before proceeding.
## What this PR does This patch narrows the scope of the Velero backup strategy controller to simply template Velero Backups according to the application being backed up and the template in the strategy. Creating storage locations is now out of scope. ### Release note ```release-note [backups] Implement templating for Velero backups and remove creation of backup storage locations and volume snapshot locations. ``` Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
5f21c6c to
f4228ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
hack/e2e-apps/backup.bats.disabled (1)
157-161: Race condition: waiting for transient Running phase may cause flaky test failures.The test waits for
phase=Running(line 158) before waiting forphase=Succeeded(line 161). If the BackupJob transitions quickly or directly to Succeeded (faster than the polling interval), the Running wait will timeout even though the backup succeeded.🔎 Proposed fix
- print_log "Wait for BackupJob to start" - kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=60s --for=jsonpath='{.status.phase}'=Running - print_log "Wait for BackupJob to complete" kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=300s --for=jsonpath='{.status.phase}'=Succeededinternal/template/template.go (1)
19-25: Template errors are silently swallowed.The
templateFuncclosure returns the original input string whentemplate()fails (line 22), hiding template parsing or execution errors from the caller. Users will see unexpanded template syntax (e.g.,{{ .Missing }}) without any indication that an error occurred, making debugging extremely difficult.🔎 Proposed fix to propagate template errors
Collect errors during traversal and return them:
+ var templateErrors []error templateFunc := func(in string) string { out, err := template(in, templateContext) if err != nil { + templateErrors = append(templateErrors, fmt.Errorf("template error in %q: %w", in, err)) return in } return out } unstructured = mapAtStrings(unstructured, templateFunc) + if len(templateErrors) > 0 { + return nil, fmt.Errorf("template errors: %v", templateErrors) + } b, err = json.Marshal(unstructured)Note: Requires importing
"fmt"at the top of the file.
🧹 Nitpick comments (8)
hack/e2e-apps/backup.bats.disabled (2)
191-201: Redundant wait logic creates confusion.Lines 191-194 implement an
untilloop without an explicit timeout, while line 197 adds a separate timeout wrapper that also waits for Completed. This creates redundant and confusing wait logic.🔎 Proposed fix to consolidate wait logic
- echo '# Wait for Velero Backup to complete' >&3 - until kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' | grep -q 'Completed\|Failed'; do - sleep 5 - done - print_log "Verify Velero Backup is Completed" timeout 90 sh -ec "until [ \"\$(kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' 2>/dev/null)\" = \"Completed\" ]; do sleep 30; done"
204-208: Port-forward background process may leak.The
timeout 100swrapper spawns the port-forward in the background but relies on teardown'spkill(line 17) to clean it up. If the test script exits abnormally, the process may leak.🔎 Proposed improvement using a subshell trap
- bash -c 'timeout 100s kubectl port-forward service/seaweedfs-s3 -n tenant-root 8333:8333 > /dev/null 2>&1 &' + (trap 'kill $PORT_FWD_PID 2>/dev/null || true' EXIT + kubectl port-forward service/seaweedfs-s3 -n tenant-root 8333:8333 > /dev/null 2>&1 & + PORT_FWD_PID=$! + sleep 100 + ) &Alternatively, store the PID and ensure cleanup in teardown.
internal/template/template.go (1)
9-37: Consider adding documentation for the Template function.The generic
Templatefunction performs complex operations (JSON marshaling, recursive string transformation, unmarshaling). Adding a doc comment would help users understand its behavior, especially regarding error handling and the template context format.Example documentation
// Template applies text/template transformations to all string fields in obj using templateContext. // It marshals obj to JSON, recursively applies template substitutions to string values, // and unmarshals back to type T. Template syntax errors in individual strings currently // result in the original string being preserved (see known limitation). // // Parameters: // - obj: pointer to the object to template // - templateContext: variables available to templates (e.g., map[string]any{"name": "foo"}) // // Returns a new templated object of type T, or an error if marshaling/unmarshaling fails. func Template[T any](obj *T, templateContext map[string]any) (*T, error) {internal/backupcontroller/backupjob_controller.go (1)
25-31: Consider naming the embedded interfaces for clarity.Embedding
dynamic.Interfaceandmeta.RESTMapperdirectly works but can be confusing for readers. Named fields would improve clarity and make the struct self-documenting.🔎 Proposed refactor
type BackupJobReconciler struct { client.Client - dynamic.Interface - meta.RESTMapper + DynamicClient dynamic.Interface + RESTMapper meta.RESTMapper Scheme *runtime.Scheme Recorder record.EventRecorder }internal/backupcontroller/velerostrategy_controller.go (4)
226-226: Use the defined constant for requeue duration.Line 226 uses a hardcoded
5 * time.Secondinstead of thedefaultActiveJobPollingIntervalconstant defined at line 67. This was noted in past reviews.🔎 Proposed fix
// Still in progress (InProgress, New, etc.) - return ctrl.Result{RequeueAfter: 5 * time.Second}, nil + return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil }
498-505: Consider using meta.SetStatusCondition for idempotent condition updates.Appending to
Conditionsdirectly could create duplicates if this function is called multiple times due to update failures. Usingmeta.SetStatusConditionfromk8s.io/apimachinery/pkg/api/metaensures idempotent updates.🔎 Proposed fix
+ apimeta "k8s.io/apimachinery/pkg/api/meta" ... // Add condition - backupJob.Status.Conditions = append(backupJob.Status.Conditions, metav1.Condition{ + apimeta.SetStatusCondition(&backupJob.Status.Conditions, metav1.Condition{ Type: "Ready", Status: metav1.ConditionFalse, Reason: "BackupFailed", Message: message, LastTransitionTime: now, })
586-589: Simplify redundant string formatting.
fmt.Sprintf("%s", backupJob.Name)is equivalent to justbackupJob.Name.🔎 Proposed fix
backup := &backupsv1alpha1.Backup{ ObjectMeta: metav1.ObjectMeta{ - Name: fmt.Sprintf("%s", backupJob.Name), + Name: backupJob.Name, Namespace: backupJob.Namespace,
616-618: Remove unnecessary nil check.The
artifactvariable is always non-nil as it's assigned at line 568. This check is redundant.🔎 Proposed fix
if backupJob.Spec.PlanRef != nil { backup.Spec.PlanRef = backupJob.Spec.PlanRef } - if artifact != nil { - backup.Status.Artifact = artifact - } + backup.Status.Artifact = artifact if err := r.Create(ctx, backup); err != nil {
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
api/backups/strategy/v1alpha1/velero_types.goapi/backups/strategy/v1alpha1/zz_generated.deepcopy.goapi/backups/v1alpha1/backupjob_types.goapi/backups/v1alpha1/groupversion_info.gohack/e2e-apps/backup.bats.disabledinternal/backupcontroller/backupjob_controller.gointernal/backupcontroller/velerostrategy_controller.gointernal/template/template.gointernal/template/template_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/template/template_test.go
🧰 Additional context used
📓 Path-based instructions (3)
internal/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Files:
internal/backupcontroller/backupjob_controller.gointernal/template/template.gointernal/backupcontroller/velerostrategy_controller.go
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)
Files:
internal/backupcontroller/backupjob_controller.goapi/backups/strategy/v1alpha1/velero_types.gointernal/template/template.goapi/backups/v1alpha1/backupjob_types.goapi/backups/strategy/v1alpha1/zz_generated.deepcopy.gointernal/backupcontroller/velerostrategy_controller.goapi/backups/v1alpha1/groupversion_info.go
api/**
📄 CodeRabbit inference engine (AGENTS.md)
Define Kubernetes CRDs in the api directory following kubebuilder conventions
Files:
api/backups/strategy/v1alpha1/velero_types.goapi/backups/v1alpha1/backupjob_types.goapi/backups/strategy/v1alpha1/zz_generated.deepcopy.goapi/backups/v1alpha1/groupversion_info.go
🧠 Learnings (2)
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.
Applied to files:
internal/backupcontroller/velerostrategy_controller.go
🧬 Code graph analysis (3)
internal/backupcontroller/backupjob_controller.go (4)
api/backups/v1alpha1/backupjob_types.go (1)
BackupJob(101-107)api/backups/v1alpha1/groupversion_info.go (1)
GroupVersion(34-34)api/backups/strategy/v1alpha1/job_types.go (1)
JobStrategyKind(25-25)api/backups/strategy/v1alpha1/velero_types.go (1)
VeleroStrategyKind(25-25)
api/backups/strategy/v1alpha1/zz_generated.deepcopy.go (2)
internal/template/template.go (1)
Template(9-37)api/backups/strategy/v1alpha1/velero_types.go (1)
VeleroTemplate(57-59)
internal/backupcontroller/velerostrategy_controller.go (4)
internal/backupcontroller/backupjob_controller.go (1)
BackupJobReconciler(25-31)api/backups/v1alpha1/backupjob_types.go (1)
BackupJob(101-107)api/backups/strategy/v1alpha1/velero_types.go (1)
Velero(33-39)api/backups/v1alpha1/backup_types.go (6)
BackupList(114-118)Backup(103-109)BackupArtifact(35-48)BackupSpec(51-77)BackupStatus(80-93)BackupPhaseReady(30-30)
🔇 Additional comments (13)
api/backups/strategy/v1alpha1/zz_generated.deepcopy.go (1)
1-236: LGTM! Auto-generated deepcopy methods are correct.The controller-gen tool has properly generated deep-copy methods for the new VeleroTemplate type and updated existing methods to handle nested struct copying via DeepCopyInto calls.
As per coding guidelines, auto-generated files (zz_generated.*.go) should not be manually edited.
api/backups/strategy/v1alpha1/velero_types.go (2)
9-9: LGTM! Velero integration properly imported.The import alias
velerov1follows Go conventions and enables integration with Velero's BackupSpec type.
51-59: LGTM! VeleroSpec and VeleroTemplate types are well-designed.The VeleroTemplate wrapper around
velerov1.BackupSpecenables templating of Velero backup specifications while maintaining type safety. The structure follows kubebuilder conventions with appropriate JSON tags.hack/e2e-apps/backup.bats.disabled (1)
9-47: LGTM! Comprehensive teardown ensures proper cleanup.The teardown function properly cleans up all resources including Velero backups, storage locations, secrets, and port-forwards. This ensures test isolation.
api/backups/v1alpha1/groupversion_info.go (1)
28-34: LGTM! Constants improve maintainability.Extracting group and version as constants (
thisGroup,thisVersion) improves maintainability and enables consistent reuse across the API package, particularly for label prefixes mentioned in the AI summary.api/backups/v1alpha1/backupjob_types.go (2)
24-27: LGTM! Label constants are well-defined.The label constants use the group prefix pattern correctly for namespacing. The
owned-by.BackupJobNameandowned-by.BackupJobNamespacesuffixes clearly convey ownership semantics.
93-94: LGTM! Kubebuilder annotations are correctly placed.The
+kubebuilder:subresource:statusenables the status subresource for proper status updates, and the+kubebuilder:printcolumnannotation adds useful Phase visibility when listing BackupJobs.internal/backupcontroller/backupjob_controller.go (2)
77-92: LGTM! SetupWithManager properly initializes dependencies.The dynamic client and REST mapper are correctly initialized from the manager's config. Error handling is appropriate, and the controller registration follows controller-runtime conventions.
48-59: LGTM! Defensive nil check and API group validation.The nil check for
StrategyRef.APIGroupprevents panics, and the API group matching ensures only supported strategies are processed. Logging at V(1) level is appropriate for skip scenarios.internal/backupcontroller/velerostrategy_controller.go (4)
28-41: LGTM! Debug logger wrapper is clean and useful.The
loggerWithDebugwrapper provides a convenientDebug()method that maps toV(1).Info(), reducing verbosity in the calling code while maintaining proper log levels.
318-402: LGTM! Idempotent secret management with proper change detection.The function correctly handles create/update scenarios, compares data before updating to avoid unnecessary writes, and emits appropriate events.
404-446: LGTM! Conflict retry pattern is well implemented.The 3-attempt retry with 100ms backoff handles concurrent Velero status updates gracefully. The spec comparison avoids unnecessary updates.
515-562: LGTM! Velero Backup creation is well-structured.The function properly resolves the application resource, templates the backup spec, and creates the Velero Backup with appropriate labels for ownership tracking. Error handling is in place.
| // resolveBucketStorageRef discovers S3 credentials from a Bucket storageRef | ||
| // It follows this flow: | ||
| // 1. Get the Bucket resource (apps.cozystack.io/v1alpha1) | ||
| // 2. Find the BucketAccess that references this bucket | ||
| // 3. Get the secret from BucketAccess.spec.credentialsSecretName | ||
| // 4. Decode BucketInfo from secret.data.BucketInfo and extract S3 credentials | ||
| func (r *BackupJobReconciler) resolveBucketStorageRef(ctx context.Context, storageRef corev1.TypedLocalObjectReference, namespace string) (*S3Credentials, error) { | ||
| logger := getLogger(ctx) | ||
|
|
||
| // Step 1: Get the Bucket resource | ||
| bucket := &unstructured.Unstructured{} | ||
| bucket.SetGroupVersionKind(schema.GroupVersionKind{ | ||
| Group: *storageRef.APIGroup, | ||
| Version: "v1alpha1", | ||
| Kind: storageRef.Kind, | ||
| }) | ||
|
|
||
| if *storageRef.APIGroup != "apps.cozystack.io" { | ||
| return nil, fmt.Errorf("Unsupported storage APIGroup: %v, expected apps.cozystack.io", storageRef.APIGroup) | ||
| } | ||
| bucketKey := client.ObjectKey{Namespace: namespace, Name: storageRef.Name} | ||
|
|
||
| if err := r.Get(ctx, bucketKey, bucket); err != nil { | ||
| return nil, fmt.Errorf("failed to get Bucket %s: %w", storageRef.Name, err) | ||
| } | ||
|
|
||
| // Step 2: Determine the bucket claim name | ||
| // For apps.cozystack.io Bucket, the BucketClaim name is typically the same as the Bucket name | ||
| // or follows a pattern. Based on the templates, it's usually the Release.Name which equals the Bucket name | ||
| bucketName := storageRef.Name | ||
|
|
||
| // Step 3: Get BucketAccess by name (assuming BucketAccess name matches bucketName) | ||
| bucketAccess := &unstructured.Unstructured{} | ||
| bucketAccess.SetGroupVersionKind(schema.GroupVersionKind{ | ||
| Group: "objectstorage.k8s.io", | ||
| Version: "v1alpha1", | ||
| Kind: "BucketAccess", | ||
| }) | ||
|
|
||
| bucketAccessKey := client.ObjectKey{Name: "bucket-" + bucketName, Namespace: namespace} | ||
| if err := r.Get(ctx, bucketAccessKey, bucketAccess); err != nil { | ||
| return nil, fmt.Errorf("failed to get BucketAccess %s in namespace %s: %w", bucketName, namespace, err) | ||
| } | ||
|
|
||
| // Step 4: Get the secret name from BucketAccess | ||
| secretName, found, err := unstructured.NestedString(bucketAccess.Object, "spec", "credentialsSecretName") | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to get credentialsSecretName from BucketAccess: %w", err) | ||
| } | ||
| if !found || secretName == "" { | ||
| return nil, fmt.Errorf("credentialsSecretName not found in BucketAccess %s", bucketAccessKey.Name) | ||
| } | ||
|
|
||
| // Step 5: Get the secret | ||
| secret := &corev1.Secret{} | ||
| secretKey := client.ObjectKey{Namespace: namespace, Name: secretName} | ||
| if err := r.Get(ctx, secretKey, secret); err != nil { | ||
| return nil, fmt.Errorf("failed to get secret %s: %w", secretName, err) | ||
| } | ||
|
|
||
| // Step 6: Decode BucketInfo from secret.data.BucketInfo | ||
| bucketInfoData, found := secret.Data["BucketInfo"] | ||
| if !found { | ||
| return nil, fmt.Errorf("BucketInfo key not found in secret %s", secretName) | ||
| } | ||
|
|
||
| // Parse JSON value | ||
| var info bucketInfo | ||
| if err := json.Unmarshal(bucketInfoData, &info); err != nil { | ||
| return nil, fmt.Errorf("failed to unmarshal BucketInfo from secret %s: %w", secretName, err) | ||
| } | ||
|
|
||
| // Step 7: Extract and return S3 credentials | ||
| creds := &S3Credentials{ | ||
| BucketName: info.Spec.BucketName, | ||
| Endpoint: info.Spec.SecretS3.Endpoint, | ||
| Region: info.Spec.SecretS3.Region, | ||
| AccessKeyID: info.Spec.SecretS3.AccessKeyID, | ||
| AccessSecretKey: info.Spec.SecretS3.AccessSecretKey, | ||
| } | ||
|
|
||
| logger.Debug("resolved S3 credentials from Bucket storageRef", | ||
| "bucket", storageRef.Name, | ||
| "bucketName", creds.BucketName, | ||
| "endpoint", creds.Endpoint) | ||
|
|
||
| return creds, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify if these functions are called anywhere in the codebase
echo "=== Checking for usages of resolveBucketStorageRef ==="
rg -n "resolveBucketStorageRef" --type go
echo ""
echo "=== Checking for usages of createS3CredsForVelero ==="
rg -n "createS3CredsForVelero" --type go
echo ""
echo "=== Checking for usages of createBackupStorageLocation ==="
rg -n "createBackupStorageLocation" --type go
echo ""
echo "=== Checking for usages of createVolumeSnapshotLocation ==="
rg -n "createVolumeSnapshotLocation" --type goRepository: cozystack/cozystack
Length of output: 1658
Remove dead code functions or add TODO comments explaining future intent.
The functions resolveBucketStorageRef, createS3CredsForVelero, createBackupStorageLocation, and createVolumeSnapshotLocation are defined but never called anywhere in the codebase. Since the PR removed automatic creation of BackupStorageLocation and VolumeSnapshotLocation resources, these functions appear to be unused remnants and should be removed. If they're intended for future use, add TODO comments explaining the planned feature to preserve intent.
| bucket := &unstructured.Unstructured{} | ||
| bucket.SetGroupVersionKind(schema.GroupVersionKind{ | ||
| Group: *storageRef.APIGroup, | ||
| Version: "v1alpha1", | ||
| Kind: storageRef.Kind, | ||
| }) | ||
|
|
||
| if *storageRef.APIGroup != "apps.cozystack.io" { | ||
| return nil, fmt.Errorf("Unsupported storage APIGroup: %v, expected apps.cozystack.io", storageRef.APIGroup) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nil pointer dereference risk before validation.
Line 241 dereferences *storageRef.APIGroup before the nil check on line 246. If APIGroup is nil, this will panic.
🔎 Proposed fix
func (r *BackupJobReconciler) resolveBucketStorageRef(ctx context.Context, storageRef corev1.TypedLocalObjectReference, namespace string) (*S3Credentials, error) {
logger := getLogger(ctx)
+ if storageRef.APIGroup == nil {
+ return nil, fmt.Errorf("storageRef.APIGroup is nil")
+ }
+ if *storageRef.APIGroup != "apps.cozystack.io" {
+ return nil, fmt.Errorf("unsupported storage APIGroup: %s, expected apps.cozystack.io", *storageRef.APIGroup)
+ }
+
// Step 1: Get the Bucket resource
bucket := &unstructured.Unstructured{}
bucket.SetGroupVersionKind(schema.GroupVersionKind{
Group: *storageRef.APIGroup,
Version: "v1alpha1",
Kind: storageRef.Kind,
})
-
- if *storageRef.APIGroup != "apps.cozystack.io" {
- return nil, fmt.Errorf("Unsupported storage APIGroup: %v, expected apps.cozystack.io", storageRef.APIGroup)
- }🤖 Prompt for AI Agents
In internal/backupcontroller/velerostrategy_controller.go around lines 239 to
248, the code dereferences storageRef.APIGroup before checking for nil which can
panic; first check if storageRef.APIGroup == nil (and return a clear error)
before using *storageRef.APIGroup, then only construct the GroupVersionKind or
compare values after that nil-check; reorder the logic so the nil validation
happens before any dereference and use the validated *storageRef.APIGroup where
needed.
## What this PR does Implement a controller for BackupJobs referencing a Velero strategy. Creates a Backup.velero.io according to the template in the `Velero.strategy.backups.cozystack.io`. ### Release note ```release-note [backups] Implement a backup strategy controller for Velero strategies. ``` Signed-off-by: Timofei Larkin <lllamnyp@gmail.com> Co-authored-by: Andrey Kolkov <androndo@gmail.com> Co-authored-by: Timofei Larkin <lllamnyp@gmail.com> (cherry picked from commit 069a3ca) Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
…d backup system (#1867) ## What this PR does Update changelog for v1.0.0-alpha.1 to include missing features: - **Cozystack Operator**: New operator for Package and PackageSource management (#1740, #1741, #1755, #1756, #1760, #1761) - **Backup System**: Comprehensive backup functionality with Velero integration (#1640, #1685, #1687, #1708, #1719, #1720, #1737, #1762) - Add @androndo to contributors - Update Full Changelog link to v0.38.0...v1.0.0-alpha.1 ### Release note ```release-note [docs] Update changelog for v1.0.0-alpha.1: add cozystack-operator and backup system ```
What this PR does
Release note
User scenario
Create BackupJob with Bucket:
Controller discover S3 creds and swawn related Velero resources for performing backup:
After all backup will be available:
Summary by CodeRabbit
Release Notes
New Features
Dependencies
✏️ Tip: You can customize this high-level summary in your review settings.