Skip to content

Conversation

@androndo
Copy link
Contributor

@androndo androndo commented Dec 25, 2025

What this PR does

Release note

Implemented Velero backups for cozy VirtualMachines.

User scenario

Create BackupJob with Bucket:

apiVersion: backups.cozystack.io/v1alpha1
kind: BackupJob
metadata:
  name: desired-backup
  namespace: tenant-root
  labels:
    backups.cozystack.io/triggered-by: manual
spec:
  applicationRef:
    apiGroup: apps.cozystack.io
    kind: VirtualMachine
    name: vm1
  storageRef:
    apiGroup: apps.cozystack.io
    kind: Bucket
    name: test-bucket
  strategyRef:
    apiGroup: strategy.backups.cozystack.io
    kind: Velero
    name: velero-strategy-default

Controller discover S3 creds and swawn related Velero resources for performing backup:

kubectl describe backupjob desired-backup -n tenant-root | grep -A 20 "Events:"
Events:
  Type    Reason                         Age   From               Message
  ----    ------                         ----  ----               -------
  Normal  SecretCreated                  11s   backup-controller  Created Velero credentials secret cozy-velero/backup-tenant-root-desired-backup-s3-credentials
  Normal  BackupStorageLocationCreated   10s   backup-controller  Created or updated BackupStorageLocation cozy-velero/tenant-root-desired-backup
  Normal  VolumeSnapshotLocationCreated  10s   backup-controller  Created or updated VolumeSnapshotLocation cozy-velero/tenant-root-desired-backup
  Normal  VeleroBackupCreated            10s   backup-controller  Created Velero Backup cozy-velero/tenant-root-desired-backup-2025-12-31-11-27-04

After all backup will be available:

NAMESPACE     NAME             PHASE
tenant-root   desired-backup   Succeeded

Summary by CodeRabbit

Release Notes

  • New Features

    • Added BackupJob resource for creating and managing backups of virtual machines.
    • Integrated Velero backup strategy for automated backup orchestration.
    • Added Phase column display for backup job status visibility.
    • Added example configuration for backup job creation.
  • Dependencies

    • Updated telemetry and protobuf libraries; added Velero v1.17.1 support.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 25, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

This PR introduces a BackupJob controller system with Velero backup strategy support. It adds BackupJob API types, implements reconciliation logic for managing backup jobs via Velero, includes credential resolution for S3 storage, CRD definitions, RBAC permissions, and example resources with end-to-end test coverage.

Changes

Cohort / File(s) Summary
API Types & Type Definitions
api/backups/v1alpha1/backupjob_types.go, api/backups/strategy/v1alpha1/velero_types.go, api/backups/strategy/v1alpha1/zz_generated.deepcopy.go
Added BackupJob constants and kubebuilder annotations (status subresource, phase printer column); replaced VeleroSpec with Template field; added VeleroTemplate type with BackupSpec; generated deep-copy methods for VeleroTemplate
Controller Implementation
internal/backupcontroller/backupjob_controller.go, internal/backupcontroller/velerostrategy_controller.go
Extended BackupJobReconciler with dynamic client, REST mapper, and event recorder; introduced comprehensive Velero strategy reconciliation with credential resolution from Bucket/BucketAccess, S3 credential provisioning, idempotent backup storage/snapshot location creation, and Velero Backup lifecycle management with status propagation
CRD & RBAC Definitions
packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml, packages/system/backup-controller/templates/rbac.yaml, packages/system/backup-controller/templates/strategy.yaml
Added Phase printer column and status subresource to BackupJob CRD; expanded RBAC with permissions for backupjobs, backups, Velero resources (backups, storage/snapshot locations, restores), and read access to referenced resources across apps.cozystack.io, objectstorage.k8s.io, and kubevirt.io; added default Velero strategy template
Templating Utility
internal/template/template.go, internal/template/template_test.go
Introduced generic Template function that recursively applies text templating to string fields in structured objects; includes test validating templating on PodTemplateSpec
Configuration & Dependencies
cmd/backup-controller/main.go, api/backups/v1alpha1/groupversion_info.go, go.mod
Wired BackupJobReconciler into manager; refactored group/version constants; added velero v1.17.1 dependency and updated transitive dependencies (prometheus, opentelemetry, grpc, protobuf)
Examples & Tests
examples/desired-backup.yaml, hack/e2e-apps/backup.bats.disabled
Added BackupJob manifest example (desired-backup) with VirtualMachine and Bucket references; added comprehensive e2e test validating backup creation, Velero integration, and S3 artifact verification

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant BJobCtrl as BackupJob<br/>Reconciler
    participant VStrat as Velero Strategy<br/>Reconciler
    participant K8s as Kubernetes API
    participant Velero as Velero Service
    participant S3 as S3 Storage

    User->>K8s: Create BackupJob
    K8s->>BJobCtrl: Trigger reconcile
    BJobCtrl->>BJobCtrl: Validate StrategyRef
    BJobCtrl->>VStrat: Route to Velero handler
    
    rect rgb(200, 220, 255)
    Note over VStrat: Credential Resolution
    VStrat->>K8s: Read Bucket resource
    VStrat->>K8s: Read BucketAccess resource
    VStrat->>K8s: Read storage secret
    VStrat->>VStrat: Extract S3 credentials
    end
    
    rect rgb(200, 240, 200)
    Note over VStrat: Provision Velero Resources
    VStrat->>K8s: Create/update cloud-credentials<br/>(in cozy-velero namespace)
    VStrat->>K8s: Create/update BackupStorageLocation
    VStrat->>K8s: Create/update VolumeSnapshotLocation
    end
    
    rect rgb(255, 240, 200)
    Note over VStrat: Create Velero Backup
    VStrat->>K8s: Create Velero Backup resource<br/>(with BackupJob owner labels)
    K8s->>Velero: Trigger backup job
    Velero->>S3: Upload backup artifacts
    Velero->>K8s: Update Backup status
    end
    
    rect rgb(240, 200, 200)
    Note over VStrat: Finalize
    VStrat->>K8s: Create Backup resource<br/>(with Velero reference)
    VStrat->>K8s: Update BackupJob status<br/>(Succeeded/Failed)
    K8s->>BJobCtrl: Status updated
    end
    
    BJobCtrl->>User: BackupJob Complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes


🐰 A backup plan, oh what a sight,
Velero's magic keeps data bright,
Jobs and credentials dance in sync,
No more backup-time headaches, methinks!
To S3 and beyond we go,

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Velero backup controller impl' is partially related to the main changes but is abbreviated and vague. It refers to a real component (Velero backup controller) being implemented, but uses informal shorthand ('impl') and doesn't clearly convey the scope or the BackupJob implementation. Consider using a more descriptive title such as 'Implement Velero backup controller for BackupJob resources' or 'Add BackupJob controller with Velero strategy support' for better clarity.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @androndo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the foundational components for integrating Velero as a backup strategy within the Cozystack backup system. It defines the necessary API types for a Velero backup strategy and implements a dedicated controller to manage BackupJob resources that leverage this strategy. The controller's primary function is to translate BackupJob requests into Velero Backup operations, particularly for Virtual Machine applications, and to reflect the status of these operations back to the BackupJob resource.

Highlights

  • New Velero Strategy API: Introduced a new Velero custom resource definition (CRD) and its corresponding Go types (Velero, VeleroList, VeleroSpec, VeleroStatus) under strategy.backups.cozystack.io/v1alpha1 to define Velero-based backup strategies.
  • Velero Backup Controller: Implemented a new controller, BackupVeleroStrategyReconciler, responsible for reconciling BackupJob resources that specify a 'Velero' strategy. This controller orchestrates the creation and monitoring of Velero Backup objects.
  • Virtual Machine Backup Support: The controller currently supports backing up VirtualMachine applications, including those managed by HelmRelease through label selectors, by creating corresponding Velero Backup resources.
  • BackupJob Status Management: The controller updates the BackupJob's status, including StartedAt, CompletedAt, Phase (Running, Succeeded, Failed), and Message, based on the lifecycle and status of the underlying Velero Backup resource.
  • Autogenerated DeepCopy Methods: Added autogenerated deepcopy methods for the new Velero strategy types to ensure proper handling of object copies within the Kubernetes ecosystem.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Velero backup strategy controller. The implementation is a good start, but there are several critical and high-severity issues that need to be addressed before this can be considered production-ready.

Key issues include:

  • Critical bugs in the application type validation logic.
  • Hardcoded storageLocation which should be dynamically resolved.
  • A dangerous implementation that backs up all Virtual Machines in a namespace instead of the specific one requested.
  • Several incorrect assumptions about the Velero Backup CRD structure, leading to non-functional logic for error handling and artifact creation.
  • Minor issues like incorrect log messages and hardcoded requeue times.

Please review the detailed comments for each issue.

@androndo androndo force-pushed the velero-backup-controller-impl branch from 6347c38 to 5649fb1 Compare December 26, 2025 07:45
@androndo androndo changed the base branch from main to feat/velero-backup-controller December 26, 2025 07:46
@androndo androndo force-pushed the velero-backup-controller-impl branch from 5649fb1 to 6ad416a Compare December 26, 2025 07:55
@lllamnyp lllamnyp self-assigned this Dec 29, 2025
@androndo androndo force-pushed the velero-backup-controller-impl branch 2 times, most recently from 0fe45eb to 0d71a23 Compare December 29, 2025 16:24
Base automatically changed from feat/velero-backup-controller to main December 29, 2025 18:13
@androndo androndo force-pushed the velero-backup-controller-impl branch 3 times, most recently from 1094574 to 52838ab Compare December 31, 2025 05:52
@androndo androndo marked this pull request as ready for review December 31, 2025 07:29
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Dec 31, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
packages/system/backup-controller/templates/rbac.yaml (2)

24-26: Consider scoping secrets permissions.

The secrets permissions are quite broad (create/get/list/watch/update/patch across all namespaces). The controller creates secrets in the cozy-velero namespace. Consider if this could be scoped more narrowly using resourceNames or if there's a need for such broad access.


30-32: Consider adding delete permission for Velero resources.

The controller creates BackupStorageLocations, VolumeSnapshotLocations, and Secrets per BackupJob, but there's no cleanup mechanism visible. Consider whether delete permission should be added for garbage collection when BackupJobs are deleted, or document that these resources are intentionally retained.

internal/backupcontroller/velerostrategy_controller.go (3)

214-216: Use constant for requeue duration.

Line 216 uses a hardcoded 5 * time.Second while the rest of the file uses defaultRequeueAfter or defaultActiveJobPollingInterval constants.

🔎 Proposed fix
 	// Still in progress (InProgress, New, etc.)
-	return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
+	return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil
 }

381-396: Avoid blocking sleep in reconciler.

Using time.Sleep in a reconciler blocks the worker goroutine. For conflict retries, consider returning with requeue instead, or use a client with built-in retry. The same pattern appears in createVolumeSnapshotLocation (lines 424-439).

Alternatively, use a server-side apply or patch approach to avoid conflicts entirely:

// Example using Patch instead of Update to reduce conflicts
patch := client.MergeFrom(foundBSL.DeepCopy())
foundBSL.Spec = bsl.Spec
if err := r.Patch(ctx, foundBSL, patch); err != nil {
    return fmt.Errorf("failed to patch BackupStorageLocation: %w", err)
}

450-477: Consider deduplicating conditions.

The markBackupJobFailed function appends to conditions without checking for duplicates. If the reconciler retries after a transient error during status update, it could append multiple "Ready=False" conditions. Consider using a helper like meta.SetStatusCondition from k8s.io/apimachinery/pkg/api/meta.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d6e50b and fc594e3.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (9)
  • api/backups/v1alpha1/backupjob_types.go
  • cmd/backup-controller/main.go
  • examples/desired-backup.yaml
  • go.mod
  • internal/backupcontroller/backupjob_controller.go
  • internal/backupcontroller/velerostrategy_controller.go
  • packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml
  • packages/system/backup-controller/templates/rbac.yaml
  • packages/system/backupstrategy-controller/definitions/strategy.backups.cozystack.io_jobs.yaml
🧰 Additional context used
📓 Path-based instructions (7)
packages/system/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/system/backup-controller/templates/rbac.yaml
  • packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml
**/*.yaml

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms

Files:

  • packages/system/backup-controller/templates/rbac.yaml
  • packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml
  • examples/desired-backup.yaml
api/**

📄 CodeRabbit inference engine (AGENTS.md)

Define Kubernetes CRDs in the api directory following kubebuilder conventions

Files:

  • api/backups/v1alpha1/backupjob_types.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • api/backups/v1alpha1/backupjob_types.go
  • internal/backupcontroller/backupjob_controller.go
  • internal/backupcontroller/velerostrategy_controller.go
  • cmd/backup-controller/main.go
internal/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Files:

  • internal/backupcontroller/backupjob_controller.go
  • internal/backupcontroller/velerostrategy_controller.go
cmd/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory

Files:

  • cmd/backup-controller/main.go
go.mod

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT modify go.mod manually; use go get command instead

Files:

  • go.mod
🧠 Learnings (5)
📚 Learning: 2025-07-10T12:14:27.197Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1161
File: packages/apps/virtual-machine/templates/dashboard-resourcemap.yaml:6-12
Timestamp: 2025-07-10T12:14:27.197Z
Learning: Kubernetes RBAC rules with resourceNames work correctly for list/watch verbs. When resourceNames is specified in an RBAC rule, it properly restricts access to only those named resources, even for list and watch operations. Examples: `kubectl get resource resourcename -w` watches for changes on a single resource, and `kubectl get resource --field-selector .metadata.name=resourcename` lists a specific resource. The Kubernetes API server correctly distinguishes such requests from their less specific counterparts.

Applied to files:

  • packages/system/backup-controller/templates/rbac.yaml
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to pkg/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in pkg directory

Applied to files:

  • cmd/backup-controller/main.go
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to internal/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Applied to files:

  • cmd/backup-controller/main.go
🧬 Code graph analysis (2)
internal/backupcontroller/backupjob_controller.go (3)
api/backups/v1alpha1/backupjob_types.go (1)
  • BackupJob (96-102)
api/backups/strategy/v1alpha1/job_types.go (1)
  • JobStrategyKind (25-25)
api/backups/strategy/v1alpha1/velero_types.go (1)
  • VeleroStrategyKind (24-24)
internal/backupcontroller/velerostrategy_controller.go (4)
api/backups/v1alpha1/backupjob_types.go (1)
  • BackupJob (96-102)
internal/backupcontroller/backupjob_controller.go (1)
  • BackupJobReconciler (20-24)
api/backups/strategy/v1alpha1/velero_types.go (1)
  • Velero (32-38)
api/backups/v1alpha1/backup_types.go (5)
  • Backup (103-109)
  • BackupSpec (51-77)
  • BackupArtifact (35-48)
  • BackupStatus (80-93)
  • BackupPhaseReady (30-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (10)
examples/desired-backup.yaml (1)

1-20: LGTM!

The example manifest is well-structured and demonstrates proper usage of the BackupJob CR with all required references (applicationRef, storageRef, strategyRef). The manual trigger label is a nice touch for filtering.

api/backups/v1alpha1/backupjob_types.go (1)

88-89: LGTM!

The kubebuilder markers correctly enable the status subresource and add a Phase print column with priority=0 for default visibility. This follows kubebuilder conventions and aligns with the controller's need to update BackupJob status.

packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml (1)

17-21: LGTM!

The CRD manifest correctly adds the Phase printer column and status subresource, consistent with the kubebuilder markers in api/backups/v1alpha1/backupjob_types.go. This enables the controller to update status independently and provides visibility into BackupJob phase via kubectl.

Also applies to: 240-241

internal/backupcontroller/backupjob_controller.go (1)

26-67: LGTM!

The reconcile logic is well-structured with:

  • Proper logger initialization from context
  • Defensive nil checks for StrategyRef.APIGroup
  • Clear logging at appropriate verbosity levels
  • Clean switch dispatch to strategy-specific handlers

The event recorder field is properly wired for use in the Velero strategy controller.

cmd/backup-controller/main.go (2)

50-56: LGTM!

Scheme registrations for strategyv1alpha1 and velerov1 are correctly added in the init function, enabling the controller to work with Velero and strategy custom resources. This follows controller-runtime patterns.


162-169: LGTM!

The BackupJobReconciler is correctly wired with the manager's client, scheme, and event recorder. The setup follows the same pattern as PlanReconciler and adheres to kubebuilder conventions. Based on learnings, this follows controller-runtime patterns for Go code in the cmd directory.

internal/backupcontroller/velerostrategy_controller.go (3)

72-105: LGTM on validation logic.

The validation logic correctly checks for supported application and storage types with clear error messages. The early return pattern on failures is well-implemented.


611-666: Backup resource creation looks correct.

The createBackupResource function properly:

  • Creates the artifact with Velero URI scheme
  • Extracts takenAt from Velero backup timestamps
  • Sets driver metadata for Velero backup reference
  • Uses owner references to link back to BackupJob
  • Handles optional PlanRef

584-594: Label selector is correctly aligned with VirtualMachine labeling convention.

The verification confirms that VirtualMachine resources created by the Helm chart apply the app.kubernetes.io/instance label using {{ .Release.Name }}, which follows the virtual-machine-{name} prefix convention defined in the resource definition. The backup controller's label selector using "virtual-machine-" + backupJob.Spec.ApplicationRef.Name correctly matches this convention.

go.mod (1)

20-20: Velero v1.17.1 is a valid and current release.

No code-level issues identified. Verify compatibility with your cluster's Velero version during deployment testing.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
internal/backupcontroller/velerostrategy_controller.go (2)

204-205: Use the defined constant for consistency.

Line 205 hardcodes 5 * time.Second instead of using the defaultActiveJobPollingInterval constant defined at line 48.

🔎 Proposed fix
 	// Still in progress (InProgress, New, etc.)
-	return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
+	return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil

399-440: Consider extracting the retry logic to reduce duplication.

The retry logic in lines 416-431 is identical to lines 373-388 in createBackupStorageLocation. Consider extracting this into a generic helper function to improve maintainability.

💡 Example refactor
func (r *BackupJobReconciler) updateWithRetry(ctx context.Context, obj client.Object, updateSpec func() error) error {
	logger := log.FromContext(ctx)
	for i := 0; i < 3; i++ {
		key := client.ObjectKeyFromObject(obj)
		if err := r.Get(ctx, key, obj); err != nil {
			return fmt.Errorf("failed to get object for update: %w", err)
		}
		if err := updateSpec(); err != nil {
			if errors.IsConflict(err) && i < 2 {
				logger.V(1).Info("conflict updating object, retrying", "attempt", i+1)
				time.Sleep(100 * time.Millisecond)
				continue
			}
			return err
		}
		return nil
	}
	return fmt.Errorf("failed after 3 retry attempts")
}
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc594e3 and 8a218df.

📒 Files selected for processing (1)
  • internal/backupcontroller/velerostrategy_controller.go
🧰 Additional context used
📓 Path-based instructions (2)
internal/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Files:

  • internal/backupcontroller/velerostrategy_controller.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • internal/backupcontroller/velerostrategy_controller.go
🧠 Learnings (2)
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
🧬 Code graph analysis (1)
internal/backupcontroller/velerostrategy_controller.go (4)
internal/backupcontroller/backupjob_controller.go (1)
  • BackupJobReconciler (20-24)
api/backups/v1alpha1/backupjob_types.go (1)
  • BackupJob (96-102)
api/backups/strategy/v1alpha1/velero_types.go (1)
  • Velero (32-38)
api/backups/v1alpha1/backup_types.go (3)
  • Backup (103-109)
  • BackupSpec (51-77)
  • BackupStatus (80-93)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (10)
internal/backupcontroller/velerostrategy_controller.go (10)

3-44: LGTM! Clean type definitions and imports.

The imports are well-organized and the type definitions for S3Credentials and bucketInfo are clear and appropriate for the credential resolution flow.


46-59: LGTM! Well-defined constants and helper functions.

The constants are appropriately scoped and the helper functions follow Kubernetes patterns.


61-116: LGTM! Well-structured reconciliation flow with proper error handling.

The early exit logic, validation checks, and status initialization are all correctly implemented. The handling of NotFound errors when the BackupJob is deleted (lines 105-109, 182-186) is particularly good.


208-295: LGTM! Credential resolution follows COSI patterns correctly.

The function properly resolves S3 credentials through the Bucket → BucketAccess → Secret chain. The use of the "bucket-" prefix for BucketAccess names (line 247) aligns with the COSI convention and previous feedback. Error handling appropriately separates field absence from parsing errors.


297-353: LGTM! Proper Secret management for Velero credentials.

The credential format (lines 311-319) matches Velero's AWS plugin expectations. Setting Data = nil (line 338) before updating with StringData is the correct Kubernetes pattern to ensure the StringData takes precedence. Event recording provides good observability.


355-397: LGTM! Proper conflict handling for Velero resource updates.

The retry logic (lines 373-388) correctly handles conflicts from Velero's concurrent status updates. The DeepEqual check (line 371) prevents unnecessary updates.


442-469: LGTM! Comprehensive failure handling.

The function properly updates all relevant status fields and handles the case where the BackupJob is deleted during reconciliation (lines 459-462). The condition handling follows Kubernetes conventions.


471-566: LGTM! Well-orchestrated Velero resource creation.

The dynamic creation of BackupStorageLocation and VolumeSnapshotLocation based on resolved Bucket credentials is correctly implemented. The namespace-prefixed location names (line 477) avoid conflicts in the shared cozy-velero namespace. Comprehensive event recording provides good observability.


603-666: LGTM! Proper Backup resource creation with correct owner references.

The function correctly materializes a Backup resource from the Velero Backup data. The owner reference now includes Controller: true (line 635), addressing previous feedback. Timestamp selection logic (lines 612-617) appropriately prefers the Velero StartTimestamp over CreationTimestamp.


582-586: No action needed. The label selector correctly matches VirtualMachine labels as enforced by the Helm chart template, which applies app.kubernetes.io/instance: {{ .Release.Name }} to all VirtualMachine resources. The CRD definition specifies a virtual-machine- prefix, ensuring the label value follows the expected virtual-machine-<name> convention consistently.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/backupcontroller/velerostrategy_controller.go (1)

205-205: Use the defined constant for consistency.

This line uses a hardcoded 5 * time.Second while the rest of the file uses defaultRequeueAfter or defaultActiveJobPollingInterval constants. Use the constant for consistency and maintainability.

🔎 Proposed fix
-	return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
+	return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a218df and 0f71138.

📒 Files selected for processing (2)
  • internal/backupcontroller/velerostrategy_controller.go
  • packages/system/backup-controller/templates/strategy.yaml
🧰 Additional context used
📓 Path-based instructions (4)
packages/system/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/system/backup-controller/templates/strategy.yaml
**/*.yaml

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms

Files:

  • packages/system/backup-controller/templates/strategy.yaml
internal/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Files:

  • internal/backupcontroller/velerostrategy_controller.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • internal/backupcontroller/velerostrategy_controller.go
🧠 Learnings (2)
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (10)
packages/system/backup-controller/templates/strategy.yaml (1)

1-5: LGTM!

The Velero strategy template is correctly structured with an empty spec, which is appropriate for a default strategy resource. This aligns with the controller implementation that fetches and uses strategy references.

internal/backupcontroller/velerostrategy_controller.go (9)

1-22: LGTM!

The package declaration and imports are appropriate for a Velero backup controller implementation, including necessary K8s APIs, controller-runtime packages, and Velero v1 types.


24-59: LGTM!

The type definitions and helper functions are well-structured. Good to see the hardcoded requeue duration now defined as a constant (defaultRequeueAfter), addressing previous feedback about maintainability.


208-295: LGTM!

The credential resolution logic correctly follows the flow from Bucket → BucketAccess → Secret → S3 credentials. The BucketAccess naming convention (bucket- prefix) and the separate error handling for err and !found cases address previous feedback effectively.


297-381: LGTM!

The secret creation and update logic is well-implemented. The comparison logic (lines 338-364) ensures that updates only occur when data actually changes, reducing unnecessary operations and potential conflicts. This addresses the "better s3 secret update" improvement mentioned in the commit message.


383-468: LGTM!

The storage location creation/update functions properly handle idempotency with conflict retry logic. The retry mechanism (3 attempts with 100ms delay) is adequate for handling concurrent updates from Velero's status controller.


470-497: LGTM!

The failure handling properly updates status, adds conditions, and gracefully handles NotFound errors when the BackupJob is deleted during reconciliation.


499-629: Significant improvements to backup orchestration.

The dynamic storage location setup (lines 505-594) and scoped label selector (lines 610-614) address critical issues from previous reviews. The storage location is no longer hardcoded, and the backup is properly scoped to the specific VM instead of backing up the entire namespace.


631-694: LGTM!

The Backup resource creation correctly materializes artifacts with a simple URI format (lines 635-637), sets appropriate timestamps, and includes proper owner references with Controller: true (line 663). These changes address previous feedback about non-existent fields and missing controller ownership.


77-84: The validation and backup strategy are correct; no changes needed.

The apps.cozystack.io/VirtualMachine refers to a Helm application definition, not a wrapper resource. The Helm chart for this application creates kubevirt.io/v1/VirtualMachine resources in the same namespace. The validation at lines 77-84 correctly ensures the ApplicationRef points to a VirtualMachine Helm application, and the backup at line 606 correctly includes the actual kubevirt.io VirtualMachine resources that were created by the Helm chart deployment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
internal/backupcontroller/velerostrategy_controller.go (1)

633-637: Verify label selector matches deployed VirtualMachine instances.

The label selector assumes app.kubernetes.io/instance equals virtual-machine-{ApplicationRef.Name}. As flagged in previous reviews, if VirtualMachine Helm releases use {{ .Release.Name }} for this label, backups may miss VMs deployed with different release names.

Consider one of these approaches:

  1. Enforce a naming convention that guarantees release name matches the pattern
  2. Query the actual VirtualMachine resource to determine its labels at runtime
  3. Use a more stable selector like app.kubernetes.io/name: "virtual-machine" combined with namespace filtering

Run the following script to verify the label pattern on deployed VirtualMachine resources:

#!/bin/bash
# Check if VirtualMachine resources in the cluster have the expected label pattern

echo "=== Checking VirtualMachine resources for label patterns ==="
kubectl get virtualmachines.kubevirt.io --all-namespaces -o json | \
  jq -r '.items[] | "\(.metadata.namespace)/\(.metadata.name) labels: \(.metadata.labels)"'

echo ""
echo "=== Checking for app.kubernetes.io/instance label values ==="
kubectl get virtualmachines.kubevirt.io --all-namespaces -o json | \
  jq -r '.items[] | select(.metadata.labels."app.kubernetes.io/instance" != null) | "\(.metadata.namespace)/\(.metadata.name): \(.metadata.labels."app.kubernetes.io/instance")"'
🧹 Nitpick comments (1)
internal/backupcontroller/velerostrategy_controller.go (1)

233-233: Use the defined constant for consistency.

Line 233 uses a hardcoded 5 * time.Second for requeuing, but the constant defaultRequeueAfter is defined at the top of the file for this purpose. Use the constant for maintainability.

🔎 Proposed fix
-	return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
+	return ctrl.Result{RequeueAfter: defaultRequeueAfter}, nil
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f71138 and 263afb4.

📒 Files selected for processing (1)
  • internal/backupcontroller/velerostrategy_controller.go
🧰 Additional context used
📓 Path-based instructions (2)
internal/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Files:

  • internal/backupcontroller/velerostrategy_controller.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • internal/backupcontroller/velerostrategy_controller.go
🧠 Learnings (2)
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (7)
internal/backupcontroller/velerostrategy_controller.go (7)

26-78: Good: Helper functions and constants improve maintainability.

The introduction of loggerWithDebug, structured credential types, and consolidated constants (like defaultRequeueAfter) addresses previous feedback about hardcoded values and improves code clarity. The helper functions like boolPtr and storageS3SecretName follow Go best practices.


95-102: Validation correctly enforces VirtualMachine support only.

The validation logic correctly restricts ApplicationRef to VirtualMachine with APIGroup apps.cozystack.io. Previous comments mentioned HelmRelease support, but that appears to have been removed from scope. The current implementation is consistent with this constraint.


236-323: LGTM: Credential discovery follows COSI patterns.

The function correctly resolves S3 credentials from a Bucket storageRef by following the COSI flow: Bucket → BucketAccess → credentials secret → BucketInfo. The stable naming pattern (bucket-<bucketname>) and error handling have been properly addressed per previous feedback.


325-409: Well-implemented idempotent secret management.

The function correctly creates or updates Velero credentials in the expected AWS config format. The data comparison logic (lines 376-388) prevents unnecessary updates, and proper events are emitted for observability. The secret format accommodates SeaweedFS as an S3-compatible backend.


411-496: Robust conflict handling for concurrent Velero updates.

Both createBackupStorageLocation and createVolumeSnapshotLocation implement proper idempotent create-or-update logic with conflict retries. The 3-retry mechanism with 100ms delays handles concurrent status updates from Velero's controller, and reflect.DeepEqual prevents unnecessary spec updates.


498-520: LGTM: Standard failure handling pattern.

The function correctly sets completion timestamp, phase, message, and adds a Ready=False condition. This follows Kubernetes conventions for status reporting.


654-717: Well-structured Backup resource creation.

The function correctly creates a Backup resource with proper owner references (including Controller: true), simplified artifact URI that doesn't depend on non-existent Velero status fields, and accurate driver metadata. Previous concerns about missing Controller field and incorrect field access have been addressed.

@dosubot dosubot bot removed the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 1, 2026
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jan 1, 2026
@androndo androndo force-pushed the velero-backup-controller-impl branch from 239c02a to dee4e10 Compare January 1, 2026 18:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
hack/e2e-apps/backup.bats (3)

26-30: Consider using mktemp for temporary credential files.

Writing to a fixed path /tmp/bucket-backup-credentials.json can cause race conditions in parallel test runs and leaves credentials in a predictable location.

🔎 Proposed fix
+  CRED_FILE=$(mktemp /tmp/bucket-backup-credentials.XXXXXX.json)
+  trap "rm -f $CRED_FILE" EXIT
+
   # Get bucket credentials for later S3 verification
-  kubectl -n ${namespace} get secret bucket-${bucket_name} -ojsonpath='{.data.BucketInfo}' | base64 -d > /tmp/bucket-backup-credentials.json
-  ACCESS_KEY=$(jq -r '.spec.secretS3.accessKeyID' /tmp/bucket-backup-credentials.json)
-  SECRET_KEY=$(jq -r '.spec.secretS3.accessSecretKey' /tmp/bucket-backup-credentials.json)
-  BUCKET_NAME=$(jq -r '.spec.bucketName' /tmp/bucket-backup-credentials.json)
-  ENDPOINT=$(jq -r '.spec.secretS3.endpoint' /tmp/bucket-backup-credentials.json)
+  kubectl -n ${namespace} get secret bucket-${bucket_name} -ojsonpath='{.data.BucketInfo}' | base64 -d > "$CRED_FILE"
+  ACCESS_KEY=$(jq -r '.spec.secretS3.accessKeyID' "$CRED_FILE")
+  SECRET_KEY=$(jq -r '.spec.secretS3.accessSecretKey' "$CRED_FILE")
+  BUCKET_NAME=$(jq -r '.spec.bucketName' "$CRED_FILE")
+  ENDPOINT=$(jq -r '.spec.secretS3.endpoint' "$CRED_FILE")

121-127: Consider using jq for more robust JSON parsing.

Word splitting on kubectl jsonpath output works for K8s resource names but is fragile. Using jq with a proper array iteration would be more robust and handle edge cases better.

🔎 Proposed fix
   # Find backup by pattern matching namespace-backupjob
-  for backup in $(kubectl -n cozy-velero get backups.velero.io -o jsonpath='{.items[*].metadata.name}' 2>/dev/null); do
-    if echo "$backup" | grep -q "^${namespace}-${backupjob_name}-"; then
-      VELERO_BACKUP_NAME=$backup
-      VELERO_BACKUP_PHASE=$(kubectl -n cozy-velero get backups.velero.io $backup -o jsonpath='{.status.phase}' 2>/dev/null || echo "")
-      break
-    fi
-  done
+  VELERO_BACKUP_NAME=$(kubectl -n cozy-velero get backups.velero.io -o json 2>/dev/null | \
+    jq -r --arg pattern "^${namespace}-${backupjob_name}-" \
+    '.items[].metadata.name | select(test($pattern))' | head -1)

133-141: Consider adding explicit failure handling for better test diagnostics.

The timeout correctly waits for either Completed or Failed, but if the backup fails, the test will fail silently at the assertion. Adding a check for the Failed state with a diagnostic message would help debug issues.

🔎 Proposed improvement
   # Verify Velero Backup is Completed
   VELERO_BACKUP_PHASE=$(kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}')
+  if [ "$VELERO_BACKUP_PHASE" = "Failed" ]; then
+    echo "Velero backup failed. Failure reason:"
+    kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.failureReason}'
+    false
+  fi
   [ "$VELERO_BACKUP_PHASE" = "Completed" ]
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 263afb4 and dee4e10.

📒 Files selected for processing (1)
  • hack/e2e-apps/backup.bats
🔇 Additional comments (4)
hack/e2e-apps/backup.bats (4)

3-8: LGTM!

Test variables are well-named and clearly define the resources under test.


67-72: Acceptable test setup with comprehensive wait conditions.

The sleep 5 before waits is a common E2E pattern to handle resource propagation delays. The wait conditions properly cover all dependent resources (HelmRelease, DataVolume, PVC, VM).


98-106: LGTM!

The BackupJob lifecycle is properly tested by waiting for the RunningSucceeded phase transition. The explicit assertion at lines 105-106 adds clarity even though it's technically redundant after the wait.


181-183: LGTM!

Port-forward cleanup correctly handles the background process with proper error suppression.

Comment on lines 145 to 149
ENDPOINT_HOST=$(echo $ENDPOINT | sed 's|https\?://||' | cut -d: -f1)
ENDPOINT_PORT=$(echo $ENDPOINT | sed 's|https\?://||' | cut -d: -f2)
if [ -z "$ENDPOINT_PORT" ]; then
ENDPOINT_PORT=8333
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Endpoint port parsing has a bug for URLs without explicit ports.

When ENDPOINT is https://host.example.com (no port), cut -d: -f2 returns //host.example.com, not empty. The -z check won't trigger, leaving an invalid port.

🔎 Proposed fix
   # Extract endpoint host and port for port-forwarding
-  ENDPOINT_HOST=$(echo $ENDPOINT | sed 's|https\?://||' | cut -d: -f1)
-  ENDPOINT_PORT=$(echo $ENDPOINT | sed 's|https\?://||' | cut -d: -f2)
-  if [ -z "$ENDPOINT_PORT" ]; then
+  ENDPOINT_NO_PROTO=$(echo "$ENDPOINT" | sed 's|https\?://||')
+  ENDPOINT_HOST=$(echo "$ENDPOINT_NO_PROTO" | cut -d: -f1)
+  if echo "$ENDPOINT_NO_PROTO" | grep -q ':'; then
+    ENDPOINT_PORT=$(echo "$ENDPOINT_NO_PROTO" | cut -d: -f2)
+  else
     ENDPOINT_PORT=8333
   fi
🤖 Prompt for AI Agents
In hack/e2e-apps/backup.bats around lines 145 to 149, the port extraction uses
cut -d: -f2 which returns the wrong value for URLs without an explicit port
(e.g. https://host.example.com) so the -z check fails; change the logic to first
strip the scheme (remove http:// or https://), then check if the remaining host
string contains a colon and only then extract the part after the colon as
ENDPOINT_PORT; if no colon is present set ENDPOINT_PORT=8333. Ensure the check
distinguishes presence of ":" before extracting the port so URLs without ports
fall back to the default.

@androndo androndo force-pushed the velero-backup-controller-impl branch 2 times, most recently from 134228d to 5620045 Compare January 2, 2026 08:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
internal/backupcontroller/velerostrategy_controller.go (1)

633-637: Verify the label selector matches VirtualMachine deployments.

The label selector assumes VMs have app.kubernetes.io/instance: virtual-machine-{ApplicationRef.Name}. This works for Cozystack's Helm-deployed VMs but may miss VMs deployed differently.

This was flagged in a previous review. The current implementation follows Cozystack's Helm naming conventions, but consider documenting this assumption or adding validation that the target VM exists with the expected label.

🧹 Nitpick comments (5)
internal/backupcontroller/velerostrategy_controller.go (5)

232-234: Use the defined constant for requeue duration.

Line 233 uses a hardcoded 5 * time.Second instead of the already-defined defaultActiveJobPollingInterval constant.

🔎 Proposed fix
 	// Still in progress (InProgress, New, etc.)
-	return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
+	return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil
 }

247-251: Consider extracting the hardcoded API version.

The version "v1alpha1" is hardcoded. If the Bucket API evolves, this could become a maintenance burden.


381-387: Consider using bytes.Equal instead of reflect.DeepEqual for byte slices.

For comparing byte slices, bytes.Equal is more idiomatic and slightly more efficient than reflect.DeepEqual.

🔎 Proposed fix
+import "bytes"
+
 			for k, newVal := range newData {
 				existingVal, exists := existingData[k]
-				if !exists || !reflect.DeepEqual(existingVal, newVal) {
+				if !exists || !bytes.Equal(existingVal, newVal) {
 					dataChanged = true
 					break
 				}

506-512: Consider using meta.SetStatusCondition for idempotent condition updates.

Appending conditions directly could lead to duplicates if this function is called multiple times before the status update succeeds. Using meta.SetStatusCondition from k8s.io/apimachinery/pkg/api/meta ensures idempotent updates.

🔎 Proposed fix
+import "k8s.io/apimachinery/pkg/api/meta"
+
 	// Add condition
-	backupJob.Status.Conditions = append(backupJob.Status.Conditions, metav1.Condition{
+	meta.SetStatusCondition(&backupJob.Status.Conditions, metav1.Condition{
 		Type:               "Ready",
 		Status:             metav1.ConditionFalse,
 		Reason:             "BackupFailed",
 		Message:            message,
 		LastTransitionTime: now,
 	})

706-708: Remove redundant nil check.

The artifact variable is unconditionally assigned on line 658, so this nil check will always be true.

🔎 Proposed fix
 	if backupJob.Spec.PlanRef != nil {
 		backup.Spec.PlanRef = backupJob.Spec.PlanRef
 	}

-	if artifact != nil {
-		backup.Status.Artifact = artifact
-	}
+	backup.Status.Artifact = artifact

 	if err := r.Create(ctx, backup); err != nil {
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dee4e10 and 5620045.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (11)
  • api/backups/v1alpha1/backupjob_types.go
  • cmd/backup-controller/main.go
  • examples/desired-backup.yaml
  • go.mod
  • hack/e2e-apps/backup.bats
  • internal/backupcontroller/backupjob_controller.go
  • internal/backupcontroller/velerostrategy_controller.go
  • packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml
  • packages/system/backup-controller/templates/rbac.yaml
  • packages/system/backup-controller/templates/strategy.yaml
  • packages/system/backupstrategy-controller/definitions/strategy.backups.cozystack.io_jobs.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/system/backup-controller/templates/strategy.yaml
  • hack/e2e-apps/backup.bats
  • packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml
🧰 Additional context used
📓 Path-based instructions (7)
go.mod

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT modify go.mod manually; use go get command instead

Files:

  • go.mod
internal/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Files:

  • internal/backupcontroller/backupjob_controller.go
  • internal/backupcontroller/velerostrategy_controller.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • internal/backupcontroller/backupjob_controller.go
  • api/backups/v1alpha1/backupjob_types.go
  • internal/backupcontroller/velerostrategy_controller.go
  • cmd/backup-controller/main.go
api/**

📄 CodeRabbit inference engine (AGENTS.md)

Define Kubernetes CRDs in the api directory following kubebuilder conventions

Files:

  • api/backups/v1alpha1/backupjob_types.go
**/*.yaml

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms

Files:

  • examples/desired-backup.yaml
  • packages/system/backup-controller/templates/rbac.yaml
packages/system/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/system/backup-controller/templates/rbac.yaml
cmd/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory

Files:

  • cmd/backup-controller/main.go
🧠 Learnings (6)
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to go.mod : Do NOT modify go.mod manually; use `go get` command instead

Applied to files:

  • go.mod
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-10T12:14:27.197Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1161
File: packages/apps/virtual-machine/templates/dashboard-resourcemap.yaml:6-12
Timestamp: 2025-07-10T12:14:27.197Z
Learning: Kubernetes RBAC rules with resourceNames work correctly for list/watch verbs. When resourceNames is specified in an RBAC rule, it properly restricts access to only those named resources, even for list and watch operations. Examples: `kubectl get resource resourcename -w` watches for changes on a single resource, and `kubectl get resource --field-selector .metadata.name=resourcename` lists a specific resource. The Kubernetes API server correctly distinguishes such requests from their less specific counterparts.

Applied to files:

  • packages/system/backup-controller/templates/rbac.yaml
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to pkg/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in pkg directory

Applied to files:

  • cmd/backup-controller/main.go
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to internal/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Applied to files:

  • cmd/backup-controller/main.go
🧬 Code graph analysis (3)
internal/backupcontroller/backupjob_controller.go (4)
pkg/apiserver/apiserver.go (1)
  • Scheme (53-53)
api/backups/v1alpha1/backupjob_types.go (1)
  • BackupJob (96-102)
api/backups/strategy/v1alpha1/job_types.go (1)
  • JobStrategyKind (25-25)
api/backups/strategy/v1alpha1/velero_types.go (1)
  • VeleroStrategyKind (24-24)
internal/backupcontroller/velerostrategy_controller.go (4)
internal/backupcontroller/backupjob_controller.go (1)
  • BackupJobReconciler (20-24)
api/backups/v1alpha1/backupjob_types.go (1)
  • BackupJob (96-102)
api/backups/strategy/v1alpha1/velero_types.go (1)
  • Velero (32-38)
api/backups/v1alpha1/backup_types.go (5)
  • Backup (103-109)
  • BackupSpec (51-77)
  • BackupArtifact (35-48)
  • BackupStatus (80-93)
  • BackupPhaseReady (30-30)
cmd/backup-controller/main.go (2)
internal/backupcontroller/backupjob_controller.go (1)
  • BackupJobReconciler (20-24)
pkg/apiserver/apiserver.go (1)
  • Scheme (53-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (12)
api/backups/v1alpha1/backupjob_types.go (1)

88-89: LGTM!

The kubebuilder annotations correctly enable the status subresource and expose a "Phase" column in kubectl output. This aligns with the BackupJobReconciler's status update pattern in the controller implementation.

examples/desired-backup.yaml (1)

1-20: LGTM!

The example manifest correctly demonstrates a manual BackupJob creation with all required references (applicationRef, storageRef, strategyRef) properly configured. The API groups align with what the controller validates.

internal/backupcontroller/velerostrategy_controller.go (3)

63-69: LGTM!

Constants are well-defined and centralized. The defaultRequeueAfter addresses the past review feedback about hardcoded durations.


429-444: LGTM!

The retry-on-conflict logic correctly handles concurrent Velero status updates. The 3-attempt limit with 100ms backoff is reasonable.


680-688: LGTM!

The owner reference now correctly sets Controller: true, addressing the previous review feedback about garbage collection.

cmd/backup-controller/main.go (2)

162-169: LGTM!

The BackupJobReconciler is correctly wired with the manager, including the EventRecorder for emitting Kubernetes events. This follows controller-runtime patterns as per coding guidelines.


54-55: LGTM!

Scheme registrations for strategy/v1alpha1 and velero/v1 are correctly added, enabling the reconciler to work with these API types.

packages/system/backup-controller/templates/rbac.yaml (1)

30-32: LGTM!

The Velero resource permissions correctly cover the resources the controller needs to manage (backups, backupstoragelocations, volumesnapshotlocations, restores).

internal/backupcontroller/backupjob_controller.go (3)

26-39: LGTM!

The reconcile function follows controller-runtime best practices: structured logging, proper NotFound handling (returning nil error), and clear error paths.


41-52: LGTM!

The two-step APIGroup validation (nil check followed by value check) is defensive and prevents nil pointer dereference. Logging at V(1) for skip cases keeps the logs clean during normal operation.


55-66: LGTM!

The strategy dispatch logic correctly routes to the appropriate reconciler based on the StrategyRef.Kind, with informative logging for unsupported kinds.

go.mod (1)

20-20: Velero v1.17.1 is the latest stable release and has no known product CVEs.

Verification confirms that Velero v1.17.1, released November 7, 2025, is the current stable version. No product-specific security advisories apply to this release (the only notable Velero CVE, CVE-2020-3996, affects much older versions prior to 1.4.3/1.5.2).

Comment on lines +24 to +26
- apiGroups: [""]
resources: ["secrets"]
verbs: ["create", "get", "list", "watch", "update", "patch"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for cleanup logic related to secret deletion
find packages/system/backup-controller -type f -name "*.go" -o -name "*.yaml" | head -20

Repository: cozystack/cozystack

Length of output: 815


🏁 Script executed:

# Check for any cleanup or deletion logic in the backup controller
rg -n "delete.*secret|Secret.*Delete|cleanup|orphan|finalizer" packages/system/backup-controller/ -t go -t yaml

Repository: cozystack/cozystack

Length of output: 45


🏁 Script executed:

# Search for secret creation logic to understand the controller's workflow
rg -n "cozy-velero|create.*[Ss]ecret|Secret.*create" packages/system/backup-controller/ -t go -t yaml

Repository: cozystack/cozystack

Length of output: 45


🏁 Script executed:

# Check if there are owner references or lifecycle management patterns
rg -n "owner.*reference|OwnerReference|finalizer|garbage.*collection" packages/system/backup-controller/ -t go

Repository: cozystack/cozystack

Length of output: 45


🏁 Script executed:

# Look at the backup-controller deployment to see where the actual controller code comes from
cat packages/system/backup-controller/templates/deployment.yaml | head -50

Repository: cozystack/cozystack

Length of output: 1488


🏁 Script executed:

# Search the entire repo for backupcontroller Go code
find . -type f -name "*.go" | xargs grep -l "backupcontroller\|backup-controller" | head -10

Repository: cozystack/cozystack

Length of output: 344


🏁 Script executed:

# Look for any internal packages related to backup
find . -type d -name "*backup*" | grep -E "internal|pkg|cmd"

Repository: cozystack/cozystack

Length of output: 146


🏁 Script executed:

# Check the BackupJob controller for deletion/cleanup logic
cat internal/backupcontroller/backupjob_controller.go

Repository: cozystack/cozystack

Length of output: 2608


🏁 Script executed:

# Check the Velero strategy controller for secret management
cat internal/backupcontroller/velerostrategy_controller.go

Repository: cozystack/cozystack

Length of output: 28771


Add delete permission and implement cleanup for orphaned Velero credential secrets.

The concern is valid: the controller creates secrets in the cozy-velero namespace but cannot use owner references due to cross-namespace limitations. No cleanup logic currently exists when BackupJobs are deleted, causing secrets to be orphaned. This will accumulate over time. Either implement a finalizer on BackupJob to delete associated secrets, or add cleanup logic triggered on BackupJob deletion. This requires adding delete permission for secrets in the RBAC rules.

🤖 Prompt for AI Agents
packages/system/backup-controller/templates/rbac.yaml lines 24-26: the RBAC
rules grant create/get/list/watch/update/patch on secrets but lack the "delete"
verb required to remove orphaned Velero credential secrets; update the verbs
array to include "delete" and then implement cleanup in the controller by adding
a BackupJob finalizer (or handling BackupJob deletion events) that enumerates
and deletes the secrets the controller created in the cozy-velero namespace (use
labels or predictable names to identify them) and removes the finalizer after
successful deletion so secrets are cleaned up on BackupJob removal.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
hack/e2e-apps/backup.bats (2)

17-17: pkill pattern may terminate unrelated port-forwards.

The pattern kubectl.*port-forward.*seaweedfs-s3 could inadvertently kill port-forwards from other tests or sessions. Consider storing the PID when the port-forward starts (line 207) and using it here for precise cleanup.

🔎 Proposed fix
+# Track port-forward PID at file scope for teardown
+PORT_FORWARD_PID=""
+
 teardown() {
   # Clean up resources (runs even if test fails)
   # ...
   
-  # Clean up port-forward if still running
-  pkill -f "kubectl.*port-forward.*seaweedfs-s3" 2>/dev/null || true
+  # Clean up port-forward if PID is known
+  if [ -n "$PORT_FORWARD_PID" ]; then
+    kill $PORT_FORWARD_PID 2>/dev/null || true
+    wait $PORT_FORWARD_PID 2>/dev/null || true
+  fi

Then at line 206-207, capture the PID:

 kubectl -n tenant-root port-forward service/seaweedfs-s3 8333:8333 > /dev/null 2>&1 &
-PORT_FORWARD_PID=$!
+PORT_FORWARD_PID=$!  # Now accessible in teardown

194-202: Timeout loop succeeds on "Failed" status.

The grep pattern 'Completed\|Failed' at line 195 will exit the loop successfully even if the backup reaches "Failed" status. While line 201 will correctly fail the test, the timeout message could be misleading.

🔎 Proposed refinement
   # Wait for Velero Backup to complete (with timeout)
   timeout 300 sh -ec "
-    until kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' | grep -q 'Completed\|Failed'; do
+    until kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' | grep -q 'Completed'; do
+      # Exit early if backup failed
+      if kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' | grep -q 'Failed'; then
+        echo 'Backup failed' >&2
+        exit 1
+      fi
       sleep 5
     done
   "
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5620045 and c936b5a.

📒 Files selected for processing (1)
  • hack/e2e-apps/backup.bats
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (1)
hack/e2e-apps/backup.bats (1)

206-206: Verify the hardcoded tenant-root namespace is correct.

The port-forward targets tenant-root, but the test uses namespace tenant-test. This might be intentional if the SeaweedFS S3 service runs in a shared namespace, but please confirm this is the expected configuration.

kubectl -n ${namespace} delete bucket.apps.cozystack.io ${bucket_name} --wait=false 2>/dev/null || true

# Clean up temporary files
rm -f /tmp/bucket-backup-credentials.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incomplete temporary file cleanup.

Line 90 creates /tmp/bucket-backup-credentials-fixed.json, but teardown only removes /tmp/bucket-backup-credentials.json.

🔎 Proposed fix
   # Clean up temporary files
-  rm -f /tmp/bucket-backup-credentials.json
+  rm -f /tmp/bucket-backup-credentials.json /tmp/bucket-backup-credentials-fixed.json
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rm -f /tmp/bucket-backup-credentials.json
rm -f /tmp/bucket-backup-credentials.json /tmp/bucket-backup-credentials-fixed.json
🤖 Prompt for AI Agents
In hack/e2e-apps/backup.bats around line 46, the teardown only removes
/tmp/bucket-backup-credentials.json but the test creates
/tmp/bucket-backup-credentials-fixed.json at line 90; update the cleanup to
remove both temporary files (or adjust teardown to delete any matching
/tmp/bucket-backup-credentials*.json) so both
/tmp/bucket-backup-credentials.json and
/tmp/bucket-backup-credentials-fixed.json are removed during teardown.

sleep 10

# Set up MinIO client
mc alias set local https://localhost:8333 $ACCESS_KEY $SECRET_KEY --insecure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Protocol mismatch: HTTPS used where HTTP is expected.

The mc alias is configured with https://localhost:8333, but line 89 patches the bucket endpoint to use http://seaweedfs-s3.... If the SeaweedFS S3 service actually serves HTTP (not HTTPS), the connection will fail even with --insecure. The --insecure flag only skips certificate validation for HTTPS connections; it doesn't handle HTTP/HTTPS protocol mismatches.

🔎 Proposed fix
   # Set up MinIO client
-  mc alias set local https://localhost:8333 $ACCESS_KEY $SECRET_KEY --insecure
+  mc alias set local http://localhost:8333 $ACCESS_KEY $SECRET_KEY
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mc alias set local https://localhost:8333 $ACCESS_KEY $SECRET_KEY --insecure
mc alias set local http://localhost:8333 $ACCESS_KEY $SECRET_KEY
🤖 Prompt for AI Agents
In hack/e2e-apps/backup.bats around line 216, the mc alias is using
https://localhost:8333 which mismatches the patched S3 endpoint that serves
HTTP; update the alias to use http://localhost:8333 (and remove the unnecessary
--insecure flag) so the protocol matches the SeaweedFS S3 service and the
connection succeeds.

Comment on lines 151 to 152
timestamp := j.Status.StartedAt.Time.Format("2006-01-02-15-04-05")
veleroBackupName := fmt.Sprintf("%s-%s-%s", j.Namespace, j.Name, timestamp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary. BackupJobs have unique names within their namespace, you don't need to put a timestamp in the name of the backup.velero.io resource.

Suggested change
timestamp := j.Status.StartedAt.Time.Format("2006-01-02-15-04-05")
veleroBackupName := fmt.Sprintf("%s-%s-%s", j.Namespace, j.Name, timestamp)
veleroBackupName := fmt.Sprintf("%s-%s", j.Namespace, j.Name)


backup := &backupsv1alpha1.Backup{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-backup", backupJob.Name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Name: fmt.Sprintf("%s-backup", backupJob.Name),
Name: backupJob.Name,

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
hack/e2e-apps/backup.bats (1)

213-215: Fix S3 endpoint protocol: use HTTP instead of HTTPS.

Line 215 configures the MinIO client with https://localhost:8333, but the SeaweedFS S3 service listens only on HTTP (port 8333 with httpsPort: 0 disabled in configuration). The MC_INSECURE=1 flag only skips certificate validation for HTTPS; it cannot resolve a protocol mismatch. Change line 215 to use http://:

Suggested change
   # Set up MinIO client with insecure flag (use environment variable for all commands)
   export MC_INSECURE=1
-  mc alias set local https://localhost:8333 $ACCESS_KEY $SECRET_KEY
+  mc alias set local http://localhost:8333 $ACCESS_KEY $SECRET_KEY
🧹 Nitpick comments (1)
hack/e2e-apps/backup.bats (1)

177-177: Consider replacing arbitrary sleeps with polling loops for more reliable tests.

Lines 177 and 211 use fixed 30-second sleeps to wait for asynchronous operations. These arbitrary delays can make tests unnecessarily slow (if the operation completes faster) or flaky (if the operation takes longer than 30 seconds). Polling loops with timeouts provide more reliable and efficient waiting.

🔎 Example polling pattern for line 177
-  print_log "Wait a bit for the backup to be created and appear in the API"
-  sleep 30
+  print_log "Wait for backup to appear in API"
+  timeout 60 sh -ec "until kubectl -n cozy-velero get backups.velero.io 2>/dev/null | grep -q '^${namespace}-${backupjob_name}-'; do sleep 2; done"

For line 211, consider polling the S3 backup path instead:

-  # Wait a bit for backup data to be written to S3
-  sleep 30
+  # Poll for backup data in S3
+  timeout 60 sh -ec "until mc ls local/${BUCKET_NAME}/backups/ 2>/dev/null | grep -q '${VELERO_BACKUP_NAME}'; do sleep 5; done"

Also applies to: 211-211

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c936b5a and 7abd680.

📒 Files selected for processing (2)
  • hack/e2e-apps/backup.bats
  • hack/e2e-apps/bucket.bats

Comment on lines +157 to +161
print_log "Wait for BackupJob to start"
kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=60s --for=jsonpath='{.status.phase}'=Running

print_log "Wait for BackupJob to complete"
kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=300s --for=jsonpath='{.status.phase}'=Succeeded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Race condition: waiting for transient Running phase may cause flaky test failures.

The test waits for phase=Running (line 158) before waiting for phase=Succeeded (line 161). If the BackupJob transitions directly to Succeeded (or completes faster than the polling interval), the Running wait will timeout even though the backup succeeded.

🔎 Proposed fix
-  print_log "Wait for BackupJob to start"
-  kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=60s --for=jsonpath='{.status.phase}'=Running
-
   print_log "Wait for BackupJob to complete"
   kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=300s --for=jsonpath='{.status.phase}'=Succeeded
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print_log "Wait for BackupJob to start"
kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=60s --for=jsonpath='{.status.phase}'=Running
print_log "Wait for BackupJob to complete"
kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=300s --for=jsonpath='{.status.phase}'=Succeeded
print_log "Wait for BackupJob to complete"
kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=300s --for=jsonpath='{.status.phase}'=Succeeded
🤖 Prompt for AI Agents
In hack/e2e-apps/backup.bats around lines 157-161, the test currently waits for
the transient Running phase before waiting for Succeeded which can race if the
BackupJob skips or quickly passes Running; remove the intermediate wait for
Running and only wait for the backupjob to reach Succeeded (with the
existing/adjusted timeout) so the check succeeds whether the job goes through
Running or directly to Succeeded.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
internal/template/template.go (2)

39-56: LGTM! Helper correctly traverses JSON structures.

The mapAtStrings function correctly handles the recursive traversal of maps, slices, and strings. Non-string types (numbers, booleans, null) appropriately pass through unchanged.

Note: The function mutates the input structures in place, which is efficient but could be surprising. Consider documenting this behavior if the function becomes public.


58-68: LGTM! Standard text/template usage.

The template function correctly uses Go's text/template package for string templating. Error handling is appropriate.

If additional template functions are needed in the future (e.g., {{ lower .Name }}), consider using .Funcs(template.FuncMap{...}) before parsing.

internal/template/template_test.go (2)

65-67: Use structured comparison instead of JSON string comparison.

Comparing JSON-marshaled strings (lines 65-66) is fragile—field ordering or whitespace differences could cause false failures even when the objects are semantically equal.

🔎 Proposed fix using reflect.DeepEqual or cmp.Diff
-	if string(b1) != string(b2) {
-		t.Fatalf("transformed value not equal to reference value, expected: %s, got: %s", string(b1), string(b2))
+	if !reflect.DeepEqual(reference, *got) {
+		t.Fatalf("transformed value not equal to reference value\nexpected: %+v\ngot: %+v", reference, *got)
 	}

Or use github.com/google/go-cmp/cmp:

+	if diff := cmp.Diff(reference, *got); diff != "" {
+		t.Fatalf("transformed value mismatch (-want +got):\n%s", diff)
+	}

11-68: Expand test coverage to include error cases and edge cases.

The test only covers the happy path. Consider adding tests for:

  • Template parsing errors (malformed syntax: {{ .Missing)
  • Template execution errors (missing keys: {{ .Undefined }})
  • Empty input objects
  • Nil template context
  • Non-templated strings (pass-through behavior)
internal/backupcontroller/velerostrategy_controller.go (2)

218-218: Use constant instead of hardcoded duration.

Line 218 uses a hardcoded 5 * time.Second instead of the defined constant defaultRequeueAfter or defaultActiveJobPollingInterval. Use the constant for consistency with the rest of the file.

🔎 Proposed fix
 	// Still in progress (InProgress, New, etc.)
-	return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
+	return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil

140-166: Consider handling multiple backups in list.

While the labels should ensure only one Velero Backup exists per BackupJob, there's no handling or logging if len(veleroBackupList.Items) > 1. Consider adding a debug or warning log to catch unexpected cases where multiple backups share the same labels (e.g., due to bugs or manual intervention).

🔎 Proposed enhancement
 	if len(veleroBackupList.Items) == 0 {
 		// Create Velero Backup
 		...
 	} else {
+		if len(veleroBackupList.Items) > 1 {
+			logger.V(1).Info("found multiple Velero Backups for BackupJob, using first one", 
+				"count", len(veleroBackupList.Items))
+		}
 		// Assign the first backup from the list
 		veleroBackup = &veleroBackupList.Items[0]
 	}
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7abd680 and 0cd5666.

📒 Files selected for processing (7)
  • api/backups/strategy/v1alpha1/velero_types.go
  • api/backups/v1alpha1/backupjob_types.go
  • api/backups/v1alpha1/groupversion_info.go
  • internal/backupcontroller/backupjob_controller.go
  • internal/backupcontroller/velerostrategy_controller.go
  • internal/template/template.go
  • internal/template/template_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/backups/v1alpha1/backupjob_types.go
🧰 Additional context used
📓 Path-based instructions (3)
internal/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Files:

  • internal/template/template_test.go
  • internal/template/template.go
  • internal/backupcontroller/backupjob_controller.go
  • internal/backupcontroller/velerostrategy_controller.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • internal/template/template_test.go
  • internal/template/template.go
  • api/backups/strategy/v1alpha1/velero_types.go
  • internal/backupcontroller/backupjob_controller.go
  • internal/backupcontroller/velerostrategy_controller.go
  • api/backups/v1alpha1/groupversion_info.go
api/**

📄 CodeRabbit inference engine (AGENTS.md)

Define Kubernetes CRDs in the api directory following kubebuilder conventions

Files:

  • api/backups/strategy/v1alpha1/velero_types.go
  • api/backups/v1alpha1/groupversion_info.go
🧠 Learnings (2)
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
🧬 Code graph analysis (4)
internal/template/template_test.go (1)
internal/template/template.go (1)
  • Template (9-37)
api/backups/strategy/v1alpha1/velero_types.go (2)
internal/template/template.go (1)
  • Template (9-37)
api/backups/v1alpha1/backup_types.go (1)
  • BackupSpec (51-77)
internal/backupcontroller/backupjob_controller.go (4)
api/backups/v1alpha1/backupjob_types.go (1)
  • BackupJob (101-107)
api/backups/v1alpha1/groupversion_info.go (1)
  • GroupVersion (34-34)
api/backups/strategy/v1alpha1/job_types.go (1)
  • JobStrategyKind (25-25)
api/backups/strategy/v1alpha1/velero_types.go (1)
  • VeleroStrategyKind (25-25)
internal/backupcontroller/velerostrategy_controller.go (5)
internal/backupcontroller/backupjob_controller.go (1)
  • BackupJobReconciler (25-31)
api/backups/v1alpha1/backupjob_types.go (6)
  • BackupJob (101-107)
  • BackupJobPhaseSucceeded (36-36)
  • BackupJobPhaseFailed (37-37)
  • OwningJobNamespaceLabel (26-26)
  • OwningJobNameLabel (25-25)
  • BackupJobPhaseRunning (35-35)
api/backups/strategy/v1alpha1/velero_types.go (1)
  • Velero (33-39)
api/backups/v1alpha1/backup_types.go (5)
  • Backup (103-109)
  • BackupList (114-118)
  • BackupArtifact (35-48)
  • BackupSpec (51-77)
  • BackupStatus (80-93)
internal/template/template.go (1)
  • Template (9-37)
🔇 Additional comments (7)
api/backups/v1alpha1/groupversion_info.go (1)

28-34: LGTM! Clean refactor to constants.

The introduction of thisGroup and thisVersion constants eliminates string literal duplication and improves maintainability. The constants are appropriately unexported and the public GroupVersion API remains unchanged.

internal/backupcontroller/backupjob_controller.go (4)

25-31: LGTM! Reconciler struct enhanced with dynamic client support.

The additions of dynamic.Interface, meta.RESTMapper, and Recorder follow controller-runtime patterns and enable dynamic resource management and event emission.


33-46: LGTM! Improved logging and error handling.

The enhanced logging with context (namespace, name) and appropriate verbosity levels (V(1) for debug) improves observability. Error handling correctly distinguishes not-found from other errors.


48-61: LGTM! Robust StrategyRef validation.

The nil check for StrategyRef.APIGroup and the APIGroup matching logic are correct. The detailed logging (expected vs. got) aids debugging when strategies don't match.


77-92: LGTM! Standard controller setup with dynamic client.

The initialization of dynamic.Interface and meta.RESTMapper follows controller-runtime patterns. Error handling is correct, and the HTTP client is appropriately created for the RESTMapper.

api/backups/strategy/v1alpha1/velero_types.go (2)

51-59: No action required. This is a new API feature (v1alpha1) with no backward compatibility concerns. Existing Velero resources have empty specs and will continue to work with the new VeleroSpec structure.


9-9: Velero v1.17.1 is the current latest stable version with no security advisories. The import is actively used for VeleroSpec.

Comment on lines +19 to +25
templateFunc := func(in string) string {
out, err := template(in, templateContext)
if err != nil {
return in
}
return out
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Template errors are silently swallowed.

The templateFunc closure returns the original input string when template() fails, hiding template parsing or execution errors from the caller. This makes debugging difficult—users will see unexpanded template syntax (e.g., {{ .Missing }}) in the output without any indication that an error occurred.

Consider either:

  1. Collecting errors during traversal and returning them, or
  2. Logging the error, or
  3. Returning a sentinel value that causes Template to fail
🔎 Proposed fix to propagate template errors

One approach is to collect errors during traversal:

+	var templateErrors []error
 	templateFunc := func(in string) string {
 		out, err := template(in, templateContext)
 		if err != nil {
+			templateErrors = append(templateErrors, fmt.Errorf("template error in %q: %w", in, err))
 			return in
 		}
 		return out
 	}
 	unstructured = mapAtStrings(unstructured, templateFunc)
+	if len(templateErrors) > 0 {
+		return nil, fmt.Errorf("template errors: %v", templateErrors)
+	}

Note: This requires importing "fmt" and adjusting the closure to capture templateErrors.

Committable suggestion skipped: line range outside the PR's diff.

## What this PR does

This patch implements the Reconcile function for BackupJobs with a
Velero strategy ref.

### Release note

```release-note
[backups] Implement the Velero backup strategy controller.
```

Signed-off-by: Andrey Kolkov <androndo@gmail.com>
@lllamnyp lllamnyp force-pushed the velero-backup-controller-impl branch from 0cd5666 to 5f21c6c Compare January 4, 2026 09:57
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
hack/e2e-apps/backup.bats (3)

213-215: Verify HTTPS/HTTP compatibility with SeaweedFS S3.

Using https://localhost:8333 assumes SeaweedFS S3 serves HTTPS. If it serves HTTP, the connection will fail even with MC_INSECURE=1 (which only skips certificate validation, not protocol mismatch).

#!/bin/bash
# Check what protocol SeaweedFS S3 service uses
kubectl get svc seaweedfs-s3 -n tenant-root -o yaml | grep -A10 'ports:'
# Check if there's TLS configuration
kubectl get svc seaweedfs-s3 -n tenant-root -o jsonpath='{.spec.ports[*].name}'

157-161: Race condition: waiting for transient Running phase may cause flaky failures.

If the BackupJob transitions quickly from creation to Succeeded (or skips Running entirely), the wait on line 158 will timeout even though the backup succeeded. The Succeeded wait at line 161 is sufficient.

🔎 Proposed fix
-  print_log "Wait for BackupJob to start"
-  kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=60s --for=jsonpath='{.status.phase}'=Running
-
   print_log "Wait for BackupJob to complete"
   kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=300s --for=jsonpath='{.status.phase}'=Succeeded

191-197: Redundant wait logic with potential infinite loop.

Lines 191-194 contain an unbounded until loop, but line 197 duplicates the wait with a proper timeout. Either remove the unbounded loop or add a timeout to it.

🔎 Proposed fix — remove redundant unbounded loop
   print_log "Verify Velero Backup was found"
   [ -n "$VELERO_BACKUP_NAME" ]
   
-  echo '# Wait for Velero Backup to complete' >&3
-  until kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' | grep -q 'Completed\|Failed'; do
-    sleep 5
-  done
-
   print_log "Verify Velero Backup is Completed"
   timeout 90 sh -ec "until [ \"\$(kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' 2>/dev/null)\" = \"Completed\" ]; do sleep 30; done"
internal/backupcontroller/velerostrategy_controller.go (2)

145-166: Critical bug: veleroBackup is never assigned from the list when backups exist.

When len(veleroBackupList.Items) > 0, the code proceeds to use veleroBackup (lines 166, 180, 190, 211) but it remains an uninitialized empty struct from line 130. This causes incorrect phase checks and invalid data passed to createBackupResource.

🔎 Proposed fix
 	if len(veleroBackupList.Items) == 0 {
 		// Create Velero Backup
 		logger.Debug("Velero Backup not found, creating new one")
 		if err := r.createVeleroBackup(ctx, j, veleroStrategy); err != nil {
 			logger.Error(err, "failed to create Velero Backup")
 			return r.markBackupJobFailed(ctx, j, fmt.Sprintf("failed to create Velero Backup: %v", err))
 		}
 		// After successful Velero backup creation, set phase to Running
 		if j.Status.Phase != backupsv1alpha1.BackupJobPhaseRunning {
 			logger.Debug("setting BackupJob phase to Running after successful Velero backup creation")
 			j.Status.Phase = backupsv1alpha1.BackupJobPhaseRunning
 			if err := r.Status().Update(ctx, j); err != nil {
 				logger.Error(err, "failed to update BackupJob phase to Running")
 				return ctrl.Result{}, err
 			}
 		}
 		logger.Debug("created Velero Backup, requeuing", "veleroBackupName", veleroBackupName)
 		// Requeue to check status
 		return ctrl.Result{RequeueAfter: defaultRequeueAfter}, nil
 	}
+
+	// Assign the backup from the list
+	veleroBackup = &veleroBackupList.Items[0]
 
 	logger.Debug("found existing Velero Backup", "veleroBackupName", veleroBackupName, "phase", veleroBackup.Status.Phase)

519-524: Missing error check after fetching application resource.

The error from r.Resource(...).Get() at line 519 is not checked before using app at line 521. If the application doesn't exist, app will be nil and the template call will fail with an unclear error.

🔎 Proposed fix
 	app, err := r.Resource(mapping.Resource).Namespace(ns).Get(ctx, backupJob.Spec.ApplicationRef.Name, metav1.GetOptions{})
+	if err != nil {
+		return fmt.Errorf("failed to get application resource %s/%s: %w", backupJob.Spec.ApplicationRef.Kind, backupJob.Spec.ApplicationRef.Name, err)
+	}
 
 	veleroBackupSpec, err := template.Template(&strategy.Spec.Template.Spec, app.Object)
packages/system/backup-controller/templates/rbac.yaml (1)

24-26: Missing delete permission for secrets cleanup.

The controller creates secrets in the cozy-velero namespace (via createS3CredsForVelero), but without delete permission, orphaned secrets will accumulate when BackupJobs are deleted. Cross-namespace owner references cannot be used for automatic garbage collection.

🔎 Proposed fix
 - apiGroups: [""]
   resources: ["secrets"]
-  verbs: ["create", "get", "list", "watch", "update", "patch"]
+  verbs: ["create", "get", "list", "watch", "update", "patch", "delete"]
🧹 Nitpick comments (8)
internal/template/template_test.go (1)

11-68: Consider adding edge case coverage and error handling tests.

The test validates basic templating functionality correctly, but consider adding:

  • Tests for malformed templates (e.g., {{ .NonExistent }})
  • Tests for empty strings and nil values
  • Tests for special characters in template values
  • Table-driven tests to cover multiple scenarios efficiently

The JSON string comparison approach is acceptable but be aware it relies on deterministic field ordering.

api/backups/v1alpha1/backupjob_types.go (1)

24-27: Consider reviewing the label naming convention.

The label keys use an unconventional format with CamelCase components:

  • backups.cozystack.io/owned-by.BackupJobName
  • backups.cozystack.io/owned-by.BackupJobNamespace

Kubernetes label keys typically use lowercase with hyphens. Consider whether this format:

  1. Follows existing conventions in your codebase
  2. Should use a format like backups.cozystack.io/owned-by-backupjob-name instead
  3. Might create confusion between the "owned-by" and "BackupJobName" portions

If this format is intentional and consistent with your project's conventions, this comment can be ignored.

internal/backupcontroller/velerostrategy_controller.go (4)

217-218: Use constant instead of hardcoded value.

Line 218 uses 5 * time.Second directly, but defaultActiveJobPollingInterval is defined for this purpose.

🔎 Proposed fix
 	// Still in progress (InProgress, New, etc.)
-	return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
+	return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil

575-578: Simplify unnecessary fmt.Sprintf.

fmt.Sprintf("%s", backupJob.Name) is redundant—use backupJob.Name directly.

🔎 Proposed fix
 	backup := &backupsv1alpha1.Backup{
 		ObjectMeta: metav1.ObjectMeta{
-			Name:      fmt.Sprintf("%s", backupJob.Name),
+			Name:      backupJob.Name,
 			Namespace: backupJob.Namespace,

605-607: Remove unnecessary nil check.

artifact is always non-nil since it's unconditionally assigned at line 557. This check is dead code.

🔎 Proposed fix
-	if artifact != nil {
-		backup.Status.Artifact = artifact
-	}
+	backup.Status.Artifact = artifact

396-438: Good retry pattern for conflict handling.

The retry loops with conflict detection for BackupStorageLocation and VolumeSnapshotLocation updates handle concurrent status updates from Velero correctly. Consider extracting the common retry logic into a helper function to reduce duplication.

Also applies to: 440-481

packages/system/backup-controller/templates/rbac.yaml (1)

30-32: Consider adding delete permission for Velero resources.

For cleanup consistency, you may want delete on Velero resources if the controller should clean up orphaned BackupStorageLocations, VolumeSnapshotLocations, or Velero Backups when BackupJobs are removed.

🔎 Proposed fix
 - apiGroups: ["velero.io"]
   resources: ["backups", "backupstoragelocations", "volumesnapshotlocations", "restores"]
-  verbs: ["create", "get", "list", "watch", "update", "patch"]
+  verbs: ["create", "get", "list", "watch", "update", "patch", "delete"]
hack/e2e-apps/backup.bats (1)

127-131: Consider removing the arbitrary sleep 5.

The sleep 5 on line 127 before waiting for resources is likely unnecessary if the subsequent wait commands have proper timeouts. If needed for HelmRelease propagation, add a comment explaining the reason.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0cd5666 and 5f21c6c.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (16)
  • api/backups/strategy/v1alpha1/velero_types.go
  • api/backups/v1alpha1/backupjob_types.go
  • api/backups/v1alpha1/groupversion_info.go
  • cmd/backup-controller/main.go
  • examples/desired-backup.yaml
  • go.mod
  • hack/e2e-apps/backup.bats
  • hack/e2e-apps/bucket.bats
  • internal/backupcontroller/backupjob_controller.go
  • internal/backupcontroller/velerostrategy_controller.go
  • internal/template/template.go
  • internal/template/template_test.go
  • packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml
  • packages/system/backup-controller/templates/rbac.yaml
  • packages/system/backup-controller/templates/strategy.yaml
  • packages/system/backupstrategy-controller/definitions/strategy.backups.cozystack.io_jobs.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
  • internal/template/template.go
  • cmd/backup-controller/main.go
  • go.mod
  • packages/system/backup-controller/templates/strategy.yaml
  • internal/backupcontroller/backupjob_controller.go
  • hack/e2e-apps/bucket.bats
🧰 Additional context used
📓 Path-based instructions (5)
api/**

📄 CodeRabbit inference engine (AGENTS.md)

Define Kubernetes CRDs in the api directory following kubebuilder conventions

Files:

  • api/backups/v1alpha1/groupversion_info.go
  • api/backups/v1alpha1/backupjob_types.go
  • api/backups/strategy/v1alpha1/velero_types.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • api/backups/v1alpha1/groupversion_info.go
  • api/backups/v1alpha1/backupjob_types.go
  • internal/backupcontroller/velerostrategy_controller.go
  • internal/template/template_test.go
  • api/backups/strategy/v1alpha1/velero_types.go
internal/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Files:

  • internal/backupcontroller/velerostrategy_controller.go
  • internal/template/template_test.go
**/*.yaml

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms

Files:

  • examples/desired-backup.yaml
  • packages/system/backup-controller/templates/rbac.yaml
  • packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml
packages/system/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/system/backup-controller/templates/rbac.yaml
  • packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml
🧠 Learnings (3)
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-10T12:14:27.197Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1161
File: packages/apps/virtual-machine/templates/dashboard-resourcemap.yaml:6-12
Timestamp: 2025-07-10T12:14:27.197Z
Learning: Kubernetes RBAC rules with resourceNames work correctly for list/watch verbs. When resourceNames is specified in an RBAC rule, it properly restricts access to only those named resources, even for list and watch operations. Examples: `kubectl get resource resourcename -w` watches for changes on a single resource, and `kubectl get resource --field-selector .metadata.name=resourcename` lists a specific resource. The Kubernetes API server correctly distinguishes such requests from their less specific counterparts.

Applied to files:

  • packages/system/backup-controller/templates/rbac.yaml
🧬 Code graph analysis (3)
internal/backupcontroller/velerostrategy_controller.go (4)
internal/backupcontroller/backupjob_controller.go (1)
  • BackupJobReconciler (25-31)
api/backups/strategy/v1alpha1/velero_types.go (1)
  • Velero (33-39)
api/backups/v1alpha1/backup_types.go (6)
  • Backup (103-109)
  • BackupList (114-118)
  • BackupArtifact (35-48)
  • BackupSpec (51-77)
  • BackupStatus (80-93)
  • BackupPhaseReady (30-30)
internal/template/template.go (1)
  • Template (9-37)
internal/template/template_test.go (1)
internal/template/template.go (1)
  • Template (9-37)
api/backups/strategy/v1alpha1/velero_types.go (2)
internal/template/template.go (1)
  • Template (9-37)
api/backups/v1alpha1/backup_types.go (1)
  • BackupSpec (51-77)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (14)
api/backups/v1alpha1/groupversion_info.go (1)

28-34: LGTM! Good refactoring to use constants.

The introduction of thisGroup and thisVersion constants improves maintainability. Note that the kubebuilder annotation on line 19 still contains the hardcoded string "backups.cozystack.io", but this is unavoidable since annotations are comments and cannot reference constants.

examples/desired-backup.yaml (1)

1-20: LGTM! Well-structured example manifest.

The BackupJob example clearly demonstrates the CR structure with all three required references (applicationRef, storageRef, strategyRef). The use of backups.cozystack.io/triggered-by: manual label is helpful for distinguishing manual backups.

packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml (2)

17-20: LGTM! Phase column addition enhances kubectl output.

The additionalPrinterColumns for Phase aligns with the kubebuilder annotation in the Go type definition and will provide useful visibility into BackupJob status.


240-241: LGTM! Status subresource properly enabled.

The status subresource addition aligns with the +kubebuilder:subresource:status annotation in api/backups/v1alpha1/backupjob_types.go and follows Kubernetes best practices for status handling.

api/backups/v1alpha1/backupjob_types.go (1)

93-94: LGTM! Kubebuilder annotations properly configured.

The status subresource and Phase print column annotations align correctly with the CRD definition in packages/system/backup-controller/definitions/backups.cozystack.io_backupjobs.yaml.

api/backups/strategy/v1alpha1/velero_types.go (2)

9-9: LGTM on Velero v1 import.

The import alias velerov1 is clear and follows Go naming conventions for external API imports.


50-59: Clean template design for Velero backup specifications.

The VeleroSpecVeleroTemplatevelerov1.BackupSpec hierarchy provides a clear structure for templating Velero backups. This aligns well with the template.Template generic function in internal/template/template.go that processes these specs.

internal/backupcontroller/velerostrategy_controller.go (4)

28-41: LGTM on logger wrapper.

The loggerWithDebug wrapper provides a convenient Debug() method that maps to V(1).Info(), improving code readability while maintaining proper log verbosity levels.


43-63: LGTM on credential structs.

The S3Credentials and bucketInfo structs are well-defined for handling COSI bucket credentials. The nested structure in bucketInfo correctly mirrors the expected JSON format from BucketInfo secrets.


65-79: LGTM on constants and helpers.

Good practice defining defaultRequeueAfter and defaultActiveJobPollingInterval as constants. The storageS3SecretName function provides consistent naming for Velero credential secrets.


310-394: LGTM on S3 credentials management.

Good implementation of idempotent secret creation with proper change detection using reflect.DeepEqual. The event recording provides useful audit trail.

packages/system/backup-controller/templates/rbac.yaml (1)

6-17: LGTM on core backup permissions.

Proper separation of read-only access to plans and full lifecycle access to backupjobs and backups with status subresource.

hack/e2e-apps/backup.bats (2)

9-47: Good teardown implementation for reliable cleanup.

The teardown function properly cleans up all resources (Velero backups, BSL/VSL, secrets, BackupJob, VM, Bucket) even on test failure. Using pattern matching to find Velero backups is a good approach given the generated names.


70-83: LGTM on bucket setup and readiness checks.

Good use of multiple wait commands to ensure all bucket-related resources (HelmRelease, BucketClaim, BucketAccess) are ready before proceeding.

## What this PR does

This patch narrows the scope of the Velero backup strategy controller to
simply template Velero Backups according to the application being backed
up and the template in the strategy. Creating storage locations is now
out of scope.

### Release note

```release-note
[backups] Implement templating for Velero backups and remove creation of
backup storage locations and volume snapshot locations.
```

Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
@lllamnyp lllamnyp force-pushed the velero-backup-controller-impl branch from 5f21c6c to f4228ff Compare January 4, 2026 10:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
hack/e2e-apps/backup.bats.disabled (1)

157-161: Race condition: waiting for transient Running phase may cause flaky test failures.

The test waits for phase=Running (line 158) before waiting for phase=Succeeded (line 161). If the BackupJob transitions quickly or directly to Succeeded (faster than the polling interval), the Running wait will timeout even though the backup succeeded.

🔎 Proposed fix
-  print_log "Wait for BackupJob to start"
-  kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=60s --for=jsonpath='{.status.phase}'=Running
-
   print_log "Wait for BackupJob to complete"
   kubectl -n ${namespace} wait backupjob ${backupjob_name} --timeout=300s --for=jsonpath='{.status.phase}'=Succeeded
internal/template/template.go (1)

19-25: Template errors are silently swallowed.

The templateFunc closure returns the original input string when template() fails (line 22), hiding template parsing or execution errors from the caller. Users will see unexpanded template syntax (e.g., {{ .Missing }}) without any indication that an error occurred, making debugging extremely difficult.

🔎 Proposed fix to propagate template errors

Collect errors during traversal and return them:

+	var templateErrors []error
 	templateFunc := func(in string) string {
 		out, err := template(in, templateContext)
 		if err != nil {
+			templateErrors = append(templateErrors, fmt.Errorf("template error in %q: %w", in, err))
 			return in
 		}
 		return out
 	}
 	unstructured = mapAtStrings(unstructured, templateFunc)
+	if len(templateErrors) > 0 {
+		return nil, fmt.Errorf("template errors: %v", templateErrors)
+	}
 	b, err = json.Marshal(unstructured)

Note: Requires importing "fmt" at the top of the file.

🧹 Nitpick comments (8)
hack/e2e-apps/backup.bats.disabled (2)

191-201: Redundant wait logic creates confusion.

Lines 191-194 implement an until loop without an explicit timeout, while line 197 adds a separate timeout wrapper that also waits for Completed. This creates redundant and confusing wait logic.

🔎 Proposed fix to consolidate wait logic
-  echo '# Wait for Velero Backup to complete' >&3
-  until kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' | grep -q 'Completed\|Failed'; do
-    sleep 5
-  done
-
   print_log "Verify Velero Backup is Completed"
   timeout 90 sh -ec "until [ \"\$(kubectl -n cozy-velero get backups.velero.io ${VELERO_BACKUP_NAME} -o jsonpath='{.status.phase}' 2>/dev/null)\" = \"Completed\" ]; do sleep 30; done"

204-208: Port-forward background process may leak.

The timeout 100s wrapper spawns the port-forward in the background but relies on teardown's pkill (line 17) to clean it up. If the test script exits abnormally, the process may leak.

🔎 Proposed improvement using a subshell trap
-  bash -c 'timeout 100s kubectl port-forward service/seaweedfs-s3 -n tenant-root 8333:8333 > /dev/null 2>&1 &'
+  (trap 'kill $PORT_FWD_PID 2>/dev/null || true' EXIT
+   kubectl port-forward service/seaweedfs-s3 -n tenant-root 8333:8333 > /dev/null 2>&1 &
+   PORT_FWD_PID=$!
+   sleep 100
+  ) &

Alternatively, store the PID and ensure cleanup in teardown.

internal/template/template.go (1)

9-37: Consider adding documentation for the Template function.

The generic Template function performs complex operations (JSON marshaling, recursive string transformation, unmarshaling). Adding a doc comment would help users understand its behavior, especially regarding error handling and the template context format.

Example documentation
// Template applies text/template transformations to all string fields in obj using templateContext.
// It marshals obj to JSON, recursively applies template substitutions to string values,
// and unmarshals back to type T. Template syntax errors in individual strings currently
// result in the original string being preserved (see known limitation).
//
// Parameters:
//   - obj: pointer to the object to template
//   - templateContext: variables available to templates (e.g., map[string]any{"name": "foo"})
//
// Returns a new templated object of type T, or an error if marshaling/unmarshaling fails.
func Template[T any](obj *T, templateContext map[string]any) (*T, error) {
internal/backupcontroller/backupjob_controller.go (1)

25-31: Consider naming the embedded interfaces for clarity.

Embedding dynamic.Interface and meta.RESTMapper directly works but can be confusing for readers. Named fields would improve clarity and make the struct self-documenting.

🔎 Proposed refactor
 type BackupJobReconciler struct {
 	client.Client
-	dynamic.Interface
-	meta.RESTMapper
+	DynamicClient dynamic.Interface
+	RESTMapper    meta.RESTMapper
 	Scheme   *runtime.Scheme
 	Recorder record.EventRecorder
 }
internal/backupcontroller/velerostrategy_controller.go (4)

226-226: Use the defined constant for requeue duration.

Line 226 uses a hardcoded 5 * time.Second instead of the defaultActiveJobPollingInterval constant defined at line 67. This was noted in past reviews.

🔎 Proposed fix
 	// Still in progress (InProgress, New, etc.)
-	return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
+	return ctrl.Result{RequeueAfter: defaultActiveJobPollingInterval}, nil
 }

498-505: Consider using meta.SetStatusCondition for idempotent condition updates.

Appending to Conditions directly could create duplicates if this function is called multiple times due to update failures. Using meta.SetStatusCondition from k8s.io/apimachinery/pkg/api/meta ensures idempotent updates.

🔎 Proposed fix
+	apimeta "k8s.io/apimachinery/pkg/api/meta"
 ...
 	// Add condition
-	backupJob.Status.Conditions = append(backupJob.Status.Conditions, metav1.Condition{
+	apimeta.SetStatusCondition(&backupJob.Status.Conditions, metav1.Condition{
 		Type:               "Ready",
 		Status:             metav1.ConditionFalse,
 		Reason:             "BackupFailed",
 		Message:            message,
 		LastTransitionTime: now,
 	})

586-589: Simplify redundant string formatting.

fmt.Sprintf("%s", backupJob.Name) is equivalent to just backupJob.Name.

🔎 Proposed fix
 	backup := &backupsv1alpha1.Backup{
 		ObjectMeta: metav1.ObjectMeta{
-			Name:      fmt.Sprintf("%s", backupJob.Name),
+			Name:      backupJob.Name,
 			Namespace: backupJob.Namespace,

616-618: Remove unnecessary nil check.

The artifact variable is always non-nil as it's assigned at line 568. This check is redundant.

🔎 Proposed fix
 	if backupJob.Spec.PlanRef != nil {
 		backup.Spec.PlanRef = backupJob.Spec.PlanRef
 	}
 
-	if artifact != nil {
-		backup.Status.Artifact = artifact
-	}
+	backup.Status.Artifact = artifact
 
 	if err := r.Create(ctx, backup); err != nil {
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f21c6c and f4228ff.

📒 Files selected for processing (9)
  • api/backups/strategy/v1alpha1/velero_types.go
  • api/backups/strategy/v1alpha1/zz_generated.deepcopy.go
  • api/backups/v1alpha1/backupjob_types.go
  • api/backups/v1alpha1/groupversion_info.go
  • hack/e2e-apps/backup.bats.disabled
  • internal/backupcontroller/backupjob_controller.go
  • internal/backupcontroller/velerostrategy_controller.go
  • internal/template/template.go
  • internal/template/template_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/template/template_test.go
🧰 Additional context used
📓 Path-based instructions (3)
internal/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Files:

  • internal/backupcontroller/backupjob_controller.go
  • internal/template/template.go
  • internal/backupcontroller/velerostrategy_controller.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • internal/backupcontroller/backupjob_controller.go
  • api/backups/strategy/v1alpha1/velero_types.go
  • internal/template/template.go
  • api/backups/v1alpha1/backupjob_types.go
  • api/backups/strategy/v1alpha1/zz_generated.deepcopy.go
  • internal/backupcontroller/velerostrategy_controller.go
  • api/backups/v1alpha1/groupversion_info.go
api/**

📄 CodeRabbit inference engine (AGENTS.md)

Define Kubernetes CRDs in the api directory following kubebuilder conventions

Files:

  • api/backups/strategy/v1alpha1/velero_types.go
  • api/backups/v1alpha1/backupjob_types.go
  • api/backups/strategy/v1alpha1/zz_generated.deepcopy.go
  • api/backups/v1alpha1/groupversion_info.go
🧠 Learnings (2)
📚 Learning: 2025-12-25T15:24:47.730Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/velero.yaml:20-23
Timestamp: 2025-12-25T15:24:47.730Z
Learning: In Velero PackageSource configurations, the privileged: true setting is required for node-agents to function properly with hostPath access for backup operations.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
📚 Learning: 2025-07-24T13:21:17.388Z
Learnt from: lllamnyp
Repo: cozystack/cozystack PR: 1239
File: packages/extra/seaweedfs/templates/external/cosi-bucket-class.yaml:3-15
Timestamp: 2025-07-24T13:21:17.388Z
Learning: In COSI (Container Object Storage Interface), BucketAccessClass resources have fields like driverName and authenticationType at the top level, not under a spec block. Only BucketClass resources require a spec block for fields like driverName and deletionPolicy.

Applied to files:

  • internal/backupcontroller/velerostrategy_controller.go
🧬 Code graph analysis (3)
internal/backupcontroller/backupjob_controller.go (4)
api/backups/v1alpha1/backupjob_types.go (1)
  • BackupJob (101-107)
api/backups/v1alpha1/groupversion_info.go (1)
  • GroupVersion (34-34)
api/backups/strategy/v1alpha1/job_types.go (1)
  • JobStrategyKind (25-25)
api/backups/strategy/v1alpha1/velero_types.go (1)
  • VeleroStrategyKind (25-25)
api/backups/strategy/v1alpha1/zz_generated.deepcopy.go (2)
internal/template/template.go (1)
  • Template (9-37)
api/backups/strategy/v1alpha1/velero_types.go (1)
  • VeleroTemplate (57-59)
internal/backupcontroller/velerostrategy_controller.go (4)
internal/backupcontroller/backupjob_controller.go (1)
  • BackupJobReconciler (25-31)
api/backups/v1alpha1/backupjob_types.go (1)
  • BackupJob (101-107)
api/backups/strategy/v1alpha1/velero_types.go (1)
  • Velero (33-39)
api/backups/v1alpha1/backup_types.go (6)
  • BackupList (114-118)
  • Backup (103-109)
  • BackupArtifact (35-48)
  • BackupSpec (51-77)
  • BackupStatus (80-93)
  • BackupPhaseReady (30-30)
🔇 Additional comments (13)
api/backups/strategy/v1alpha1/zz_generated.deepcopy.go (1)

1-236: LGTM! Auto-generated deepcopy methods are correct.

The controller-gen tool has properly generated deep-copy methods for the new VeleroTemplate type and updated existing methods to handle nested struct copying via DeepCopyInto calls.

As per coding guidelines, auto-generated files (zz_generated.*.go) should not be manually edited.

api/backups/strategy/v1alpha1/velero_types.go (2)

9-9: LGTM! Velero integration properly imported.

The import alias velerov1 follows Go conventions and enables integration with Velero's BackupSpec type.


51-59: LGTM! VeleroSpec and VeleroTemplate types are well-designed.

The VeleroTemplate wrapper around velerov1.BackupSpec enables templating of Velero backup specifications while maintaining type safety. The structure follows kubebuilder conventions with appropriate JSON tags.

hack/e2e-apps/backup.bats.disabled (1)

9-47: LGTM! Comprehensive teardown ensures proper cleanup.

The teardown function properly cleans up all resources including Velero backups, storage locations, secrets, and port-forwards. This ensures test isolation.

api/backups/v1alpha1/groupversion_info.go (1)

28-34: LGTM! Constants improve maintainability.

Extracting group and version as constants (thisGroup, thisVersion) improves maintainability and enables consistent reuse across the API package, particularly for label prefixes mentioned in the AI summary.

api/backups/v1alpha1/backupjob_types.go (2)

24-27: LGTM! Label constants are well-defined.

The label constants use the group prefix pattern correctly for namespacing. The owned-by.BackupJobName and owned-by.BackupJobNamespace suffixes clearly convey ownership semantics.


93-94: LGTM! Kubebuilder annotations are correctly placed.

The +kubebuilder:subresource:status enables the status subresource for proper status updates, and the +kubebuilder:printcolumn annotation adds useful Phase visibility when listing BackupJobs.

internal/backupcontroller/backupjob_controller.go (2)

77-92: LGTM! SetupWithManager properly initializes dependencies.

The dynamic client and REST mapper are correctly initialized from the manager's config. Error handling is appropriate, and the controller registration follows controller-runtime conventions.


48-59: LGTM! Defensive nil check and API group validation.

The nil check for StrategyRef.APIGroup prevents panics, and the API group matching ensures only supported strategies are processed. Logging at V(1) level is appropriate for skip scenarios.

internal/backupcontroller/velerostrategy_controller.go (4)

28-41: LGTM! Debug logger wrapper is clean and useful.

The loggerWithDebug wrapper provides a convenient Debug() method that maps to V(1).Info(), reducing verbosity in the calling code while maintaining proper log levels.


318-402: LGTM! Idempotent secret management with proper change detection.

The function correctly handles create/update scenarios, compares data before updating to avoid unnecessary writes, and emits appropriate events.


404-446: LGTM! Conflict retry pattern is well implemented.

The 3-attempt retry with 100ms backoff handles concurrent Velero status updates gracefully. The spec comparison avoids unnecessary updates.


515-562: LGTM! Velero Backup creation is well-structured.

The function properly resolves the application resource, templates the backup spec, and creates the Velero Backup with appropriate labels for ownership tracking. Error handling is in place.

Comment on lines +229 to +316
// resolveBucketStorageRef discovers S3 credentials from a Bucket storageRef
// It follows this flow:
// 1. Get the Bucket resource (apps.cozystack.io/v1alpha1)
// 2. Find the BucketAccess that references this bucket
// 3. Get the secret from BucketAccess.spec.credentialsSecretName
// 4. Decode BucketInfo from secret.data.BucketInfo and extract S3 credentials
func (r *BackupJobReconciler) resolveBucketStorageRef(ctx context.Context, storageRef corev1.TypedLocalObjectReference, namespace string) (*S3Credentials, error) {
logger := getLogger(ctx)

// Step 1: Get the Bucket resource
bucket := &unstructured.Unstructured{}
bucket.SetGroupVersionKind(schema.GroupVersionKind{
Group: *storageRef.APIGroup,
Version: "v1alpha1",
Kind: storageRef.Kind,
})

if *storageRef.APIGroup != "apps.cozystack.io" {
return nil, fmt.Errorf("Unsupported storage APIGroup: %v, expected apps.cozystack.io", storageRef.APIGroup)
}
bucketKey := client.ObjectKey{Namespace: namespace, Name: storageRef.Name}

if err := r.Get(ctx, bucketKey, bucket); err != nil {
return nil, fmt.Errorf("failed to get Bucket %s: %w", storageRef.Name, err)
}

// Step 2: Determine the bucket claim name
// For apps.cozystack.io Bucket, the BucketClaim name is typically the same as the Bucket name
// or follows a pattern. Based on the templates, it's usually the Release.Name which equals the Bucket name
bucketName := storageRef.Name

// Step 3: Get BucketAccess by name (assuming BucketAccess name matches bucketName)
bucketAccess := &unstructured.Unstructured{}
bucketAccess.SetGroupVersionKind(schema.GroupVersionKind{
Group: "objectstorage.k8s.io",
Version: "v1alpha1",
Kind: "BucketAccess",
})

bucketAccessKey := client.ObjectKey{Name: "bucket-" + bucketName, Namespace: namespace}
if err := r.Get(ctx, bucketAccessKey, bucketAccess); err != nil {
return nil, fmt.Errorf("failed to get BucketAccess %s in namespace %s: %w", bucketName, namespace, err)
}

// Step 4: Get the secret name from BucketAccess
secretName, found, err := unstructured.NestedString(bucketAccess.Object, "spec", "credentialsSecretName")
if err != nil {
return nil, fmt.Errorf("failed to get credentialsSecretName from BucketAccess: %w", err)
}
if !found || secretName == "" {
return nil, fmt.Errorf("credentialsSecretName not found in BucketAccess %s", bucketAccessKey.Name)
}

// Step 5: Get the secret
secret := &corev1.Secret{}
secretKey := client.ObjectKey{Namespace: namespace, Name: secretName}
if err := r.Get(ctx, secretKey, secret); err != nil {
return nil, fmt.Errorf("failed to get secret %s: %w", secretName, err)
}

// Step 6: Decode BucketInfo from secret.data.BucketInfo
bucketInfoData, found := secret.Data["BucketInfo"]
if !found {
return nil, fmt.Errorf("BucketInfo key not found in secret %s", secretName)
}

// Parse JSON value
var info bucketInfo
if err := json.Unmarshal(bucketInfoData, &info); err != nil {
return nil, fmt.Errorf("failed to unmarshal BucketInfo from secret %s: %w", secretName, err)
}

// Step 7: Extract and return S3 credentials
creds := &S3Credentials{
BucketName: info.Spec.BucketName,
Endpoint: info.Spec.SecretS3.Endpoint,
Region: info.Spec.SecretS3.Region,
AccessKeyID: info.Spec.SecretS3.AccessKeyID,
AccessSecretKey: info.Spec.SecretS3.AccessSecretKey,
}

logger.Debug("resolved S3 credentials from Bucket storageRef",
"bucket", storageRef.Name,
"bucketName", creds.BucketName,
"endpoint", creds.Endpoint)

return creds, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify if these functions are called anywhere in the codebase

echo "=== Checking for usages of resolveBucketStorageRef ==="
rg -n "resolveBucketStorageRef" --type go

echo ""
echo "=== Checking for usages of createS3CredsForVelero ==="
rg -n "createS3CredsForVelero" --type go

echo ""
echo "=== Checking for usages of createBackupStorageLocation ==="
rg -n "createBackupStorageLocation" --type go

echo ""
echo "=== Checking for usages of createVolumeSnapshotLocation ==="
rg -n "createVolumeSnapshotLocation" --type go

Repository: cozystack/cozystack

Length of output: 1658


Remove dead code functions or add TODO comments explaining future intent.

The functions resolveBucketStorageRef, createS3CredsForVelero, createBackupStorageLocation, and createVolumeSnapshotLocation are defined but never called anywhere in the codebase. Since the PR removed automatic creation of BackupStorageLocation and VolumeSnapshotLocation resources, these functions appear to be unused remnants and should be removed. If they're intended for future use, add TODO comments explaining the planned feature to preserve intent.

Comment on lines +239 to +248
bucket := &unstructured.Unstructured{}
bucket.SetGroupVersionKind(schema.GroupVersionKind{
Group: *storageRef.APIGroup,
Version: "v1alpha1",
Kind: storageRef.Kind,
})

if *storageRef.APIGroup != "apps.cozystack.io" {
return nil, fmt.Errorf("Unsupported storage APIGroup: %v, expected apps.cozystack.io", storageRef.APIGroup)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Nil pointer dereference risk before validation.

Line 241 dereferences *storageRef.APIGroup before the nil check on line 246. If APIGroup is nil, this will panic.

🔎 Proposed fix
 func (r *BackupJobReconciler) resolveBucketStorageRef(ctx context.Context, storageRef corev1.TypedLocalObjectReference, namespace string) (*S3Credentials, error) {
 	logger := getLogger(ctx)
 
+	if storageRef.APIGroup == nil {
+		return nil, fmt.Errorf("storageRef.APIGroup is nil")
+	}
+	if *storageRef.APIGroup != "apps.cozystack.io" {
+		return nil, fmt.Errorf("unsupported storage APIGroup: %s, expected apps.cozystack.io", *storageRef.APIGroup)
+	}
+
 	// Step 1: Get the Bucket resource
 	bucket := &unstructured.Unstructured{}
 	bucket.SetGroupVersionKind(schema.GroupVersionKind{
 		Group:   *storageRef.APIGroup,
 		Version: "v1alpha1",
 		Kind:    storageRef.Kind,
 	})
-
-	if *storageRef.APIGroup != "apps.cozystack.io" {
-		return nil, fmt.Errorf("Unsupported storage APIGroup: %v, expected apps.cozystack.io", storageRef.APIGroup)
-	}
🤖 Prompt for AI Agents
In internal/backupcontroller/velerostrategy_controller.go around lines 239 to
248, the code dereferences storageRef.APIGroup before checking for nil which can
panic; first check if storageRef.APIGroup == nil (and return a clear error)
before using *storageRef.APIGroup, then only construct the GroupVersionKind or
compare values after that nil-check; reorder the logic so the nil validation
happens before any dereference and use the validated *storageRef.APIGroup where
needed.

@lllamnyp lllamnyp enabled auto-merge January 5, 2026 06:52
@lllamnyp lllamnyp merged commit 069a3ca into main Jan 5, 2026
64 of 68 checks passed
@lllamnyp lllamnyp deleted the velero-backup-controller-impl branch January 5, 2026 12:29
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 5, 2026
lllamnyp added a commit that referenced this pull request Jan 8, 2026
## What this PR does

Implement a controller for BackupJobs referencing a Velero strategy. Creates a Backup.velero.io according to the template in the `Velero.strategy.backups.cozystack.io`.

### Release note

```release-note
[backups] Implement a backup strategy controller for Velero strategies.
```

Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
Co-authored-by: Andrey Kolkov <androndo@gmail.com>
Co-authored-by: Timofei Larkin <lllamnyp@gmail.com>
(cherry picked from commit 069a3ca)
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
kvaps added a commit that referenced this pull request Jan 16, 2026
…d backup system (#1867)

## What this PR does

Update changelog for v1.0.0-alpha.1 to include missing features:
- **Cozystack Operator**: New operator for Package and PackageSource
management (#1740, #1741, #1755, #1756, #1760, #1761)
- **Backup System**: Comprehensive backup functionality with Velero
integration (#1640, #1685, #1687, #1708, #1719, #1720, #1737, #1762)
- Add @androndo to contributors
- Update Full Changelog link to v0.38.0...v1.0.0-alpha.1

### Release note

```release-note
[docs] Update changelog for v1.0.0-alpha.1: add cozystack-operator and backup system
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants