Skip to content

Conversation

@kvaps
Copy link
Member

@kvaps kvaps commented Dec 22, 2025

Signed-off-by: Andrei Kvapil kvapss@gmail.com

What this PR does

Release note

[cozystack-operator] Introduce Cozystack-operator core logic

Summary by CodeRabbit

  • New Features

    • Operator entrypoint with flag-driven configuration (metrics, probes, leader election, Flux pre-install, version, platform-source options).
    • Optional pre-start Flux installation and automatic provisioning of platform sources from OCI or Git/HTTP(S)/SSH.
  • Chores

    • Embedded Flux manifests packaged for simplified installation; build tooling updated to produce new manifest outputs.
  • Behavior

    • Pre-start Flux and platform-source installation failures are treated as fatal.

✏️ Tip: You can customize this high-level summary in your review settings.

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Dec 22, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 22, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a new operator entrypoint and Flux integration: flag-driven startup that can pre-install Flux from embedded manifests, apply Flux resources, generate/apply a platform source (OCI or Git), then start a controller-runtime manager with probes and leader election.

Changes

Cohort / File(s) Summary
Operator Entrypoint
cmd/cozystack-operator/main.go
New main: controller-runtime manager setup (metrics, health/readiness probes, leader election), extensive CLI flags, optional pre-start Flux install via embedded manifests, JSON-based resource create-or-update, platform-source generation/install (OCIRepository or GitRepository), scheme registration, and startup lifecycle handling.
Flux Installation Logic
internal/fluxinstall/install.go
New exported Install(ctx, k8sClient, writeEmbeddedManifests) that materializes embedded manifests, parses multi-doc YAML into unstructured objects, optionally injects KUBERNETES_SERVICE_HOST/PORT into pod templates, extracts target namespace, and applies resources in two staged server-side apply passes (CRDs/Namespace then rest) with FieldManager "cozystack-operator" and Force.
Embedded Manifests Writer
internal/fluxinstall/manifests.embed.go
Embeds manifests/*.yaml via embed.FS and adds WriteEmbeddedManifests(dir string) error to write files to disk.
Platform Source Helpers
cmd/cozystack-operator/... (internal helpers)
New helpers to parse platform source URLs (OCI vs Git), parse ref specs, generate Flux OCIRepository or GitRepository objects, and apply them using a direct client with timeouts.
Flux AIO Build Targets
packages/core/flux-aio/Makefile
Adds update-new target and makes update depend on update-old and update-new; update-new emits .../internal/fluxinstall/manifests/fluxcd.yaml and omits hostNetwork/dnsPolicy patching.
Go Module Updates
go.mod
Adds github.com/fluxcd/source-controller/api v1.7.4, bumps sigs.k8s.io/controller-runtime and k8s.io/apimachinery, and updates related indirect Flux packages.

Sequence Diagram(s)

sequenceDiagram
    participant Operator as Operator main
    participant FS as Embedded Manifests FS
    participant K8sAPI as Kubernetes API
    participant FluxCR as Flux resources

    Note over Operator,FS: Optional pre-start Flux install
    Operator->>FS: WriteEmbeddedManifests(dir)
    FS-->>Operator: manifests files
    Operator->>K8sAPI: Install(ctx) — apply CRDs & Namespace (stage 1)
    alt stage1 success
        K8sAPI-->>Operator: ACK
        Operator->>K8sAPI: apply remaining manifests (stage 2)
        K8sAPI-->>Operator: ACK / errors (logged)
    else stage1 failure (logged, non-fatal)
        K8sAPI-->>Operator: Error (logged)
    end

    Note over Operator,FluxCR: Optional Flux resource provisioning
    Operator->>Operator: parse JSON resources
    Operator->>K8sAPI: create-or-update each Flux resource
    K8sAPI-->>Operator: applied / errors (logged)

    Note over Operator,K8sAPI: Optional platform source (fatal on failure)
    Operator->>Operator: parse platform URL (oci:// or git/...)
    Operator->>K8sAPI: apply GitRepository / OCIRepository in cozy-system
    alt success
        K8sAPI-->>Operator: created/updated
    else failure
        K8sAPI-->>Operator: error -> Operator exits
    end

    Note over Operator: Start manager (healthz/readyz, controllers)
    Operator->>Operator: manager.Start(signalCtx)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I tuck manifests beneath the log,
Flux hums a tune while I hop the bog.
I write the files and stage the flow,
Apply, update — watch sources grow.
Cozy systems bloom where pipelines go.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[cozystack-operator] Introduce Cozystack-operator core logic' clearly identifies the main component and its primary purpose of introducing core operator logic.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cozystack-operator-core

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot bot added the enhancement New feature or request label Dec 22, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @kvaps, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the foundational framework for the cozystack-operator, introducing its core operational logic. It significantly enhances the operator's capabilities by integrating with Flux CD, allowing for the automated installation of Flux components and the dynamic creation of source repositories (OCI or Git) based on configuration. This initial setup is crucial for the operator to manage and deploy Cozystack's infrastructure in a declarative and automated manner.

Highlights

  • Core Operator Logic: Introduces the cozystack-operator's main execution flow, including flag parsing for configuration such as metrics, leader election, and Flux installation.
  • Flux CD Integration: Adds functionality to optionally install Flux CD components and apply arbitrary Flux resources from JSON strings at operator startup, enabling declarative management.
  • Dynamic Platform Source Management: Implements logic to generate and apply Flux OCIRepository or GitRepository resources based on a provided URL (supporting OCI and Git protocols), allowing dynamic source configuration.
  • Manifest Application Utility: A new fluxinstall package is added to handle the parsing and server-side application of Kubernetes manifests, prioritizing CRDs and Namespaces for correct dependency ordering.
  • Environment Variable Injection: Automatically injects KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT into Flux component pods (Deployments, StatefulSets, DaemonSets) if these are set in the operator's environment.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the core logic for the cozystack-operator, including Flux installation and resource management. The overall structure is good, but I have identified a few areas for improvement. My feedback focuses on making the resource application logic more robust by using server-side apply consistently, improving error handling, fixing a potential parsing bug, and enhancing the reliability of CRD installation. I've also pointed out a minor security improvement regarding file permissions.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
internal/fluxinstall/manifests.embed.go (1)

31-50: Consider more restrictive file permissions.

The function uses 0666 permissions when writing manifests to disk. While these files are written to a temporary directory (reducing security impact), consider using 0644 (read-write for owner, read-only for group/others) as a more defensive default.

🔎 Proposed permission adjustment
-		if err := os.WriteFile(outputPath, data, 0666); err != nil {
+		if err := os.WriteFile(outputPath, data, 0644); err != nil {
packages/core/flux-aio/Makefile (1)

17-34: Dual update workflow supports migration.

The coexistence of update-old and update-new paths provides a clean migration strategy. The update-new path correctly outputs to manifests/fluxcd.yaml and removes the hostNetwork/dnsPolicy patches per the TODO comment on lines 33-34.

The TODO on line 33 mentions solving DNS issues with hostNetwork for tenant clusters. Would you like me to help track this or investigate alternative DNS configuration approaches?

internal/fluxinstall/install.go (1)

161-197: Consider replacing hard-coded sleep with readiness polling.

The two-stage apply strategy (CRDs/Namespaces first, then other resources) is correct. However, the hard-coded 2-second sleep on line 185 is brittle and may be insufficient under load or excessive when CRDs register quickly.

Consider replacing with a readiness check that polls for CRD availability using the API server's discovery client, with a timeout. This would make the installation more reliable across different cluster conditions.

Conceptual approach
// After applying CRDs, poll for their availability
if len(stageOne) > 0 {
    logger.Info("Applying cluster definitions", "count", len(stageOne))
    if err := applyObjects(ctx, k8sClient, decoder, stageOne); err != nil {
        return fmt.Errorf("failed to apply cluster definitions: %w", err)
    }
    
    // Poll for CRD readiness with timeout
    if err := waitForCRDsReady(ctx, k8sClient, stageOne, 30*time.Second); err != nil {
        logger.Info("CRDs may not be fully ready, continuing anyway", "error", err)
    }
}

The waitForCRDsReady helper would check the Established condition on CRD status.

cmd/cozystack-operator/main.go (2)

105-107: Development mode enabled by default.

Line 106 sets Development: true for the logger, which may not be appropriate for production deployments. Consider making this configurable via a CLI flag or environment variable.

🔎 Suggested flag-based configuration
+	var devMode bool
+	flag.BoolVar(&devMode, "dev-mode", false, "Enable development mode (more verbose logging)")
+
 	opts := zap.Options{
-		Development: true,
+		Development: devMode,
 	}

209-270: Consider extracting common create-or-update logic.

The installFluxResourcesFunc implements create-or-update logic that is duplicated in installPlatformSourceResource (lines 307-332). Both follow the same pattern:

  1. Get existing resource
  2. If not found, create
  3. If found, update with ResourceVersion

Consider extracting this pattern into a reusable helper function to reduce duplication and improve maintainability:

func createOrUpdateResource(ctx context.Context, k8sClient client.Client, obj *unstructured.Unstructured) error {
    existing := &unstructured.Unstructured{}
    existing.SetGroupVersionKind(obj.GroupVersionKind())
    key := client.ObjectKey{Name: obj.GetName(), Namespace: obj.GetNamespace()}
    
    err := k8sClient.Get(ctx, key, existing)
    if err != nil {
        if client.IgnoreNotFound(err) == nil {
            return k8sClient.Create(ctx, obj)
        }
        return fmt.Errorf("failed to check if resource exists: %w", err)
    }
    
    obj.SetResourceVersion(existing.GetResourceVersion())
    return k8sClient.Update(ctx, obj)
}

This would simplify both functions and ensure consistent behavior.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe7bdcf and dfde8f2.

📒 Files selected for processing (6)
  • cmd/cozystack-operator/main.go
  • internal/fluxinstall/install.go
  • internal/fluxinstall/manifests.embed.go
  • internal/fluxinstall/manifests/fluxcd.yaml
  • packages/core/flux-aio/Makefile
  • packages/core/flux-aio/manifests
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Use Controller-runtime patterns and kubebuilder style for Go code

Files:

  • internal/fluxinstall/manifests.embed.go
  • cmd/cozystack-operator/main.go
  • internal/fluxinstall/install.go
🧬 Code graph analysis (1)
cmd/cozystack-operator/main.go (2)
internal/fluxinstall/install.go (1)
  • Install (41-108)
internal/fluxinstall/manifests.embed.go (1)
  • WriteEmbeddedManifests (31-50)
🪛 checkmake (0.2.2)
packages/core/flux-aio/Makefile

[warning] 15-15: Target "update" should be declared PHONY.

(phonydeclared)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (9)
packages/core/flux-aio/manifests (1)

1-1: LGTM - Manifest anchoring mechanism.

This anchor correctly references the internal Flux installation manifests directory, supporting the cross-package manifest workflow described in the AI summary.

internal/fluxinstall/manifests.embed.go (1)

27-28: LGTM - Embedded manifests declaration.

The embed directive correctly targets YAML manifests in the manifests/ subdirectory.

internal/fluxinstall/install.go (3)

41-108: LGTM - Well-structured installation orchestration.

The Install function provides robust orchestration with:

  • Proper temp directory cleanup with defer
  • Flexible manifest discovery (tries fluxcd.yaml first, falls back to any .yaml)
  • Non-fatal env injection (line 88-90) allowing graceful degradation
  • Clear namespace extraction from manifests

The continue-on-error pattern for env injection is appropriate given the comment on line 89.


200-214: Verify Force flag usage is intentional.

The Force: true flag on line 206 causes server-side apply to take ownership of fields from other field managers. This is appropriate for initial installation but could override manual changes if re-run.

Confirm this force-ownership behavior aligns with the intended use case. If the operator may re-apply manifests during normal operation (not just initial install), consider:

  1. Documenting that forced ownership is intentional for the installation workflow
  2. Or using Force: false and handling conflicts explicitly

The current implementation appears correct for a one-time installation scenario as suggested by the function name and usage context.


240-286: LGTM - Robust environment variable injection.

The injectKubernetesServiceEnv function safely traverses unstructured Kubernetes objects (Deployment, StatefulSet, DaemonSet) and injects KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT into containers and initContainers. The implementation:

  • Handles missing fields gracefully with continue statements
  • Preserves existing environment variables via setEnvVar
  • Only injects when environment variables are set in the operator
cmd/cozystack-operator/main.go (4)

58-67: LGTM - Custom flag type for multiple values.

The stringSliceFlag implementation correctly supports the -install-flux-resource flag being specified multiple times, following Go flag conventions.


69-75: LGTM - Comprehensive scheme initialization.

The scheme correctly registers core Kubernetes types, CRDs (apiextensionsv1), Flux Helm controller types (helmv2), and Cozystack custom resources. This aligns with the operator's need to interact with Flux resources and manage Cozystack platform resources.


408-435: LGTM - OCI repository generation.

The generateOCIRepository function correctly:

  • Sets the Flux source toolkit v1 API version
  • Creates resources in the cozy-system namespace
  • Handles optional digest with proper sha256: prefix normalization (lines 422-424)
  • Uses 5-minute interval per Flux best practices

438-458: LGTM - Git repository generation.

The generateGitRepository function correctly generates Flux GitRepository resources with appropriate defaults (branch ref, 5-minute interval, cozy-system namespace).

@kvaps kvaps force-pushed the cozystack-operator-core branch 2 times, most recently from 0f8afc7 to 36adc63 Compare December 22, 2025 12:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (8)
cmd/cozystack-operator/main.go (4)

240-266: Consider using server-side apply for consistency and robustness.

The current Get-Create/Update pattern works but server-side apply (client.Patch with client.Apply) would be more robust and consistent with the approach used in fluxinstall.Install.

🔎 Proposed refactor using server-side apply
-		// Use server-side apply or create/update
-		existing := &unstructured.Unstructured{}
-		existing.SetGroupVersionKind(obj.GroupVersionKind())
-		key := client.ObjectKey{
-			Name:      obj.GetName(),
-			Namespace: obj.GetNamespace(),
-		}
-
-		err := k8sClient.Get(ctx, key, existing)
-		if err != nil {
-			if client.IgnoreNotFound(err) == nil {
-				// Resource doesn't exist, create it
-				if err := k8sClient.Create(ctx, &obj); err != nil {
-					return fmt.Errorf("failed to create resource %s/%s: %w", obj.GetKind(), obj.GetName(), err)
-				}
-				logger.Info("Created Flux resource", "kind", obj.GetKind(), "name", obj.GetName())
-			} else {
-				return fmt.Errorf("failed to check if resource exists: %w", err)
-			}
-		} else {
-			// Resource exists, update it
-			obj.SetResourceVersion(existing.GetResourceVersion())
-			if err := k8sClient.Update(ctx, &obj); err != nil {
-				return fmt.Errorf("failed to update resource %s/%s: %w", obj.GetKind(), obj.GetName(), err)
-			}
-			logger.Info("Updated Flux resource", "kind", obj.GetKind(), "name", obj.GetName())
-		}
+		// Use server-side apply
+		patchOpts := []client.PatchOption{
+			client.ForceOwnership,
+			client.FieldOwner("cozystack-operator"),
+		}
+		if err := k8sClient.Patch(ctx, &obj, client.Apply, patchOpts...); err != nil {
+			return fmt.Errorf("failed to apply resource %s/%s: %w", obj.GetKind(), obj.GetName(), err)
+		}
+		logger.Info("Applied Flux resource", "kind", obj.GetKind(), "name", obj.GetName())

307-332: Consider using server-side apply for consistency and robustness.

Similar to installFluxResourcesFunc, using server-side apply here would simplify the logic and make it more robust and consistent.

🔎 Proposed refactor using server-side apply
-	existing := &unstructured.Unstructured{}
-	existing.SetGroupVersionKind(obj.GroupVersionKind())
-	key := client.ObjectKey{
-		Name:      obj.GetName(),
-		Namespace: obj.GetNamespace(),
-	}
-
-	err = k8sClient.Get(ctx, key, existing)
-	if err != nil {
-		if client.IgnoreNotFound(err) == nil {
-			// Resource doesn't exist, create it
-			if err := k8sClient.Create(ctx, obj); err != nil {
-				return fmt.Errorf("failed to create resource %s/%s: %w", obj.GetKind(), obj.GetName(), err)
-			}
-			logger.Info("Created platform source resource", "kind", obj.GetKind(), "name", obj.GetName())
-		} else {
-			return fmt.Errorf("failed to check if resource exists: %w", err)
-		}
-	} else {
-		// Resource exists, update it
-		obj.SetResourceVersion(existing.GetResourceVersion())
-		if err := k8sClient.Update(ctx, obj); err != nil {
-			return fmt.Errorf("failed to update resource %s/%s: %w", obj.GetKind(), obj.GetName(), err)
-		}
-		logger.Info("Updated platform source resource", "kind", obj.GetKind(), "name", obj.GetName())
-	}
+	patchOpts := []client.PatchOption{
+		client.ForceOwnership,
+		client.FieldOwner("cozystack-operator"),
+	}
+	if err := k8sClient.Patch(ctx, obj, client.Apply, patchOpts...); err != nil {
+		return fmt.Errorf("failed to apply resource %s/%s: %w", obj.GetKind(), obj.GetName(), err)
+	}
+	logger.Info("Applied platform source resource", "kind", obj.GetKind(), "name", obj.GetName())

176-187: Inconsistent error handling: platform source failure is fatal while Flux failures are not.

The platform source installation calls os.Exit(1) on failure (line 183), causing the operator pod to crash-loop, while Flux and Flux resource installations are non-fatal and allow the operator to start. This inconsistency could cause unexpected behavior.

Consider either:

  1. Making platform source installation non-fatal to match the other steps (remove os.Exit(1)), allowing manual recovery or GitOps-based reconciliation, or
  2. If platform source failure truly must be fatal, document why with a clear comment and return the error to main for controlled shutdown instead of calling os.Exit(1) directly.

344-405: Add unit tests and clarify parsing logic for edge cases.

The parsing has two issues:

  1. Line 354 comment states "find the last @" but line 355 uses strings.Index() (first occurrence, not last)—though this works correctly for the @sha256: case, the comment is misleading.
  2. Line 361 fallback uses strings.LastIndex("@") which could misinterpret @ characters within repository paths (e.g., oci://registry.io/org@name) as ref separators.

Add unit tests covering:

  • OCI: oci://registry.io/org@name (no ref), oci://registry.io/org@name@sha256:abc (with digest)
  • OCI: registry paths containing special characters
  • Git: refs with various characters, multiple @ symbols
  • Both: empty/invalid input handling
💡 Consider clarifying the logic

Update the comment to accurately describe the search method, and consider more robust parsing that only treats @ as a ref separator when it introduces a digest pattern (e.g., @sha256:...) or explicit ref syntax.

internal/fluxinstall/install.go (2)

184-186: Replace fixed sleep with CRD status polling for reliability.

Using time.Sleep to wait for CRDs is flaky—it can cause race conditions if 2 seconds isn't enough or unnecessary delays if CRDs establish faster. Poll the Established condition instead using k8s.io/apimachinery/pkg/util/wait.PollUntilContextTimeout.

🔎 Proposed fix using polling
// For each CRD in stageOne, poll for Established condition
for _, obj := range stageOne {
	if obj.GetKind() != "CustomResourceDefinition" {
		continue
	}
	
	err := wait.PollUntilContextTimeout(ctx, 500*time.Millisecond, 1*time.Minute, true, func(ctx context.Context) (bool, error) {
		crd := &apiextensionsv1.CustomResourceDefinition{}
		if err := k8sClient.Get(ctx, client.ObjectKey{Name: obj.GetName()}, crd); err != nil {
			logger.Info("Waiting for CRD to be available", "name", obj.GetName(), "error", err)
			return false, nil
		}
		for _, cond := range crd.Status.Conditions {
			if cond.Type == apiextensionsv1.Established && cond.Status == apiextensionsv1.ConditionTrue {
				return true, nil
			}
		}
		return false, nil
	})
	if err != nil {
		return fmt.Errorf("failed to wait for CRD %s to be established: %w", obj.GetName(), err)
	}
}

256-282: Consider logging errors instead of silently continuing.

Errors from unstructured.NestedMap and unstructured.NestedSlice are currently ignored with continue statements (lines 258, 266, 275, 281). While environment injection is optional, silently skipping errors could hide real issues.

Consider logging errors at debug or info level before continuing:

if !found || err != nil {
	if err != nil {
		logger.V(1).Info("Failed to access spec path, skipping", "kind", kind, "name", obj.GetName(), "error", err)
	}
	continue
}

This pattern should be applied to similar error handling in updateContainersEnv (lines 297, 316).

packages/core/flux-aio/Makefile (1)

15-16: Declare update targets as PHONY.

The update, update-old, and update-new targets should be declared as .PHONY since they don't produce file outputs.

Based on static analysis hints (checkmake).

internal/fluxinstall/manifests.embed.go (1)

44-44: Use more restrictive file permissions (0644 instead of 0666).

The file permission 0666 is overly permissive as it grants write access to all users. Use 0644 instead, which is the standard permission for files (read/write for owner, read-only for group and others).

🔎 Proposed fix
-		if err := os.WriteFile(outputPath, data, 0666); err != nil {
+		if err := os.WriteFile(outputPath, data, 0644); err != nil {
🧹 Nitpick comments (1)
packages/core/flux-aio/Makefile (1)

33-34: Track the TODO for DNS resolution with hostNetwork.

The TODO comment indicates a known issue with DNS resolution when using hostNetwork: true for installing HelmReleases in tenant Kubernetes clusters. This affects the new manifests path while the old path (line 25) still applies the workaround.

Do you want me to open an issue to track this technical debt and ensure the DNS solution is implemented before the old manifest path is removed?

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfde8f2 and 0f8afc7.

📒 Files selected for processing (7)
  • cmd/cozystack-operator/main.go
  • internal/fluxinstall/install.go
  • internal/fluxinstall/manifests.embed.go
  • internal/fluxinstall/manifests/fluxcd.yaml
  • packages/core/flux-aio/.helmignore
  • packages/core/flux-aio/Makefile
  • packages/core/flux-aio/manifests
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/core/flux-aio/manifests
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Use Controller-runtime patterns and kubebuilder style for Go code

Files:

  • internal/fluxinstall/install.go
  • cmd/cozystack-operator/main.go
  • internal/fluxinstall/manifests.embed.go
🧬 Code graph analysis (1)
cmd/cozystack-operator/main.go (2)
internal/fluxinstall/install.go (1)
  • Install (41-108)
internal/fluxinstall/manifests.embed.go (1)
  • WriteEmbeddedManifests (31-50)
🪛 checkmake (0.2.2)
packages/core/flux-aio/Makefile

[warning] 15-15: Target "update" should be declared PHONY.

(phonydeclared)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (7)
packages/core/flux-aio/.helmignore (1)

1-1: LGTM! Appropriate ignore entry for the manifests directory.

This correctly excludes the manifests directory from Helm chart packaging, as it's intended for operator embedding rather than chart distribution.

cmd/cozystack-operator/main.go (3)

57-67: LGTM! Clean implementation of multi-value flag support.

The stringSliceFlag type correctly implements the flag.Value interface to support multiple --install-flux-resource flags.


77-143: LGTM! Standard controller-runtime manager setup.

The manager configuration follows kubebuilder best practices with appropriate options for metrics, webhooks, health probes, and leader election.


407-458: LGTM! Clean resource generation for Flux sources.

Both generateOCIRepository and generateGitRepository correctly create unstructured Flux source resources with appropriate fields. The hardcoded cozy-system namespace is consistent with the platform source pattern.

internal/fluxinstall/install.go (3)

38-108: LGTM! Well-structured Flux installation workflow.

The Install function follows a clear workflow: extract manifests, parse them, inject environment variables, determine the namespace, and apply resources. The callback pattern for writeEmbeddedManifests provides good flexibility.


110-158: LGTM! Robust YAML parsing with proper error handling.

The multi-document YAML parsing correctly handles empty documents and objects without kinds, with appropriate error handling for decode failures.


217-235: LGTM! Simple and effective helper functions.

The extractNamespace and isClusterDefinition helpers correctly identify and extract the target namespace from the manifests.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
internal/fluxinstall/install.go (2)

184-186: Replace time.Sleep with robust CRD readiness polling.

Using time.Sleep to wait for CRDs is flaky—2 seconds might be insufficient under load or cause unnecessary delays when CRDs register quickly. Poll the Established condition instead.

🔎 Proposed fix using wait.PollUntilContextTimeout
+import (
+	apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
+	"k8s.io/apimachinery/pkg/util/wait"
+)
 	// Apply stage one (CRDs and Namespaces) first
 	if len(stageOne) > 0 {
 		logger.Info("Applying cluster definitions", "count", len(stageOne))
 		if err := applyObjects(ctx, k8sClient, stageOne); err != nil {
 			return fmt.Errorf("failed to apply cluster definitions: %w", err)
 		}
 
-		// Wait a bit for CRDs to be registered
-		time.Sleep(2 * time.Second)
+		// Wait for CRDs to be established
+		for _, obj := range stageOne {
+			if obj.GetKind() != "CustomResourceDefinition" {
+				continue
+			}
+			if err := wait.PollUntilContextTimeout(ctx, 500*time.Millisecond, 1*time.Minute, true, func(ctx context.Context) (bool, error) {
+				crd := &apiextensionsv1.CustomResourceDefinition{}
+				if err := k8sClient.Get(ctx, client.ObjectKey{Name: obj.GetName()}, crd); err != nil {
+					logger.V(1).Info("waiting for CRD to be available", "name", obj.GetName(), "error", err)
+					return false, nil
+				}
+				for _, cond := range crd.Status.Conditions {
+					if cond.Type == apiextensionsv1.Established && cond.Status == apiextensionsv1.ConditionTrue {
+						return true, nil
+					}
+				}
+				return false, nil
+			}); err != nil {
+				return fmt.Errorf("failed to wait for CRD %s to be established: %w", obj.GetName(), err)
+			}
+		}
 	}

Based on past review comments.


256-282: Propagate or log errors from unstructured operations.

Errors from unstructured.NestedMap, unstructured.NestedSlice, and unstructured.SetNestedSlice are silently ignored (lines 257, 265, 274, 281). This makes debugging difficult and may hide legitimate issues. At minimum, log these errors; ideally, collect and return them.

🔎 Proposed fix to log errors
+	logger := log.Log.WithName("fluxinstall")
+
 	for _, obj := range objects {
 		kind := obj.GetKind()
 		if kind != "Deployment" && kind != "StatefulSet" && kind != "DaemonSet" {
 			continue
 		}
 
 		// Navigate to spec.template.spec.containers
 		spec, found, err := unstructured.NestedMap(obj.Object, "spec", "template", "spec")
-		if !found || err != nil {
+		if err != nil {
+			logger.Error(err, "failed to get spec.template.spec", "kind", kind, "name", obj.GetName())
+			continue
+		}
+		if !found {
 			continue
 		}
 
 		// Update containers
 		containers, found, err := unstructured.NestedSlice(spec, "containers")
-		if found && err == nil {
+		if err != nil {
+			logger.Error(err, "failed to get containers", "kind", kind, "name", obj.GetName())
+			continue
+		}
+		if found {
 			containers = updateContainersEnv(containers, kubernetesHost, kubernetesPort)
 			if err := unstructured.SetNestedSlice(spec, containers, "containers"); err != nil {
+				logger.Error(err, "failed to set containers", "kind", kind, "name", obj.GetName())
 				continue
 			}
 		}
 
 		// Update initContainers
 		initContainers, found, err := unstructured.NestedSlice(spec, "initContainers")
-		if found && err == nil {
+		if err != nil {
+			logger.Error(err, "failed to get initContainers", "kind", kind, "name", obj.GetName())
+			continue
+		}
+		if found {
 			initContainers = updateContainersEnv(initContainers, kubernetesHost, kubernetesPort)
 			if err := unstructured.SetNestedSlice(spec, initContainers, "initContainers"); err != nil {
+				logger.Error(err, "failed to set initContainers", "kind", kind, "name", obj.GetName())
 				continue
 			}
 		}
 
 		// Update spec in the object
 		if err := unstructured.SetNestedMap(obj.Object, spec, "spec", "template", "spec"); err != nil {
+			logger.Error(err, "failed to set spec.template.spec", "kind", kind, "name", obj.GetName())
 			continue
 		}
 	}

Based on past review comments.

packages/core/flux-aio/Makefile (1)

15-15: Declare update targets as PHONY.

The update, update-old, and update-new targets should be declared as .PHONY since they don't produce file outputs.

🔎 Proposed fix

Add near the top of the Makefile (after line 4 or before line 6):

+.PHONY: show apply diff update update-old update-new
+
 show:

Based on static analysis hints (checkmake) and past review comments.

🧹 Nitpick comments (1)
internal/fluxinstall/install.go (1)

163-163: Remove unused decoder variable.

The decoder is created but never used in applyManifests or applyObjects. Consider removing it to clean up the code.

🔎 Proposed fix
 func applyManifests(ctx context.Context, k8sClient client.Client, objects []*unstructured.Unstructured) error {
 	logger := log.FromContext(ctx)
-	decoder := yaml.NewDecodingSerializer(unstructured.UnstructuredJSONScheme)
 
 	// Separate CRDs and namespaces from other resources

And update the function signature of applyObjects:

-func applyObjects(ctx context.Context, k8sClient client.Client, decoder runtime.Decoder, objects []*unstructured.Unstructured) error {
+func applyObjects(ctx context.Context, k8sClient client.Client, objects []*unstructured.Unstructured) error {

And update the two call sites (lines 180, 191):

-		if err := applyObjects(ctx, k8sClient, decoder, stageOne); err != nil {
+		if err := applyObjects(ctx, k8sClient, stageOne); err != nil {
-		if err := applyObjects(ctx, k8sClient, decoder, stageTwo); err != nil {
+		if err := applyObjects(ctx, k8sClient, stageTwo); err != nil {
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36adc63 and d99669e.

📒 Files selected for processing (5)
  • cmd/cozystack-operator/main.go
  • internal/fluxinstall/install.go
  • internal/fluxinstall/manifests.embed.go
  • internal/fluxinstall/manifests/fluxcd.yaml
  • packages/core/flux-aio/Makefile
🚧 Files skipped from review as they are similar to previous changes (2)
  • cmd/cozystack-operator/main.go
  • internal/fluxinstall/manifests.embed.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Use Controller-runtime patterns and kubebuilder style for Go code

Files:

  • internal/fluxinstall/install.go
🪛 checkmake (0.2.2)
packages/core/flux-aio/Makefile

[warning] 15-15: Target "update" should be declared PHONY.

(phonydeclared)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (4)
internal/fluxinstall/install.go (3)

38-108: LGTM: Well-structured installation flow.

The function properly manages temporary directories, handles embedded manifest extraction, and applies resources with appropriate error handling. The decision to log and continue on environment variable injection failure (line 89) is reasonable since these variables are optional.


110-158: LGTM: Robust YAML parsing implementation.

The multi-document YAML parsing logic correctly handles empty documents, EOF conditions, and filters out objects without a kind. The error handling is appropriate for this use case.


217-235: LGTM: Clear helper functions.

Both extractNamespace and isClusterDefinition are simple, focused helper functions with appropriate error handling.

packages/core/flux-aio/Makefile (1)

28-32: Note: Kubernetes environment injection now handled dynamically.

The update-new target doesn't include the sed command for injecting {{- include "cozy.kubernetes_envs" . | nindent 12 }} (present in update-old at line 24). This is intentional—environment variable injection is now performed dynamically by injectKubernetesServiceEnv in internal/fluxinstall/install.go.

@kvaps kvaps force-pushed the cozystack-operator-core branch from d99669e to c3db248 Compare December 25, 2025 09:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
packages/core/flux-aio/Makefile (2)

15-15: Declare update targets as PHONY.

The update, update-old, and update-new targets don't produce file outputs and should be declared as .PHONY to avoid conflicts with files of the same name.

🔎 Proposed fix

Add near the top of the Makefile:

+.PHONY: update update-old update-new
+
 update: update-old update-new

Based on static analysis hints (checkmake).


27-34: DNS resolution will fail for Flux pods with hostNetwork: true.

The update-new target omits the dnsPolicy: ClusterFirstWithHostNet patch (line 34 is commented out), but the generated fluxcd.yaml contains pods with hostNetwork: true. Without the correct DNS policy, these pods will use the host's DNS configuration and fail to resolve cluster DNS names, breaking Flux functionality.

This must be fixed before the new installation path can be used. Either:

  1. Uncomment line 34 to apply the patch during manifest generation, or
  2. Implement the DNS policy correction in internal/fluxinstall/install.go before applying manifests
🔎 Option 1: Uncomment the dnsPolicy patch
 	sed -i ../../../internal/fluxinstall/manifests/fluxcd.yaml \
 		-e '/timoni/d' \
 		-e 's|\.cluster\.local\.,||g' -e 's|\.cluster\.local\,||g' -e 's|\.cluster\.local\.||g'
-	# TODO: solve dns issue with hostNetwork for installing helmreleases in tenant k8s clusters
-	#-e '/hostNetwork: true/i \      dnsPolicy: ClusterFirstWithHostNet' 
+		-e '/hostNetwork: true/i \      dnsPolicy: ClusterFirstWithHostNet'
🔎 Option 2: Add DNS policy correction in install.go

In internal/fluxinstall/install.go, add a function to patch DNS policy before applying manifests:

// patchDNSPolicy adds dnsPolicy: ClusterFirstWithHostNet to pods with hostNetwork: true
func patchDNSPolicy(objects []*unstructured.Unstructured) error {
	for _, obj := range objects {
		kind := obj.GetKind()
		if kind != "Deployment" && kind != "StatefulSet" && kind != "DaemonSet" {
			continue
		}
		
		spec, found, err := unstructured.NestedMap(obj.Object, "spec", "template", "spec")
		if !found || err != nil {
			continue
		}
		
		hostNetwork, found, _ := unstructured.NestedBool(spec, "hostNetwork")
		if found && hostNetwork {
			if err := unstructured.SetNestedField(spec, "ClusterFirstWithHostNet", "dnsPolicy"); err != nil {
				return fmt.Errorf("failed to set dnsPolicy for %s/%s: %w", kind, obj.GetName(), err)
			}
			if err := unstructured.SetNestedMap(obj.Object, spec, "spec", "template", "spec"); err != nil {
				return fmt.Errorf("failed to update spec for %s/%s: %w", kind, obj.GetName(), err)
			}
		}
	}
	return nil
}

Then call it in the Install function after parsing manifests:

 	if err := injectKubernetesServiceEnv(objects); err != nil {
 		logger.Info("Failed to inject KUBERNETES_SERVICE_* env vars, continuing anyway", "error", err)
 	}
+
+	// Patch DNS policy for pods with hostNetwork
+	if err := patchDNSPolicy(objects); err != nil {
+		return fmt.Errorf("failed to patch DNS policy: %w", err)
+	}
internal/fluxinstall/manifests.embed.go (1)

44-44: Use more restrictive file permissions.

The file permission 0666 grants write access to all users, which is overly permissive. Use 0644 instead (read/write for owner, read-only for group and others).

🔎 Proposed fix
-		if err := os.WriteFile(outputPath, data, 0666); err != nil {
+		if err := os.WriteFile(outputPath, data, 0644); err != nil {
internal/fluxinstall/install.go (1)

182-182: Replace fixed sleep with CRD status polling.

Using time.Sleep(2 * time.Second) to wait for CRDs can be unreliable—too short causes race conditions, too long wastes time. Poll the Established condition on each CRD instead.

🔎 Proposed fix using wait.PollUntilContextTimeout
+import (
+	apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
+	"k8s.io/apimachinery/pkg/util/wait"
+)

 	// Apply stage one (CRDs and Namespaces) first
 	if len(stageOne) > 0 {
 		logger.Info("Applying cluster definitions", "count", len(stageOne))
 		if err := applyObjects(ctx, k8sClient, stageOne); err != nil {
 			return fmt.Errorf("failed to apply cluster definitions: %w", err)
 		}
 
-		// Wait a bit for CRDs to be registered
-		time.Sleep(2 * time.Second)
+		// Wait for CRDs to be established
+		for _, obj := range stageOne {
+			if obj.GetKind() == "CustomResourceDefinition" {
+				if err := wait.PollUntilContextTimeout(ctx, 500*time.Millisecond, 1*time.Minute, true, func(ctx context.Context) (bool, error) {
+					crd := &apiextensionsv1.CustomResourceDefinition{}
+					if err := k8sClient.Get(ctx, client.ObjectKey{Name: obj.GetName()}, crd); err != nil {
+						logger.V(1).Info("Waiting for CRD to be available", "name", obj.GetName(), "error", err)
+						return false, nil
+					}
+					for _, cond := range crd.Status.Conditions {
+						if cond.Type == apiextensionsv1.Established && cond.Status == apiextensionsv1.ConditionTrue {
+							return true, nil
+						}
+					}
+					return false, nil
+				}); err != nil {
+					return fmt.Errorf("failed to wait for CRD %s to be established: %w", obj.GetName(), err)
+				}
+			}
+		}
 	}
🧹 Nitpick comments (1)
internal/fluxinstall/install.go (1)

238-312: Improve error handling consistency in injectKubernetesServiceEnv.

The function collects only the first error via firstErr but silently ignores subsequent errors. While continuing on errors is reasonable (env injection is not critical), the current pattern may hide multiple failures.

Consider either:

  1. Collecting all errors and returning a multi-error, or
  2. Logging each error as it occurs (using the logger) instead of just tracking firstErr
🔎 Option: Log errors instead of just tracking firstErr
 func injectKubernetesServiceEnv(objects []*unstructured.Unstructured) error {
+	logger := log.Log.WithName("injectKubernetesServiceEnv")
 	kubernetesHost := os.Getenv("KUBERNETES_SERVICE_HOST")
 	kubernetesPort := os.Getenv("KUBERNETES_SERVICE_PORT")
 
 	// If neither variable is set, nothing to do
 	if kubernetesHost == "" && kubernetesPort == "" {
 		return nil
 	}
 
 	var firstErr error
 	for _, obj := range objects {
 		kind := obj.GetKind()
 		if kind != "Deployment" && kind != "StatefulSet" && kind != "DaemonSet" {
 			continue
 		}
 
 		// Navigate to spec.template.spec.containers
 		spec, found, err := unstructured.NestedMap(obj.Object, "spec", "template", "spec")
 		if !found {
 			continue
 		}
 		if err != nil {
+			logger.Error(err, "failed to get spec", "kind", kind, "name", obj.GetName())
 			if firstErr == nil {
 				firstErr = fmt.Errorf("failed to get spec for %s/%s: %w", kind, obj.GetName(), err)
 			}
 			continue
 		}
 
 		// Update containers
 		containers, found, err := unstructured.NestedSlice(spec, "containers")
 		if err != nil {
+			logger.Error(err, "failed to get containers", "kind", kind, "name", obj.GetName())
 			if firstErr == nil {
 				firstErr = fmt.Errorf("failed to get containers for %s/%s: %w", kind, obj.GetName(), err)
 			}
 			continue
 		}
 		// ... similar pattern for other error paths
 	}
 
 	return firstErr
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d99669e and c3db248.

📒 Files selected for processing (5)
  • cmd/cozystack-operator/main.go
  • internal/fluxinstall/install.go
  • internal/fluxinstall/manifests.embed.go
  • internal/fluxinstall/manifests/fluxcd.yaml
  • packages/core/flux-aio/Makefile
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/cozystack-operator/main.go
🧰 Additional context used
📓 Path-based instructions (3)
internal/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Files:

  • internal/fluxinstall/manifests.embed.go
  • internal/fluxinstall/install.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • internal/fluxinstall/manifests.embed.go
  • internal/fluxinstall/install.go
packages/core/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/core/flux-aio/Makefile
🪛 checkmake (0.2.2)
packages/core/flux-aio/Makefile

[warning] 15-15: Target "update" should be declared PHONY.

(phonydeclared)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (3)
internal/fluxinstall/install.go (3)

39-106: Installation workflow looks solid.

The Install function properly orchestrates the Flux installation: creates temp workspace, extracts manifests, parses YAML, injects environment variables, and applies resources in two stages. Error handling is comprehensive with context-rich messages.


108-156: YAML parsing logic is correct.

The multi-document YAML parsing properly handles empty documents, EOF conditions, and objects without kinds. Good defensive coding.


196-211: Server-side apply configuration is appropriate.

Using server-side apply with force ownership and a dedicated field manager ("cozystack-operator") is the right approach for operator-managed resources.

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@kvaps kvaps force-pushed the cozystack-operator-core branch from c3db248 to 2990f05 Compare December 25, 2025 09:59
@lllamnyp lllamnyp self-assigned this Dec 29, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this vendored, but patched? Maybe this can be placed somewhere in packages/core and get embedded at build time? Also you could just add the manifest to the container image, since you're already writing it to a path inside the container at runtime anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be placed in packages/core because of go embed limitation.
The packages/core will be updated later to add symlink on manifests directory, after switching to cozystack-operator and removing templates in flux-aio package.

Copy link
Member

@lllamnyp lllamnyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand the purpose of all of this. All this code does is run one or two one-off jobs (install flux, download cozy platform resource) and then waits without actually reconciling anything.

@kvaps
Copy link
Member Author

kvaps commented Dec 29, 2025

@lllamnyp reconciliation logic in separate PR #1755

@kvaps kvaps requested a review from lllamnyp December 29, 2025 12:58
@kvaps kvaps force-pushed the cozystack-operator-core branch from 73880e7 to d1dcc1f Compare December 29, 2025 12:58
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/core/flux-aio/Makefile (1)

15-34: LGTM for the dual-path manifest generation approach.

The split into update-old (Helm-based path with template includes) and update-new (operator-based path with embedded manifests) is a clean migration strategy. The TODO comments correctly document the temporary state and the known DNS issue.

The PHONY declaration and DNS/hostNetwork issues were already flagged in previous reviews.

🧹 Nitpick comments (2)
cmd/cozystack-operator/main.go (2)

221-242: Consider using server-side apply for atomic create/update.

The current Get-Create/Update pattern works but server-side apply provides a single atomic operation that handles both cases, is consistent with fluxinstall.Install, and avoids potential race conditions.

🔎 Proposed refactor using server-side apply
-	existing := obj.DeepCopyObject().(client.Object)
-	key := client.ObjectKeyFromObject(obj)
-
-	err = k8sClient.Get(ctx, key, existing)
-	if err != nil {
-		if client.IgnoreNotFound(err) == nil {
-			// Resource doesn't exist, create it
-			if err := k8sClient.Create(ctx, obj); err != nil {
-				return fmt.Errorf("failed to create resource %s/%s: %w", obj.GetObjectKind().GroupVersionKind().Kind, obj.GetName(), err)
-			}
-			logger.Info("Created platform source resource", "kind", obj.GetObjectKind().GroupVersionKind().Kind, "name", obj.GetName())
-		} else {
-			return fmt.Errorf("failed to check if resource exists: %w", err)
-		}
-	} else {
-		// Resource exists, update it
-		obj.SetResourceVersion(existing.GetResourceVersion())
-		if err := k8sClient.Update(ctx, obj); err != nil {
-			return fmt.Errorf("failed to update resource %s/%s: %w", obj.GetObjectKind().GroupVersionKind().Kind, obj.GetName(), err)
-		}
-		logger.Info("Updated platform source resource", "kind", obj.GetObjectKind().GroupVersionKind().Kind, "name", obj.GetName())
-	}
+	patchOpts := []client.PatchOption{
+		client.ForceOwnership,
+		client.FieldOwner("cozystack-operator"),
+	}
+	if err := k8sClient.Patch(ctx, obj, client.Apply, patchOpts...); err != nil {
+		return fmt.Errorf("failed to apply resource %s/%s: %w", obj.GetObjectKind().GroupVersionKind().Kind, obj.GetName(), err)
+	}
+	logger.Info("Applied platform source resource", "kind", obj.GetObjectKind().GroupVersionKind().Kind, "name", obj.GetName())

375-378: Consider extracting the hardcoded namespace to a constant.

The namespace "cozy-system" is hardcoded in both generateOCIRepository and generateGitRepository. While likely correct for this project, extracting it to a package-level constant or making it configurable improves maintainability.

🔎 Proposed constant extraction
const platformSourceNamespace = "cozy-system"

Then use platformSourceNamespace in both generator functions.

Also applies to: 409-412

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3db248 and d1dcc1f.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • cmd/cozystack-operator/main.go
  • go.mod
  • internal/fluxinstall/install.go
  • internal/fluxinstall/manifests.embed.go
  • internal/fluxinstall/manifests/fluxcd.yaml
  • packages/core/flux-aio/Makefile
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/fluxinstall/manifests.embed.go
  • internal/fluxinstall/install.go
🧰 Additional context used
📓 Path-based instructions (4)
cmd/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory

Files:

  • cmd/cozystack-operator/main.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • cmd/cozystack-operator/main.go
packages/core/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/core/flux-aio/Makefile
go.mod

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT modify go.mod manually; use go get command instead

Files:

  • go.mod
🧠 Learnings (1)
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to pkg/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in pkg directory

Applied to files:

  • go.mod
🧬 Code graph analysis (1)
cmd/cozystack-operator/main.go (2)
internal/fluxinstall/install.go (1)
  • Install (39-106)
internal/fluxinstall/manifests.embed.go (1)
  • WriteEmbeddedManifests (31-50)
🪛 checkmake (0.2.2)
packages/core/flux-aio/Makefile

[warning] 15-15: Target "update" should be declared PHONY.

(phonydeclared)

🔇 Additional comments (5)
cmd/cozystack-operator/main.go (4)

51-63: Scheme initialization follows kubebuilder conventions.

The scheme registration for Cozystack, Helm, Source, and APIExtensions types is correctly implemented using utilruntime.Must() for panic-on-failure semantics during init.


133-159: Pre-start installation logic is well-structured.

The installation blocks for Flux and platform source resource have consistent error handling (both fatal on failure), appropriate timeouts, and proper context cancellation via defer. This addresses the inconsistency noted in previous reviews.


247-265: Clean URL scheme detection.

The function correctly delegates reference parsing to the separate --platform-source-ref flag, avoiding the complexities of parsing refs embedded in URLs (which was a concern in previous reviews). This is a cleaner approach.


267-362: Reference parsing and validation are thorough.

Good handling of edge cases: splitting on first = to allow equals in values (e.g., digest=sha256:...), clear error messages listing valid keys, and proper hex validation for commit SHAs.

go.mod (1)

9-9: Dependency updates align with Flux integration and follow proper go.mod management practices.

Changes to source-controller/api v1.7.4, k8s.io/apimachinery v0.34.2, and controller-runtime v0.22.4 are consistent with the Kubernetes 0.34.x ecosystem and Flux integration. File is auto-generated (compliant with go get workflow), confirming changes were made through proper dependency management rather than manual edits.

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@kvaps kvaps force-pushed the cozystack-operator-core branch from d1dcc1f to db8a006 Compare December 29, 2025 13:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
cmd/cozystack-operator/main.go (1)

72-72: Unused flag: cozystackVersion is defined but never used.

The cozystackVersion flag is declared and parsed but not referenced anywhere in the code (same issue as previously flagged). Consider either using it (e.g., in startup logs, metrics, or version endpoints) or removing it to avoid confusion.

🔎 Suggested usage or removal

If you intend to use it for observability:

 	setupLog.Info("Starting controller manager")
+	setupLog.Info("Cozystack operator version", "version", cozystackVersion)
 	mgr, err := ctrl.NewManager(config, ctrl.Options{

Or remove the unused flag:

-	var cozystackVersion string
 	var platformSourceURL string
-	flag.StringVar(&cozystackVersion, "cozystack-version", "unknown",
-		"Version of Cozystack")
 	flag.StringVar(&platformSourceURL, "platform-source-url", "", "Platform source URL (oci:// or https://). If specified, generates OCIRepository or GitRepository resource.")

Also applies to: 87-88

🧹 Nitpick comments (1)
cmd/cozystack-operator/main.go (1)

229-250: Consider using server-side apply for consistency and robustness.

The current Get-Create/Update pattern works but is less idiomatic than server-side apply (SSA) in controller-runtime. The fluxinstall.Install function already uses SSA (see internal/fluxinstall/install.go), so adopting it here would improve consistency.

Benefits of SSA:

  • More atomic (single operation vs. Get + Create/Update)
  • Simpler code (no need to handle Get errors and ResourceVersion)
  • Better handles concurrent modifications
  • Consistent with Flux installation flow
🔎 Refactor to use server-side apply
-	existing := obj.DeepCopyObject().(client.Object)
-	key := client.ObjectKeyFromObject(obj)
-
-	err = k8sClient.Get(ctx, key, existing)
-	if err != nil {
-		if client.IgnoreNotFound(err) == nil {
-			// Resource doesn't exist, create it
-			if err := k8sClient.Create(ctx, obj); err != nil {
-				return fmt.Errorf("failed to create resource %s/%s: %w", obj.GetObjectKind().GroupVersionKind().Kind, obj.GetName(), err)
-			}
-			logger.Info("Created platform source resource", "kind", obj.GetObjectKind().GroupVersionKind().Kind, "name", obj.GetName())
-		} else {
-			return fmt.Errorf("failed to check if resource exists: %w", err)
-		}
-	} else {
-		// Resource exists, update it
-		obj.SetResourceVersion(existing.GetResourceVersion())
-		if err := k8sClient.Update(ctx, obj); err != nil {
-			return fmt.Errorf("failed to update resource %s/%s: %w", obj.GetObjectKind().GroupVersionKind().Kind, obj.GetName(), err)
-		}
-		logger.Info("Updated platform source resource", "kind", obj.GetObjectKind().GroupVersionKind().Kind, "name", obj.GetName())
-	}
+	patchOpts := []client.PatchOption{
+		client.ForceOwnership,
+		client.FieldOwner("cozystack-operator"),
+	}
+	if err := k8sClient.Patch(ctx, obj, client.Apply, patchOpts...); err != nil {
+		return fmt.Errorf("failed to apply resource %s/%s: %w", obj.GetObjectKind().GroupVersionKind().Kind, obj.GetName(), err)
+	}
+	logger.Info("Applied platform source resource", "kind", obj.GetObjectKind().GroupVersionKind().Kind, "name", obj.GetName())
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1dcc1f and db8a006.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (2)
  • cmd/cozystack-operator/main.go
  • go.mod
🧰 Additional context used
📓 Path-based instructions (3)
cmd/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory

Files:

  • cmd/cozystack-operator/main.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • cmd/cozystack-operator/main.go
go.mod

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT modify go.mod manually; use go get command instead

Files:

  • go.mod
🧠 Learnings (1)
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to pkg/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in pkg directory

Applied to files:

  • go.mod
🧬 Code graph analysis (1)
cmd/cozystack-operator/main.go (5)
internal/sse/server.go (1)
  • Options (17-44)
pkg/apiserver/apiserver.go (1)
  • Scheme (53-53)
internal/controller/dashboard/manager.go (1)
  • NewManager (51-54)
internal/fluxinstall/install.go (1)
  • Install (39-106)
internal/fluxinstall/manifests.embed.go (1)
  • WriteEmbeddedManifests (31-50)
🔇 Additional comments (6)
go.mod (1)

9-9: Dependency updates align with new Flux integration.

The added and updated dependencies (Flux source-controller API, k8s.io/apimachinery, controller-runtime, and indirect Flux packages) correctly support the new operator entrypoint and Flux installation logic introduced in cmd/cozystack-operator/main.go.

Ensure these changes were made using proper Go commands (e.g., go get github.com/fluxcd/source-controller/api@v1.7.4 followed by go mod tidy) rather than manual edits, as noted in the coding guidelines.

Also applies to: 22-22, 29-29, 48-48, 50-50

cmd/cozystack-operator/main.go (5)

140-152: LGTM: Flux installation logic is correct.

The Flux installation uses a reasonable 5-minute timeout and properly handles errors by exiting on failure. The use of a direct client (non-cached) for pre-start operations is appropriate since the manager's cache is not ready yet.


154-167: LGTM: Platform source installation logic is correct.

The platform source installation uses a reasonable 2-minute timeout and properly handles errors. The error handling is now consistent with the Flux installation (both exit on failure), which is appropriate for pre-start setup operations.


255-312: LGTM: URL and reference parsing logic is clean and robust.

The parsing functions are well-designed:

  • parsePlatformSourceURL: Simple prefix checking avoids complex URL parsing edge cases
  • parseRefSpec: Correctly splits on the first = to allow = characters in values (e.g., digest=sha256:abc)
  • The explicit key=value format via --platform-source-ref is clearer and less ambiguous than embedding refs in URLs

This approach avoids the ambiguity issues that can arise with @ symbol parsing mentioned in earlier discussions.


314-370: LGTM: Validation logic is thorough and secure.

The validation functions are well-implemented:

  • Whitelist approach for valid keys prevents injection of unexpected fields
  • Digest format validation ensures proper sha256: prefix
  • Commit SHA validation checks for hexadecimal format and minimum length (7 chars, matching Git short SHA convention)
  • Clear, specific error messages aid debugging

372-439: LGTM: Resource generation functions are correct.

The generation functions properly construct Flux source resources:

  • Correct TypeMeta with APIVersion and Kind
  • Appropriate ObjectMeta (namespace: cozy-system, configurable name)
  • Reasonable default interval (5 minutes)
  • Reference fields only populated when ref options are provided
  • Validation called before generation ensures type safety

The hardcoded cozy-system namespace is appropriate for platform-level sources.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 30, 2025
@kvaps kvaps merged commit 2d4e568 into main Dec 30, 2025
23 of 24 checks passed
@kvaps kvaps deleted the cozystack-operator-core branch December 30, 2025 10:48
kvaps added a commit that referenced this pull request Dec 30, 2025
depends on #1741

## What this PR does


### Release note

<!--  Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->

```release-note
[cozystack-operator] Add Package and PackageSource reconcilers
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Launches CozyStack operator with CLI-configurable runtime
(metrics/health/leader election/HTTP2), optional pre-reconcile steps,
optional automatic Flux install, and platform-source URL support
(OCI/Git).
* Adds PackageSource and Package controllers for artifact generation,
variant resolution, dependency tracking, HelmRelease orchestration,
namespace reconciliation, orphan cleanup, and status propagation.
  * Package status now exposes per-dependency readiness.

* **Chores**
* Embeds Flux manifests and adds a local Flux install workflow; updates
manifest generation path and Makefile targets.
* **Other**
  * PackageSource CRD shortName simplified to "pks".

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kvaps added a commit that referenced this pull request Jan 8, 2026
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

<!-- Thank you for making a contribution! Here are some tips for you:
- Start the PR title with the [label] of Cozystack component:
- For system components: [platform], [system], [linstor], [cilium],
[kube-ovn], [dashboard], [cluster-api], etc.
- For managed apps: [apps], [tenant], [kubernetes], [postgres],
[virtual-machine] etc.
- For development and maintenance: [tests], [ci], [docs], [maintenance].
- If it's a work in progress, consider creating this PR as a draft.
- Don't hesistate to ask for opinion and review in the community chats,
even if it's still a draft.
- Add the label `backport` if it's a bugfix that needs to be backported
to a previous version.
-->

<!--  Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->

```release-note
[cozystack-operator] Introduce Cozystack-operator core logic
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

* **New Features**
* Operator entrypoint with flag-driven configuration (metrics, probes,
leader election, Flux pre-install, version, platform-source options).
* Optional pre-start Flux installation and automatic provisioning of
platform sources from OCI or Git/HTTP(S)/SSH.

* **Chores**
* Embedded Flux manifests packaged for simplified installation; build
tooling updated to produce new manifest outputs.

* **Behavior**
* Pre-start Flux and platform-source installation failures are treated
as fatal.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kvaps added a commit that referenced this pull request Jan 8, 2026
depends on #1741

## What this PR does


### Release note

<!--  Write a release note:
- Explain what has changed internally and for users.
- Start with the same [label] as in the PR title
- Follow the guidelines at
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md.
-->

```release-note
[cozystack-operator] Add Package and PackageSource reconcilers
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Launches CozyStack operator with CLI-configurable runtime
(metrics/health/leader election/HTTP2), optional pre-reconcile steps,
optional automatic Flux install, and platform-source URL support
(OCI/Git).
* Adds PackageSource and Package controllers for artifact generation,
variant resolution, dependency tracking, HelmRelease orchestration,
namespace reconciliation, orphan cleanup, and status propagation.
  * Package status now exposes per-dependency readiness.

* **Chores**
* Embeds Flux manifests and adds a local Flux install workflow; updates
manifest generation path and Makefile targets.
* **Other**
  * PackageSource CRD shortName simplified to "pks".

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kvaps added a commit that referenced this pull request Jan 16, 2026
…d backup system (#1867)

## What this PR does

Update changelog for v1.0.0-alpha.1 to include missing features:
- **Cozystack Operator**: New operator for Package and PackageSource
management (#1740, #1741, #1755, #1756, #1760, #1761)
- **Backup System**: Comprehensive backup functionality with Velero
integration (#1640, #1685, #1687, #1708, #1719, #1720, #1737, #1762)
- Add @androndo to contributors
- Update Full Changelog link to v0.38.0...v1.0.0-alpha.1

### Release note

```release-note
[docs] Update changelog for v1.0.0-alpha.1: add cozystack-operator and backup system
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants