Overview

Relevant source files

Purpose and Scope

This document provides a high-level introduction to the k8s-gitops homelab repository, which implements a fully automated, declarative Kubernetes cluster using GitOps principles. This overview covers the repository structure, GitOps workflow, core technologies, and architectural design. For detailed information about specific subsystems, refer to:

Core infrastructure components: Core Infrastructure
Networking configuration: Networking Infrastructure
Application deployments: Application Deployments
CI/CD automation: GitOps and Automation
Operational procedures: Operations and Maintenance

Sources: README.md1-46

System Architecture

The k8s-gitops repository manages a three-node Kubernetes cluster running on bare-metal hardware with Talos Linux. The architecture is organized into distinct layers that build upon each other, from physical infrastructure through platform services to applications.

Architecture Layers

The architecture consists of five layers:

Infrastructure Layer: Physical hardware running Talos Linux, including three MS-A2 workstations with AMD Ryzen 9 9955HX processors and 96GB RAM each, connected via 10G/25G networking, plus a dedicated TrueNAS server for NFS storage.
Platform Layer: Essential Kubernetes services including Cilium for networking, Rook-Ceph for storage, Flux for GitOps, CoreDNS for DNS, cert-manager for certificates, and external-secrets for secrets management via 1Password Connect.
Observability Layer: Monitoring and logging infrastructure with Prometheus for metrics, VictoriaLogs for logs, Grafana for dashboards, and Fluent Bit for log collection.
Application Layer: User-facing applications including the media management stack (Sonarr, Radarr, Plex), Home Assistant, and Cloudflare tunnels.
Automation Layer: CI/CD and maintenance automation with Renovate for dependency updates, self-hosted GitHub Actions runners, and tuppr for automated OS/Kubernetes upgrades.

Sources: README.md50-87 bootstrap/helmfile.d/01-apps.yaml1-85

Repository Structure

The repository is organized into two primary directories: bootstrap/ for initial cluster setup and kubernetes/ for ongoing GitOps management.

Directory Layout

The bootstrap/ directory contains Helmfile configurations that deploy the foundational platform services during initial cluster setup. The kubernetes/apps/ directory is organized by namespace, with each namespace containing one or more applications managed by Flux.

Sources: README.md78-87

GitOps Workflow

The repository implements a GitOps workflow where Git is the single source of truth for cluster state. Changes to the repository automatically propagate to the cluster through Flux CD.

Control Flow

The workflow operates as follows:

source-controller monitors the GitHub repository (https://github.com/buroa/k8s-gitops) at 1-hour intervals and OCI repositories at 15-minute intervals. It creates GitRepository and OCIRepository resources.
kustomize-controller watches GitRepository resources and reconciles Kustomization manifests, applying them to the cluster. It uses in-memory builds for performance.
helm-controller watches OCIRepository resources and reconciles HelmRelease manifests, installing or upgrading Helm charts. It includes OOM detection at 95% memory threshold.

All three controllers are configured with --concurrent=20 for parallel processing and 2Gi memory limits for stability.

Sources: kubernetes/apps/flux-system/flux-instance/app/helmrelease.yaml1-128 README.md70-77

Bootstrap Process

Initial cluster deployment follows a specific sequence managed by Helmfile. The bootstrap process deploys core platform services in dependency order.

Bootstrap Sequence

The bootstrap sequence defined in bootstrap/helmfile.d/01-apps.yaml1-85 establishes dependencies using the needs field:

cilium deploys first as the CNI provider
coredns requires cilium and provides DNS at 10.245.0.10
spegel provides local OCI registry caching
cert-manager enables TLS certificate automation
external-secrets integrates with external secret stores
onepassword provides 1Password Connect access
flux-operator manages Flux lifecycle
flux-instance deploys the Flux controllers that then manage the rest of the cluster

After bootstrap completes, Flux takes over and manages all subsequent deployments from the kubernetes/ directory.

Sources: bootstrap/helmfile.d/01-apps.yaml1-85

Dependency Management

Renovate continuously scans the repository for dependency updates across Helm charts, Docker images, and GitHub Actions. It creates pull requests automatically when newer versions are detected.

Automated Update Flow

Renovate runs hourly and scans for:

Helm chart versions in HelmRelease and OCIRepository resources
Docker image tags in deployment manifests
GitHub Actions versions in workflow files

When a pull request is created, GitHub Actions workflows automatically run validation and pre-pull container images to cluster nodes before merge. After merge, Flux detects the change and reconciles the cluster state.

Sources: README.md70-77

Cluster Upgrade Automation

The cluster uses tuppr to automate Kubernetes and Talos OS upgrades. The tuppr controller monitors custom resources and orchestrates rolling upgrades across the cluster.

Upgrade Management

The upgrade process is defined in kubernetes/apps/system-upgrade/tuppr/upgrades/kubernetes.yaml1-10:

A KubernetesUpgrade custom resource specifies the target version (currently v1.35.2)
The tuppr controller (2 replicas) watches for version changes
tuppr generates upgrade plans for system-upgrade-controller
system-upgrade-controller performs rolling upgrades of nodes
Renovate automatically updates the version field when new releases are available

This ensures the cluster stays current with minimal manual intervention.

Sources: kubernetes/apps/system-upgrade/tuppr/app/helmrelease.yaml1-26 kubernetes/apps/system-upgrade/tuppr/upgrades/kubernetes.yaml1-10

CI/CD Integration

Self-hosted GitHub Actions runners execute CI/CD workflows with elevated permissions for cluster and OS-level operations.

Runner Configuration

The runner configuration in kubernetes/apps/actions-runner-system/actions-runner-controller/runners/k8s-gitops/rbac.yaml1-26 grants:

Kubernetes access: ClusterRoleBinding to cluster-admin role for full cluster control
Talos access: Talos ServiceAccount with os:admin role for node-level operations like image pre-pulling

This enables workflows to perform cluster validation, manifest testing, and infrastructure operations.

Sources: kubernetes/apps/actions-runner-system/actions-runner-controller/runners/k8s-gitops/rbac.yaml1-26

Flux Performance Tuning

The Flux instance is heavily tuned for performance and reliability through multiple customizations applied via JSON patches.

Controller Optimizations

Configuration	Value	Purpose
Concurrent workers	`--concurrent=20`	Parallel reconciliation across all controllers
Memory limit	`2Gi`	Prevent OOM during large reconciliations
Kustomize builds	`emptyDir: medium: Memory`	In-memory builds for faster processing
OOM detection	`--oom-watch-memory-threshold=95`	Proactive detection of memory pressure
Config caching	`CacheSecretsAndConfigMaps=true`	Cache frequently accessed configs
Retry on failure	`DefaultToRetryOnFailure=true`	Auto-retry failed Helm releases
Health check cancellation	`CancelHealthCheckOnNewRevision=true`	Cancel stale health checks on updates

The controller patches are defined in kubernetes/apps/flux-system/flux-instance/app/helmrelease.yaml42-127 and apply to source-controller, kustomize-controller, and helm-controller.

Sources: kubernetes/apps/flux-system/flux-instance/app/helmrelease.yaml1-128

Physical Infrastructure

The cluster runs on three identical bare-metal workstations with enterprise-grade storage and networking.

Hardware Specifications

Component	Specification
Nodes	3x Minisforum MS-A2
CPU	AMD Ryzen 9 9955HX per node
RAM	96GB DDR5-5600 per node (Crucial 2x48GB)
OS Disk	1.92TB Samsung PM9A3 M.2 NVMe per node
Data Disk	3.84TB Samsung PM9A3 U.2 NVMe per node
Additional Storage	1.92TB M.2 NVMe per node
NAS	45 HomeLab HL15 2.0 with 12x22TB HDD, 512GB RAM
NAS OS	TrueNAS SCALE
Network Backbone	UniFi 10G/25G Aggregation Switch
Network Access	UniFi Switch Pro Max 24 PoE (2.5G)
Router	UniFi Dream Machine Pro Max
WAN	5Gbps RCN

Each node connects to the network via 10G LACP, while the NAS connects via dual 25G LACP for maximum throughput. The cluster consumes approximately 240W total during normal operation.

Sources: README.md186-204

Key Design Principles

The cluster architecture follows several core principles:

Immutability: Talos Linux provides an immutable OS with no SSH access, managed entirely through APIs
Declarative Configuration: All cluster state is defined in Git and reconciled by Flux
Reproducibility: The entire cluster can be destroyed and rebuilt without data loss
Separation of Concerns: Storage (Rook-Ceph for configs, NFS for media) is isolated from compute
Defense in Depth: Network policies, security contexts, and least-privilege RBAC throughout
Observability: Comprehensive metrics and logs with 14-day retention
Automation: Renovate handles dependency updates, tuppr handles OS/K8s upgrades, KEDA handles autoscaling

These principles ensure the cluster remains maintainable, secure, and resilient while supporting a diverse application portfolio from media automation to home automation.

Sources: README.md44-54

Overview

Relevant source files

Purpose and Scope

Core infrastructure components: Core Infrastructure
Networking configuration: Networking Infrastructure
Application deployments: Application Deployments
CI/CD automation: GitOps and Automation
Operational procedures: Operations and Maintenance

Sources: README.md1-46

System Architecture

Architecture Layers

The architecture consists of five layers:

Infrastructure Layer: Physical hardware running Talos Linux, including three MS-A2 workstations with AMD Ryzen 9 9955HX processors and 96GB RAM each, connected via 10G/25G networking, plus a dedicated TrueNAS server for NFS storage.
Platform Layer: Essential Kubernetes services including Cilium for networking, Rook-Ceph for storage, Flux for GitOps, CoreDNS for DNS, cert-manager for certificates, and external-secrets for secrets management via 1Password Connect.
Observability Layer: Monitoring and logging infrastructure with Prometheus for metrics, VictoriaLogs for logs, Grafana for dashboards, and Fluent Bit for log collection.
Application Layer: User-facing applications including the media management stack (Sonarr, Radarr, Plex), Home Assistant, and Cloudflare tunnels.
Automation Layer: CI/CD and maintenance automation with Renovate for dependency updates, self-hosted GitHub Actions runners, and tuppr for automated OS/Kubernetes upgrades.

Sources: README.md50-87 bootstrap/helmfile.d/01-apps.yaml1-85

Repository Structure

The repository is organized into two primary directories: bootstrap/ for initial cluster setup and kubernetes/ for ongoing GitOps management.

Directory Layout

Sources: README.md78-87

GitOps Workflow

The repository implements a GitOps workflow where Git is the single source of truth for cluster state. Changes to the repository automatically propagate to the cluster through Flux CD.

Control Flow

The workflow operates as follows:

source-controller monitors the GitHub repository (https://github.com/buroa/k8s-gitops) at 1-hour intervals and OCI repositories at 15-minute intervals. It creates GitRepository and OCIRepository resources.
kustomize-controller watches GitRepository resources and reconciles Kustomization manifests, applying them to the cluster. It uses in-memory builds for performance.
helm-controller watches OCIRepository resources and reconciles HelmRelease manifests, installing or upgrading Helm charts. It includes OOM detection at 95% memory threshold.

All three controllers are configured with --concurrent=20 for parallel processing and 2Gi memory limits for stability.

Sources: kubernetes/apps/flux-system/flux-instance/app/helmrelease.yaml1-128 README.md70-77

Bootstrap Process

Initial cluster deployment follows a specific sequence managed by Helmfile. The bootstrap process deploys core platform services in dependency order.

Bootstrap Sequence

The bootstrap sequence defined in bootstrap/helmfile.d/01-apps.yaml1-85 establishes dependencies using the needs field:

cilium deploys first as the CNI provider
coredns requires cilium and provides DNS at 10.245.0.10
spegel provides local OCI registry caching
cert-manager enables TLS certificate automation
external-secrets integrates with external secret stores
onepassword provides 1Password Connect access
flux-operator manages Flux lifecycle
flux-instance deploys the Flux controllers that then manage the rest of the cluster

After bootstrap completes, Flux takes over and manages all subsequent deployments from the kubernetes/ directory.

Sources: bootstrap/helmfile.d/01-apps.yaml1-85

Dependency Management

Renovate continuously scans the repository for dependency updates across Helm charts, Docker images, and GitHub Actions. It creates pull requests automatically when newer versions are detected.

Automated Update Flow

Renovate runs hourly and scans for:

Helm chart versions in HelmRelease and OCIRepository resources
Docker image tags in deployment manifests
GitHub Actions versions in workflow files

Sources: README.md70-77

Cluster Upgrade Automation

The cluster uses tuppr to automate Kubernetes and Talos OS upgrades. The tuppr controller monitors custom resources and orchestrates rolling upgrades across the cluster.

Upgrade Management

The upgrade process is defined in kubernetes/apps/system-upgrade/tuppr/upgrades/kubernetes.yaml1-10:

A KubernetesUpgrade custom resource specifies the target version (currently v1.35.2)
The tuppr controller (2 replicas) watches for version changes
tuppr generates upgrade plans for system-upgrade-controller
system-upgrade-controller performs rolling upgrades of nodes
Renovate automatically updates the version field when new releases are available

This ensures the cluster stays current with minimal manual intervention.

Sources: kubernetes/apps/system-upgrade/tuppr/app/helmrelease.yaml1-26 kubernetes/apps/system-upgrade/tuppr/upgrades/kubernetes.yaml1-10

CI/CD Integration

Self-hosted GitHub Actions runners execute CI/CD workflows with elevated permissions for cluster and OS-level operations.

Runner Configuration

The runner configuration in kubernetes/apps/actions-runner-system/actions-runner-controller/runners/k8s-gitops/rbac.yaml1-26 grants:

Kubernetes access: ClusterRoleBinding to cluster-admin role for full cluster control
Talos access: Talos ServiceAccount with os:admin role for node-level operations like image pre-pulling

This enables workflows to perform cluster validation, manifest testing, and infrastructure operations.

Sources: kubernetes/apps/actions-runner-system/actions-runner-controller/runners/k8s-gitops/rbac.yaml1-26

Flux Performance Tuning

The Flux instance is heavily tuned for performance and reliability through multiple customizations applied via JSON patches.

Controller Optimizations

Configuration	Value	Purpose
Concurrent workers	`--concurrent=20`	Parallel reconciliation across all controllers
Memory limit	`2Gi`	Prevent OOM during large reconciliations
Kustomize builds	`emptyDir: medium: Memory`	In-memory builds for faster processing
OOM detection	`--oom-watch-memory-threshold=95`	Proactive detection of memory pressure
Config caching	`CacheSecretsAndConfigMaps=true`	Cache frequently accessed configs
Retry on failure	`DefaultToRetryOnFailure=true`	Auto-retry failed Helm releases
Health check cancellation	`CancelHealthCheckOnNewRevision=true`	Cancel stale health checks on updates

The controller patches are defined in kubernetes/apps/flux-system/flux-instance/app/helmrelease.yaml42-127 and apply to source-controller, kustomize-controller, and helm-controller.

Sources: kubernetes/apps/flux-system/flux-instance/app/helmrelease.yaml1-128

Physical Infrastructure

The cluster runs on three identical bare-metal workstations with enterprise-grade storage and networking.

Hardware Specifications

Component	Specification
Nodes	3x Minisforum MS-A2
CPU	AMD Ryzen 9 9955HX per node
RAM	96GB DDR5-5600 per node (Crucial 2x48GB)
OS Disk	1.92TB Samsung PM9A3 M.2 NVMe per node
Data Disk	3.84TB Samsung PM9A3 U.2 NVMe per node
Additional Storage	1.92TB M.2 NVMe per node
NAS	45 HomeLab HL15 2.0 with 12x22TB HDD, 512GB RAM
NAS OS	TrueNAS SCALE
Network Backbone	UniFi 10G/25G Aggregation Switch
Network Access	UniFi Switch Pro Max 24 PoE (2.5G)
Router	UniFi Dream Machine Pro Max
WAN	5Gbps RCN

Each node connects to the network via 10G LACP, while the NAS connects via dual 25G LACP for maximum throughput. The cluster consumes approximately 240W total during normal operation.

Sources: README.md186-204

Key Design Principles

The cluster architecture follows several core principles:

Immutability: Talos Linux provides an immutable OS with no SSH access, managed entirely through APIs
Declarative Configuration: All cluster state is defined in Git and reconciled by Flux
Reproducibility: The entire cluster can be destroyed and rebuilt without data loss
Separation of Concerns: Storage (Rook-Ceph for configs, NFS for media) is isolated from compute
Defense in Depth: Network policies, security contexts, and least-privilege RBAC throughout
Observability: Comprehensive metrics and logs with 14-day retention
Automation: Renovate handles dependency updates, tuppr handles OS/K8s upgrades, KEDA handles autoscaling

These principles ensure the cluster remains maintainable, secure, and resilient while supporting a diverse application portfolio from media automation to home automation.

Sources: README.md44-54

Overview

Purpose and Scope

System Architecture

Architecture Layers

Repository Structure

Directory Layout

GitOps Workflow

Control Flow

Bootstrap Process

Bootstrap Sequence

Dependency Management

Automated Update Flow

Cluster Upgrade Automation

Upgrade Management

CI/CD Integration

Runner Configuration

Flux Performance Tuning

Controller Optimizations

Physical Infrastructure

Hardware Specifications

Key Design Principles

On this page

Overview

Purpose and Scope

System Architecture

Architecture Layers

Repository Structure

Directory Layout

GitOps Workflow

Control Flow

Bootstrap Process

Bootstrap Sequence

Dependency Management

Automated Update Flow

Cluster Upgrade Automation

Upgrade Management

CI/CD Integration

Runner Configuration

Flux Performance Tuning

Controller Optimizations

Physical Infrastructure

Hardware Specifications

Key Design Principles

On this page