This document provides a high-level introduction to the k8s-gitops homelab repository, which implements a fully automated, declarative Kubernetes cluster using GitOps principles. This overview covers the repository structure, GitOps workflow, core technologies, and architectural design. For detailed information about specific subsystems, refer to:
Sources: README.md1-46
The k8s-gitops repository manages a three-node Kubernetes cluster running on bare-metal hardware with Talos Linux. The architecture is organized into distinct layers that build upon each other, from physical infrastructure through platform services to applications.
The architecture consists of five layers:
Infrastructure Layer: Physical hardware running Talos Linux, including three MS-A2 workstations with AMD Ryzen 9 9955HX processors and 96GB RAM each, connected via 10G/25G networking, plus a dedicated TrueNAS server for NFS storage.
Platform Layer: Essential Kubernetes services including Cilium for networking, Rook-Ceph for storage, Flux for GitOps, CoreDNS for DNS, cert-manager for certificates, and external-secrets for secrets management via 1Password Connect.
Observability Layer: Monitoring and logging infrastructure with Prometheus for metrics, VictoriaLogs for logs, Grafana for dashboards, and Fluent Bit for log collection.
Application Layer: User-facing applications including the media management stack (Sonarr, Radarr, Plex), Home Assistant, and Cloudflare tunnels.
Automation Layer: CI/CD and maintenance automation with Renovate for dependency updates, self-hosted GitHub Actions runners, and tuppr for automated OS/Kubernetes upgrades.
Sources: README.md50-87 bootstrap/helmfile.d/01-apps.yaml1-85
The repository is organized into two primary directories: bootstrap/ for initial cluster setup and kubernetes/ for ongoing GitOps management.
The bootstrap/ directory contains Helmfile configurations that deploy the foundational platform services during initial cluster setup. The kubernetes/apps/ directory is organized by namespace, with each namespace containing one or more applications managed by Flux.
Sources: README.md78-87
The repository implements a GitOps workflow where Git is the single source of truth for cluster state. Changes to the repository automatically propagate to the cluster through Flux CD.
The workflow operates as follows:
source-controller monitors the GitHub repository (https://github.com/buroa/k8s-gitops) at 1-hour intervals and OCI repositories at 15-minute intervals. It creates GitRepository and OCIRepository resources.
kustomize-controller watches GitRepository resources and reconciles Kustomization manifests, applying them to the cluster. It uses in-memory builds for performance.
helm-controller watches OCIRepository resources and reconciles HelmRelease manifests, installing or upgrading Helm charts. It includes OOM detection at 95% memory threshold.
All three controllers are configured with --concurrent=20 for parallel processing and 2Gi memory limits for stability.
Sources: kubernetes/apps/flux-system/flux-instance/app/helmrelease.yaml1-128 README.md70-77
Initial cluster deployment follows a specific sequence managed by Helmfile. The bootstrap process deploys core platform services in dependency order.
The bootstrap sequence defined in bootstrap/helmfile.d/01-apps.yaml1-85 establishes dependencies using the needs field:
cilium deploys first as the CNI providercoredns requires cilium and provides DNS at 10.245.0.10spegel provides local OCI registry cachingcert-manager enables TLS certificate automationexternal-secrets integrates with external secret storesonepassword provides 1Password Connect accessflux-operator manages Flux lifecycleflux-instance deploys the Flux controllers that then manage the rest of the clusterAfter bootstrap completes, Flux takes over and manages all subsequent deployments from the kubernetes/ directory.
Sources: bootstrap/helmfile.d/01-apps.yaml1-85
Renovate continuously scans the repository for dependency updates across Helm charts, Docker images, and GitHub Actions. It creates pull requests automatically when newer versions are detected.
Renovate runs hourly and scans for:
HelmRelease and OCIRepository resourcesWhen a pull request is created, GitHub Actions workflows automatically run validation and pre-pull container images to cluster nodes before merge. After merge, Flux detects the change and reconciles the cluster state.
Sources: README.md70-77
The cluster uses tuppr to automate Kubernetes and Talos OS upgrades. The tuppr controller monitors custom resources and orchestrates rolling upgrades across the cluster.
The upgrade process is defined in kubernetes/apps/system-upgrade/tuppr/upgrades/kubernetes.yaml1-10:
KubernetesUpgrade custom resource specifies the target version (currently v1.35.2)This ensures the cluster stays current with minimal manual intervention.
Sources: kubernetes/apps/system-upgrade/tuppr/app/helmrelease.yaml1-26 kubernetes/apps/system-upgrade/tuppr/upgrades/kubernetes.yaml1-10
Self-hosted GitHub Actions runners execute CI/CD workflows with elevated permissions for cluster and OS-level operations.
The runner configuration in kubernetes/apps/actions-runner-system/actions-runner-controller/runners/k8s-gitops/rbac.yaml1-26 grants:
ClusterRoleBinding to cluster-admin role for full cluster controlServiceAccount with os:admin role for node-level operations like image pre-pullingThis enables workflows to perform cluster validation, manifest testing, and infrastructure operations.
Sources: kubernetes/apps/actions-runner-system/actions-runner-controller/runners/k8s-gitops/rbac.yaml1-26
The Flux instance is heavily tuned for performance and reliability through multiple customizations applied via JSON patches.
| Configuration | Value | Purpose |
|---|---|---|
| Concurrent workers | --concurrent=20 | Parallel reconciliation across all controllers |
| Memory limit | 2Gi | Prevent OOM during large reconciliations |
| Kustomize builds | emptyDir: medium: Memory | In-memory builds for faster processing |
| OOM detection | --oom-watch-memory-threshold=95 | Proactive detection of memory pressure |
| Config caching | CacheSecretsAndConfigMaps=true | Cache frequently accessed configs |
| Retry on failure | DefaultToRetryOnFailure=true | Auto-retry failed Helm releases |
| Health check cancellation | CancelHealthCheckOnNewRevision=true | Cancel stale health checks on updates |
The controller patches are defined in kubernetes/apps/flux-system/flux-instance/app/helmrelease.yaml42-127 and apply to source-controller, kustomize-controller, and helm-controller.
Sources: kubernetes/apps/flux-system/flux-instance/app/helmrelease.yaml1-128
The cluster runs on three identical bare-metal workstations with enterprise-grade storage and networking.
| Component | Specification |
|---|---|
| Nodes | 3x Minisforum MS-A2 |
| CPU | AMD Ryzen 9 9955HX per node |
| RAM | 96GB DDR5-5600 per node (Crucial 2x48GB) |
| OS Disk | 1.92TB Samsung PM9A3 M.2 NVMe per node |
| Data Disk | 3.84TB Samsung PM9A3 U.2 NVMe per node |
| Additional Storage | 1.92TB M.2 NVMe per node |
| NAS | 45 HomeLab HL15 2.0 with 12x22TB HDD, 512GB RAM |
| NAS OS | TrueNAS SCALE |
| Network Backbone | UniFi 10G/25G Aggregation Switch |
| Network Access | UniFi Switch Pro Max 24 PoE (2.5G) |
| Router | UniFi Dream Machine Pro Max |
| WAN | 5Gbps RCN |
Each node connects to the network via 10G LACP, while the NAS connects via dual 25G LACP for maximum throughput. The cluster consumes approximately 240W total during normal operation.
Sources: README.md186-204
The cluster architecture follows several core principles:
These principles ensure the cluster remains maintainable, secure, and resilient while supporting a diverse application portfolio from media automation to home automation.
Sources: README.md44-54
Refresh this wiki