An opinionated Kubernetes distribution built for AI/ML workflows.
One config file. Production-ready platform. Any cloud.
Quick Start · CLI Reference · Architecture · Roadmap · Documentation
Status: Under heavy development and very unstable. APIs, configuration formats, and behavior will change without notice. Not yet suitable for production use.
Nebari Infrastructure Core (NIC) is an opinionated Kubernetes distribution that ships with sane defaults (that are fully configurable) and a suite of foundational software. A single YAML config file gives you a production-grade Kubernetes cluster with SSO, GitOps, API gateway, TLS certificates, and an OpenTelemetry exporter that plugs into whatever observability system you already run — all wired together and working out of the box.
NIC's composable architecture means you get exactly the platform you need — nothing more, nothing less. Our initial focus is AI/ML workflows (notebook environments, model serving, experiment tracking), but the foundation is general-purpose. Software Packs let you tailor the platform to your workload without carrying software you don't use.
NIC is the successor to Nebari, rebuilt from the ground up, based on seven years of lessons learned deploying data science platforms in production.
Getting from a managed Kubernetes cluster to a platform teams can actually use requires assembling and integrating dozens of components: identity providers, certificate management, ingress controllers, telemetry pipelines, GitOps tooling. This takes months of engineering time, and keeping it all working across environments takes even more.
NIC deploys a complete platform stack — not just a cluster. You declare what you want, NIC provisions the infrastructure and deploys foundational services that are pre-integrated and production-hardened.
On top of this foundation, Software Packs let you compose your platform. Software Packs are curated collections of
open-source tools packaged as ArgoCD applications with a NebariApp Custom Resource. When installed, they automatically
register with the platform — picking up SSO, routing, TLS, and telemetry with zero manual configuration.
Want JupyterHub and conda-store? Install the Data Science Pack. Need model serving? Add the ML Pack (MLflow, KServe, Envoy AI Gateway). Want dashboards and log aggregation? Add the Observability Pack (Grafana LGTM stack). Each pack is independent, so you deploy only what you need.
flowchart TD
subgraph SP["Software Packs"]
direction LR
ds["Data Science"] ~~~ ml["ML Serving"] ~~~ obs["Observability"] ~~~ custom["Your Pack"]
end
subgraph NO["Nebari Operator"]
op["Auto-configures SSO, routing, TLS, telemetry via NebariApp CRD"]
end
subgraph FS["Foundational Software"]
direction LR
kc["Keycloak"] ~~~ eg["Envoy GW"] ~~~ cm["cert-manager"] ~~~ ot["OTel"] ~~~ ac["ArgoCD"]
end
subgraph K8["Kubernetes Cluster"]
direction LR
vpc["VPC"] ~~~ np["Node Pools"] ~~~ st["Storage"] ~~~ iam["IAM"]
end
subgraph CP["Cloud Provider"]
direction LR
aws["AWS EKS"] ~~~ gcp["GCP GKE"] ~~~ az["Azure AKS"] ~~~ hz["Hetzner K3s"] ~~~ k3s["Local K3s"]
end
SP --> NO --> FS --> K8 --> CP
style SP fill:#f3e8fc,stroke:#c840e9,color:#6b21a8
style NO fill:#d4f5f2,stroke:#20aaa1,color:#0d5d57
style FS fill:#fef0db,stroke:#e8952c,color:#7c4a03
style K8 fill:#eeeef3,stroke:#4a4a6a,color:#1a1a2e
style CP fill:#e8faf8,stroke:#20aaa1,color:#0d5d57
nic deploy -f config.yaml
- Provisions infrastructure — VPC, managed Kubernetes, node pools, storage, IAM via OpenTofu
- Deploys foundational software — ArgoCD installs Keycloak, Envoy Gateway, cert-manager, OpenTelemetry Collector
- Activates the Nebari Operator — watches for
NebariAppresources, auto-configures SSO, routing, TLS, and telemetry - Configures DNS — optional Cloudflare integration for automatic record management
Every NIC deployment includes a landing page where users discover and access all deployed services.
| Feature | Description |
|---|---|
| Opinionated Defaults | Production-ready configuration out of the box — multi-AZ, autoscaling, security best practices |
| Composable Software Packs | Install only what you need. Each pack auto-integrates with SSO, telemetry, and routing |
| Multi-Cloud | AWS (EKS), GCP (GKE), Azure (AKS), Hetzner (K3s), and local (K3s) from the same config format |
| GitOps Native | ArgoCD manages all foundational software with dependency ordering and health checks |
| OpenTelemetry Native | Built-in OTel Collector exports metrics, logs, and traces — plugs into whatever observability system you run |
| SSO Everywhere | Keycloak provides centralized auth. The Nebari Operator creates OAuth clients automatically |
| Declarative | One YAML config file. NIC reconciles actual state to match using OpenTofu |
| DNS Automation | Optional Cloudflare provider for automatic DNS record management |
- Go 1.25+
- Cloud provider credentials (AWS, GCP, or Azure) configured via environment variables
NIC automatically downloads and manages its own OpenTofu binary — no manual installation required.
# From source
make build
# Or install to $GOPATH/bin
make install# Copy and edit a sample config
cp examples/aws-config.yaml config.yaml
# Set your credentials
cp .env.example .env # Edit with your cloud provider credentials
# Validate your config
./nic validate
# Deploy everything
./nic deploySee the CLI Reference for all commands and options.
Deploy infrastructure and foundational services based on a configuration file.
./nic deploy [flags]
./nic deploy -f <config-file> [flags]The -f flag is optional. When omitted, NIC looks for config.yaml in the current directory. You can also set
NIC_CONFIG_PATH as an environment variable.
Options:
-f, --file: Path to config.yaml file (auto-discovered if omitted)--dry-run: Preview changes without applying them--timeout: Override default timeout (e.g., '45m', '1h')--regen-apps: Regenerate ArgoCD application manifests even if already bootstrapped
The deploy command:
- Provisions cloud infrastructure via the selected provider (OpenTofu)
- Bootstraps a GitOps repository with ArgoCD application manifests (if configured)
- Installs ArgoCD and foundational services (Keycloak, Envoy Gateway, cert-manager)
- Configures DNS records (if a DNS provider is configured)
Validate a configuration file without deploying any infrastructure.
./nic validate
./nic validate -f <config-file>Options:
-f, --file: Path to config.yaml file (auto-discovered if omitted)
Destroy all infrastructure resources.
./nic destroy [flags]
./nic destroy -f <config-file> [flags]Options:
-f, --file: Path to config.yaml file (auto-discovered if omitted)--auto-approve: Skip confirmation prompt and destroy immediately--dry-run: Show what would be destroyed without actually deleting--force: Continue destruction even if some resources fail to delete--timeout: Override default timeout (e.g., '45m', '1h')
WARNING: This operation is destructive and cannot be undone.
Generate a kubeconfig for the deployed Kubernetes cluster.
./nic kubeconfig [-o output-file]
./nic kubeconfig -f <config-file> [-o output-file]Options:
-f, --file: Path to config.yaml file (auto-discovered if omitted)-o, --output: Path to output kubeconfig file (defaults to stdout)
Show version information and registered providers.
./nic versionNIC uses a YAML configuration file. See the examples/ directory for sample configurations:
examples/aws-config.yaml- AWS/EKS configurationexamples/aws-config-with-dns.yaml- AWS with Cloudflare DNS automationexamples/aws-existing.yaml- Deploy to an existing EKS clusterexamples/gcp-config.yaml- GCP/GKE configurationexamples/azure-config.yaml- Azure/AKS configurationexamples/hetzner-config.yaml- Hetzner Cloud/K3s configurationexamples/local-config.yaml- Local Kind/K3s configuration
Secrets are never stored in configuration files. Use environment variables or a .env file (see .env.example):
# Copy the example and fill in your values
cp .env.example .envNIC supports OpenTelemetry tracing with configurable exporters:
OTEL_EXPORTER: Exporter type —none(default),console,otlp, orbothOTEL_ENDPOINT: OTLP endpoint (default:localhost:4317)
# Console traces (debugging) — config.yaml auto-discovered in current directory
OTEL_EXPORTER=console ./nic deploy
# OTLP traces
OTEL_EXPORTER=otlp OTEL_ENDPOINT=localhost:4317 ./nic deploy -f config.yamlFor local development, you can deploy a Kind cluster with foundational services:
make localkind-up # Create Kind cluster and deploy
make localkind-down # Tear downWhen using a remote repo, a repo URL must be set in your local-config.yaml, and a valid private SSH key must be set as the GIT_SSH_PRIVATE_KEY environment variable.
Ommitting the git_repository or explicitely setting a local git path will result in a local git directory being used for gitops.
The local provider works against any cluster already present in your kubeconfig — it does not create the cluster. To use a tool other than Kind:
- Create the cluster with your tool of choice.
- Point NIC at it by setting
kube_contextin yourlocal-config.yamlto the context name of that cluster (NIC reads the kubeconfig from$KUBECONFIG, falling back to~/.kube/config).kube_contextis a context name, not a file path — list available names withkubectl config get-contexts -o name. - Make the local GitOps directory visible to the cluster. When
git_repositoryis omitted (or set to afile://path), NIC uses a local GitOps directory at/tmp/nebari-gitops-<project_name>, whereproject_namecomes from your config. ArgoCD's repo-server mounts this path via ahostPathvolume, so it must exist inside the cluster node, not just on your host. Cluster nodes run in containers/VMs that don't share your host filesystem, so the directory must be bind-mounted in when the cluster is created. Themake localkind-uptarget does this for you by generating a kind config withextraMounts; for k3d and minikube you mount it manually as shown below.
k3d nodes run as Docker containers and don't see your host's /tmp by default. Create the directory first, then mount it into the nodes at the same path:
mkdir -p /tmp/nebari-gitops-my-nebari-local
k3d cluster create \
--volume /tmp/nebari-gitops-my-nebari-local:/tmp/nebari-gitops-my-nebari-local@all
k3d kubeconfig get --all > kubeconfig
export KUBECONFIG=$(pwd)/kubeconfig
./nic deploy --file local-config.yamlSet kube_context: "k3d-<cluster-name>" in your config (k3d prefixes the context with k3d-). For k3s clusters, also set storage_class: local-path and disable MetalLB (k3s ships ServiceLB) as noted in examples/local-config.yaml.
minikube runs the node inside a VM/container. Mount the host directory before deploying:
mkdir -p /tmp/nebari-gitops-my-nebari-local
minikube start
minikube mount /tmp/nebari-gitops-my-nebari-local:/tmp/nebari-gitops-my-nebari-local &
export KUBECONFIG=$HOME/.kube/config # minikube updates this automatically
./nic deploy --file local-config.yamlminikube mount runs in the foreground and must stay running for the duration of the deploy (and while ArgoCD is reconciling), so launch it in a separate terminal or background it as shown. Set kube_context: "minikube" in your config.
If you'd rather avoid the host-path mount entirely, set an explicit remote
git_repository(see OPTION 3 inexamples/local-config.yaml); ArgoCD then clones the repo over HTTPS/SSH and no local directory needs to be mounted into the node.
# Run all tests
go test ./... -v
# Run with coverage
go test ./... -cover -coverprofile=coverage.out
go tool cover -html=coverage.out# Format, vet, lint, and test
make check
# Or individually:
make fmt
make vet
make lint
make test# Install hooks (one-time setup)
pre-commit install
# Run all hooks manually
pre-commit run --all-filescmd/nic/ CLI entry point and commands
pkg/
├── argocd/ ArgoCD installation, Helm charts, app manifests
├── config/ Configuration parsing and validation
├── dnsprovider/ DNS provider interface (Cloudflare)
├── git/ Git client for GitOps repository management
├── kubeconfig/ Kubeconfig generation
├── provider/ Cloud provider interface
│ ├── aws/ AWS provider (EKS, VPC, EFS, IAM)
│ ├── gcp/ GCP provider
│ ├── azure/ Azure provider
│ ├── hetzner/ Hetzner Cloud provider (K3s via hetzner-k3s)
│ └── local/ Local Kind/K3s provider
├── telemetry/ OpenTelemetry setup
└── tofu/ OpenTofu binary management and execution
terraform/ OpenTofu/Terraform modules per provider
examples/ Sample configuration files
docs/ Architecture docs, design decisions, ADRs
NIC is under very active development.
Our current roadmap can be found at 2026-02-04-roadmap.md. We welcome feedback and contributions to help shape the future of the project!
| Document | Description |
|---|---|
| CLI Reference | All commands, flags, and configuration options |
| Design Doc | The original design document that laid the foundation for NIC's architecture and implementation. It includes detailed explanations of the core components, design decisions, and implementation details. The document is organized into sections covering architecture, design decisions, configuration reference, Nebari Operator, and testing strategy.) |
| Architectural Decision Records | Architectural decision records recording design decisions as we build |
Contributions are welcome! To get started:
# Clone the repo
git clone https://github.com/nebari-dev/nebari-infrastructure-core.git
cd nebari-infrastructure-core
# Install dependencies and build
make build
# Run tests
go test ./... -v
# Run all checks (fmt, vet, lint, test)
make check
# Install pre-commit hooks
pre-commit installSee our issue tracker for open issues.
Apache License 2.0 — see LICENSE for details.
If you change provider templates under pkg/provider/**/templates/, regenerate the provider lockfile(s) locally:
./scripts/pre-commit-tofu-lock.sh