Migrate build system from Earthly to Nix#6999
Conversation
37ec31c to
b75aac3
Compare
6729630 to
4543c4e
Compare
|
I asked some maintainers to kick the tires on this POC. The biggest pain point was how invasive installing Nix is. I did a little looking into this. The short version is Nix really wants its package store to be rooted at Official InstallerThe official installer at https://nixos.org/download/ has two ways to install Nix: single-user and multi-user. In single-user mode, one user owns In multi-user mode, a daemon handles store operations and builds run under isolated build users. Multi-user is recommended because it offers better sharing, improved isolation, and more security on systems where multiple users might use Nix. Sadly MacOS is multi-user only. This is because MacOS since 10.15 has a read only root, so the installer needs to create a new volume and mount it at This POC uses two "experimental" Nix features: Flakes and the Determinate Systems InstallerDeterminate Systems offers a distro of Nix with a few (seemingly small) performance improvements compared to upstream. I recommended it for this POC because:
Rootless NixYou can use a static Nix binary to run rootless, but it's Linux only and requires that your distro supports user namespaces. Download a static Nix binary here (click your platform to get a download link). If you run this as a non-root user on a system that doesn't already have DockerIf you have Docker (or similar) and don't want to install Nix at all you can do this: docker run --rm -it \
-e 'NIX_CONFIG=experimental-features = nix-command flakes' \
-v $(pwd):/workdir -w /workdir \
nixos/nix
# Inside the container...
git config --global safe.directory /worker
nix shell nixpkgs#cachix -c cachix use crossplane
nix develop
nix run .#test |
|
Unless I'm missing something, another TODO is adding back a target to spin up everything locally as we used to be able to do with Other than that, I do think it could scare newcomers a bit, but I think we could live with that. |
@phisco True - there's a few targets I'd need to migrate if we choose to go this route. I've updated the TODO list to just say we need all the targets Earthly has. |
|
I tried to install Nix on my Macbook Pro using both the pkg installer and the CLI and both failed. The pkg gave no useful information on why it failed, and the CLI said "warn: curl: (22) The requested URL returned error: 404 I'll do some more digging tomorrow. |
jbw976
left a comment
There was a problem hiding this comment.
Thanks for driving this @negz! I took it for a test spin and things mostly worked for me, although the UX is definitely quite foreign to me and I don't really know what to expect when running, getting results, finding artifacts, etc.
A couple notes:
- i'm on MacOS
- i used the determinate systems installer from https://github.com/DeterminateSystems/nix-installer and just used the
curl | sh1-liner before noticing it had a GUI installer - installation was successful, i had to type in a password for sudo once and it finished OK
- running commands like
nix run .#testmostly work OK but I do see some unexpected error messages I'll note below
In general, while it seems functional, I don't really know what's going on - where is the build happening, if it fails where do I look to troubleshoot, where is the built binary, etc.?
It's just a very unfamiliar system that would require ramping up on for me and likely many others. I have a feeling there would be a lot of @negz... questions for sometime while folks are getting used to it. I also worry about the new contributor experience and if this unfamiliar environment will impact their ability and motivation to see a contribution through.
I know make and the build submodule aren't perfect either, but I feel like the barrier to entry there from a "use the build system" perspective is lower. It's probably not any lower from a "troubleshoot or add new functionality to the build system" perspective though.
I'm still leaning towards the build submodule, but I'm not completely against this nix approach. Perhaps it will get easier after some more usage and there's of course more we can do with the contributing guide to help people understand it better.
Errors encountered:
nix develop -c $SHELL-->warning: unhandled Platform key FamilyDisplayNamenix run .#test-->go: no such tool "covdata"
|
I pushed a ./nix.sh run .#test
./nix.sh build
./nix.sh flake showIt runs Nix inside a Docker container, so there's no The wrapper handles the annoying bits:
The first run takes a few minutes to populate the caches - same as running the flake natively for the first time. After that, subsequent runs are just as fast as native Nix.
Note that the container runs This means even regular maintainers might not need to install Nix locally - just use the wrapper. |
|
I tried /nix.sh and eventually got this: Then I realized I didn't add the arguments for the shell script to run 🙈 THOUGHT: Make it print out a menu of options if no argument is passed I proceeded to run the following commands and all work without any issue. ./nix.sh run .#test
./nix.sh build
./nix.sh flake showVERDICT: As someone who would consume the Nix stuff, not care about maintaining it, the experience was good. I really like the fact that it can all run in a docker container and not impact my local env. PROPOSAL: Add the commands to manage all of this in crossplane CLI: |
|
Thanks Nic! I tested out just the newest commit with As you mention, the first runs take some time to get all the images/packages but later on it just works. Just have to get used to running the commands with |
|
I'm trying the nix.sh script now - it does make me wonder why it wants to pull down 7GB of data to create the build environment? That seems like a lot of overhead even if it only gets downloaded once. |
|
One of the biggest problems I have with Earthly is that it doesn't play nice with network proxies, and it's almost impossible for me to build successfully when using a corporate machine, so it will be interesting to see if/how nix handles it. |
|
The experience using nix.sh is far better than trying to install it myself - that never worked for me. I had no proxy issues when building. Concerns:
|
@bobh66 What kind of proxy do you use? Did you have to do something to get Nix (inside the container) to use it? I was just going to look into that.
I don't expect most contributors to use the development shell often. I'd expect them to have Go and an IDE installed on their machine for day-to-day development, but use Where I do think the dev shell could be useful is:
You can install packages inside the dev shell by running for example
The script is setup to make |
We won't be able to store it in /nix most of the time. It'd only work inside of nix.sh, since in native Nix /nix is owned by root. In native Nix shell history should survive devshell restarts anyway. Inside the container we can fix it if it becomes an issue. Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
When running `./nix.sh run .#hack`, the kind cluster is created inside the Docker container. Without staying in that container, the user can't interact with the cluster using kubectl, helm, etc. This commit adds a check for the NIX_SH_CONTAINER environment variable (set by nix.sh) and execs into `nix develop` after cluster setup. This gives the user an interactive shell with all the dev tools available while maintaining access to the cluster. When running natively (`nix run .#hack`), behavior is unchanged. Signed-off-by: Nic Cope <nicc@rk0n.org>
jbw976
left a comment
There was a problem hiding this comment.
We spent 30 mins on a demo, we walked through the changes together, multiple people have tried the changes out directly as well - I think this is good enough to go ahead and try it out.
Let's keep fingers crossed for main builds going well after this merge, but we can iterate quickly there if needed.
| 'golangci/golangci-lint', | ||
| ], | ||
| matchBaseBranches: [ | ||
| '/^release-.*/', |
There was a problem hiding this comment.
will this need to get updated over time to only match the release branches that earthly is used in? e.g. the releases before v2.2?
similarly, matchBaseBranches will need to be updated over time in renovate-nix.json5 to include the releases that nix is used in?
There was a problem hiding this comment.
Yeah that's the idea. Open to better ways to handle this. 😦
There was a problem hiding this comment.
i don't think i have a better idea, just making sure I understand the expectations 😉
| # Build OCI image arguments for dockerTools. | ||
| # This matches the distroless base image Crossplane previously used. | ||
| # https://github.com/GoogleContainerTools/distroless | ||
| mkImageArgs = | ||
| { | ||
| version, | ||
| crossplaneBin, | ||
| arch, | ||
| }: | ||
| let | ||
| passwd = pkgs.writeText "passwd" '' | ||
| root:x:0:0:root:/root:/sbin/nologin | ||
| nobody:x:65534:65534:nobody:/nonexistent:/sbin/nologin | ||
| nonroot:x:65532:65532:nonroot:/home/nonroot:/sbin/nologin | ||
| ''; | ||
| group = pkgs.writeText "group" '' | ||
| root:x:0: | ||
| nobody:x:65534: | ||
| nonroot:x:65532: | ||
| ''; | ||
| nsswitch = pkgs.writeText "nsswitch.conf" '' | ||
| hosts: files dns | ||
| ''; | ||
| in | ||
| { | ||
| name = "crossplane/crossplane"; | ||
| tag = version; | ||
| created = "now"; | ||
| architecture = arch; | ||
|
|
||
| contents = [ | ||
| crossplaneBin | ||
| pkgs.cacert | ||
| pkgs.tzdata | ||
| pkgs.iana-etc | ||
| ]; | ||
|
|
||
| extraCommands = '' | ||
| mkdir -p tmp home/nonroot etc crds webhookconfigurations | ||
| chmod 1777 tmp | ||
| cp ${passwd} etc/passwd | ||
| cp ${group} etc/group | ||
| cp ${nsswitch} etc/nsswitch.conf | ||
| cp -r ${self}/cluster/crds/* crds/ | ||
| cp -r ${self}/cluster/webhookconfigurations/* webhookconfigurations/ | ||
| ''; | ||
|
|
||
| config = { | ||
| Entrypoint = [ "/bin/crossplane" ]; | ||
| ExposedPorts = { | ||
| "8080/tcp" = { }; | ||
| }; | ||
| User = "65532"; | ||
| Env = [ | ||
| "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" | ||
| "SSL_CERT_FILE=${pkgs.cacert}/etc/ssl/certs/ca-certificates.crt" | ||
| ]; | ||
| Labels = { | ||
| "org.opencontainers.image.source" = "https://github.com/crossplane/crossplane"; | ||
| "org.opencontainers.image.version" = version; | ||
| }; | ||
| }; | ||
| }; |
There was a problem hiding this comment.
We talked about this in person today - I was a bit hesitant to take on maintaining the equivalent config that distroless uses, but it it appears to be quite a small surface area and is pretty static. Note this is just the config for the base layer, we get updates to packages/dependencies over time without changing any of this stuff ^
By default, writeShellApplication prepends runtimeInputs to PATH but still inherits the system PATH. This means apps can accidentally depend on tools like sed, grep, or coreutils that happen to exist on the build machine but aren't explicitly declared. This commit sets inheritPath = false on all writeShellApplication calls, ensuring scripts can only use explicitly declared tools. Several apps needed additional dependencies that were previously provided implicitly: - generate: coreutils, gnused (used by go generate tooling) - hack: coreutils, gnugrep, nix (for grep, rm, and nix develop) - promoteArtifacts: coreutils (for mktemp, rm) Signed-off-by: Nic Cope <nicc@rk0n.org>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
contributing/README.md (1)
106-210: Small consistency tweak: use./nix.shin the prose.The sentence about validating with
nix.shcould trip folks if the script isn’t on PATH. Would you be open to aligning it with the other./nix.shexamples? Thanks for the clear migration guidance.✏️ Suggested tweak
-Use your editor or IDE to write code, then validate with `nix.sh`: +Use your editor or IDE to write code, then validate with `./nix.sh`:
🤖 Fix all issues with AI agents
In @.github/workflows/ci.yml:
- Around line 113-185: The Run E2E Tests step is always passing an empty
-prior-crossplane-version= when CROSSPLANE_PRIOR_VERSION is unset; change the
command block in the Run E2E Tests step so the flag is only included when
CROSSPLANE_PRIOR_VERSION is non-empty (e.g. build a PRIOR_FLAG variable with: if
[[ -n "${CROSSPLANE_PRIOR_VERSION}" ]]; then PRIOR_FLAG="
-prior-crossplane-version=${CROSSPLANE_PRIOR_VERSION}"; else PRIOR_FLAG=""; fi
and then include ${PRIOR_FLAG} in the nix run invocation), updating the command
that currently references CROSSPLANE_PRIOR_VERSION.
🧹 Nitpick comments (3)
design/one-pager-build-with-nix.md (2)
34-38: Add a decision checkpoint for the split‑brain build story.Would you be open to a short section that defines when/how we decide to migrate
providers versus keep dual build systems? It would make the long‑term
maintenance path more explicit. Thanks! As per coding guidelines: Focus on
architectural decisions, user experience, and long-term maintainability.📌 Suggested addition
@@ ## Future Improvements + +### Decision Checkpoint +Document when and how we'll decide to migrate providers versus keep dual build +systems (e.g., after N releases or once success criteria are met).
40-49: Consider adding explicit success criteria.If this is intended as a trial, could we add measurable success criteria (e.g.,
setup time, CI stability, cache hit rate, contributor feedback) so the
evaluation is objective? Thanks! As per coding guidelines: Focus on
architectural decisions, user experience, and long-term maintainability.📌 Suggested addition
@@ ## Goals @@ * Provide a foundation that could eventually unify core and provider builds. + +### Success Criteria (Trial Exit) +Define measurable outcomes (e.g., contributor setup time, CI stability, cache +hit rate, and feedback) to decide whether to proceed.nix.sh (1)
1-75: Optional: show a brief help message when no args are passed.Given prior feedback about
./nix.shwith no arguments, a small usage banner could reduce confusion. Would you be open to adding a guard before the container re‑exec? The rest of the container bootstrap reads solid.💡 Possible guard
set -e +if [ "$#" -eq 0 ]; then + echo "Usage: ./nix.sh <command>" + echo "Try: ./nix.sh flake show" + exit 1 +fi + # When NIX_SH_CONTAINER is set, we're running inside the Docker container.
Signed-off-by: Nic Cope <nicc@rk0n.org>
This way new users only need Docker for it to run. Signed-off-by: Nic Cope <nicc@rk0n.org>
This seems more useful than linking to Nix website itself. The one pager links to said website. Signed-off-by: Nic Cope <nicc@rk0n.org>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@contributing/README.md`:
- Around line 118-136: Summary: zsh can glob the flake selector .#test causing
surprises; update the examples to quote the selector and make all ./nix.sh
usages consistent. Fix: find occurrences of the unquoted flake selectors (e.g.,
./nix.sh run .#test, ./nix.sh run .#lint, ./nix.sh run .#generate, ./nix.sh run
.#e2e, ./nix.sh flake show) and wrap the selector in quotes so shells won’t
expand the # character, and ensure the inline usage of ./nix.sh is consistent
throughout the section.
🧹 Nitpick comments (1)
contributing/README.md (1)
121-123: Set clearer expectations for first-run time and cache size.The “~2–3 minutes” claim may understate real-world first-run time and disk usage (several GB). Could we soften this to a range (e.g., “several minutes” and “multiple GB”) or reference that it varies by network/cache state? This helps newcomers plan. Thanks!
Signed-off-by: Nic Cope <nicc@rk0n.org>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@design/one-pager-build-with-nix.md`:
- Around line 92-96: Polish the prose by adding missing periods and removing
overused intensifiers: replace "etc" with "etc." (period), remove or replace
instances of "really" and "exactly" where they appear (e.g., in the sentence
about downloads and any other occurrences), and tighten wording for crisper
style; search for the literal tokens "etc", "really", and "exactly" in the
document and update those occurrences accordingly (also apply the same edits at
the other mentioned occurrences of these tokens).
- Around line 105-106: Update the sentence that states "The container runs
`--privileged` for Docker-in-Docker support" to include a short rationale, an
explicit threat model, and concrete mitigations: explain why `--privileged` is
required for the current D-in-D approach, enumerate risks (host compromise,
kernel/device access escalation), and add safer alternatives such as using
rootless Docker / podman, dind with user namespaces, or bind-mounting the host
Docker socket; recommend limiting `--privileged` to E2E/test environments,
document required operational controls for production (minimal hosts, hardened
kernel, seccomp/apparmor profiles, CI runner isolation), and add guidance for
auditing/monitoring and rollback if `--privileged` must be used.
🧹 Nitpick comments (1)
design/one-pager-build-with-nix.md (1)
127-130: Reproducibility claim should mention version injection.The doc says commit N builds are identical next year. Given Crossplane embeds a version string, can we explicitly document how CI sets
buildVersionto avoid cache collisions while staying pure? That detail helps readers trust the claim.
Based on learnings: “In flake.nix for crossplane/crossplane, the buildVersion variable is set by CI via sed (.github/workflows/ci.yml) which directly modifiesbuildVersion = null;tobuildVersion = "$VERSION";before building. This approach keeps builds pure (no --impure flag) while ensuring different versions produce different Nix store hashes, preventing cache hits from incorrectly returning artifacts built with old versions.”
Earthly's maintainers announced they would no longer actively develop the tool. This commit migrates crossplane-runtime to use Nix, matching the approach taken in crossplane/crossplane#6999. The migration adds a Nix flake with: - Apps for local development (test, lint, generate, tidy) - Checks for CI (test, go-lint, generate, nix-lint) - A development shell with all required tools - A Docker wrapper (nix.sh) that requires no local Nix installation This commit also updates apis/apis.go to invoke controller-gen and buf directly from $PATH rather than via `go run`. This matches crossplane's pattern and allows code generation to work in Nix's sandboxed builds. The CI workflow now uses Nix checks, and CodeQL uses GitHub's native action rather than a custom Earthly target. The Renovate configuration is split into modular files (base, nix, earthly) to support both Nix on main and Earthly on release branches. The Earthfile is retained for release branches that still use it. Note: The go-lint check will fail until pre-existing lint issues (22 gocritic and godoclint violations) are addressed in a follow-up commit. Signed-off-by: Nic Cope <nicc@rk0n.org>
Earthly's maintainers announced they would no longer actively develop the tool. This commit migrates crossplane-runtime to use Nix, matching the approach taken in crossplane/crossplane#6999. The migration adds a Nix flake with: - Apps for local development (test, lint, generate, tidy) - Checks for CI (test, go-lint, generate, nix-lint) - A development shell with all required tools - A Docker wrapper (nix.sh) that requires no local Nix installation This commit also updates apis/apis.go to invoke controller-gen and buf directly from $PATH rather than via `go run`. This matches crossplane's pattern and allows code generation to work in Nix's sandboxed builds. The CI workflow now uses Nix checks, and CodeQL uses GitHub's native action rather than a custom Earthly target. The Renovate configuration is split into modular files (base, nix, earthly) to support both Nix on main and Earthly on release branches. The Earthfile is retained for release branches that still use it. Note: The go-lint check will fail until pre-existing lint issues (22 gocritic and godoclint violations) are addressed in a follow-up commit. Signed-off-by: Nic Cope <nicc@rk0n.org>
Earthly's maintainers announced they would no longer actively develop the tool. This commit migrates crossplane-runtime to use Nix, matching the approach taken in crossplane/crossplane#6999. The migration adds a Nix flake with: - Apps for local development (test, lint, generate, tidy) - Checks for CI (test, go-lint, generate, nix-lint) - A development shell with all required tools - A Docker wrapper (nix.sh) that requires no local Nix installation This commit also updates apis/apis.go to invoke controller-gen and buf directly from $PATH rather than via `go run`. This matches crossplane's pattern and allows code generation to work in Nix's sandboxed builds. The CI workflow now uses Nix checks, and CodeQL uses GitHub's native action rather than a custom Earthly target. The Renovate configuration is split into modular files (base, nix, earthly) to support both Nix on main and Earthly on release branches. The Earthfile is retained for release branches that still use it. Note: The go-lint check will fail until pre-existing lint issues (22 gocritic and godoclint violations) are addressed in a follow-up commit. Signed-off-by: Nic Cope <nicc@rk0n.org>
I meant to remove them in crossplane#6999 but somehow missed them. Signed-off-by: Nic Cope <nicc@rk0n.org>
This is how we pinned our tools for code generation before we switched to Nix in crossplane#6999. I thought I removed it in that PR, but didn't. Signed-off-by: Nic Cope <nicc@rk0n.org>
Earthly's maintainers announced they would no longer actively develop the tool. This commit migrates crossplane-runtime to use Nix, matching the approach taken in crossplane/crossplane#6999. The migration adds a Nix flake with: - Apps for local development (test, lint, generate, tidy) - Checks for CI (test, go-lint, generate, nix-lint) - A development shell with all required tools - A Docker wrapper (nix.sh) that requires no local Nix installation This commit also updates apis/apis.go to invoke controller-gen and buf directly from $PATH rather than via `go run`. This matches crossplane's pattern and allows code generation to work in Nix's sandboxed builds. The CI workflow now uses Nix checks, and CodeQL uses GitHub's native action rather than a custom Earthly target. The Renovate configuration is split into modular files (base, nix, earthly) to support both Nix on main and Earthly on release branches. The Earthfile is retained for release branches that still use it. Note: The go-lint check will fail until pre-existing lint issues (22 gocritic and godoclint violations) are addressed in a follow-up commit. Signed-off-by: Nic Cope <nicc@rk0n.org>
Description of your changes
Fixes #6394
Earthly's maintainers announced last year that they would no longer actively develop it. This PR replaces Earthly with a Nix flake. See the included one-pager for the full rationale.
I propose we trial Nix on
crossplane/crossplaneandcrossplane/crossplane-runtimefor 3 months, same as we did when adopting Earthly. If it works well, we could establish a pattern that providers could optionally use to migrate from the build submodule.Nix is a 21-year-old build system governed by the NixOS Foundation. nixpkgs is one of the most active repos on GitHub, packaging over 120,000 tools.
The PR adds a
./nix.shcommand that runs Nix in a Docker image, so the only build tooling dependency is Docker.The first run downloads dependencies into a Docker volume (~2-3 min on my ~500Mbps connection). Subsequent runs reuse the cache and take seconds. Contributors who prefer native Nix can install it, but it's entirely optional. The
./nix.shUX is essentially identical to thenixUX. Performance is also identical unless you've limited Docker's compute resources.Nix has excellent support for binary caching. It hashes package build inputs, and if the build inputs don't change it'll download the package from a binary cache instead of rebuilding it. This is true for all packages, whether they're part of Crossplane's build toolchain (e.g. Go, controller-runtime) or part of Crossplane itself (OCI images, crank binaries, etc). GitHub Actions is configured to push to https://crossplane.cachix.org, and
./nix.shis configured to pull from it. This means local builds on your laptop won't recompile things CI has recently built.Notably Nix considers everything in
go.modto be a separate package from Crossplane. These dependencies change a lot less frequently and take a lot longer to build than Crossplane itself, so caching them can result in significant speedups:build-artifactsTimeI have:
nix flake checkto ensure this PR is ready for review.Added or updated unit tests.Added or updated e2e tests.Linked a PR or a docs tracking issue to document this change.Addedbackport release-x.ylabels to auto-backport this PR.Followed the API promotion workflow if this PR introduces, removes, or promotes an API.