Skip to content

Flake inputs fetched and unpacked despite outputs being a cache hit #9570

@pwaller

Description

@pwaller

Is your feature request related to a problem? Please describe.

Flake input sources are fetched and unpacked even if they are unneeded. If you have lots of large flake inputs sources, this becomes a big bottleneck and resource consumer (wall time, cpu time, disk io, disk storage, network bandwidth and github API calls) when fetching from a cache.

Describe the solution you'd like

Building a flake output which is a cache hit should not require fetching the input sources.

Describe alternatives you've considered

  • Not using flakes might do the trick. But I like flakes.
  • It's also possible the lazy trees effort (Lazy trees #6530) may resolve this, since the contents of the input tree are unused in the (cache hit) scenario I describe, so I would hope they would not be unpacked.

Additional context

Consider the following flake:

{
  # Needed for runCommandNoCC.
  inputs.nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
  # Arbitrary old nixpkgs commit you're unlikely to have the sources for in your /nix/store directory.
  inputs.arbitrarySources.url = "github:nixos/nixpkgs/49e5e473182a44fd0cd9048e4a3a99ba1d47da37";
  outputs = { nixpkgs, arbitrarySources, ... }: {
    packages.x86_64-linux.default = nixpkgs.legacyPackages.x86_64-linux.runCommandNoCC "test" {} ''
      echo ${arbitrarySources}
      touch $out
    '';
  };
}

Note: arbitrarySources uses nixos/nixpkgs as an input, but if the default package is built or available via a substituter, the sources are no longer required (arbitrarySources is not a runtime dependency).

Expectation:

  • If the default package has already been built, nix build should be a cheap no-op, even if the sources are not present.

Problem:

  • nix build of this flake fetches/unpacks the input source for arbitrarySource, even if the output is already built. These sources are subsequently unnecessary, fetching them consumes network bandwidth, disk bandwidth, CPU time, wall clock time and disk space (to store the flake source).
  • If default is available via a substituter, fetching the sources (and putting them in the store) is unnecessary.

Reproduction (whole block can be pasted including parentheses, runs in subshell with tracing switched on):

(
set -x
mkdir -p issue-9570 && cd issue-9570
cat > flake.nix <<'EOF'
{
  # Needed for runCommandNoCC.
  inputs.nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
  # Arbitrary old nixpkgs commit you're unlikely to have the sources for in your /nix/store directory.
  inputs.arbitrarySources.url = "github:nixos/nixpkgs/49e5e473182a44fd0cd9048e4a3a99ba1d47da37";
  outputs = { nixpkgs, arbitrarySources, ... }: {
    packages.x86_64-linux.default = nixpkgs.legacyPackages.x86_64-linux.runCommandNoCC "test" {} ''
      echo ${arbitrarySources}
      touch $out
    '';
  };
}
EOF
# 1. First build fetches sources
time nix build path:.
# 2. Second build is a no-op (expected behaviour, also expected even if flake input source is missing)
time nix build path:.
# 3. Delete the sources (and the test.drv) from the store, working around known deletion issues using sudo, stdin and ignore-liveness.
nix-store -q --referrers-closure $(nix flake archive --json path:. | jq -r .inputs.arbitrarySources.path) | nix-store --option keep-derivations false --delete --stdin
# 4. Second build is a no-op (expected behaviour, also expected even if flake input source is missing)
time nix build path:.
# 5. Put state back how it was (roughly), by deleting the sources again so the reproducer can be run repeatedly; note that this also shows that arbitrarySources has been fetched again.
nix-store -q --referrers-closure $(nix flake archive --json path:. | jq -r .inputs.arbitrarySources.path) | nix-store --option keep-derivations false --delete --stdin
)

Reproduction output

+ mkdir -p issue-9570
+ cd issue-9570
+ cat
+ nix build path:.

real	0m4.789s
user	0m0.702s
sys	0m2.973s
+ nix build path:.

real	0m0.182s
user	0m0.098s
sys	0m0.058s
+ nix-store --option keep-derivations false --delete --stdin
++ nix flake archive --json path:.
++ jq -r .inputs.arbitrarySources.path
+ nix-store -q --referrers-closure /nix/store/ijny9v749dpicbcgvx6iwxk317dzsybs-source
finding garbage collector roots...
deleting '/nix/store/bza7cl6wfgbbr47cbj7g1wxaq86lyxzm-test.drv'
deleting '/nix/store/ijny9v749dpicbcgvx6iwxk317dzsybs-source'
deleting unused links...
note: currently hard linking saves 12503.42 MiB
2 store paths deleted, 117.31 MiB freed
+ nix build path:.

real	0m4.769s
user	0m0.698s
sys	0m2.936s
+ nix-store --option keep-derivations false --delete --stdin
++ nix flake archive --json path:.
++ jq -r .inputs.arbitrarySources.path
+ nix-store -q --referrers-closure /nix/store/ijny9v749dpicbcgvx6iwxk317dzsybs-source
finding garbage collector roots...
deleting '/nix/store/bza7cl6wfgbbr47cbj7g1wxaq86lyxzm-test.drv'
deleting '/nix/store/ijny9v749dpicbcgvx6iwxk317dzsybs-source'
deleting unused links...
note: currently hard linking saves 12503.42 MiB
2 store paths deleted, 117.31 MiB freed

What I expect to see

The above shows the following times to run nix build:

1. real	0m4.815s # OK
2. real	0m0.169s # OK
3. real	0m4.785s # Bad

I expect in the latter case that (3) should take as long as (2), not (1); the lost time in (3) is spent fetching and unpacking the sources.

nix build path:. --debug --verbose shows that the whole of nixpkgs is being unpacked in this scenario. It's not being downloaded again because the gzipped sources are also a cache hit (and not deleted by nix store --delete); but in my real world scenario where the package can be fetched from a substituter, I see eval hang while all the flake input sources are fetched and unpacked, which means waiting multiple minutes and consuming substantial resources.

I note that I've used nixpkgs as a stand-in here; I do not expect fixing the issue I've described to improve typical uses of nixpkgs very much, because those would actually involve eval'ing nixpkgs, whereas the scenario I describe only use the flake inputs as a src attribute to mkDerivation; in this case, the sources are only necessary if the derivation need to be built.

Priorities

Add 👍 to issues you find important.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeature request or proposalflakes
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions