-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Cargo.lock considered harmful #327063
Description
Introduction
I've been doing a little investigation on the impact of Cargo.lock files because, if you run ncdu against a Nixpkgs checkout, they're usually the largest individual files you come across and rust packages are frequently at the top in any given sub-directory.
AFAICT the functionality to import Cargo.lock has existed since May 2021. Usage has exploded since:
$ for ver in 21.05 22.05 23.05 24.05 ; do echo -n "nixos-$ver " ; git checkout --quiet NixOS/nixos-$ver && fd '^Cargo.lock$' . | wc -l ; done
nixos-21.05 4
nixos-22.05 15
nixos-23.05 208
nixos-24.05 316
Measurements
Next I measured the total disk usage of all Cargo.lock files combined:
$ for ver in 21.05 22.05 23.05 24.05 ; do echo -n "nixos-$ver " ; git checkout --quiet NixOS/nixos-$ver && fd '^Cargo.lock$' . -X du -b | cut -f 1 | jq -s 'add' ; done
nixos-21.05 156292
nixos-22.05 113643
nixos-23.05 12505511
nixos-24.05 24485533
24MiB!
Realistically though, anyone who cares about space efficiency in any way will use compression, so I measured again with each Cargo.lock compressed individually:
$ for ver in 21.05 22.05 23.05 24.05 ; do echo -n "nixos-$ver " ; git checkout --quiet NixOS/nixos-$ver && fd '^Cargo.lock$' . -x sh -c 'gzip -9 < {} | wc -c' | jq -s 'add' ; done
nixos-21.05 38708
nixos-22.05 30104
nixos-23.05 3075375
nixos-24.05 5986458
Further, evidence in #320528 (comment) suggests that handling Cargo.lock adds significant eval overhead. Eval time for Nixpkgs via nix-env is ~28% lower if parsing/handling of Cargo.lock files is stubbed.
Analysis
Just ~300/116231 packages (~0.25%) make up ~6MiB of our ~41MiB compressed nixpkgs tarball which is about 15% in relative terms (18.5KiB per package).
For comparison, our hackage-packages.nix containing the entire Hackage package set (18191 packages) is ~2.3MiB compressed (133 Bytes per package).
Breaking down eval time by package reveals that each Cargo.lock takes on average about 76.67 ms to handle/parse.
Discussion
I do not believe that this trend is sustainable, especially given the likely increasing importance of rust in the coming years. If we had one order of magnitude more rust packages in Nixpkgs and assumed the same amount of data per package that we currently observe, just the rust packages alone would take up ~54 MiB compressed.
If nothing is done, I could very well see the compressed Nixpkgs tarball bloat beyond 100MiB in just a few years.
Extrapolating eval time does not paint a bright picture either: If we assume one order of magnitude more Cargo.lock packages again, evaluating just those packages would take ~4x as long as evaluating the entire rest of Nixpkgs currently does.
This does not scale.
Solutions
I'm not deep into rust packaging but I remember the vendorHash being the predominant pattern a few years ago which did not have any of these issue as it's just one 32 Byte string literal per package.
Would it be possible to revert back to using vendorHashes again?
(At least for packages in Nixpkgs, having Cargo.lock support available for external use is fine.)
What else could be done to mitigate this situation?
Limitations/Future work
Files were compressed individually, adding gzip overhead for each lockfile. You could create a tarball out of all Cargo.lock files and compress it as a whole to mitigate this effect.
I found some Cargo.lock files that have a different name or a prefix/suffix and were not considered.
CC @NixOS/rust