-
-
Notifications
You must be signed in to change notification settings - Fork 17.7k
stdenv: allow for jobservers across multiple nix builds #314888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Conversation
doc/stdenv/stdenv.chapter.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### `runInJobServer` \<command\> \-\-\-\- \<defArgs\> \-\-\-\- \<args\> {#fun-runInJobServer} | |
| ### `runInJobserver` \<command\> \-\-\-\- \<defArgs\> \-\-\-\- \<args\> {#fun-runInJobServer} |
(multiple places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running this as root is most unwise in practice but running it as another user seems to be nearly impossible. when fuse doesn't fuck us over systemd does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .st_mode = S_IFREG | 0660, | |
| .st_mode = S_IFREG | 0666, |
won't work otherwise
pkgs/stdenv/generic/setup.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| -j${NIX_BUILD_CORES} -l${NIX_BUILD_CORES} ---- \ | |
| -j${NIX_BUILD_CORES} ---- \ |
multiple places
|
cargo apparently no longer accepts jobserver fds that aren't pipes. that's somewhat problematic. |
initially only make and cargo support using the jobserver. other build systems may follow suit later.
Co-authored-by: Raito Bezarius <masterancpp@gmail.com>
|
Is this work dead? It looked very promising. |
|
I have some patches lying around locally to make it work with ninja, so anyone interested in picking this up, feel free to write me about it. |
|
FYI for anyone looking at this in the future: the state of this is that various people are talking about this in Lix and something better may appear eventually, but the approach of Make jobservers really does not work and a new protocol needs to exist. Here's a pad where it is being discussed: https://pad.lix.systems/lix-jobserver |
Remove the requirement that the jobserver "fifo" is actually a named pipe. Named pipes are essentially stateless, and therefore carry a high risk of a killed process leaving the server with no tokens left, and no clear way to reclaim them. Therefore, multiple jobserver implementations use FUSE instead: - [nixos-jobserver](NixOS/nixpkgs#314888) (WIP) uses simple file on FUSE - [steve](https://gitweb.gentoo.org/proj/steve.git) uses a character device via CUSE - [guildmaster](https://codeberg.org/amonakov/guildmaster) uses a character device via CUSE This is compatible with GNU make and Ninja, since they do not check the file type, and seems to be the only solution that can achieve state tracking while preserving compatibility. Signed-off-by: Michał Górny <mgorny@gentoo.org>
The jobserver specification [0] currently suggests that the FIFO must be a genuine FIFO. For some work we're doing [1][2], we're emulating a FIFO using CUSE/FUSE to allow tracking when consumers disappear to avoid lost tokens. nixos had a similar idea in the past too [3]. There doesn't seem to be a good reason to check that any FIFO passed by the user is actually identifiable as such by `stat()`, so drop the check. make already does not perform such a check, just the specification isn't clear about it, so we've asked them to clarify it [4]. [0] https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html [1] https://codeberg.org/amonakov/guildmaster [2] https://gitweb.gentoo.org/proj/steve.git/ [3] NixOS/nixpkgs#314888 [4] https://savannah.gnu.org/bugs/index.php?67726
The jobserver specification [0] currently suggests that the FIFO must be a genuine FIFO. For some work we're doing [1][2], we're emulating a FIFO using CUSE/FUSE to allow tracking when consumers disappear to avoid lost tokens. nixos had a similar idea in the past too [3]. make already does not perform such a check, just the specification isn't clear about it, so we've asked them to clarify it [4]. [0] https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html [1] https://codeberg.org/amonakov/guildmaster [2] https://gitweb.gentoo.org/proj/steve.git/ [3] NixOS/nixpkgs#314888 [4] https://savannah.gnu.org/bugs/index.php?67726 Signed-off-by: Sam James <sam@gentoo.org>
Remove the requirement that the jobserver "fifo" is actually a named pipe. Named pipes are essentially stateless, and therefore carry a high risk of a killed process leaving the server with no tokens left, and no clear way to reclaim them. Therefore, multiple jobserver implementations use FUSE instead: - [nixos-jobserver](NixOS/nixpkgs#314888) (WIP) uses simple file on FUSE - [steve](https://gitweb.gentoo.org/proj/steve.git) uses a character device via CUSE - [guildmaster](https://codeberg.org/amonakov/guildmaster) uses a character device via CUSE This is compatible with GNU make and Ninja, since they do not check the file type, and seems to be the only solution that can achieve state tracking while preserving compatibility. Signed-off-by: Michał Górny <mgorny@gentoo.org>
Do not require that the jobserver path corresponds to an actual named pipe. Token-accounting jobserver implementations such as steve [1] and guildmaster [2] use a character device instead, while the draft version of nixos-jobserver [3] uses a pseudo-file via FUSE. GNU make does not enforce a named pipe either. [1] https://gitweb.gentoo.org/proj/steve.git [2] https://codeberg.org/amonakov/guildmaster [3] NixOS/nixpkgs#314888 Signed-off-by: Michał Górny <mgorny@gentoo.org>
Remove the requirement that the jobserver "fifo" is actually a named pipe. Named pipes are essentially stateless, and therefore carry a high risk of a killed process leaving the server with no tokens left, and no clear way to reclaim them. Therefore, multiple jobserver implementations use FUSE instead: - [nixos-jobserver](NixOS/nixpkgs#314888) (WIP) uses simple file on FUSE - [steve](https://gitweb.gentoo.org/proj/steve.git) uses a character device via CUSE - [guildmaster](https://codeberg.org/amonakov/guildmaster) uses a character device via CUSE This is compatible with GNU make and Ninja, since they do not check the file type, and seems to be the only solution that can achieve state tracking while preserving compatibility. CC @amonakov --------- Signed-off-by: Michał Górny <mgorny@gentoo.org>
…(#169154) Remove the requirement that the jobserver "fifo" is actually a named pipe. Named pipes are essentially stateless, and therefore carry a high risk of a killed process leaving the server with no tokens left, and no clear way to reclaim them. Therefore, multiple jobserver implementations use FUSE instead: - [nixos-jobserver](NixOS/nixpkgs#314888) (WIP) uses simple file on FUSE - [steve](https://gitweb.gentoo.org/proj/steve.git) uses a character device via CUSE - [guildmaster](https://codeberg.org/amonakov/guildmaster) uses a character device via CUSE This is compatible with GNU make and Ninja, since they do not check the file type, and seems to be the only solution that can achieve state tracking while preserving compatibility. CC @amonakov --------- Signed-off-by: Michał Górny <mgorny@gentoo.org>
Remove the requirement that the jobserver "fifo" is actually a named pipe. Named pipes are essentially stateless, and therefore carry a high risk of a killed process leaving the server with no tokens left, and no clear way to reclaim them. Therefore, multiple jobserver implementations use FUSE instead: - [nixos-jobserver](NixOS/nixpkgs#314888) (WIP) uses simple file on FUSE - [steve](https://gitweb.gentoo.org/proj/steve.git) uses a character device via CUSE - [guildmaster](https://codeberg.org/amonakov/guildmaster) uses a character device via CUSE This is compatible with GNU make and Ninja, since they do not check the file type, and seems to be the only solution that can achieve state tracking while preserving compatibility. CC @amonakov --------- Signed-off-by: Michał Górny <mgorny@gentoo.org>
Remove the requirement that the jobserver "fifo" is actually a named pipe. Named pipes are essentially stateless, and therefore carry a high risk of a killed process leaving the server with no tokens left, and no clear way to reclaim them. Therefore, multiple jobserver implementations use FUSE instead: - [nixos-jobserver](NixOS/nixpkgs#314888) (WIP) uses simple file on FUSE - [steve](https://gitweb.gentoo.org/proj/steve.git) uses a character device via CUSE - [guildmaster](https://codeberg.org/amonakov/guildmaster) uses a character device via CUSE This is compatible with GNU make and Ninja, since they do not check the file type, and seems to be the only solution that can achieve state tracking while preserving compatibility. CC @amonakov --------- Signed-off-by: Michał Górny <mgorny@gentoo.org>
Description of changes
Retake of #143820 where I unfortunately fucked up trying to debug kernel issues and btrfs.
Original motivations by @pennae:
make -jN -lNin stdenv is a very blunt instrument. it works well when max-jobs=1, but as nix-level paralellism increases it becomes increasingly deficient. starting from a low-load situation we start max-jobs * N compilers, loadavg goes through the roof, the-lNload limit kicks in and inhibits new compilers starting until loadavg has fallen below N—at which point all make instances spawn a lot of new compilers and loadavg goes through the roof again. this oscillation leaves the system underutilized in low phases and overcommitted in high phases.testing the current stdenv against a jobserver with 26 tokens on a 12C/24T machine shows that parallel builds of llvm_{8..11} run about 7% faster (35:52min for stdenv, 33:30min with jobserver), a larger build of llvm{5..13} is about about 11% faster (1:27h for stdenv, 1:17h with jobserver). (
removing the[more testing says that-lfrom stdenv also improves utilization but is less efficient. preliminary testing here shows that-l${1.5 * N}may be a good alternative to-lNas used currently, #141266 could be a good vector to go for that instead of this whole mess.-l2Nwould be a minimum to get better utilization, but so far every-lsetting we've tried has produced some underutilization except excessive large numbers like 6N or higher])nothing in here should be regarded as a final suggestion in any way, it's more of a "hey look, this might just work". as such it's extremely rough around the edges, eg to use the jobserver the experimenter currently has to bring a
/jobserverfifo filled with tokens into the nix sandbox:is this something worth pursuing? a 10% speedup for hydra does seem tempting
todos before this is more generally usable:
Things done
nix.conf? (See Nix manual)sandbox = true