Describe the bug
Setting up any S3 store substituter causes builtin:fetchurl builder to freeze indefinitely and silently switches global OpenSSL engine to S2N (AWS's simplified and audited implementation of TLS).
I encountered the bug when boostrapping nixpkgs.glibc and struggled with its reproducibility as builtins:fetchurl is currently used on a rare occasion to boot actual fetchurl, which itself is not affected. As it turns out, the bug also modifies OpenSSL global state and could surface someplace where any cryptographic operation is involved.
Steps To Reproduce
The following self-sustained shell command reproduces the problem (sudo is essential to override substituters):
sudo nix build \
--expr 'import <nix/fetchurl.nix> { url = "https://example.com/file.tar.xz"; sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; }' \
--option sandbox false \
--option substituters 's3://REDACTED' \
--debug --verbose
After failed attempts to access inexistent REDACTED bucket build freezes indefinitely with the following output:
...
sandbox setup: closing leaked FD 24
sandbox setup: closing leaked FD 25
building '/nix/store/5psjg9667p5wm7shh3jpfxk5rsghr18r-file.tar.xz.drv'...
waiting for children
building of '/nix/store/5psjg9667p5wm7shh3jpfxk5rsghr18r-file.tar.xz.drv!out' from .drv file: read 457 bytes
download thread waiting for 10000 ms
downloading 'https://example.com/file.tar.xz'...
starting download of https://example.com/file.tar.xz
curl: Couldn't find host example.com in the /etc/nix/netrc file; using defaults
waiting for children
building of '/nix/store/5psjg9667p5wm7shh3jpfxk5rsghr18r-file.tar.xz.drv!out' from .drv file: read 152 bytes
curl: Connected to example.com (93.184.216.34) port 443 (#0)
waiting for children
Disabling substituters or leaving only non-S3 substituters restores correct behavior (error 404). Explicitly added for completeness sandbox option actually makes no difference.
Expected behavior
The actual execution of builtin:fetchurl shouldn't be affected by the choice of substituters and shouldn't get stuck.
nix-env --version output
nix-env (Nix) 2.6.0pre20220118_4af88a4
Additional context
I found out that the forked Nix build process inside namespaced sandbox is stuck in an AWS TLS implementation of OpenSSL in this loop at s2n-tls/main/utils/s2n_random.c:245 trying to read from closed file descriptor for /dev/urandom. entropy_fd is assigned there before nix splits processes and it still holds some positive value, but it's no longer related to /dev/urandom. On my system entropy_fd happens to really point to libCURL's opened socket and it can't produce enough output to bootstrap S2N's random generator, although I believe in some cases it can.
S2N takes place instead of ordinary OpenSSL implementation (engine) once AWS::InitAPI is called and libs2n.so is loaded, and as it seems to me there's no way to prevent it from modifying global state as it calls ENGINE_set_default from s2n_init().
Commenting closeMostFDs call allows Nix to finish the download attempt (with 404 error in the example), so it confirms that the problem is due to closed 'leaked' file descriptors.
I attempted different approaches to fix this.
Patch S2N
Using getrandom() instead of /dev/urandom actually solves the issue, but I doubt it can be merged in upstream as this feature is relatively new. Nevertheless, I did an overlay to demonstrate.
A much more thorough approach is to ensure proper management of /dev/urandom descriptors the way it is done in OpenSSL. It could be borrowed and implemented in S2N, but I'm not sure if there are other simplifications in S2N such as this, so this way OpenSSL library seems to me more appealing with its maturity and a better fit for Nix's cross-platform purposes. So perhaps S2N shouldn't be used in the first place.
Get rid of S2N altogether in favor of OpenSSL
If this is acceptable, I think it would be better if maintainers of aws-sdk-cpp could add an appropriate interface to build a S2N-less version of the sdk.
I implemented fix in my fork. I'm happy to PR if it's okay.
Describe the bug
Setting up any S3 store substituter causes
builtin:fetchurlbuilder to freeze indefinitely and silently switches global OpenSSL engine to S2N (AWS's simplified and audited implementation of TLS).I encountered the bug when boostrapping
nixpkgs.glibcand struggled with its reproducibility asbuiltins:fetchurlis currently used on a rare occasion to boot actual fetchurl, which itself is not affected. As it turns out, the bug also modifies OpenSSL global state and could surface someplace where any cryptographic operation is involved.Steps To Reproduce
The following self-sustained shell command reproduces the problem (sudo is essential to override substituters):
sudo nix build \ --expr 'import <nix/fetchurl.nix> { url = "https://example.com/file.tar.xz"; sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; }' \ --option sandbox false \ --option substituters 's3://REDACTED' \ --debug --verboseAfter failed attempts to access inexistent
REDACTEDbucket build freezes indefinitely with the following output:Disabling substituters or leaving only non-S3 substituters restores correct behavior (error 404). Explicitly added for completeness
sandboxoption actually makes no difference.Expected behavior
The actual execution of
builtin:fetchurlshouldn't be affected by the choice of substituters and shouldn't get stuck.nix-env --versionoutputnix-env (Nix) 2.6.0pre20220118_4af88a4
Additional context
I found out that the forked Nix build process inside namespaced sandbox is stuck in an AWS TLS implementation of OpenSSL in this loop at s2n-tls/main/utils/s2n_random.c:245 trying to read from closed file descriptor for
/dev/urandom.entropy_fdis assigned there before nix splits processes and it still holds some positive value, but it's no longer related to/dev/urandom. On my systementropy_fdhappens to really point to libCURL's opened socket and it can't produce enough output to bootstrap S2N's random generator, although I believe in some cases it can.S2N takes place instead of ordinary OpenSSL implementation (engine) once AWS::InitAPI is called and libs2n.so is loaded, and as it seems to me there's no way to prevent it from modifying global state as it calls ENGINE_set_default from s2n_init().
Commenting closeMostFDs call allows Nix to finish the download attempt (with 404 error in the example), so it confirms that the problem is due to closed 'leaked' file descriptors.
I attempted different approaches to fix this.
Patch S2N
Using
getrandom()instead of/dev/urandomactually solves the issue, but I doubt it can be merged in upstream as this feature is relatively new. Nevertheless, I did an overlay to demonstrate.A much more thorough approach is to ensure proper management of
/dev/urandomdescriptors the way it is done in OpenSSL. It could be borrowed and implemented in S2N, but I'm not sure if there are other simplifications in S2N such as this, so this way OpenSSL library seems to me more appealing with its maturity and a better fit for Nix's cross-platform purposes. So perhaps S2N shouldn't be used in the first place.Get rid of S2N altogether in favor of OpenSSL
If this is acceptable, I think it would be better if maintainers of
aws-sdk-cppcould add an appropriate interface to build a S2N-less version of the sdk.I implemented fix in my fork. I'm happy to PR if it's okay.