Skip to content

AWS SDK breaks bultin:fetchurl builder and overrides OpenSSL engine  #5947

@st8ed

Description

@st8ed

Describe the bug

Setting up any S3 store substituter causes builtin:fetchurl builder to freeze indefinitely and silently switches global OpenSSL engine to S2N (AWS's simplified and audited implementation of TLS).

I encountered the bug when boostrapping nixpkgs.glibc and struggled with its reproducibility as builtins:fetchurl is currently used on a rare occasion to boot actual fetchurl, which itself is not affected. As it turns out, the bug also modifies OpenSSL global state and could surface someplace where any cryptographic operation is involved.

Steps To Reproduce

The following self-sustained shell command reproduces the problem (sudo is essential to override substituters):

sudo nix build \
    --expr 'import <nix/fetchurl.nix> { url = "https://example.com/file.tar.xz"; sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; }' \
    --option sandbox false \
    --option substituters 's3://REDACTED' \
    --debug --verbose

After failed attempts to access inexistent REDACTED bucket build freezes indefinitely with the following output:

...
sandbox setup: closing leaked FD 24
sandbox setup: closing leaked FD 25
building '/nix/store/5psjg9667p5wm7shh3jpfxk5rsghr18r-file.tar.xz.drv'...
waiting for children
building of '/nix/store/5psjg9667p5wm7shh3jpfxk5rsghr18r-file.tar.xz.drv!out' from .drv file: read 457 bytes
download thread waiting for 10000 ms
downloading 'https://example.com/file.tar.xz'...
starting download of https://example.com/file.tar.xz
curl: Couldn't find host example.com in the /etc/nix/netrc file; using defaults
waiting for children
building of '/nix/store/5psjg9667p5wm7shh3jpfxk5rsghr18r-file.tar.xz.drv!out' from .drv file: read 152 bytes
curl: Connected to example.com (93.184.216.34) port 443 (#0)
waiting for children

Disabling substituters or leaving only non-S3 substituters restores correct behavior (error 404). Explicitly added for completeness sandbox option actually makes no difference.

Expected behavior

The actual execution of builtin:fetchurl shouldn't be affected by the choice of substituters and shouldn't get stuck.

nix-env --version output

nix-env (Nix) 2.6.0pre20220118_4af88a4

Additional context

I found out that the forked Nix build process inside namespaced sandbox is stuck in an AWS TLS implementation of OpenSSL in this loop at s2n-tls/main/utils/s2n_random.c:245 trying to read from closed file descriptor for /dev/urandom. entropy_fd is assigned there before nix splits processes and it still holds some positive value, but it's no longer related to /dev/urandom. On my system entropy_fd happens to really point to libCURL's opened socket and it can't produce enough output to bootstrap S2N's random generator, although I believe in some cases it can.

S2N takes place instead of ordinary OpenSSL implementation (engine) once AWS::InitAPI is called and libs2n.so is loaded, and as it seems to me there's no way to prevent it from modifying global state as it calls ENGINE_set_default from s2n_init().

Commenting closeMostFDs call allows Nix to finish the download attempt (with 404 error in the example), so it confirms that the problem is due to closed 'leaked' file descriptors.

I attempted different approaches to fix this.

Patch S2N

Using getrandom() instead of /dev/urandom actually solves the issue, but I doubt it can be merged in upstream as this feature is relatively new. Nevertheless, I did an overlay to demonstrate.

A much more thorough approach is to ensure proper management of /dev/urandom descriptors the way it is done in OpenSSL. It could be borrowed and implemented in S2N, but I'm not sure if there are other simplifications in S2N such as this, so this way OpenSSL library seems to me more appealing with its maturity and a better fit for Nix's cross-platform purposes. So perhaps S2N shouldn't be used in the first place.

Get rid of S2N altogether in favor of OpenSSL

If this is acceptable, I think it would be better if maintainers of aws-sdk-cpp could add an appropriate interface to build a S2N-less version of the sdk.

I implemented fix in my fork. I'm happy to PR if it's okay.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions