Skip to content

Conversation

@kolyshkin
Copy link
Contributor

Go CL 393354 (in Go 1.19beta1) introduced raising the RLIMIT_NOFILE soft limit to the current value of the hard limit.

Further CLs (in Go 1.21, but backported to Go 1.20.x and 1.19.x) introduced more code in between getrlimit and setrlimit syscalls (see Go's src/syscall/rlimit.go).

In runc exec, we use prlimit(2) to set rlimits for the child (runc init) some time after we start it. This results in the following race:

runc exec (parent)                 runc init (child)
------------------                 -----------------
(see (*setnsProcess).start         (see init in src/syscall/rlimit.go)

                                   getrlimit(RLIMIT_NOFILE, &lim)
prlimit                            ....
                                   setrlimit(RLIMIT_NOFILE, &nlim)

As a result, once in a while runc exec NOFILE limit is wrong. I reproduced this in Ubuntu 20.04, running a container with soft and hard NOFILE limit set to 65536 in config.json, and using the following script:

set -e

RUNC=./runc-1.1-go1.19.13
$RUNC --version

i=0
while [ $i -lt 100000 ]; do
  LIM=$($RUNC exec bionic sh -c "ulimit -n");
  if [ $LIM -ne 65536 ]; then
    echo "WHOOPSIE (iter $i) got numfile $LIM";
    break;
  fi;
  ((i%100==0)) && printf "[%s] %10d\n" "$(date)" "$i";
  ((i++)) || true
done

Running it for a few minutes, you'll get:

WHOOPSIE (iter 3148) got numfile 1024

This seems like a go runtime bug (the changes did not account for the possibility of setprlimit from another process), but while it's being fixed, let's use this workaround of applying child's limits at a predefined time (after all init functions).

NOTE it does not make sense to add a test case because it takes many iterations to hit the issue, and we can not afford it in CI.

Go CL 393354 [1] (in Go 1.19beta1) introduced raising the RLIMIT_NOFILE
soft limit to the current value of the hard limit.

Further CLs (in Go 1.21, but backported to Go 1.20.x and 1.19.x)
introduced more code in between getrlimit and setrlimit syscalls
(see Go's src/syscall/rlimit.go).

In runc exec, we use prlimit to set rlimits for the child (runc init)
some time after we start it. This results in the following race:

```
runc exec (parent)                 runc init (child)
------------------                 -----------------
(see (*setnsProcess).start         (see init in src/syscall/rlimit.go)

                                   getrlimit(RLIMIT_NOFILE, &lim)
prlimit                            ....
                                   setrlimit(RLIMIT_NOFILE, &nlim)
```

As a result, once in a while runc exec NOFILE limit is wrong. I
reproduced this in Ubuntu 20.04, running a container with soft and hard
NOFILE limit set to 65536 in config.json, and using the following
script:

```

set -e

RUNC=./runc-1.1-go1.19.13
$RUNC --version

i=0
while [ $i -lt 100000 ]; do
  LIM=$($RUNC exec bionic sh -c "ulimit -n");
  if [ $LIM -ne 65536 ]; then
    echo "WHOOPSIE (iter $i) got numfile $LIM";
    break;
  fi;
  ((i%100==0)) && printf "[%s] %10d\n" "$(date)" "$i";
  ((i++)) || true
done
```

Running it for a few minutes, you'll get:

> WHOOPSIE (iter 3148) got numfile 1024

This seems like a go runtime bug (the changes did not account for the
possibility of setprlimit from another process), but while it's being
fixed, let's use this workaround of applying child's limits at a
predefined time (after all init functions).

NOTE it does not make sense to add a test case because it takes many
iterations to hit the issue, and we can not afford it in CI.

[1] https://go-review.googlesource.com/c/go/+/393354

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

Closing in favor of #4265

@kolyshkin kolyshkin closed this May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant