Skip to content

Docker fails to start container if certain syscalls are restricted by seccomp #22252

@jpallen

Description

@jpallen

Description of problem: The following syscalls must be provided in the seccomp profile, even if they are not used by the process that is run in the container:

capget
capset
chdir
fchown
futex
getdents64
getpid
getppid
lstat
openat
prctl
setgid
setgroups
setuid
stat

If these are not allowed, the container will fail to run with varying error messages depending on the missing syscall. I'm not familiar with the internals, but I suspect the seccomp profile is applied before the container is set up, and these syscalls are needed for the container set up. For a security model that allows limiting syscalls, it should also be possible to deny these calls if they are not needed by the process that is actually run.

docker version:

Client:
 Version:      1.11.0-rc5
 API version:  1.23
 Go version:   go1.5.3
 Git commit:   6178547
 Built:        Mon Apr 11 21:16:15 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.0-rc5
 API version:  1.23
 Go version:   go1.5.3
 Git commit:   6178547
 Built:        Mon Apr 11 21:16:15 2016
 OS/Arch:      linux/amd64

docker info:

Containers: 6
 Running: 0
 Paused: 0
 Stopped: 6
Images: 3
Server Version: 1.11.0-rc5
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 74
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: null host bridge
Kernel Version: 4.4.0-18-generic
Operating System: Ubuntu 16.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.953 GiB
Name: sl-lin-stag-clsi-2
ID: 53XM:GFZU:I5QC:MGBM:HPGW:KENC:ZBK7:GN4C:TGXD:JJPC:AO3S:IVM4
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support

uname -a: Linux sl-lin-stag-clsi-2 4.4.0-18-generic #34-Ubuntu SMP Wed Apr 6 14:01:02 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Environment details (AWS, VirtualBox, physical, etc.): Test machine was a VPS from Linode

How reproducible: Very. Just run a container with any of the above syscalls removed from the default docker profile.

Steps to Reproduce:

  1. Create a seccomp profile for echo with only the syscalls it needs. (Check using strace):
$ strace echo hi 2>&1 | grep -v '+++ exited' | cut -d'(' -f1 | grep -v ')' | sort | uniq
access
arch_prctl
brk
close
execve
exit_group
fstat
mmap
mprotect
munmap
open
read
write

The seccomp profile which allows only these syscalls is attached:
echo-seccomp-profile.json.txt

  1. Try to run echo in a container with this profile:
$ sudo docker run -it --rm --security-opt seccomp=echo-seccomp-profile.json ubuntu echo hi
docker: Error response from daemon: rpc error: code = 2 desc = "oci runtime error: open /proc/self/fd: operation not permitted".

Expected Results: It should run the echo command.

Additional info: I found the list of syscalls that docker needs (in the first paragraph of this report) by removing each syscall in turn from the default profile, and seeing which ones causes a run of echo hi to fail in the a container (except the ones explicitly needed by echo). It is likely that some of access, arch_prctl, brk, close, execve, exit_group, fstat, mmap, mprotect, munmap, open, read, write are also fundamental to the set up of the container, rather than just the echo command, since these are pretty fundamental calls.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions