-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
This is distilled down from a larger example, reproducing only with coreutils xargs / timeout + sleep
This is a pretty reliable example:
$ yes | head -10 | xargs --replace -P5 timeout 2 podman run --rm --init ubuntu:focal sleep 5
ERRO[0000] container not running
ERRO[0002] Error forwarding signal 18 to container d9ae2ff8a421a835771930f9743210c994e2a162df6d4099eded6f5609048c6c: error sending signal to container d9ae2ff8a421a835771930f9743210c994e2a162df6d4099eded6f5609048c6c: `/usr/lib/cri-o-runc/sbin/runc kill d9ae2ff8a421a835771930f9743210c994e2a162df6d4099eded6f5609048c6c 18` failed: exit status 1
ERRO[0000] container not running
ERRO[0002] Error forwarding signal 18 to container 69b4770e164abce77ecb0c0105414ed59a274296d13703eeaebc00b5f7873283: error sending signal to container 69b4770e164abce77ecb0c0105414ed59a274296d13703eeaebc00b5f7873283: `/usr/lib/cri-o-runc/sbin/runc kill 69b4770e164abce77ecb0c0105414ed59a274296d13703eeaebc00b5f7873283 18` failed: exit status 1
occasionally I'll get outputs like this:
+ timeout 2 podman run --rm runner-image:latest python3 -c 'import time; time.sleep(5)'
ERRO[0002] Error forwarding signal 18 to container 473036e98db1884e7a6baac667a135db7e5176b9c34db0c885b6bc10efd6fd11: can only kill running containers. 473036e98db1884e7a6baac667a135db7e5176b9c34db0c885b6bc10efd6fd11 is in state stopped: container state improper (runner-image is this one)
My actual use case ends like this however:
time="2020-10-19T15:07:21-07:00" level=error msg="container not running"
time="2020-10-19T15:07:21-07:00" level=error msg="Error forwarding signal 15 to container 31e767a031b7cd086dbda154b9df5adaff694fd111e7feedee748d91c7eedcdb: error sending signal to container 31e767a031b7cd086dbda154b9df5adaff694fd111e7feedee748d91c7eedcdb: `/usr/lib/cri-o-runc/sbin/runc kill 31e767a031b7cd086dbda154b9df5adaff694fd111e7feedee748d91c7eedcdb 15` failed: exit status 1"
With -i, it occasionally deadlocks, breaking most podman commands (such as podman ps for that user). the only way I've been able to recover the deadlock is to kill -9 that podman process
+ timeout 2 podman run --rm -i runner-image:latest python3 -c 'import time; time.sleep(5)'
(hung forever)
strace of podman ps shows what looks like a spinlock with a timeout:
$ strace podman ps
...
futex(0x28849e8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x28849e8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x28849e8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x28849e8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x28849e8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x28849e8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x28849e8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x28849e8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x28849e8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x28849e8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x28849e8, FUTEX_WAIT_PRIVATE, 0, NULL
the signals in question are:
15: SIGTERM
18: SIGCONT
Steps to reproduce the issue:
- See commands above
Describe the results you received:
See output above
Describe the results you expected:
I expect things to exit gracefully, and not deadlock
Additional information you deem important (e.g. issue happens only occasionally):
Output of podman version:
$ podman version
Version: 2.1.1
API Version: 2.0.0
Go Version: go1.15.2
Built: Wed Dec 31 16:00:00 1969
OS/Arch: linux/amd64
Output of podman info --debug:
$ podman info --debug
host:
arch: amd64
buildahVersion: 1.16.1
cgroupManager: cgroupfs
cgroupVersion: v1
conmon:
package: 'conmon: /usr/libexec/podman/conmon'
path: /usr/libexec/podman/conmon
version: 'conmon version 2.0.20, commit: '
cpus: 5
distribution:
distribution: ubuntu
version: "20.04"
eventLogger: journald
hostname: babibox
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 5.4.0-51-generic
linkmode: dynamic
memFree: 5532704768
memTotal: 8348299264
ociRuntime:
name: runc
package: 'cri-o-runc: /usr/lib/cri-o-runc/sbin/runc'
path: /usr/lib/cri-o-runc/sbin/runc
version: 'runc version spec: 1.0.2-dev'
os: linux
remoteSocket:
path: /run/user/1000/podman/podman.sock
rootless: true
slirp4netns:
executable: /usr/bin/slirp4netns
package: 'slirp4netns: /usr/bin/slirp4netns'
version: |-
slirp4netns version 1.1.4
commit: unknown
libslirp: 4.3.1-git
SLIRP_CONFIG_VERSION_MAX: 3
swapFree: 1964396544
swapTotal: 1964396544
uptime: 5h 13m 0.85s (Approximately 0.21 days)
registries:
search:
- docker.io
- quay.io
store:
configFile: /home/asottile/.config/containers/storage.conf
containerStore:
number: 2
paused: 0
running: 2
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mount_program:
Executable: /usr/bin/fuse-overlayfs
Package: 'fuse-overlayfs: /usr/bin/fuse-overlayfs'
Version: |-
fusermount3 version: 3.9.0
fuse-overlayfs: version 1.1.0
FUSE library version 3.9.0
using FUSE kernel interface version 7.31
graphRoot: /home/asottile/.local/share/containers/storage
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "false"
imageStore:
number: 23
runRoot: /run/user/1000/containers
volumePath: /home/asottile/.local/share/containers/storage/volumes
version:
APIVersion: 2.0.0
Built: 0
BuiltTime: Wed Dec 31 16:00:00 1969
GitCommit: ""
GoVersion: go1.15.2
OsArch: linux/amd64
Version: 2.1.1
Package info (e.g. output of rpm -q podman or apt list podman):
$ apt list podman
Listing... Done
podman/unknown,now 2.1.1~2 amd64 [installed]
podman/unknown 2.1.1~2 arm64
podman/unknown 2.1.1~2 armhf
podman/unknown 2.1.1~2 s390x
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
VirtualBox + aws