Skip to content

Compat API suspected fd-related hang w/ aardvark-dns #13932

@cevich

Description

@cevich

/kind bug

Description

While introducing new Fedora 36 VM images (#13376), it was found that a APIv2 (rest API) test is hanging (log).

Steps to reproduce the issue:

  1. Clone PR Update to F36 CI VM Images + Testing netavark/aardvark-dns #13376 @ 58c9ecd3e09c60bc7d4bc29bc732653e6acdad9b

  2. Run hack/get_ci_vm.sh APIv2 test on fedora-36

  3. At the shell prompt in the vm, run ./contrib/cirrus/runner.sh
    -or- run env CONTAINERS_CONF=$GOSRC/test/apiv2/containers.conf PODMAN=$GOSRC/bin/podman pytest --verbose --disable-warnings ./test/apiv2/python

Describe the results you received:

============================= test session starts ==============================
platform linux -- Python 3.10.4, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
python client -- requests library
rootdir: /var/tmp/go/src/github.com/containers/podman
plugins: requests-mock-1.8.0
collecting ... collected 42 items
test/apiv2/python/rest_api/test_v2_0_0_container.py::ContainerTestCase::test_attach SKIPPED [  2%]
...cut...
test/apiv2/python/rest_api/test_v2_0_0_manifest.py::ManifestTestCase::test_manifest_409 PASSED [ 64%]
test/apiv2/python/rest_api/test_v2_0_0_network.py::NetworkTestCase::test_connect 
Timed out!

Describe the results you expected:

All tests should pass

Additional information you deem important (e.g. issue happens only occasionally):

It does not appear to matter if --log-level=debug is set (see 58c9ecd3e09c60bc7d4bc29bc732653e6acdad9b), nor if the logformatter pipe is involved or not (see 09edb5345c9d54d8bc33962035ab71baee861aca). However, if the logformatter is in place and the tests are not running in verbose mode (see 99d4e401ec594010c13ed7e162c99a1fca32407b), it will incorrectly appear (due to loss of output buffers) like the bats-based tests are failing.

Output of podman version:

version:
  APIVersion: 4.0.0-dev
  Built: 1650392165
  BuiltTime: Tue Apr 19 13:16:05 2022
  GitCommit: 58c9ecd3e09c60bc7d4bc29bc732653e6acdad9b
  GoVersion: go1.18
  Os: linux
  OsArch: linux/amd64
  Version: 4.0.0-dev

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.26.0-dev
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpus: 2
  distribution:
    distribution: fedora
    variant: cloud
    version: "36"
  eventLogger: journald
  hostname: cirrus-task-4942077850025984
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.17.3-300.fc36.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 1454792704
  memTotal: 4109578240
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.4-1.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.4
      commit: 6521fcc5806f20f6187eb933f9f45130c86da230
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 4109365248
  swapTotal: 4109365248
  uptime: 3m 50.47s
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.0.0-dev
  Built: 1650392165
  BuiltTime: Tue Apr 19 13:16:05 2022
  GitCommit: 58c9ecd3e09c60bc7d4bc29bc732653e6acdad9b
  GoVersion: go1.18
  Os: linux
  OsArch: linux/amd64
  Version: 4.0.0-dev

Package info (e.g. output of rpm -q podman or apt list podman):

Fedora release 36 (Thirty Six)
Kernel:  5.17.3-300.fc36.x86_64
Cgroups:  cgroup2fs
conmon-2.1.0-2.fc36-x86_64
containers-common-1-56.fc36-noarch
container-selinux-2.183.0-1.fc36-noarch
criu-3.16.1-12.fc36-x86_64
crun-1.4.4-1.fc36-x86_64
golang-1.18-1.fc36-x86_64
libseccomp-2.5.3-2.fc36-x86_64
netavark-1.0.2-1.fc36-x86_64
package aardvark is not installed
package containernetworking-plugins is not installed
podman-4.0.3-1.fc36-x86_64
runc-1.1.1-1.fc36-x86_64
skopeo-1.7.0-1.fc36-x86_64
slirp4netns-1.2.0-0.2.beta.0.fc36-x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

BIG FAT WARNING/HEADS-UP: At the time of creating this issue, it is believed this failure is an indication that downstream users of the compat. API in podman 4 w/ netavark/aardvark-dns, will experience similar/related negative effects. Current speculation is it is related to file-descriptor handling somewhere between the pytest and aardvark-dns. If true, these kinds of issues are notoriously difficult to debug (and sometimes fix) so progress on this issue could be negatively impacted.

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.locked - please file new issue/PRAssist humans wanting to comment on an old issue or PR with locked comments.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions