Skip to content

[Bug]: Listing network from Docker fails during container removal #17341

@Agalin

Description

@Agalin

Issue Description

If you start Podman API server and try to inspect a network from Docker (or Docker-compatible library, e.g. one used by Gitlab Runner) you get also a list of containers in that network (for backward compatibility with Docker, Podman doesn't show that data).

But if container is currently being removed -or added, not sure here - this request fails with:

Error response from daemon: container <container id> does not exist in database: no such container

The interesting part is that you get the same error even if you run docker network ls instead of docker network inspect <network>.

I believe it may be the cause of this Gitlab Runner issue (or it's at least one of the causes) and one similar error that I believe has not been reported to Gitlab yet that I've only observed with Podman 4.4.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Start Podman API server (podman system service).
  2. Create a network (podman network create test).
  3. Configure native docker to use Podman's socket (export DOCKER_HOST=unix://<path to socket>).
  4. Loop container creation (check below the list for an example code).
  5. Watch either output of docker network list or docker network inspect test (check below the list for an example code).

Example creation loop:

while true
do
    podman run -d --rm -ti --network test fedora sleep 1;
done

Example watch (you need to open the file to find those lines, terminal control keys used to clear screan are stored in it so simple cat won't work):

watch -tn 0.1 --exec docker network ls | tee -a test.log

Describe the results you received

Podman server sometimes fails with container not found error. Log entry:

INFO[0052] Request Failed(Internal Server Error): container cf3b535ee60be15c9b5ed36240caa923119be4f06f9ff80bd98620d3c7e3ef3e does not exist in database: no such container 
@ - - [02/Feb/2023:17:15:09 +0000] "GET /v1.41/networks HTTP/1.1" 500 178 "" "Docker-Client/20.10.23 (linux)"

Describe the results you expected

No errors for either request.

Ff Podman finds it cannot retrieve container details because it does no longer exist it should just remove it from network inspect output.

In case of network list I'm not even sure if there is a reason to create this containers list in the first place - does JSON response contain that field? I don't see an option in docker's cli to show containers in this view.

podman info output

host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - cpuset
  - cpu
  - memory
  - pids
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: Unknown
    path: /usr/local/libexec/podman/conmon
    version: 'conmon version 2.1.5, commit: 4cb1e4d73699ce0cef2c3d89b652b3d15be429b3'
  cpuUtilization:
    idlePercent: 98.16
    systemPercent: 0.57
    userPercent: 1.27
  cpus: 4
  distribution:
    distribution: fedora
    variant: cloud
    version: "37"
  eventLogger: file
  hostname: <cut>
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 993
      size: 1
    - container_id: 1
      host_id: 200000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 993
      size: 1
    - container_id: 1
      host_id: 200000
      size: 65536
  kernel: 6.1.8-200.fc37.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1381150720
  memTotal: 8329515008
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.7.2-3.fc37.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.7.2
      commit: 0356bf4aff9a133d655dc13b1d9ac9424706cac4
      rundir: /run/user/993/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/993/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-8.fc37.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 8328572928
  swapTotal: 8328835072
  uptime: 27h 58m 5.00s (Approximately 1.12 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  docker.io:
    Blocked: false
    Insecure: false
    Location: docker.io
    MirrorByDigestOnly: false
    Mirrors: <cut>
    Prefix: docker.io
    PullFromMirror: ""
    ...: <cut>
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/gitlab-runner/.config/containers/storage.conf
  containerStore:
    number: 10
    paused: 0
    running: 1
    stopped: 9
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/gitlab-runner/.local/share/containers/storage
  graphRootAllocated: 52527345664
  graphRootUsed: 5485887488
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 8
  runRoot: /run/user/993/containers
  transientStore: false
  volumePath: /home/gitlab-runner/.local/share/containers/storage/volumes
version:
  APIVersion: 4.4.0
  Built: 1675343283
  BuiltTime: Thu Feb  2 13:08:03 2023
  GitCommit: ""
  GoVersion: go1.19.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Running on Fedora 37 in a VM with self-compiled Podman and conmon. SELinux enabled.

Same error observed earlier using latest Fedora 37 packages (Podman 4.3.1, conmon 2.1.5).

Same Gitlab Runner issue observed even earlier (on Podman 4.3.0, older conmon, runc, netavark, aardvark, etc.) although I don't have exact versions nor a way to confirm 100% that it means it's caused by the same problem. If this is the case then oldest report (author of that Gitlab issue) comes from Podman 3.4.2.

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.locked - please file new issue/PRAssist humans wanting to comment on an old issue or PR with locked comments.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions