Skip to content

nerdctl fails when running concurrently due to CNI errors: CHAIN_USER_ADD failed (File exists): chain CNI-ISOLATION-STAGE-2 #2908

@aojea

Description

@aojea

Description

See in kubernetes-sigs/kind#3533

Command Output: time="2024-04-01T08:34:37Z" level=fatal msg="failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: time=\"2024-04-01T08:34:34Z\" level=fatal msg=\"failed to call cni.Setup: plugin type=\\\"firewall\\\" failed (add): running [/usr/sbin/iptables -t filter -N CNI-ISOLATION-STAGE-2 --wait]: exit status 4: iptables v1.8.7 (nf_tables):  CHAIN_USER_ADD failed (File exists): chain CNI-ISOLATION-STAGE-2\\n\"\nFailed to write to log, write /var/lib/nerdctl/1935db59/containers/default/18e88fcb538d49417539810b[25](https://github.com/kubernetes-sigs/kind/actions/runs/8505888448/job/23295140055?pr=3563#step:8:26)67922886e120771be00f165b5d64cf41a381f5/oci-hook.createRuntime.log: file already closed: unknown"

Stack Trace: 
sigs.k8s.io/kind/pkg/errors.WithStack
	sigs.k8s.io/kind/pkg/errors/errors.go:59
sigs.k8s.io/kind/pkg/exec.(*LocalCmd).Run
	sigs.k8s.io/kind/pkg/exec/local.go:124
sigs.k8s.io/kind/pkg/cluster/internal/providers/nerdctl.createContainerWithWaitUntilSystemdReachesMultiUserSystem
	sigs.k8s.io/kind/pkg/cluster/internal/providers/nerdctl/provision.go:383
sigs.k8s.io/kind/pkg/cluster/internal/providers/nerdctl.planCreation.func3
	sigs.k8s.io/kind/pkg/cluster/internal/providers/nerdctl/provision.go:123
sigs.k8s.io/kind/pkg/errors.UntilErrorConcurrent.func1
	sigs.k8s.io/kind/pkg/errors/concurrent.go:30
runtime.goexit
	runtime/asm_amd64.s:1598
Error: Process completed with exit code 1.

Steps to reproduce the issue

It seems that can be reproduced by running multiple containers in parallel, at one point the cni plugin will race and fail

Describe the results you received and expected

CNI is a nice and simple implementation for container networking, but for doing more complex operations it always fall short because of this simplicity.
When trying to implement more advanced features, the chaining model executes different binaries that try to do different operations that may need to be synchronized across different containers.
Docker or podman moved to different model from CNI, libnetwork and netvark because of this, though I don't think that this is completely necessary, and CNI is still able to handle this problems if nerdctl creates its own CNI plugin implementation instead of relying on the composition of multiple reference implementation plugins.

I'm happy to collaborate on this if needed, I'll just need a bit of bootstrapping on the requirements, but it does not seems a complicated problem

What version of nerdctl are you using?

NERDCTL_VERSION: 1.7.4

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions