runc-shim: check/kill init's children even if skipping init exit #10608

laurazard · 2024-08-18T17:28:03Z

This PR moves the init exit processing in s.handleStarted out of the if exited block – regardless of whether the exec was an early exit or not, if it was the last pending exec we should process it's exit. We've already removed c from s.pendingExecs, and this being the last pending exec means that after this we'll never process the exit if it doesn't happen here.

This is important to ensure correctness in the shared-pid namespace scenario (together with the changes from #10634) – with these changes we:

prevent new execs from being started after the init process of the container has been reaped, and
ensure that even if the last exec succeeds (as it may in the shared-pid namespace scenario) we don't drop the init exit.

k8s-ci-robot · 2024-08-18T17:28:05Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

ningmingxiao · 2024-08-18T20:26:56Z

exec process may have child process. if father process exited, process 1 will be child process' mother.It may be not be killed.
and we change the behavior, we can't exec a container if init process exited.

laurazard · 2024-08-18T21:13:30Z

Sorry, I don't quite understand what you mean. This PR doesn't change anything about which processes are killed, just when. We have a check – runc.ShouldKillAllOnExit, which checks whether the container has a private PID namespace. This is the same as before.

However, this is only called when we're about to send the exit event for the init process – which is now delayed if there are pending execs. We shouldn't delay this – we can run this logic as soon as we receive the exit for the init process, regardless of whether we're going to delay sending that exit or not.

ningmingxiao · 2024-08-19T11:41:53Z

Sorry, I don't quite understand what you mean. This PR doesn't change anything about which processes are killed, just when. We have a check – runc.ShouldKillAllOnExit, which checks whether the container has a private PID namespace. This is the same as before.

However, this is only called when we're about to send the exit event for the init process – which is now delayed if there are pending execs. We shouldn't delay this – we can run this logic as soon as we receive the exit for the init process, regardless of whether we're going to delay sending that exit or not.

crictl exec -it containerdA sh
sleep 1000&

sleep will not be killed by ip.KillAll

laurazard · 2024-08-19T12:07:26Z

That might be true, but that's not a new feature of this PR.

Also, wouldn't sleep get reparented in this case?

ningmingxiao · 2024-08-19T13:41:21Z

I try this pr, issue still exist.

[root@LIN-FB738BFD367 pods]# cat container.json 
{
  "metadata": {
    "name": "gpu"
  },
  "image": {
    "image": "docker.io/library/busybox:1.28"
  },
  "command": [
    "top"
  ],
  "tty": true,
  "linux": {
  }
}
[root@LIN-FB738BFD367 pods]# cat pod.json 
{
  "metadata": {
    "name": "sandbox-01027951909",
    "namespace": "default",
    "attempt": 1,
    "uid": "hdishd83djaidwnduwk28bcsb"
  },
  "log_directory": "/tmp",
  "linux": {
  }
}

package main

import (
        "fmt"
        "os/exec"
        "sync"
        "time"
)

func main() {
        go func ()  {
                time.Sleep(time.Second*1)
                runCmd("kill -9 `pidof top`")
        }()
        test()
        fmt.Println("done")
}

func runCmd(command string) {
        cmd := exec.Command("bash", "-c", command)
        err := cmd.Run()
        if err != nil {
                fmt.Println(err)
        }
}

func test() {
        var wg sync.WaitGroup
        const numContainers = 100
        for i := 1; i <= numContainers; i++ {
                wg.Add(1)
                go func(i int) {
                        defer wg.Done()
                        runCmd("crictl exec 7d307  date")
                }(i)
        }
        wg.Wait()
}
[root@LIN-FB738BFD367 pods]# crictl  run --no-pull container.json  pod.json
7d307a9dea00e40dc726aaeac172436ab0aeb8652bd407f804948713df296e35
[root@LIN-FB738BFD367 pods]# go run main.go
[root@LIN-FB738BFD367 pods]# crictl  ps
CONTAINER           IMAGE                            CREATED             STATE               NAME                ATTEMPT             POD ID              POD
7d307a9dea00e       docker.io/library/busybox:1.28   3 minutes ago       Running             gpu                 0                   d4c701491262a

top is exited
root     1322712  0.1  0.0 1241848 16184 pts/0   Sl   21:36   0:00 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id d4c701491262aa245be999b
root     1322744  0.0  0.0   1024     4 ?        Ss   21:36   0:00  \_ /pause

for k8s just set shareProcessNamespace =true (share pause pid ns) in pod yaml. (if we don't use crictl create pods)
and run main.go can easily reproduce.

laurazard · 2024-08-19T14:12:22Z

As I mentioned before, I unfortunately can't repro – could you share containerd logs that you get when you reproduce the issue? If possible, with some extra logs such as 511ed30

laurazard · 2024-08-19T15:50:50Z

for k8s just set shareProcessNamespace =true (share pause pid ns) in pod yaml. (if we don't use crictl create pods)
and run main.go can easily reproduce.

If this issue occurs when sharing the PID namespace, then we have no guarantee that by the time we receive the init's exit, we have/will soon receive the exec's exits. In the meantime, I think (need to confirm) that other execs could "start" execution/increment s.pendingExecs, meaning that we would continue delaying emitting the init exit up until a point where all the current pending exec exit subscribers no longer contain the init exit to emit.

I don't think we guaranteed ordering before all the recent changes when sharing a pid namespace. Maybe we shouldn't skip processing the init exit in that case.

If a container isn't running in it's own pid namespace/it's init process is not the init process of the pid namespace, it's possible for an exec started just before the init process exited/was reaped to start successfully/not early exit. If we previously skipped sending an init processes exit due to pending execs, we need to emit it regardless of whether the pending exec was an early exit or not, otherwise this exit won't get emitted at all. Signed-off-by: Laura Brehm <laurabrehm@hey.com>

laurazard · 2024-08-27T22:51:19Z

Superseded by #10647

k8s-ci-robot added do-not-merge/work-in-progress size/S labels Aug 18, 2024

laurazard mentioned this pull request Aug 18, 2024

fix init process exited event lost #10603

Closed

k8s-ci-robot added size/M and removed size/S labels Aug 18, 2024

laurazard changed the base branch from release/1.7 to main August 18, 2024 21:33

k8s-ci-robot added size/XXL and removed size/M labels Aug 18, 2024

laurazard force-pushed the runc-shim-missing-init-exit-fix branch from 8db9224 to 1c4fd15 Compare August 18, 2024 21:36

k8s-ci-robot added size/M and removed size/XXL labels Aug 18, 2024

laurazard force-pushed the runc-shim-missing-init-exit-fix branch from 1c4fd15 to 47f0829 Compare August 18, 2024 22:07

k8s-ci-robot added size/S and removed size/M labels Aug 18, 2024

laurazard force-pushed the runc-shim-missing-init-exit-fix branch from 47f0829 to 1c8283a Compare August 19, 2024 13:03

corhere mentioned this pull request Aug 23, 2024

runc-shim: refuse to start execs after init exits #10634

Closed

laurazard closed this Aug 23, 2024

laurazard reopened this Aug 23, 2024

laurazard force-pushed the runc-shim-missing-init-exit-fix branch from a1751c7 to 1c8283a Compare August 23, 2024 21:46

laurazard marked this pull request as ready for review August 23, 2024 21:46

k8s-ci-robot removed the do-not-merge/work-in-progress label Aug 23, 2024

dosubot bot added the area/runtime Runtime label Aug 23, 2024

laurazard force-pushed the runc-shim-missing-init-exit-fix branch from 1c8283a to 8543b69 Compare August 23, 2024 21:51

laurazard self-assigned this Aug 23, 2024

laurazard requested a review from dmcgowan August 23, 2024 21:53

laurazard closed this Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

runc-shim: check/kill init's children even if skipping init exit #10608

runc-shim: check/kill init's children even if skipping init exit #10608

Uh oh!

laurazard commented Aug 18, 2024 •

edited

Loading

Uh oh!

k8s-ci-robot commented Aug 18, 2024

Uh oh!

ningmingxiao commented Aug 18, 2024 •

edited

Loading

Uh oh!

laurazard commented Aug 18, 2024

Uh oh!

ningmingxiao commented Aug 19, 2024 •

edited

Loading

Uh oh!

laurazard commented Aug 19, 2024 •

edited

Loading

Uh oh!

ningmingxiao commented Aug 19, 2024 •

edited

Loading

Uh oh!

laurazard commented Aug 19, 2024

Uh oh!

laurazard commented Aug 19, 2024 •

edited

Loading

Uh oh!

laurazard commented Aug 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

runc-shim: check/kill init's children even if skipping init exit #10608

runc-shim: check/kill init's children even if skipping init exit #10608

Uh oh!

Conversation

laurazard commented Aug 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Aug 18, 2024

Uh oh!

ningmingxiao commented Aug 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laurazard commented Aug 18, 2024

Uh oh!

ningmingxiao commented Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laurazard commented Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ningmingxiao commented Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laurazard commented Aug 19, 2024

Uh oh!

laurazard commented Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laurazard commented Aug 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

laurazard commented Aug 18, 2024 •

edited

Loading

ningmingxiao commented Aug 18, 2024 •

edited

Loading

ningmingxiao commented Aug 19, 2024 •

edited

Loading

laurazard commented Aug 19, 2024 •

edited

Loading

ningmingxiao commented Aug 19, 2024 •

edited

Loading

laurazard commented Aug 19, 2024 •

edited

Loading