pkg/cri: add timeout to drain exec io#7832
Conversation
Could there be processes though that genuinely take a long time to process? I get that the given example might be unfortunate, but wondering if there's valid cases where it's desired to wait for the proces to finish and return a result |
I think that we should care the main exec process. If the child process is still working, I think the main exec process should wait for it instead of exit. After main exec process exits, the kernel will re-parent it to pid 1 in the pid namespace. It is unlikely to trace it in shim. And exec process doesn't have dedicated cgroup. If child processes re-parent, it is impossible to trace the life-cycle of the processes and hard to clean. I don't think it is valid user case we should support. |
What command/scenario are you testing this with? Not sure if I'll have time to review tomorrow, but I'll drop a tidbit for some scenarios where that may happen; I don't think the child should exit if the parent just runs to completion, but my memories hazy. Would have to fully refresh my mind on the behavior, but fairly sure by default on Windows child processes inherit their parents console, so any event (signal) sent to the console gets propagated to the children as well (CTRL-C, CTRL-BREAK and so on). In those scenarios, yes the child will likely die as well. I think we tried to map everything but SIGKILL to a console event, and SIGKILL would be mapped to |
Some processes created by exec main process are still running after exec main process exits. And the processes are holding the stdout/stderr file descriptors which inherited from exec main process. In runc-shim, it will block forever until the processes exits. So we have to delete the exec-process recording in shim and force to release io. First issue about this described in #3286. cc @thaJeztah |
|
Ping @thaJeztah @mikebrow @dmcgowan |
By default, the child processes spawned by exec process will inherit standard io file descriptors. The shim server creates a pipe as data channel. Both exec process and its children write data into the write end of the pipe. And the shim server will read data from the pipe. If the write end is still open, the shim server will continue to wait for data from pipe. So, if the exec command is like `bash -c "sleep 365d &"`, the exec process is bash and quit after create `sleep 365d`. But the `sleep 365d` will hold the write end of the pipe for a year! It doesn't make senses that CRI plugin should wait for it. For this case, we should use timeout to drain exec process's io instead of waiting for it. Fixes: containerd#7802 Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
99f74d5 to
652111c
Compare
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
We should validate the drainExecSyncIO timeout at the beginning and raise the error for any invalid input. Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
| ) | ||
|
|
||
| func TestContainerDrainExecIOAfterExit(t *testing.T) { | ||
| // FIXME(fuweid): support it for windows container. |
There was a problem hiding this comment.
Lets make an issue for this, I don't have enough time to look into how to support this at the moment :/ but it can be a followup
There was a problem hiding this comment.
I was trying to test it by cmd /c powershell Start-Process -FilePath timeout.exe -ArgumentList -1 -NoNewWindow. But it fails. It seems that it doesn't support nohup here. Yeah. It should be handled in the follow-up.
pkg/cri/config/config.go
Outdated
| // Validation for drain_exec_sync_io_timeout | ||
| if c.DrainExecSyncIOTimeout != "" { | ||
| if _, err := time.ParseDuration(c.DrainExecSyncIOTimeout); err != nil { | ||
| return fmt.Errorf("invalid drain exec sync io timeout: %w", err) |
There was a problem hiding this comment.
Can be a followup as the other two timeouts didn't follow this pattern, but it looks like prior there was a pattern of using the toml name of the configuration in the error message reported to the user. So this would be:
| return fmt.Errorf("invalid drain exec sync io timeout: %w", err) | |
| return fmt.Errorf("invalid `drain_exec_sync_io_timeout`: %w", err) |
pkg/cri/config/config_test.go
Outdated
| }, | ||
| DrainExecSyncIOTimeout: "10", | ||
| }, | ||
| expectedErr: "invalid drain exec sync io timeout: time: missing unit in duration \"10\"", |
There was a problem hiding this comment.
Matching against the entire error (meaning including the bit from the stdlib) seems brittle, but I don't know how much the Go team values not changing random error strings 🤷♂️. It's just an errors.New defined inline in the function so if they ever changed the text a little this would fail https://cs.opensource.google/go/go/+/refs/tags/go1.20.1:src/time/format.go;l=1652
There was a problem hiding this comment.
good point! just verify the error should contains "invalid drain_exec_sync_io_timeout".
|
|
||
| select { | ||
| case <-timerCh: | ||
|
|
| _, err := execProcess.Delete(ctx, containerd.WithProcessKill) | ||
| if err != nil { | ||
| if !errdefs.IsNotFound(err) { | ||
| return fmt.Errorf("failed to release exec io by deleting exec process %q: %w", | ||
| execProcess.ID(), err) | ||
| } | ||
| } | ||
| return fmt.Errorf("failed to drain exec process %q io in %s because io is still held by other processes", | ||
| execProcess.ID(), drainExecIOTimeout) |
There was a problem hiding this comment.
I'm confused, if we successfully deleted the process, wouldn't there be a success route here? From line 300 down there's no happy path, just different ways to fail?
There was a problem hiding this comment.
Yes. From line 300, it means that there are still exec-child-processes running in the container. The delete action is to remove the record about the exec-init-process. It doesn't mean that all the resources related to this exec context have been cleanup. The error message will warn the user that hey, you might have leaky processes issue in your container. please checkout. It might be caused by D status that exec-child-process run into. It might be caused by wrong configuration. The error can be helpful.
There was a problem hiding this comment.
Okay the log above (to me) reads as if there's a way to get us unstuck from this state "Trying to delete exec process to release io" so it was surprising to see nothing but different failures below. This is fine though, we can reword this if needed
1. it's easy to check wrong input if using drain_exec_sync_io_timeout in error 2. avoid to use full error message, as part of error generated by go stdlib would be changed in the future 3. delete the extra empty line Signed-off-by: Wei Fu <fuweid89@gmail.com>
By default, the child processes spawned by exec process will inherit standard io file descriptors. The shim server creates a pipe as data channel. Both exec process and its children write data into the write end of the pipe. And the shim server will read data from the pipe. If the write end is still open, the shim server will continue to wait for data from pipe.
So, if the exec command is like
bash -c "sleep 365d &", the exec process is bash and quit after createsleep 365d. But thesleep 365dwill hold the write end of the pipe for a year! It doesn't make senses that CRI plugin should wait for it.For this case, we should use timeout to drain exec process's io instead of waiting for it.
Fixes: #7802
Signed-off-by: Wei Fu fuweid89@gmail.com