[Carry 38704] Implement exec kill API#41548
Conversation
36291fc to
edcc15e
Compare
b5e70eb to
abeca6b
Compare
|
Hey @AkihiroSuda I finally managed to fix all the CI errors. Could you take a look at this PR again please? Thank you a lot, and I'm sorry it took me so many tries. For sure squash the commits... |
|
@dtaniwaki PTAL? |
|
Cool! Thank you for updating my PR, @awesomebytes. It looks great to me 👍 |
|
@dtaniwaki @AkihiroSuda I just pushed the changes you asked for to my best of my understanding/abilities. It's on the hands of the CI now! (I did run the unit tests locally first. The integration tests take forever in my machine, it's way faster in your CI). |
|
Hey @AkihiroSuda all tests have passed (there is a flaky test unrelated to my changes that failed apparently?). Are we good to merge? Additionally, how does the timeline look for API 1.42 to be available? Is it a matter of a few months? And adding to that question, if I'd like to use it in my environment already, is there an 'easy' way? Something like generating .deb files from the build in my machine? Thanks again! |
|
v20.10 (API v1.41) is feature-freezed, so probably this PR will be merged after the release of v20.10, before v21.XX (v21.03?). |
1ae413d to
c76a227
Compare
be08b5c to
529491c
Compare
|
The PR is ready for review or hopefully merge again @AkihiroSuda @cpuguy83 The CI failed because of some DNS issue that I believe has nothing to do with this PR. |
api/server/router/container/exec.go
Outdated
There was a problem hiding this comment.
nit: s/postContainerExecKill/exec kill/
api/server/router/container/exec.go
Outdated
There was a problem hiding this comment.
Is this what we do in the container API? I think we should have this in ContainerExecKill rather than here.
There was a problem hiding this comment.
This block appears to be a copy-paste from the postContainersKill:
moby/api/server/router/container/container_routes.go
Lines 248 to 254 in 9a6ff68
Let me know how to proceed for this PR - leave both as-is, update the exec kill one only, or update both?
daemon/exec.go
Outdated
daemon/exec.go
Outdated
There was a problem hiding this comment.
Given that we already have the exec config, it seems better to call the contaunerd API directly than to grab this info again.
|
@awesomebytes are you planning to make these updates? If not I'd be happy to do so and create a new PR. I'd like to see this get merged. |
|
@adamjseitz I recently started a new job, and got a new computer (so I lost my workspace for this) so I don't foresee me having much time to push this forward unfortunately. I've invited you to the repository of my fork so you can directly do any changes and we keep this PR going (and the conversation on just one place, maintainers will probably appreciate it). I hope that works for you! Thanks for taking the baton! |
2a3d0b0 to
e088385
Compare
|
Could you rebase? |
157d306 to
de34cdb
Compare
|
@AkihiroSuda thanks for taking a look. I rebased. CI failures this time looks like these known flaky tests: |
cpuguy83
left a comment
There was a problem hiding this comment.
Overall looks great.
Can you squash commits?
api/types/types.go
Outdated
| } | ||
|
|
||
| // ExecKillOptions is a temp struct used by execKill | ||
| type ExecKillOptions struct { |
There was a problem hiding this comment.
This looks like an unused type?
de34cdb to
585151d
Compare
I removed that unused type squashed it down to three commits. I was not sure if it is important to retain the DCO sign-offs from all three contributors. I can squash it into just one if you prefer. Looks like all tests passed this time. |
It looks like the second and third commit are "touch ups" for the first one, so I think it's fine to squash the commits; w.r.t. the DCO sign-offs, it's ok to have multiple sign-offs in the commit, e.g. |
thaJeztah
left a comment
There was a problem hiding this comment.
Sorry for the slow response; this one kept dropping off my list. Overall, this looks good (thanks!) I left some comments/thoughts inline (feedback/input welcome!)
I did have some other thoughts though (perhaps worth a discussion); while looking at this, I started wondering if there would be situations where a graceful termination of exec processes would be needed.
I realize this endpoint os largely modelled after /container/{id}/kill, but for containers, we also have /container/{id}/stop, which allows for that (defaults to SIGTERM -> timeout -> SIGKILL). ContainerKill also looks to have extra handling;
Line 40 in 12f1b3c
e.g. for possibly dead processes:
Lines 141 to 146 in 12f1b3c
Not sure if all of those are needed for exec though! But I see that ContainerExecStart does have some logic for graceful termination;
Lines 280 to 282 in 12f1b3c
In case the /exec/{id}/kill endpoint is used with a custom signal, I think the only guarantee it would give is that the signal was sent, but if the intent in that case is to check if the process is terminated, would require manual checking (and re-trying with a different signal). Of course, other cases could be to only signal (non-terminal signals, e.g., to reload a process).
api/swagger.yaml
Outdated
| /exec/{id}/kill: | ||
| post: | ||
| summary: "Kill an exec instance" |
There was a problem hiding this comment.
Mostly a "random" thought; I know that we use kill for the equivalent container endpoint (and command, description), and the kill naming probably originates from it being the equivalent of kill on command line (on Linux).
That said; the naming has led to some confusion among users (as, depending on the signal, kill doesn't always kill)
So .. mostly a random thought (could use feedback on this); would it make sense to name this signal (as in: the endpoint would signal (send a signal to) an exec instance)? Wondering if that would be more descriptive.
There was a problem hiding this comment.
The naming and graceful shutdown issues you mentioned above seem to be related, so I'll respond to both here.
As you noted above, /container/{id}/kill has some extra handling to wait for the container to stop if a SIGKILL signal is used. It is somewhat surprising behavior to me that which signal is sent changes other behavior (and this doesn't seem to be noted in the endpoint's documentation). The endpoint does feel a bit muddled to me - is it intended to behave like the Linux kill command, or is it intended to kill the process? It can do either one, but some behaviors, such as "send SIGKILL and don't wait" don't seem to be possible through the API. All this seems to stem from the confusion regarding the name.
Given that, this implementation of exec kill might indeed be better described as /exec/{id}/signal in order to help differentiate it. A second endpoint, /exec/{id}/stop might also be useful, mirroring the behavior of the equivalent container API for graceful shutdown.
There was a problem hiding this comment.
Taking a further look, some more of the container APIs would probably make sense to have exec equivalents:
/container/{id}/execs(or some other way to list execs in a container? not sure if some way to do this exists that I missed)./exec/{id}/attach(looks like this has been previously proposed, see Attach to running exec #9527)/exec/{id}/wait
I don't think that we would want to hold up this PR to create all these, though. I expect the signal endpoint, combined with polling the existing inspect endpoint if the exit status is needed, will fulfill a lot of users' needs.
api/swagger.yaml
Outdated
| /exec/{id}/kill: | ||
| post: | ||
| summary: "Kill an exec instance" | ||
| description: "Kill an exec instance." |
There was a problem hiding this comment.
Perhaps this should use a similar descriptor as is used for container kill;
Lines 6569 to 6572 in 01ae952
| responses: | ||
| 204: | ||
| description: "No error" | ||
| 404: |
There was a problem hiding this comment.
Looks like we need to add 400 as well (invalid parameter), which can happen if an invalid signal is passed.
| ContainerExecInspect(id string) (*backend.ExecInspect, error) | ||
| ContainerExecResize(name string, height, width int) error | ||
| ContainerExecStart(ctx context.Context, name string, stdin io.Reader, stdout io.Writer, stderr io.Writer) error | ||
| ContainerExecKill(ctx context.Context, name string, signame string) error |
There was a problem hiding this comment.
Wondering if we should make this take a (to be created) types.ExecKillConfig, so that more options can be added without making a long list of arguments (I'll leave another comment with some thoughts why this could be needed; also see #43206).
daemon/exec.go
Outdated
| return err | ||
| } | ||
|
|
||
| return daemon.containerd.SignalProcess(ctx, e.ContainerID, name, int(sig)) |
There was a problem hiding this comment.
Do we know what (kind of) errors this can return?
There was a problem hiding this comment.
I tried to trace the possibilities here, and I think we either get a errdefs.NotFound, or the result of errdefs.FromGRPC. I'm not sure which errors we might end up with via the GRPC calls though.
585151d to
0fa0915
Compare
|
I pushed an update addressing your comments.
CI failure looks like a known flaky test #39352. Let me know if we need to do anything differently with errors from |
… to commands in some cases leaving processes still running If `--tty` is not passed to `docker exec` because stdout is not available (`[ ! -t 1 ]`), like due to redirection to file (`&> build.log`) or if stdin is not available (`< /dev/null`), then docker does not forward kill signals to the process started and they remain running. To fix the issue, the `DOCKER_EXEC_PID_FILE_PATH` env variable with the value `/tmp/docker-exec-pid-<timestamp>` is passed to the process called with `docke exec` and the process started stores its pid in the file path passed. Traps are set in `run-docker.sh` that runs the `docker exec` command to receive any kills signals, and if it does, it runs another `docker exec` command to read the pid of the process previously started from `DOCKER_EXEC_PID_FILE_PATH` and then kills it and all its children. See Also: docker/cli#2607 moby/moby#9098 moby/moby#41548 https://stackoverflow.com/questions/41097652/how-to-fix-ctrlc-inside-a-docker-container Also passing `--init` to `docker run` to "Run an init inside the container that forwards signals and reaps processes", although it does not work for above cases, but may helpful in others. The `--init` flag changes will only engage on new container creation. https://docs.docker.com/engine/reference/run/#specify-an-init-process https://docs.docker.com/engine/reference/commandline/run/ ``` ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo &> build.log ^C $ ./scripts/run-docker.sh ps -efww Running container 'termux-package-builder' from image 'termux/package-builder'... UID PID PPID C STIME TTY TIME CMD builder 1 0 0 05:48 pts/0 00:00:00 bash builder 9243 0 0 06:01 pts/1 00:00:00 bash builder 28127 0 0 06:12 ? 00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo builder 28141 28127 0 06:12 ? 00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo builder 28449 28141 1 06:12 ? 00:00:00 ninja -w dupbuild=warn -j 8 builder 28656 28449 0 06:12 ? 00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28657 28656 79 06:12 ? 00:00:01 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28694 28449 0 06:12 ? 00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28695 28694 89 06:12 ? 00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28728 28449 0 06:12 ? 00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28729 28728 0 06:12 ? 00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28731 28449 0 06:12 ? 00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28734 28731 0 06:12 ? 00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28740 28449 0 06:12 ? 00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28741 28740 0 06:12 ? 00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28744 0 0 06:12 pts/2 00:00:00 ps -efww builder 28748 28449 0 06:12 ? 00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28752 28748 0 06:12 ? 00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28753 28449 0 06:12 ? 00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28754 28753 0 06:12 ? 00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang builder 28755 28449 0 06:12 ? 00:00:00 ninja -w dupbuild=warn -j 8 $ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo &> build.log $ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo Running container 'termux-package-builder' from image 'termux/package-builder'... ERROR: Another build is already running within same environment. ```
|
@thaJeztah I guess this PR will need to be updated again and reviewed again? |
I'd be happy to update it again if there's a chance it's actually going to get merged this time. |
|
I would love to see this feature available in Docker! I'm not terribly familiar with Moby's internals, but I can give updating it a shot. |
Signed-off-by: Daisuke Taniwaki <daisuketaniwaki@gmail.com> Signed-off-by: Sammy Pfeiffer <sammypfeiffer@gmail.com> Signed-off-by: Adam Seitz <adamjseitz@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
0fa0915 to
e6ccb20
Compare
Closes #35703 #9098
Hello this is just a rebase of #38704
I rebased #38704 (which has all the rest of the questions answered).
Note that I do not know go... I just really would like to have this feature implemented. I hope there is CI that can pickup any problem I generated while rebasing.