don't use canceled context to send KILL signal to healthcheck process#43739
don't use canceled context to send KILL signal to healthcheck process#43739thaJeztah merged 1 commit intomoby:masterfrom
Conversation
|
do we want a test for this? Or too complicated to set one up? |
|
I don't think the failure was related, so I kicked CI again, but posting here in case there's a flaky test at hand that we're not tracking yet; edit: already known as flaky; #23626 |
f529426 to
557b602
Compare
557b602 to
b4c051c
Compare
0100d64 to
25a6430
Compare
25a6430 to
e975155
Compare
|
Something failing; |
f937722 to
250d005
Compare
5d3979e to
291dcc4
Compare
|
This is weird; it's as if Windows is running an older commit before the quote was fixed; At least, I don't see anything wrong looking at this 🤔; Test: []string{"CMD-SHELL", `sh -c 'sleep 60'`}, |
|
I guess |
7a34314 to
a50a635
Compare
|
failure on Windows |
a50a635 to
cfb06cd
Compare
integration/container/health_test.go
Outdated
| err := apiClient.ContainerKill(ctx, cID, "SIGKILL") | ||
| assert.NilError(t, err) |
There was a problem hiding this comment.
Let's try if this helps with the cleanup step on Windows 🤞
Unfortunately; that didn't help, only shifting / repeating the problem; So it looks like on Windows, the health check causes an issue with stopping the container (?). Of course the question is; was that already the case (but not tested before?), or is there an issue in the implementation here? (Perhaps an issue with signal handling / incorrect signals for Windows, idk) |
cfb06cd to
5976cb3
Compare
integration/container/health_test.go
Outdated
|
|
||
| cID := container.Run(ctx, t, apiClient, func(c *container.TestContainerConfig) { | ||
| c.Config.Healthcheck = &containertypes.HealthConfig{ | ||
| Test: []string{"CMD", "sleep", "60"}, |
There was a problem hiding this comment.
Other attempt; trying CMD ["sleep", "60"] instead of starting sleep in a shell.
|
Getting more interesting now; running the command without shell looks to have fixed some instances of Windows. Possibly because Windows may be running a different version of BusyBox (ISTR older versions of BusyBox didn't propagate signals, and newer versions do). However, on Windows 2019 and Windows 2022 running with containerd, |
5976cb3 to
e419012
Compare
integration/container/health_test.go
Outdated
|
|
||
| cID := container.Run(ctx, t, apiClient, func(c *container.TestContainerConfig) { | ||
| c.Config.Healthcheck = &containertypes.HealthConfig{ | ||
| Test: []string{"CMD-SHELL", `sh -c "sleep 60"`}, |
There was a problem hiding this comment.
grmbl. back to the original issue on Windows on this one;
health_test.go:111: polling check failed: expected "Health check exceeded timeout (50ms)", got "60\": line 0: syntax error: unterminated quoted string\n"
There was a problem hiding this comment.
Relevant CI logs: builtin, containerd-shim-runhcs-v1
In both cases, the CommandLine string ultimately passed into the Win32API CreateProcess is
cmd /S /C "sh -c \"sleep 60\""
After lots of research and experimentation on a Windows box... my head hurts. Here's what I know so far:
- A Windows process' command and arguments are passed in as a single string, which is the string specified in the
CommandLinepassed intoCreateProcess. - Each process parses its own command line. Not all processes use the same algorithm.
- As a corollary,
cmd.exedoes not tokenize its subprocess' command lines for them.
- As a corollary,
cmd.exe /S /Cfollows some arcane rules, but the gist is that everything following the/Cis taken literally as a command line, with one exception: if the first non-whitespace character following the/Cis a double-quote character, the command line is modified by dropping that character and the last double-quote character in the remaining command-line.
Putting that all together:
cmd /S /C "sh -c \"sleep 60\"" is equivalent to interactively running
C:\> sh -c \"sleep 60\"
Executing that command results in CreateProcess() being called with CommandLine set to sh -c \"sleep 60\".
The Windows C Runtime parses the CommandLine sh -c \"sleep 60\" into the argv
argv[0] = sh
argv[1] = -c
argv[2] = \"sleep
argv[3] = 60\"
and busybox ash gets very confused.
The ultimate issue is that cmdProbe is incorrectly escaping the shell command line in a manner which the Windows C runtime would parse as a single argument, expecting cmd.exe /S /C to act like sh -c and consume the next argv as its argument. But cmd.exe does not unescape the argument.
containerd/containerd#5067 is very related to the above, but I don't think it has to do with the timeouts you saw. (My guess is that the timeouts have something to do with sleep.exe not existing.)
There was a problem hiding this comment.
I recall I tested various combinations; any of them would fail on at least one combination 😞
I tried running sleep as main process itself;
[]string{"CMD", "sleep", "60"},
Which would be the equivalent of the default command we set for "sleep" containers;
moby/integration-cli/test_vars_test.go
Lines 6 to 9 in 7b9275c
Tried running it as child process of sh;
[]string{"CMD", "sh", "-c", "sleep 60"},
Tried passing it as single string, but I think Linux wasn't a fan of that
[]string{"CMD-SHELL", `sh -c "sleep 60`},
And the original one (which (like above), and if I'm not mistaken, would run sh as child process of cmd, so that didn't feel right);
[]string{"CMD-SHELL", "sh", "-c", "sleep 60"},
There was a problem hiding this comment.
One thing I wonder is if the Busybox image should also be updated to set the SHELL to /bin/sh; currently it doesn't, which means that (in Dockerfiles), the default shell would still be cmd or powershell (depending on the base image)
There was a problem hiding this comment.
Also wondering if exec.Config needs to have a CommandLine property added; it just occurred to me that we have that for the container's main process, but don't have it for exec 🤔
moby/vendor/github.com/opencontainers/runtime-spec/specs-go/config.go
Lines 41 to 43 in 2646bea
There was a problem hiding this comment.
One thing I wonder is if the Busybox image should also be updated to set the
SHELLto/bin/sh
Also wondering if
exec.Configneeds to have aCommandLineproperty added
Both excellent ideas I wholeheartedly support!
64c7eab to
eb75fb8
Compare
|
windows 2022 / containerd; |
Terminating the exec process when the context is canceled has been broken since Docker v17.11 so nobody has been able to depend upon that behaviour in five years of releases. We are thus free from backwards- compatibility constraints. Co-authored-by: Nicolas De Loof <nicolas.deloof@gmail.com> Co-authored-by: Sebastiaan van Stijn <github@gone.nl> Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com> Signed-off-by: Cory Snider <csnider@mirantis.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
eb75fb8 to
4b84a33
Compare
|
|
||
| // TestHealthCheckProcessKilled verifies that health-checks exec get killed on time-out. | ||
| func TestHealthCheckProcessKilled(t *testing.T) { | ||
| skip.If(t, testEnv.RuntimeIsWindowsContainerd(), "FIXME: Broken on Windows + containerd combination") |
|
Some unrelated failures; not sure I've seen this one before, but doesn't look related at least (kicked CI once more) |
|
Alright, let's get this one in |
Looks like this was addressed, so dismissing this one.
- What I did
send KILL signal to healthcheck process when the context is canceled (timeout)
- How I did it
Use a background context to send KILL signal
- How to verify it
see reproduction scenario on #43737
- Description for the changelog
fix healthcheck process termination on timeout
- A picture of a cute animal (not mandatory but encouraged)
