Conversation
|
ping @tiborvass @SamWhited @kolyshkin ptal |
testutil/daemon/daemon.go
Outdated
There was a problem hiding this comment.
Looks like currently, d.log is always the default noplogger, so nothing gets logged (I didn't find any occurrences of WithTestLogger() used anywhere)
I wonder if we should/could have these logs printed only in the case that the test fails. I seem to recall that was the default behaviour of Go tests (see golang/go#21461), but it looks like having -v (verbose) prints each test, including logs and things printed on stdout/stderr (and for the junit.xml files to be created, I think we need -v)
There was a problem hiding this comment.
Dropped this commit, because this would also unmount the daemon root if live-restore is enabled, probably causing this failure:
=== FAIL: amd64.integration-cli TestDockerDaemonSuite/TestExecWithUserAfterLiveRestore (2.90s)
--- FAIL: TestDockerDaemonSuite/TestExecWithUserAfterLiveRestore (2.90s)
daemon.go:26: Creating a new daemon at: "/go/src/github.com/docker/docker/bundles/test-integration/TestDockerDaemonSuite/TestExecWithUserAfterLiveRestore"
docker_cli_daemon_test.go:2723: assertion failed: expression is false: err == nil: Output: unable to find user test: no matching entries in passwd file
2010345 to
ab15be9
Compare
|
Whoops; made a typo; fixed (also found some more small changes) |
ab15be9 to
bfdeb6c
Compare
testutil/daemon/daemon_unix.go
Outdated
There was a problem hiding this comment.
While at it, we can switch to using mount.Unmount() which does all the same things.
Yet better, use RecursiveUnmount() to further simplify things here.
There was a problem hiding this comment.
I now recall I had a PR that made changes here; #36511, but for some reason it kept failing.
Let me try using the RecursiveUnmount() that sounds like a clean approach (if it works)
There was a problem hiding this comment.
I take that back. Neither filepath.Walk() nor RecursiveUnmount() should be used here, as per #36511 (comment)
There was a problem hiding this comment.
I.e. the approach in #36511 is the correct one; I would take that one here )
There was a problem hiding this comment.
Let me rebase the other one, to keep this one simpler (and in case it's still problematic). I can rebase it on top of this one
testutil/daemon/daemon.go
Outdated
There was a problem hiding this comment.
Would be good to note in the commit message that this fixes bogus error from removing docker.pid that was never created.
There was a problem hiding this comment.
If this is just to fix the error when the pid file doesn't exist, maybe check `os.IsNotExist so we don't ignore anything else? Something like:
if d.pidFile != "" {
if err := os.Remove(d.pidFile); !os.IsNotExist(err) {
return err
}
}
return nilThere was a problem hiding this comment.
@SamWhited I would
- not care about error returned from
os.Remove()at all - not return any error here as this will fail the test
I.e. current approach looks fine for me.
kolyshkin
left a comment
There was a problem hiding this comment.
left a few nitpicks and one important comment about double d.cmd.Wait()
6940646 to
6d82e2a
Compare
There was a problem hiding this comment.
I know this was already in there, but it seems rather fragile. Maybe we could file an issue to fix this later, or if it's already returning a specific error type we could check if it's the appropriate type?
There was a problem hiding this comment.
Perhaps we could have it check the daemon logs to check if it's the actual error we're expecting (although that one also changed between versions);
19.03
docker run -it --rm --privileged -v /var/lib/docker docker:19.03-dind dockerd --bridge=nosuchbridge --bip=1.1.1.1
# ....
# failed to start daemon: You specified -b & --bip, mutually exclusive options. Please specify only one17.06
docker run -it --rm --privileged -v /var/lib/docker docker:17.06-dind dockerd --bridge=nosuchbridge --bip=1.1.1.1
# ....
# Error starting daemon: You specified -b & --bip, mutually exclusive options. Please specify only oneThere was a problem hiding this comment.
oh, actually, that's the error we're looking for below 🤦♂
testutil/daemon/daemon.go
Outdated
There was a problem hiding this comment.
If this is just to fix the error when the pid file doesn't exist, maybe check `os.IsNotExist so we don't ignore anything else? Something like:
if d.pidFile != "" {
if err := os.Remove(d.pidFile); !os.IsNotExist(err) {
return err
}
}
return nil
testutil/daemon/daemon.go
Outdated
There was a problem hiding this comment.
Should we also log the error here in case a pid file still exists that needs to be cleaned up?
There was a problem hiding this comment.
@SamWhited I initially had that (see #40062 (comment)); question is though; is it important to fail (or log) the failure to remove the pidfile (if the test otherwise completed successfully)?
There was a problem hiding this comment.
It could potentially let us know that something is wrong with our tests. I'd say that it's certainly not worth failing tests over, but is worth logging personally, but I defer to your judgement.
6d82e2a to
083519d
Compare
|
Argh; Windows failed, but for some reason, Windows RS5 never prints the actual failure It could be a compile error, and if I remember correctly, Windows RS1 does print the actual failure (using the exact same script 🤷♂); I'm gonna trigger a rebuild with RS1 enabled to see if I can find what's wrong. Probably also open a ticket for it, so that someone with more Windows knowledge could help fixing that |
|
There you go; Windows RS1 correctly shows the compile error, whereas RS5 just hides it (and both report all unit tests to have "passed", but possibly the problem occurs after that); |
Before:
DONE 2 tests in 12.272s
---> Making bundle: .integration-daemon-stop (in bundles/test-integration)
umount: bundles/test-integration/root: mountpoint not found
After:
DONE 2 tests in 14.650s
---> Making bundle: .integration-daemon-stop (in bundles/test-integration)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
… cleanup
test-daemons remove their docker.pid when stopped, so the `.integration-daemon-stop`
script did not find the mounts for those daemons, and therefore was not unmounting
them.
As a result, cleaning up the bundles directory on consecutive runs of the tests would fail;
rm: cannot remove 'bundles/test-integration/TestDockerSwarmSuite/TestSwarmInit/d1f188f3f5472/root': Device or resource busy
This patch unmounts the root directory of the daemon as part of the cleanup step.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This makes it easier to debug issues with tests that start multiple daemons. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This patch stores the location of the pidfile, so that we can use the
same path that was set to create it. If no pidfile was created, we'll
not try to remove it.
We're now also ignoring errors when removing the pidfile, as they should
not fail the test (especialy if no pidfile was created in the first place,
as that could potentially hide the actual failure).
This may help with "failures" such as the one below:
```
FAIL: check_test.go:347: DockerSwarmSuite.TearDownTest
check_test.go:352:
d.Stop(c)
/go/src/github.com/docker/docker/internal/test/daemon/daemon.go:414:
t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
... Error: Error while stopping the daemon d1512c423813a : remove /go/src/github.com/docker/docker/bundles/test-integration/DockerSwarmSuite.TestServiceLogs/d1512c423813a/docker.pid: no such file or directory
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
If the daemon was stopped successfully in one of the retry-loops,
the function would return early;
```go
for {
select {
case err := <-d.Wait:
---> the function returns here, both on "success" and on "fail"
return err
case <-time.After(20 * time.Second):
...
```
In that case, the pidfile would not be cleaned up. This patch changes
the function to clean-up the pidfile in a defer, so that it will
always be removed after succesfully stopping the daemon.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
`daemon.StartWithLogFile()` already creates a goroutine that calls `d.cmd.Waits()` and sends its return to the channel, `d.Wait`. This code called `d.cmd.Wait()` one more time, and returns the error, which may produce an error _because_ it's called a second time, and potentially cause an incorrect test-result. (thanks to Kir Kolyshkin for spotting this) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
083519d to
293c1a2
Compare
|
all green now @cpuguy83 @tiborvass PTAL |
tiborvass
left a comment
There was a problem hiding this comment.
LGTM but we should probably also add more t.Helper() in testutil/daemon (when daemons are starting/stopping the file lines are completely useless)
agreed; I didn't go looking for other ones yes; this one just stood out while testing my changes |
See individual commits for a description of the changes