Skip to content

integration-cli: Replace sleeps with polling in swarm lock/unlock tests#33541

Merged
thaJeztah merged 1 commit intomoby:masterfrom
aaronlehmann:lock-unlock-tests-poll
Jul 7, 2017
Merged

integration-cli: Replace sleeps with polling in swarm lock/unlock tests#33541
thaJeztah merged 1 commit intomoby:masterfrom
aaronlehmann:lock-unlock-tests-poll

Conversation

@aaronlehmann
Copy link
Copy Markdown

This will hopefully make the tests more robust by replacing a fixed 3s
sleep with a polling loop that looks at whether the key PEM file is
encrypted or not.

cc @cyli

This will hopefully make the tests more robust by replacing a fixed 3s
sleep with a polling loop that looks at whether the key PEM file is
encrypted or not.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@aaronlehmann aaronlehmann added the status/failing-ci Indicates that the PR in its current state fails the test suite label Jun 6, 2017
@aaronlehmann
Copy link
Copy Markdown
Author

A test failed in experimental. @cyli do you have any ideas what I might have missed?

@cyli
Copy link
Copy Markdown
Contributor

cyli commented Jun 6, 2017

@aaronlehmann I'm trying to track that down - in the logs it looks like there were 10 seconds between when the daemon was restarted and when leadership election finished and it was caught up via snapshot by the leader, so it should have succeeded (defaultReconciliationTimeout is 30)

@cyli
Copy link
Copy Markdown
Contributor

cyli commented Jun 7, 2017

Non-conclusive update so far: I don't think it's an issue with the test - when I replicate this test failure with a bunch of extra logging enabled, it looks like when the daemon starts up, it does get a snapshot update from one of the managers. However, on the cluster watch start (when the manager is first starting up), the manager does not seem to be spotting the unlock key change in that cluster object for some reason. Still trying to track down why.

@cpuguy83
Copy link
Copy Markdown
Member

Any news here?

@cyli
Copy link
Copy Markdown
Contributor

cyli commented Jun 20, 2017

Apologies, I hadn't had a chance to replicate again. The VM I was running this on ran without failing the test for a couple days until my machine rebooted. :| Will try again.

@cyli
Copy link
Copy Markdown
Contributor

cyli commented Jun 21, 2017

@aaronlehmann has pointed out that when restoring a snapshot, the raft store in swarm deletes all existing clusters and creates new clusters from the snapshot, so we may not get an update event. Am working on a fix.

@thaJeztah
Copy link
Copy Markdown
Member

@cyli I see moby/swarmkit#2281 was merged; should we bump the swarmkit version, and close this one, or are the change in this PR still needed?

@cyli
Copy link
Copy Markdown
Contributor

cyli commented Jun 23, 2017

@thaJeztah This is still a good change I think - polling is a better idea than a straight 3 second sleep

@LK4D4
Copy link
Copy Markdown
Contributor

LK4D4 commented Jul 6, 2017

Code looks ok. @cyli is this okay to merge?

@cyli
Copy link
Copy Markdown
Contributor

cyli commented Jul 6, 2017

Apologies I thought I had already +1'ed it. It LGTM - wasn't sure if we needed to rebase or rerun the tests before merging.

@aaronlehmann
Copy link
Copy Markdown
Author

Added the rebuild/* label but it doesn't seem to have done anything.

@dnephin dnephin removed the status/failing-ci Indicates that the PR in its current state fails the test suite label Jul 6, 2017
@dnephin
Copy link
Copy Markdown
Member

dnephin commented Jul 6, 2017

It used to be that rebuild/ didn't work when the failing-ci label was set. I fixed it in master, but I'm not sure if we've redeployed since then.

It seems we haven't. Removed failing-ci fixed it

@thaJeztah
Copy link
Copy Markdown
Member

Windows seems to be very flaky recently https://jenkins.dockerproject.org/job/Docker-PRs-WoW-RS1/15467/console

19:44:34 ----------------------------------------------------------------------
19:44:34 FAIL: check_test.go:97: DockerSuite.TearDownTest
19:44:34 
19:44:34 check_test.go:98:
19:44:34     testEnv.Clean(c, dockerBinary)
19:44:34 environment/clean.go:67:
19:44:34     t.Fatalf("error removing containers %v : %v (%s)", containers, result.Error, result.Combined())
19:44:34 ... Error: error removing containers [b53ad5b22938] : exit status 1 (Error response from daemon: Could not kill running container b53ad5b22938e29bdcb5a58dfca5676a5ce0ab16f209d1dd73599b5fff1851f3, cannot remove - Cannot kill container b53ad5b22938e29bdcb5a58dfca5676a5ce0ab16f209d1dd73599b5fff1851f3: invalid container: b53ad5b22938e29bdcb5a58dfca5676a5ce0ab16f209d1dd73599b5fff1851f3
19:44:34 )
19:44:34 
19:44:34 
19:44:34 ----------------------------------------------------------------------
19:44:34 PANIC: docker_cli_events_test.go:679: DockerSuite.TestEventsFilterImageInContainerAction
19:44:34 
19:44:34 ... Panic: Fixture has panicked (see related PANIC)
19:44:34 

@aaronlehmann
Copy link
Copy Markdown
Author

The windows rebuild failed immediately:

21:07:02 re-exec error: exit status 1: output: BackupWrite \\?\D:\CI\CI-1700e4c94\daemon\windowsfilter\f45a55918c7e7e83e5a1b321efcabc7415774d9634c8efa0fb91385ba2b471e7\Files\Windows\Speech\Engines\TTS\en-US\MSTTSLocEnUS.dat: There is not enough space on the disk.

@thaJeztah
Copy link
Copy Markdown
Member

trying again, but looks like there's some nodes out of space 😞

ping @jhowardmsft @johnstep

@lowenna
Copy link
Copy Markdown
Member

lowenna commented Jul 7, 2017

@thaJeztah yeah a few nodes have been cleared down. I'll do a purge of the others in the morning.

@thaJeztah
Copy link
Copy Markdown
Member

all green now @dnephin @vdemeester LGTY?

Copy link
Copy Markdown
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐸

Copy link
Copy Markdown
Member

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah merged commit ced0b6c into moby:master Jul 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants