integration-cli: Replace sleeps with polling in swarm lock/unlock tests#33541
integration-cli: Replace sleeps with polling in swarm lock/unlock tests#33541thaJeztah merged 1 commit intomoby:masterfrom
Conversation
This will hopefully make the tests more robust by replacing a fixed 3s sleep with a polling loop that looks at whether the key PEM file is encrypted or not. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
|
A test failed in experimental. @cyli do you have any ideas what I might have missed? |
|
@aaronlehmann I'm trying to track that down - in the logs it looks like there were 10 seconds between when the daemon was restarted and when leadership election finished and it was caught up via snapshot by the leader, so it should have succeeded ( |
|
Non-conclusive update so far: I don't think it's an issue with the test - when I replicate this test failure with a bunch of extra logging enabled, it looks like when the daemon starts up, it does get a snapshot update from one of the managers. However, on the cluster watch start (when the manager is first starting up), the manager does not seem to be spotting the unlock key change in that cluster object for some reason. Still trying to track down why. |
|
Any news here? |
|
Apologies, I hadn't had a chance to replicate again. The VM I was running this on ran without failing the test for a couple days until my machine rebooted. :| Will try again. |
|
@aaronlehmann has pointed out that when restoring a snapshot, the raft store in swarm deletes all existing clusters and creates new clusters from the snapshot, so we may not get an update event. Am working on a fix. |
|
@cyli I see moby/swarmkit#2281 was merged; should we bump the swarmkit version, and close this one, or are the change in this PR still needed? |
|
@thaJeztah This is still a good change I think - polling is a better idea than a straight 3 second sleep |
|
Code looks ok. @cyli is this okay to merge? |
|
Apologies I thought I had already +1'ed it. It LGTM - wasn't sure if we needed to rebase or rerun the tests before merging. |
|
Added the |
|
It used to be that rebuild/ didn't work when the failing-ci label was set. I fixed it in master, but I'm not sure if we've redeployed since then. It seems we haven't. Removed failing-ci fixed it |
|
Windows seems to be very flaky recently https://jenkins.dockerproject.org/job/Docker-PRs-WoW-RS1/15467/console |
|
The windows rebuild failed immediately: |
|
trying again, but looks like there's some nodes out of space 😞 ping @jhowardmsft @johnstep |
|
@thaJeztah yeah a few nodes have been cleared down. I'll do a purge of the others in the morning. |
|
all green now @dnephin @vdemeester LGTY? |
This will hopefully make the tests more robust by replacing a fixed 3s
sleep with a polling loop that looks at whether the key PEM file is
encrypted or not.
cc @cyli