Speed up hack/local-up-cluster.sh by manually checking if etcd is ready#963
Speed up hack/local-up-cluster.sh by manually checking if etcd is ready#963ghodss wants to merge 3 commits intokubernetes:masterfrom ghodss:speed_up_local_up_cluster
Conversation
|
I borrowed from the design of your reference to improve the code, and at this point I don't think there's enough there to justify a common.sh between the two files. Let me know if you feel differently, but for now I think it's good to go. Thanks! |
|
(I think the failed build is a spurious failure of the integration tests?) |
|
I agree that the build looks to be a flaky test but I'm not 100% sure as you are changing the timing on this start up. I started looking at historic build failures in travis and we do have a few flakey tests (I filed issues). But this failure mode seems to be new. Could it be that etcd starts responding to http before it is really ready? |
There was a problem hiding this comment.
maybe [ -z "$(which curl)" ] instead? I've always seen that more in shell.
|
I reran travis and it failed again. Pointer to log for posterity: https://travis-ci.org/GoogleCloudPlatform/kubernetes/jobs/33052367#L389 This leads me to suspect it is something in this PR that is causing this. |
|
I think I tracked this down. The etcd server will return 200 for the /version URL even though it isn't 100% up. If you get http://127.0.0.1:4001/v2/stats/leader right after that you'll get a 500. I'd have the loop check that URL instead. It seems the best proxy for "health" for etcd. |
|
|
|
@smarterclayton Sounds good... Any recommendation of where to put the wait_for_url function and how to make it accessible to both hack/test-cmd.sh and hack/local-up-cluster.sh? We could also make a wait_for_etcd function that builds on wait_for_url, since test-cmd.sh seems to have the same bug of checking for version instead of leader (as @jbeda mentioned above). |
|
Good question - it's kind of related to cluster/kube-util.sh and arguably hack/local-up-cluster.sh should go there (we already have |
|
The general ideas for this stuff is that cluster/... is the stuff for That is why "dev-build-and-up.sh" is in hack but "kube-up.sh" is in As we get to an end user distribution I think that stuff will become more Joe |
|
Hrm - is a local machine with Kube+Docker on it (Linux) valid? I can certainly see that as being the default for distros to ship with - so vagrant just happens to be the "i don't want to or can't run Docker" and local is "don't waste boxes"? |
|
@ghodss in smarterclayton@b037989 (part of #1009) I moved the function into hack/util.sh for now - you can invoke start_etcd and both ETCD_DIR and ETCD_PID are set so you can trap EXIT to kill / rm them. |
|
@ghodss Sam -- can you start using the function that @smarterclayton defined? (https://github.com/GoogleCloudPlatform/kubernetes/pull/1009/files#diff-b83ac5455d252fc0916d54924166ffa5R40). I'll be happy to review/merge after that. Thanks! |
|
Made a new PR #1046 since it's a much simpler commit. Closing this guy. |
Test refactoring to support "project" flag.
UPSTREAM: <carry>: openshift-hack/images/os/Dockerfile: Add io.openshift.build.versions, etc.
Use curl to check if etcd is ready, speeding up startup time of hack/local-up-cluster.sh from >5s to under 1s. If curl is unavailable, revert to the old behavior of waiting 5 seconds. Happy to refactor this in any way if there's a way to make it cleaner.