Skip to content

Speed up hack/local-up-cluster.sh by manually checking if etcd is ready#963

Closed
ghodss wants to merge 3 commits intokubernetes:masterfrom
ghodss:speed_up_local_up_cluster
Closed

Speed up hack/local-up-cluster.sh by manually checking if etcd is ready#963
ghodss wants to merge 3 commits intokubernetes:masterfrom
ghodss:speed_up_local_up_cluster

Conversation

@ghodss
Copy link
Copy Markdown
Contributor

@ghodss ghodss commented Aug 20, 2014

Use curl to check if etcd is ready, speeding up startup time of hack/local-up-cluster.sh from >5s to under 1s. If curl is unavailable, revert to the old behavior of waiting 5 seconds. Happy to refactor this in any way if there's a way to make it cleaner.

@ghodss
Copy link
Copy Markdown
Contributor Author

ghodss commented Aug 20, 2014

I borrowed from the design of your reference to improve the code, and at this point I don't think there's enough there to justify a common.sh between the two files. Let me know if you feel differently, but for now I think it's good to go. Thanks!

@ghodss
Copy link
Copy Markdown
Contributor Author

ghodss commented Aug 20, 2014

(I think the failed build is a spurious failure of the integration tests?)

@jbeda
Copy link
Copy Markdown
Contributor

jbeda commented Aug 20, 2014

I agree that the build looks to be a flaky test but I'm not 100% sure as you are changing the timing on this start up. I started looking at historic build failures in travis and we do have a few flakey tests (I filed issues). But this failure mode seems to be new.

Could it be that etcd starts responding to http before it is really ready?

Comment thread hack/local-up-cluster.sh Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe [ -z "$(which curl)" ] instead? I've always seen that more in shell.

@jbeda
Copy link
Copy Markdown
Contributor

jbeda commented Aug 20, 2014

I reran travis and it failed again. Pointer to log for posterity: https://travis-ci.org/GoogleCloudPlatform/kubernetes/jobs/33052367#L389

This leads me to suspect it is something in this PR that is causing this.

@jbeda
Copy link
Copy Markdown
Contributor

jbeda commented Aug 20, 2014

I think I tracked this down. The etcd server will return 200 for the /version URL even though it isn't 100% up. If you get http://127.0.0.1:4001/v2/stats/leader right after that you'll get a 500.

I'd have the loop check that URL instead. It seems the best proxy for "health" for etcd.

@smarterclayton
Copy link
Copy Markdown
Contributor

hack/test-cmd.sh already has a curl loop method to wait for etcd - let's refactor and share that vs creating a new one (and fixing it of course for any issues)

@ghodss
Copy link
Copy Markdown
Contributor Author

ghodss commented Aug 21, 2014

@smarterclayton Sounds good... Any recommendation of where to put the wait_for_url function and how to make it accessible to both hack/test-cmd.sh and hack/local-up-cluster.sh? We could also make a wait_for_etcd function that builds on wait_for_url, since test-cmd.sh seems to have the same bug of checking for version instead of leader (as @jbeda mentioned above).

@smarterclayton
Copy link
Copy Markdown
Contributor

Good question - it's kind of related to cluster/kube-util.sh and arguably hack/local-up-cluster.sh should go there (we already have cluster/local, it's just not flushed out much). Would want others to weigh in on that though.

@jbeda
Copy link
Copy Markdown
Contributor

jbeda commented Aug 21, 2014

The general ideas for this stuff is that ​cluster/... is the stuff for
launching and using clusters and would eventually end up in the user
downloadable SDK. The stuff in hack/... is for building and development
that end users shouldn't have to worry about.

That is why "dev-build-and-up.sh" is in hack but "kube-up.sh" is in
cluster.

As we get to an end user distribution I think that stuff will become more
clear.

Joe

@smarterclayton
Copy link
Copy Markdown
Contributor

Hrm - is a local machine with Kube+Docker on it (Linux) valid? I can certainly see that as being the default for distros to ship with - so vagrant just happens to be the "i don't want to or can't run Docker" and local is "don't waste boxes"?

@smarterclayton
Copy link
Copy Markdown
Contributor

@ghodss in smarterclayton@b037989 (part of #1009) I moved the function into hack/util.sh for now - you can invoke start_etcd and both ETCD_DIR and ETCD_PID are set so you can trap EXIT to kill / rm them.

@jbeda
Copy link
Copy Markdown
Contributor

jbeda commented Aug 26, 2014

@ghodss Sam -- can you start using the function that @smarterclayton defined? (https://github.com/GoogleCloudPlatform/kubernetes/pull/1009/files#diff-b83ac5455d252fc0916d54924166ffa5R40). I'll be happy to review/merge after that.

Thanks!

@ghodss
Copy link
Copy Markdown
Contributor Author

ghodss commented Aug 27, 2014

Made a new PR #1046 since it's a much simpler commit. Closing this guy.

@ghodss ghodss closed this Aug 27, 2014
@ghodss ghodss deleted the speed_up_local_up_cluster branch August 27, 2014 00:00
vishh pushed a commit to vishh/kubernetes that referenced this pull request Apr 6, 2016
Test refactoring to support "project" flag.
damemi pushed a commit to damemi/kubernetes that referenced this pull request Sep 21, 2021
UPSTREAM: <carry>: openshift-hack/images/os/Dockerfile: Add io.openshift.build.versions, etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants