WIP: Modify kube-push for GCE to bring down the existing master VM and completely replace it with a new one by a-robinson · Pull Request #3174 · kubernetes/kubernetes

a-robinson · 2014-12-30T18:40:25Z

This makes upgrades less likely to break in weird ways and adds support for easily upgrading underlying components on the master like the guest OS or etcd.

To do this, we reserve the IP address of the master after it's created and store all dynamically created files on a persistent disk (PD). Then, kube-push consists of swapping the PD and reserved IP address over to a new VM with the desired components on it.

This has been tested to pass the /validate endpoint after upgrading between a few different recent commit versions.

a-robinson · 2014-12-30T18:55:01Z

Related to #2524

lavalamp · 2014-12-30T19:07:15Z

cluster/gce/templates/mount-pd.sh

How is it tested that everything important on the master is stored in a dir mounted from this pd?

It's not, and frankly I don't have any good plan for getting that assumption under test, given that usage of the filesystem is spread all over the place in tons of different shell scripts and salt configs :/

Have any ideas?

lavalamp · 2014-12-30T19:12:20Z

Can you run the following sequence of commands and report back?

hack/build-go.sh
go run hack/e2e.go -v -build -up -test
go run hack/e2e.go -v -build -push -test

I think the monitoring.sh test is broken at the moment but the other tests should all pass.

a-robinson · 2014-12-31T00:08:54Z

Hm, sorry for the delay, but I've been having a lot of trouble with the e2e tests. After finally getting a cluster up, it only passed 7/10 tests before the push, and only 2/10 after. I'm running them again and looking into why they failed, but do you know if there've been any issues with the e2e tests today?

completely replace it with a new one. This makes upgrades less likely to break in weird ways and adds support for easily upgrading underlying components on the master like the guest OS or etcd. To do this, I reserve the IP address of the master after it's created and store all dynamically created files on a persistent disk (PD). Then, kube-push consists of swapping the PD and reserved IP address over to a new VM with the desired components on it. This has been tested to pass the /validate endpoint after upgrading between a few different recent commit versions.

a-robinson · 2015-01-02T22:13:42Z

Thanks for reminding me to run the e2e tests -- it turns out the /validate endpoint is woefully insufficient for validating the cluster, and that the cluster doesn't work properly (it can't even schedule pods).

The reason is that the kubelet on the minions isn't able to connect to etcd or the apiserver is because they currently talk to them over internal IPs, and GCE doesn't seem to offer a way to transfer internal IP addresses between VMs. This could be fixed by using a route for all minion-to-master traffic, or we could wait until the kubelet's direct dependency on etcd goes away (PR #846 / Issue #2483) and then have the kubelet speak to the apiserver using its external IP instead. CJ and I chatted and would lean toward the latter to avoid adding more network cruft, so this PR may have to wait a bit unless you feel differently.

a-robinson · 2015-01-02T23:50:02Z

Also closely related to #3168

lavalamp · 2015-01-02T23:58:43Z

The reason is that the kubelet on the minions isn't able to connect to etcd or the apiserver is because they currently talk to them over internal IPs, and GCE doesn't seem to offer a way to transfer internal IP addresses between VMs.

That's unfortunate. Does the internal IP change if you reboot the master? (Is it possible to reboot & supply a new disk at the same time?)

a-robinson · 2015-01-03T00:23:29Z

The IP stays the same when you reboot, but boot disks can't be detached from their instance, so there's no way to swap in a fresh one.

lavalamp · 2015-01-06T21:30:06Z

The status of this is that it's blocked on either making a special route or using master's public IP address, correct?

a-robinson · 2015-01-06T21:35:20Z

Yup. I'll self-assign this until it's unblocked and ready again.

bgrant0607 · 2015-02-06T07:01:32Z

@a-robinson Are you still working on this?

a-robinson · 2015-02-06T18:10:52Z

I haven't touched it since my last comment. I should probably check out how it works now that the minion's dependency on etcd has been removed on GCP, but I expect that the change of internal IP will still break salt, at the very least.

I'll strip out the kube-push change from the PD mounting improvements and get those checked in.

a-robinson · 2015-02-22T20:04:50Z

I tried this out again after rebasing to head. It still doesn't work, with the current cause being the use of the master's internal IP address by minions rather than its hostname or external IP. I'll take a look into whether changing our salt configs would break anything. In the meantime, I've split out the directory and static IP changes from this into #4715.

…nd reserve the master's IP upon creation to make it easier to replace the master later. This pulls out the parts of PR kubernetes#3174 that don't break anything and will make upgrading existing clusters in the future less painful. Add /etc/salt to the master-pd

a-robinson · 2015-03-09T18:50:23Z

Closing, will open a new PR once I've played with the salt configs to try not explicitly using internal IP.

lavalamp reviewed Dec 30, 2014
View reviewed changes

lavalamp self-assigned this Dec 30, 2014

davidopp mentioned this pull request Dec 30, 2014

Cluster Upgrade #2524

Closed

a-robinson force-pushed the gce-push branch from 8b80308 to 934d922 Compare December 31, 2014 00:12

a-robinson assigned a-robinson and unassigned lavalamp Jan 6, 2015

a-robinson mentioned this pull request Feb 22, 2015

Add the salt-overlay directory to the GCE master-pd and reserve the master's IP #4715

Merged

googlebot added the cla: yes label Feb 22, 2015

a-robinson changed the title ~~Modify kube-push for GCE to bring down the existing master VM and completely replace it with a new one~~ WIP: Modify kube-push for GCE to bring down the existing master VM and completely replace it with a new one Feb 23, 2015

a-robinson closed this Mar 9, 2015

mbforbes mentioned this pull request Mar 27, 2015

Master upgrades #6075

Closed

mbforbes mentioned this pull request Mar 27, 2015

Node upgrades #6079

Closed

10 tasks

a-robinson deleted the gce-push branch June 5, 2015 01:20

Conversation

a-robinson commented Dec 30, 2014

Uh oh!

a-robinson commented Dec 30, 2014

Uh oh!

lavalamp Dec 30, 2014

Choose a reason for hiding this comment

Uh oh!

a-robinson Dec 31, 2014

Choose a reason for hiding this comment

Uh oh!

lavalamp commented Dec 30, 2014

Uh oh!

a-robinson commented Dec 31, 2014

Uh oh!

a-robinson commented Jan 2, 2015

Uh oh!

a-robinson commented Jan 2, 2015

Uh oh!

lavalamp commented Jan 2, 2015

Uh oh!

a-robinson commented Jan 3, 2015

Uh oh!

lavalamp commented Jan 6, 2015

Uh oh!

a-robinson commented Jan 6, 2015

Uh oh!

bgrant0607 commented Feb 6, 2015

Uh oh!

a-robinson commented Feb 6, 2015

Uh oh!

a-robinson commented Feb 22, 2015

Uh oh!

a-robinson commented Mar 9, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants