WIP: Modify kube-push for GCE to bring down the existing master VM and completely replace it with a new one#3174
WIP: Modify kube-push for GCE to bring down the existing master VM and completely replace it with a new one#3174a-robinson wants to merge 1 commit intokubernetes:masterfrom
Conversation
|
Related to #2524 |
There was a problem hiding this comment.
How is it tested that everything important on the master is stored in a dir mounted from this pd?
There was a problem hiding this comment.
It's not, and frankly I don't have any good plan for getting that assumption under test, given that usage of the filesystem is spread all over the place in tons of different shell scripts and salt configs :/
Have any ideas?
|
Can you run the following sequence of commands and report back? I think the monitoring.sh test is broken at the moment but the other tests should all pass. |
|
Hm, sorry for the delay, but I've been having a lot of trouble with the e2e tests. After finally getting a cluster up, it only passed 7/10 tests before the push, and only 2/10 after. I'm running them again and looking into why they failed, but do you know if there've been any issues with the e2e tests today? |
completely replace it with a new one. This makes upgrades less likely to break in weird ways and adds support for easily upgrading underlying components on the master like the guest OS or etcd. To do this, I reserve the IP address of the master after it's created and store all dynamically created files on a persistent disk (PD). Then, kube-push consists of swapping the PD and reserved IP address over to a new VM with the desired components on it. This has been tested to pass the /validate endpoint after upgrading between a few different recent commit versions.
|
Thanks for reminding me to run the e2e tests -- it turns out the /validate endpoint is woefully insufficient for validating the cluster, and that the cluster doesn't work properly (it can't even schedule pods). The reason is that the kubelet on the minions isn't able to connect to etcd or the apiserver is because they currently talk to them over internal IPs, and GCE doesn't seem to offer a way to transfer internal IP addresses between VMs. This could be fixed by using a route for all minion-to-master traffic, or we could wait until the kubelet's direct dependency on etcd goes away (PR #846 / Issue #2483) and then have the kubelet speak to the apiserver using its external IP instead. CJ and I chatted and would lean toward the latter to avoid adding more network cruft, so this PR may have to wait a bit unless you feel differently. |
|
Also closely related to #3168 |
That's unfortunate. Does the internal IP change if you reboot the master? (Is it possible to reboot & supply a new disk at the same time?) |
|
The IP stays the same when you reboot, but boot disks can't be detached from their instance, so there's no way to swap in a fresh one. |
|
The status of this is that it's blocked on either making a special route or using master's public IP address, correct? |
|
Yup. I'll self-assign this until it's unblocked and ready again. |
|
@a-robinson Are you still working on this? |
|
I haven't touched it since my last comment. I should probably check out how it works now that the minion's dependency on etcd has been removed on GCP, but I expect that the change of internal IP will still break salt, at the very least. I'll strip out the kube-push change from the PD mounting improvements and get those checked in. |
|
I tried this out again after rebasing to head. It still doesn't work, with the current cause being the use of the master's internal IP address by minions rather than its hostname or external IP. I'll take a look into whether changing our salt configs would break anything. In the meantime, I've split out the directory and static IP changes from this into #4715. |
…nd reserve the master's IP upon creation to make it easier to replace the master later. This pulls out the parts of PR kubernetes#3174 that don't break anything and will make upgrading existing clusters in the future less painful. Add /etc/salt to the master-pd
|
Closing, will open a new PR once I've played with the salt configs to try not explicitly using internal IP. |
This makes upgrades less likely to break in weird ways and adds support for easily upgrading underlying components on the master like the guest OS or etcd.
To do this, we reserve the IP address of the master after it's created and store all dynamically created files on a persistent disk (PD). Then, kube-push consists of swapping the PD and reserved IP address over to a new VM with the desired components on it.
This has been tested to pass the /validate endpoint after upgrading between a few different recent commit versions.