I'm looking for a way to make kubernetes to do rolling system updates of node machines. I'm targetting the kernel and operating system levels, though kublet would come along for the ride.
My thought is that do an "update" of a node you simply turn up a new node, and then tear down the old one. This model is simple and easy to reason about, much easier than in-place update. We want kubernetes to know that a node is gone immediately though, so downtime of containers on that node is small. I think this is consistant with current plans? (e.g. #4855)
Here's my proposal:
- Add kubectl (and API) functionality to add and remove nodes from the cluster dynamically, also ensuring the state gets exported to the "spec" for that node (I think what's needed is already in the status).
- Within the kube shell scripts, refactor the startup and shutdown functions a little so they use lower-level functions that spin up a new machine and remove an old one using the kubectl calls. Add direct access to these new functions via something like kube-removenode and kube-addnode scripts. These scripts can poll the status and spec, and block until the change is complete (or error out if the spec changes).
- The "rolling" part of the update can then be done with another small shell script in the kubernetes codebase, or by a user script.
I'm with Meteor, the folks running kubernetes on AWS, and my intention is to implement this myself within about 2 months if it looks like a good direction to go. I don't want to step on anyones toes, and after I do this work I'd really like to upstream it, so I'm trying to ensure that this is compatible with the rest of the project.
I'm looking for a way to make kubernetes to do rolling system updates of node machines. I'm targetting the kernel and operating system levels, though kublet would come along for the ride.
My thought is that do an "update" of a node you simply turn up a new node, and then tear down the old one. This model is simple and easy to reason about, much easier than in-place update. We want kubernetes to know that a node is gone immediately though, so downtime of containers on that node is small. I think this is consistant with current plans? (e.g. #4855)
Here's my proposal:
I'm with Meteor, the folks running kubernetes on AWS, and my intention is to implement this myself within about 2 months if it looks like a good direction to go. I don't want to step on anyones toes, and after I do this work I'd really like to upstream it, so I'm trying to ensure that this is compatible with the rest of the project.