Several inter-related goals:
- Kubelet continues to be usable in isolation, as well as in a Kubernetes cluster. (composability)
- Config defined in terms of Pods should work both with an apiserver, and directly with Kubelets (bootstraping, debugging, composability)
- Kubelet's can only see pods that they need to see (security)
- Don't create more apiserver API objects than necessary (ease of understanding)
- Node etcd need not be connected to master etcd. (security, isolation, scalability)
- Not have clients (be they kubectl, or kubelets or whatever) depend on the storage layer of the apiserver. (abstraction, allowing multiple implementations of the API)
Current state:
- Kubelet can read container manifest from a local file. Or from an URL. Used by Google ContainerVM.
- Kubelets can read boundPods from their local etcd, which is connected to the master machine's etcd.
- Kubelets can talk to the apiserver, and do write /events. They don't read /pods or /boundPods.
Suggested changes:
- Kubelet learns the definition of type Pods.
- Kubelet can read json pod definitions from local files or URLs.
- It also still supports containerManifests for the foreseeable future, with some way for it to determine which type to expect from a source.
- In a "typical" Kubernetes cluster, a Kubelet watches /api/v1beta3/pods for what pods it should run.
- Get rid of BoundPod and BoundPods since nothing reads them anymore.
- Add a Host field to the PodStatus (but not PodSpec).
- When scheduler writes a Binding, the PodStatus.Host is be set and resourceVersion is updated.
- A cluster is bootstrapped by first starting one or more VMs with,
Concerns that may be raised and responses:
Q1: Are changes to the set of pods bound made by the scheduler machine atomic or eventually consistent?
A1: It could work either way. If we want atomic behavior, we could implement that in apiserver more readily that we could when we directly expose our storage via etcd.
Q2: Should the kubelet be allowed to see CurrentState (now PodStatus in v1beta3)? It generates (some of) the status, so why let it see that?
A2: We could implement this if it is important. Kubelet would watch pods with a selector that matches only pods with PodStatus.Host == kubelets hostname, and could use a field selector so that only PodSpec and not PodStatus is returned.
Q3: How do we prevent Kubelets from seeing other node's pods.
A3: There are a couple of ways I can think to do this with small changes to our current authorization policy.
- One way is to have a distinguished "kubelet" user and special case its authorization.
- Another is to create a new policy as each "kubelet" is added which matches on the SourceIP of the request, and requires a selector to be part of the request which selects on "PodStatus.Host=$SourceIP". This may make some assumptions about the network security of the cluster, but seems like it could work.
- A variation on the previous is to only have one line of policy for all kubelets, but have a "condition" field in the policy that checks that PodStatus.Host matches SourceIP.
- Another variation is to have a different token for each kubelet and a separate policy for each.
Several inter-related goals:
Current state:
Suggested changes:
Concerns that may be raised and responses:
Q1: Are changes to the set of pods bound made by the scheduler machine atomic or eventually consistent?
A1: It could work either way. If we want atomic behavior, we could implement that in apiserver more readily that we could when we directly expose our storage via etcd.
Q2: Should the kubelet be allowed to see CurrentState (now PodStatus in v1beta3)? It generates (some of) the status, so why let it see that?
A2: We could implement this if it is important. Kubelet would watch pods with a selector that matches only pods with PodStatus.Host == kubelets hostname, and could use a field selector so that only PodSpec and not PodStatus is returned.
Q3: How do we prevent Kubelets from seeing other node's pods.
A3: There are a couple of ways I can think to do this with small changes to our current authorization policy.