Skip to content

Kubelet to understand pods, and to be able to pull from apiserver #2483

@erictune

Description

@erictune

Several inter-related goals:

  1. Kubelet continues to be usable in isolation, as well as in a Kubernetes cluster. (composability)
  2. Config defined in terms of Pods should work both with an apiserver, and directly with Kubelets (bootstraping, debugging, composability)
  3. Kubelet's can only see pods that they need to see (security)
  4. Don't create more apiserver API objects than necessary (ease of understanding)
  5. Node etcd need not be connected to master etcd. (security, isolation, scalability)
  6. Not have clients (be they kubectl, or kubelets or whatever) depend on the storage layer of the apiserver. (abstraction, allowing multiple implementations of the API)

Current state:

  1. Kubelet can read container manifest from a local file. Or from an URL. Used by Google ContainerVM.
  2. Kubelets can read boundPods from their local etcd, which is connected to the master machine's etcd.
  3. Kubelets can talk to the apiserver, and do write /events. They don't read /pods or /boundPods.

Suggested changes:

  1. Kubelet learns the definition of type Pods.
  2. Kubelet can read json pod definitions from local files or URLs.
    • It also still supports containerManifests for the foreseeable future, with some way for it to determine which type to expect from a source.
  3. In a "typical" Kubernetes cluster, a Kubelet watches /api/v1beta3/pods for what pods it should run.
  4. Get rid of BoundPod and BoundPods since nothing reads them anymore.
  5. Add a Host field to the PodStatus (but not PodSpec).
  6. When scheduler writes a Binding, the PodStatus.Host is be set and resourceVersion is updated.
  7. A cluster is bootstrapped by first starting one or more VMs with,

Concerns that may be raised and responses:

Q1: Are changes to the set of pods bound made by the scheduler machine atomic or eventually consistent?

A1: It could work either way. If we want atomic behavior, we could implement that in apiserver more readily that we could when we directly expose our storage via etcd.

Q2: Should the kubelet be allowed to see CurrentState (now PodStatus in v1beta3)? It generates (some of) the status, so why let it see that?

A2: We could implement this if it is important. Kubelet would watch pods with a selector that matches only pods with PodStatus.Host == kubelets hostname, and could use a field selector so that only PodSpec and not PodStatus is returned.

Q3: How do we prevent Kubelets from seeing other node's pods.
A3: There are a couple of ways I can think to do this with small changes to our current authorization policy.

  1. One way is to have a distinguished "kubelet" user and special case its authorization.
  2. Another is to create a new policy as each "kubelet" is added which matches on the SourceIP of the request, and requires a selector to be part of the request which selects on "PodStatus.Host=$SourceIP". This may make some assumptions about the network security of the cluster, but seems like it could work.
  3. A variation on the previous is to only have one line of policy for all kubelets, but have a "condition" field in the policy that checks that PodStatus.Host matches SourceIP.
  4. Another variation is to have a different token for each kubelet and a separate policy for each.

Metadata

Metadata

Assignees

Labels

area/kubeletarea/kubelet-apipriority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/nodeCategorizes an issue or PR as relevant to SIG Node.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions