Skip to content

Kubelet responds to ConfigMap mutations for dynamic Kubelet config#63221

Merged
k8s-github-robot merged 1 commit intokubernetes:masterfrom
mtaufen:dkcfg-live-configmap
May 22, 2018
Merged

Kubelet responds to ConfigMap mutations for dynamic Kubelet config#63221
k8s-github-robot merged 1 commit intokubernetes:masterfrom
mtaufen:dkcfg-live-configmap

Conversation

@mtaufen
Copy link
Copy Markdown
Contributor

@mtaufen mtaufen commented Apr 27, 2018

This PR makes dynamic Kubelet config easier to reason about by leaving less room for silent skew scenarios. The new behavior is as follows:

  • ConfigMap does not exist: Kubelet reports error status due to missing source
  • ConfigMap is created: Kubelet starts using it
  • ConfigMap is updated: Kubelet respects the update (but we discourage this pattern, in favor of incrementally migrating to a new ConfigMap)
  • ConfigMap is deleted: Kubelet keeps using the config (non-disruptive), but reports error status due to missing source
  • ConfigMap is recreated: Kubelet respects any updates (but, again, we discourage this pattern)

This PR also makes a small change to the config checkpoint file tree structure, because ResourceVersion is now taken into account when saving checkpoints. The new structure is as follows:

- dir named by --dynamic-config-dir (root for managing dynamic config)
| - meta
  | - assigned (encoded kubeletconfig/v1beta1.SerializedNodeConfigSource object, indicating the assigned config)
  | - last-known-good (encoded kubeletconfig/v1beta1.SerializedNodeConfigSource object, indicating the last-known-good config)
| - checkpoints
  | - uid1 (dir for versions of object identified by uid1)
    | - resourceVersion1 (dir for unpacked files from resourceVersion1)
    | - ...
  | - ...

fixes: #61643

The dynamic Kubelet config feature will now update config in the event of a ConfigMap mutation, which reduces the chance for silent config skew. Only name, namespace, and kubeletConfigKey may now be set in Node.Spec.ConfigSource.ConfigMap. The least disruptive pattern for config management is still to create a new ConfigMap and incrementally roll out a new Node.Spec.ConfigSource.

@mtaufen mtaufen added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/kubelet area/kubelet-api sig/node Categorizes an issue or PR as relevant to SIG Node. kind/feature Categorizes issue or PR as related to a new feature. labels Apr 27, 2018
@mtaufen mtaufen self-assigned this Apr 27, 2018
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 27, 2018
@k8s-ci-robot k8s-ci-robot added area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Apr 27, 2018
@k8s-github-robot k8s-github-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Apr 27, 2018
// NewRemoteConfigSource constructs a RemoteConfigSource from a v1/NodeConfigSource object.
// We assume the source has been validated beforehand - e.g. by an API server.
func NewRemoteConfigSource(source *apiv1.NodeConfigSource) RemoteConfigSource {
return &remoteConfigMap{source}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually still need to check not all known fields nil, wrt old clients against a new API server that allows a new config source type failing cleanly (i.e. not crashing due to nil pointer dereference).

@mtaufen mtaufen force-pushed the dkcfg-live-configmap branch 10 times, most recently from 849c93e to a7eb158 Compare May 18, 2018 22:47
Copy link
Copy Markdown
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with a couple of nits.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It seems like the interface for (r *remoteConfigMap) Informer() could be a bit cleaner, as addFunc, updateFunc, and deleteFunc arent really well-defined in the interface. Rather than defining them, I would suggest either replacing these inputs with a cache.ResourceEventHandlerFuncs parameter, or assuming the caller of Informer() will add their desired ResourceEventHanderFuncs to the informer returned by Informer().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll replace them with cache.ResourceEventHandlerFuncs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Prefer describing what is returned, rather than behavior: e.g. ResourceVersion returns the resource version of the most recently downloaded payload targeted by the source.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: similar to above about describing internal behavior

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think Payload() may be a better interface. The fact that it downloads and caches it on disk can be an implementation detail?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that. Downloading the payload locally is an important part of the function.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly confusing to me. Why are we overwriting the UID and ResourceVersion of one ConfigMap with the UID and ResourceVersion of another ConfigMap? Do we use the rest of the Stored ConfigMap, or just these fields? If we only use those two fields, why store the whole thing, and if we use the whole configmap, it seems potentially confusing to insert a different UID and ResourceVersion...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r.source.ConfigMap is a ConfigMapNodeConfigSource, we are resolving the reference (namespace, name, kubeletConfigKey) by adding the additional details (uid, resourceVersion) that identify a version of that resource

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, right. I forgot it wasn't an actual configmap.

@liggitt
Copy link
Copy Markdown
Member

liggitt commented May 21, 2018

just looking at the description, for the current and last-known-good configs, where is the resolved uid/resourceVersion recorded (to be able to look up the correct dir in checkpoints)?

is there any expectation that multiple objects will need to be pulled in/checkpointed in the future for a node config (e.g. a configmap and some set of secrets)? should that inform our config/checkpoint disk structure?

@mtaufen
Copy link
Copy Markdown
Contributor Author

mtaufen commented May 21, 2018

just looking at the description, for the current and last-known-good configs, where is the resolved uid/resourceVersion recorded (to be able to look up the correct dir in checkpoints)?

@liggitt in the SerializedNodeConfigSources written to --dynamic-config-dir/meta/assigned and --dynamic-config-dir/meta/last-known-good.

is there any expectation that multiple objects will need to be pulled in/checkpointed in the future for a node config (e.g. a configmap and some set of secrets)? should that inform our config/checkpoint disk structure?

Not yet. If we added this, I'd expect NodeConfigSource to be extended to represent a request for multiple objects (which would transitively ensure --dynamic-config-dir/meta can record the same), and since UID and ResourceVersion are consistent concepts across all objects, the structure of --dynamic-config-dir/checkpoints should work for additional object types.

@liggitt
Copy link
Copy Markdown
Member

liggitt commented May 21, 2018

@liggitt in the SerializedNodeConfigSources written to --dynamic-config-dir/meta/assigned and --dynamic-config-dir/meta/last-known-good.

ah, so those are enhanced versions of what is in the Node API object

Not yet. If we added this, I'd expect NodeConfigSource to be extended to represent a request for multiple objects (which would transitively ensure --dynamic-config-dir/meta can record the same), and since UID and ResourceVersion are consistent concepts across all objects, the structure of --dynamic-config-dir/checkpoints should work for additional object types.

That makes sense

@@ -142,6 +145,8 @@ func (s *nodeConfigStatus) Sync(client clientset.Interface, nodeName string) {
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this isn't part of this PR, but is there a reason you chose to use a periodic, jittered check rather than just a long-running loop?
for _ := range s.syncCh {
utillog.Infof("updating Node.Status.Config")
...
}

@dashpole
Copy link
Copy Markdown
Contributor

/lgtm
comment above doesn't need to be dealt with in this PR.
Great work, especially on the test coverage for this.

@mtaufen
Copy link
Copy Markdown
Contributor Author

mtaufen commented May 21, 2018

/retest

@dchen1107
Copy link
Copy Markdown
Member

/approve

Approve the pr based on the offline discussions with @mtaufen on the high level design and @dashpole' detail review.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, dchen1107, mtaufen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link
Copy Markdown

Automatic merge from submit-queue (batch tested with PRs 63881, 64046, 63409, 63402, 63221). If you want to cherry-pick this change to another branch, please follow the instructions here.

@dashpole
Copy link
Copy Markdown
Contributor

I think we may need to update the node e2e framework. Tests using dynamic kubelet config are failing with the error: Forbidden: uid must not be set in spec

@mtaufen
Copy link
Copy Markdown
Contributor Author

mtaufen commented May 23, 2018

That's surprising, bad merge?

@mtaufen
Copy link
Copy Markdown
Contributor Author

mtaufen commented May 23, 2018

Ah, misread, you said tests using dynamic config, not the dynamic config tests.

@mtaufen
Copy link
Copy Markdown
Contributor Author

mtaufen commented May 23, 2018

Yup, probably have to do that.

@mtaufen
Copy link
Copy Markdown
Contributor Author

mtaufen commented May 23, 2018

#64173

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm area/kubelet area/kubelet-api cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Determine whether we should continue to require UID in node config source reference

7 participants