Skip to content

Enhancements to disable in-tree plugin code for migration#4093

Merged
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
ddebroy:disable1
Oct 15, 2019
Merged

Enhancements to disable in-tree plugin code for migration#4093
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
ddebroy:disable1

Conversation

@ddebroy
Copy link
Copy Markdown
Contributor

@ddebroy ddebroy commented Sep 13, 2019

Describe the plan to have a separate set of feature flags that will allow us to completely disable in-tree plugin code.

/cc @davidz627 @msau42 @leakingtapan @gnufied

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/design Categorizes issue or PR as related to design. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Sep 13, 2019
@ddebroy ddebroy force-pushed the disable1 branch 2 times, most recently from 1443d8c to 257dbce Compare September 16, 2019 17:54
A new file pkg/volume/util/csimigration/plugins.go will have the following functions:
```
func ProbePluginsWithCSIMigrationDisabled(plugins []volume.VolumePlugin) []volume.VolumePlugin
func ProbeAttachablePluginsWithCSIMigrationDisabled(plugins []volume.VolumePlugin) []volume.VolumePlugin
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the main reason we need a separate ProbeAttachablePluginsWithCSIMigrationDisabled is because OpenShift Cinder does not implement the Attachable interface.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddebroy which repo is OpenShift Cinder?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do existing probe functions special case for Cinder? Want to avoid this if possible

Copy link
Copy Markdown
Contributor

@davidz627 davidz627 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for writing this up Deep! Got a couple questions but it mostly looks good

plugin, as well as ease the transition to CSI when we are able to deprecate the
internal APIs.

The migration from in-tree plugins to CSI plugins will involve the following phases:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably have its own section instead of being in the background and motivations. I'm thinking something like "Roadmap" or "Phased Rollout"


The migration from in-tree plugins to CSI plugins will involve the following phases:

Phase 1: In-tree plugins are probed but the in-tree plugin code is largely skipped
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarify "probed"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarify "largely skipped" -> migrated to CSI. :)


Phase 1: In-tree plugins are probed but the in-tree plugin code is largely skipped
for operations on persistent volumes in favor of CSI plugins that supersede the
in-tree plugins. In case a node does not have a CSI plugin (that supersedes a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the introduction of this new terminology "supersceding" the plugin. Lets continue to call it "translated/migrated" so we don't call the same thing by multiple names

will be executed for specific operations like attachment/detachment and attachment
verification of volumes.

Phase 2: In-tree plugins are no longer probed and thus in-tree code is effectively
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be done on a plugin by plugin basis or will all in-tree plugins not be probed at the same time? please clarify in text. (I think it should have the option to be done piecemeal)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be doable in a plugin-by-plugin basis. Will clarify.


Phase 3: Files containing in-tree plugin code are no longer compiled as part of
Kubernetes components using golang build tags introduced in
https://github.com/kubernetes/kubernetes/pull/80353
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a phase 4. Code deleleted from k8s/k8s entirely

### Handling of nil Plugin Object returned by FindPluginBySpec/FindPluginByName

Once an in-tree plugin is no longer probed, all the VolumePluginMgr
functions of the form Find*PluginBySpec/Find*PluginByName will return a nil plugin
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't they also return an error? This should just be an error path saying "plugin not found" or whatever it is currently

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I will change the above sentence accordingly.

1. Code preceding the invocation of Find*Plugin routines performs the necessary
translation from in-tree to CSI so that when a Find*Plugin routine is invoked,
it is passed a translated CSI PV rather than an in-tree PV.
2. Upon receiving a nil plugin, the code checks migration status of a plugin based
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically it should have been translated/migrated already and we should be trying to get the CSI Plugin instead of the nil in-tree plugin. If we ever try to get the nil in-tree plugin that seems like a real error, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. If [1] is done completely and correctly, [2] should not be necessary for majority of the cases. I will clarify this.

2. Upon receiving a nil plugin, the code checks migration status of a plugin based
on spec or name rather than throwing an error.

### Enhancements in Controllerss handling Persistent Volumes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spelling

translation of volume specs for migrated PVs that are no longer probed.

#### Expansion Controller
syncHandler will be refactored to only invoke FindExpandablePluginBySpec for
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on what this paragraph means. Could you clarify?

Copy link
Copy Markdown
Contributor Author

@ddebroy ddebroy Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, in this section, I call out the main changes necessary in the different controllers to work correctly with CSIMigrationInTreeOff feature flag enabled.

For the Expansion Controller, I mean to say that it's syncHandler at https://github.com/kubernetes/kubernetes/blob/28e800245e910b65b56548f36172ce525a554dc8/pkg/controller/volume/expand/expand_controller.go#L238 will be refactored to determine csiResizerName based on migration feature flags. The current structure involving invocation of FindExpandablePluginBySpec followed by volumePlugin.IsMigratedToCSI() will no longer work with CSIMigrationInTreeOff feature flag enabled.

I will expand this out and try to explain this better.

### Enhancements in Controllerss handling Persistent Volumes

#### AttachDetach Controller
When CSIMigrationInTreeOff is enabled, translation of volume specs from in-tree to
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this behavior different from when CSIMigrationInTreeOff is off? We should always do translation at the same place.

IIUC the only behavior this new flag should actually "toggle" is "probing the in-tree plugin". All the other changes you are planning to make were to support that working; but the flag shouldn't be toggling any other behavior.

Copy link
Copy Markdown
Contributor Author

@ddebroy ddebroy Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have at least one additional set of areas where the behavior will be different when CSIMigrationInTreeOff is disabled but CSIMigration is enabled: all the functions that invoke nodeUsingCSIPlugin need to pass legacy/untranslated PVs in the generated functions.

So I was thinking of not performing the translations "early" as described for CSIMigrationInTreeOff when CSIMigrationInTreeOff is disabled. If we perform the translations early, we will not have enough context to translate back to in-tree PVs around the current invocations of nodeUsingCSIPlugin as we will not know if we are processing a "native" CSI PV (which can be left as is) or a "translated" in-tree => CSI PV (that should be translated back to in-tree).

We can construct a heuristic to translate CSI => in-tree around invocations of nodeUsingCSIPlugin: if [i] nodeUsingCSIPlugin returns false [ii] CSIMigration is enabled [iii] CSIMigrationInTreeOff is disabled and [iv] csi driver is a migrated driver, we can always translate CSI PV to in-tree. However if someone was expecting to see an error in this scenario when starting with a "native" CSI PV, they won't see the error which may be confusing and also a bit incompatible with current behavior.

Overall, both approach feels a bit convoluted. Thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeUsingCSIPlugin should trivially return true when CSIMigrationInTreeOff=true. Since this is basically a commit, the node should not have a different "migration status" than the controller

Copy link
Copy Markdown
Contributor Author

@ddebroy ddebroy Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A third/better alternative: we can mark the PVs that are translated from in-tree to CSI "early" with an annotation and use that to translate back to in-tree when nodeUsingCSIPlugin returns true during Phase 1 (CSIMigrationInTreeOff=false and CSIMigration=true).

The annotation will help differentiate PVs that are translated to CSI from PVs that are "native" CSI. This will allow us to have largely common code paths for both Phase 1 and Phase 2.

@msau42
Copy link
Copy Markdown
Member

msau42 commented Sep 20, 2019

/assign @misterikkit

@ddebroy
Copy link
Copy Markdown
Contributor Author

ddebroy commented Oct 14, 2019

@misterikkit @davidz627 I have addressed the comments around new sections outlining the changes for disabling plugins in Phase 2. PTAL.

@ddebroy ddebroy force-pushed the disable1 branch 2 times, most recently from e93be43 to 1110ed8 Compare October 14, 2019 21:22
Copy link
Copy Markdown
Contributor

@davidz627 davidz627 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this more deep. Some more comments. Lets file for an extension if we can't get the design in tomorrow

Phase 2: ProbeVolumePlugins function for specific migrated in-tree plugin packages
will no longer be invoked based on:
1. An overall feature flag: CSIMigrationInTreeOff.
2. A plugin-specific feature flag around migration. Example: CSIMigrationGCE, CSIMigrationAWS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the overall CSI Migration flag right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially thinking that CSIMigration can be implicit if CSIMigrationInTreeOff is enabled. My assumption was it did not make sense to have CSIMigrationInTreeOff enabled but CSIMigration disabled. However thinking about it a little more we can require that for CSIMigrationInTreeOff to take effect, CSIMigration needs to be enabled. I think that is what you are suggesting would be ideal?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think we should actually try to throw a validation error if CSIMigrationInTreeOff is set but CSIMigration is not as a sanity check

Kubelet that does not have CSI migration code), the in-tree plugin code will be
executed for operations like attachment/detachment and mount/dismount of volumes.

Phase 2: ProbeVolumePlugins function for specific migrated in-tree plugin packages
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding the benefit here is that if you use this feature flag the in-tree plugin code can never be executed i.e. we get the benefit of plugins not being able to take down the master components at this time

populated through CreateVolumeSpec when the following conditions are true:
[1] CSIMigration or CSIMigrationInTreeOff feature flag is enabled.
[2] A plugin-specific migration feature flag is enabled.
[3] A node has a migrated CSI plugin installed that corresponds to an
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here - not installation - we care about "migration feature flag state on the node"

@ddebroy
Copy link
Copy Markdown
Contributor Author

ddebroy commented Oct 15, 2019

@davidz627 addressed the review comments so far. PTAL.

Copy link
Copy Markdown
Contributor

@davidz627 davidz627 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 small comments. Looks good to me besides that!

in-tree plugins (that have plugin-specific migration feature flags enabled).
As a result, all nodes in the cluster are expected to have CSI plugins (that in-tree
plugins have been migrated to) configured and installed and CSIMigration and
plugin specific feature flags enabled.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if this expectation is violated? Might be worth mentioning that the post-migration code will still run and we'll just get errors - no fallback

1. Check for migration status of a plugin based on Volume spec or Plugin name before
invoking the above functions so that an error is never returned due to missing plugins.
2. Handle an error from the above functions resulting from missing plugins appropriately.
This should be rare.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced this should ever happen. But might be an implementation detail - we can leave the design as is for now

Copy link
Copy Markdown
Contributor Author

@ddebroy ddebroy Oct 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed an implementation detail and is only relevant for FindProvisionablePluginByName (which can be refactored to avoid it). Will mention this explicitly.

described above will be skipped during Desired State of World population if:
[1] CSIMigration or plugin specific migration feature flags are disabled in Kubernetes
Controller Manager.
[2] CSIMigration or plugin specific migration flags are disabled for Kubelet in a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless CSIMigrationInTreeOff is set - then we will never skip migration

Signed-off-by: Deep Debroy <ddebroy@docker.com>
@ddebroy
Copy link
Copy Markdown
Contributor Author

ddebroy commented Oct 15, 2019

@davidz627 comments addressed.

@davidz627
Copy link
Copy Markdown
Contributor

Thanks for writing this up and the quick turnarounds!
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 15, 2019
@davidz627
Copy link
Copy Markdown
Contributor

/assign @saad-ali

Copy link
Copy Markdown
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidz627, ddebroy, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 15, 2019
@k8s-ci-robot k8s-ci-robot merged commit e156317 into kubernetes:master Oct 15, 2019
MadhavJivrajani pushed a commit to MadhavJivrajani/community that referenced this pull request Nov 30, 2021
Enhancements to disable in-tree plugin code for migration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants