Skip to content

External provisioning problems #72031

@cofyc

Description

@cofyc

I did more investigations in issue #71928 (comment), I think there are two problems:

  • scheduler has no way to notify external provisioner to retry provisioning on related objects are updated (e.g. invalid storage class is fixed)
  • external provisioner has no way to notify scheduler to reschedule PVC when it encounters unrecoverable situations (e.g. selected-node is wrong)

Here are two scenarios:

Scenario 1) selected node is right, but need to notify external provisioner to provision because non-PVC objects (e.g. storage class) changed

Possible ways to do:

1.1) update external provisioner lib to retry claim on storage class events.

This requires to update external provisioner.

1.2) schedule PVC always when necessary, e.g. storage class or PVC is updated, no matter it's already being selected a node or not. But we must skip nodes except current selected-node, because it's not safe to change selected node by scheduler. A volume may be in creating on old node, we must relies on external provisioner to reject node for safety. Using this method, we need find a way to (e.g. add another annotation) to trigger PVC update event without changing current selected-node.

This requires to update external provisioner to provision again immediately on PVC update event.

1.3) remove selected-node on bind timeout

We assume it's safe to remove selected-node after bind timeout. It may take some time (depends bind timeout), but will notify external provisioner to provision again eventually, even if external provisioner exceeded retry limits.

This requires to update external provisioner to provision again immediately on PVC update event.

Scenario 2) selected node is wrong, we need notify scheduler to find a feasible node

Possible ways to do:

2.1) wait provisioner to reject PVC from given node by removing selected-node annotation, then scheduler can reset it to notify provisioner to retry again.

This works like pod scheduling, when a Pod is assigned to a node, but kubelet finds it is not correct, it will be rejected, e.g. node selector not matched now.

We need to distinguish two kinds of failures in external provisioner:

  • unrecoverable failures, e.g. capacity/volume count hard limits on this node
    • scheduler should retry on these failures, and it will and must select a new node for given PVC (otherwise, it may enter into infinite loop)
  • recoverable failures, e.g. network error, provisioner bugs etc
    • provisioners should retry on these failures and wait network to recover or bugs to be fixed

A more brutal way is to remove selected-node annotation on all failures, like internal provisioner. Anyhow, it requires external provisioner to cooperate with scheduler.

2.2) Another way is to schedule PVC always when necessary. And assign PVC to new node to notify provisioner to provision again. But it's not safe to change selected node by scheduler, see method 1.2). And current external provisioner will not retry provision immediately on selected-node field changed. We need to update external provisioner too.

2.3) Same as 1.3)

My suggestion

For now, we can do 1.1), this method has no drawbacks IMO and can fix part of problems.

/kind feature
/sig storage

Metadata

Metadata

Assignees

Labels

kind/featureCategorizes issue or PR as related to a new feature.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.sig/storageCategorizes an issue or PR as relevant to SIG Storage.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions