External provisioning problems

I did more investigations in issue https://github.com/kubernetes/kubernetes/issues/71928#issuecomment-447065491, I think there are two problems:

- scheduler has no way to notify external provisioner to retry provisioning on related objects are updated (e.g. invalid storage class is fixed)
- external provisioner has no way to notify scheduler to reschedule PVC when it encounters unrecoverable situations (e.g. selected-node is wrong)

Here are two scenarios:

#### Scenario 1) selected node is right, but need to notify external provisioner to provision because non-PVC objects (e.g. storage class) changed

Possible ways to do:

1.1) update [external provisioner lib](https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner) to retry claim on storage class events.

This requires to update external provisioner.

1.2) schedule PVC always when necessary, e.g. storage class or PVC is updated, no matter it's already being selected a node or not. But we must skip nodes except current `selected-node`, because it's not safe to change selected node by scheduler. A volume may be in creating on old node, we must relies on external provisioner to reject node for safety. Using this method, we need find a way to (e.g. add another annotation) to trigger PVC update event without changing current `selected-node`. 

This requires to update external provisioner to provision again immediately on PVC update event.

1.3) remove `selected-node` on bind timeout

We assume it's safe to remove `selected-node` after bind timeout. It may take some time (depends bind timeout), but will notify external provisioner to provision again eventually, even if external provisioner exceeded retry limits. 

This requires to update external provisioner to provision again immediately on PVC update event.

#### Scenario 2) selected node is wrong, we need notify scheduler to find a feasible node

Possible ways to do:

2.1) wait provisioner to reject PVC from given node by removing `selected-node` annotation, then scheduler can reset it to notify provisioner to retry again. 

This works like pod scheduling, when a Pod is assigned to a node, but kubelet finds it is not correct, it will be rejected, e.g. node selector not matched now.

We need to distinguish two kinds of failures in external provisioner:

- unrecoverable failures, e.g. capacity/volume count hard limits on this node
  - scheduler should retry on these failures, and it will and must select a new node for given PVC (otherwise, it may enter into infinite loop)
- recoverable failures, e.g. network error, provisioner bugs etc
  - provisioners should retry on these failures and wait network to recover or bugs to be fixed

A more brutal way is to remove `selected-node` annotation on all failures, like internal provisioner. Anyhow, it requires external provisioner to cooperate with scheduler.

2.2) Another way is to schedule PVC always when necessary. And assign PVC to new node to notify provisioner to provision again. But it's not safe to change selected node by scheduler, see method 1.2). And current external provisioner will not retry provision immediately on `selected-node` field changed. We need to update external provisioner too.

2.3) Same as 1.3)

### My suggestion 

For now, we can do 1.1), this method has no drawbacks IMO and can fix part of problems.


/kind feature
/sig storage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External provisioning problems #72031

Scenario 1) selected node is right, but need to notify external provisioner to provision because non-PVC objects (e.g. storage class) changed

Scenario 2) selected node is wrong, we need notify scheduler to find a feasible node

My suggestion

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

External provisioning problems #72031

Description

Scenario 1) selected node is right, but need to notify external provisioner to provision because non-PVC objects (e.g. storage class) changed

Scenario 2) selected node is wrong, we need notify scheduler to find a feasible node

My suggestion

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions