CSI mounter second-guesses failed NodePublishVolume RPCs, inferring success wrongly

**What happened**:

A custom CSI driver implementing [the `NodePublishVolume` RPC](https://github.com/container-storage-interface/spec/blob/master/spec.md#nodepublishvolume) returns a failure response, indicating that it did not complete its intended steps to prepare the volume for the mounting pod. Let's say that these steps involve 
1. Mounting a filesystem at the target directory,
1. Fetching data from a service, and
1. Writing that data to a set of files in the mounted filesystem.

If the first filesystem mounting step succeeds, but either of the subsequent two steps fail, the volume is still not ready for use by the pod. However, if this CSI driver returns a failure response, and the CSI mounter comes back and, before invoking the `NodePublishVolume` RPC again, it finds a filesystem already mounted at the target directory, [it considers that to be good enough](https://github.com/kubernetes/kubernetes/blob/0387ee4244ecceeddeb77783b8fedd74dcc1fb44/pkg/volume/csi/csi_mounter.go#L114-L117), and ceases further calls to `NodePublishVolume` to allow the CSI driver to try again.

**What you expected to happen**:

If a CSI driver returns a failure response to a `NodePublishVolume` RPC, Kubernetes should honor that declaration of failure and invoke the RPC again, disallowing the mounting pod from starting until this RPC succeeds.

Even though it's possible to partially work around the current behavior by having the driver mount the filesystem as late as possible, and then attempting to unmount it if fails to populate the volume with its intended data, doing so makes writing drivers more complicated and forces them to do extra work.

**How to reproduce it (as minimally and precisely as possible)**:

In a CSI driver (even one that serves just ephemeral, inline volumes), implement `NodePublishVolume` to mount a filesystem at the target path, then return a failure response. Confirm that Kubernetes does not invoke `NodePublishVolume` again, because there's now a filesystem mounted at the target path.

**Anything else we need to know?**:

This topic came up for discussion [in the "csi" channel](https://kubernetes.slack.com/archives/C8EJ01Z46/p1577411066020300) in the "Kubernetes" Slack team. There, @timoreimann helped investigate this complaint, and @msau42 [suggested filing this report](https://kubernetes.slack.com/archives/C8EJ01Z46/p1577641479041800?thread_ts=1577411066.020300&cid=C8EJ01Z46) to discuss changing this behavior.

**Environment**:
- Kubernetes version (use `kubectl version`):
```
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-13T11:51:44Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:09:08Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
```

- Cloud provider or hardware configuration:
  AWS EC2

- OS (e.g: `cat /etc/os-release`):
```
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=2303.3.0
VERSION_ID=2303.3.0
BUILD_ID=2019-12-02-2049
PRETTY_NAME="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
```

- Kernel (e.g. `uname -a`):
```
Linux ip-10-130-113-181 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019 x86_64 Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz GenuineIntel GNU/Linux
```

- Install tools:
_kubeadm_

- Others:
_csi-node-driver-registrar_ container image tag: _v1.0.2_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSI mounter second-guesses failed NodePublishVolume RPCs, inferring success wrongly #86784

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CSI mounter second-guesses failed NodePublishVolume RPCs, inferring success wrongly #86784

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions