Add NodeStage error tests#89041
Conversation
869f30a to
6e4686d
Compare
|
the newly introduced tests are flaky, https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/89041/pull-kubernetes-e2e-gce-storage-slow/1237750118531207170/: /hold |
|
/test pull-kubernetes-e2e-gce-storage-slow |
1 similar comment
|
/test pull-kubernetes-e2e-gce-storage-slow |
d23a444 to
5b6ea42
Compare
|
Reworked to merged version of javascript hooks. The only WIP item is bump of csi-mock driver version to |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jsafrane The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test pull-kubernetes-e2e-gce-storage-slow |
|
/test pull-kubernetes-e2e-gce-storage-slow |
|
/test pull-kubernetes-e2e-gce-storage-slow |
|
/test pull-kubernetes-e2e-gce-storage-slow |
5b6ea42 to
0cc6363
Compare
|
/hold cancel |
|
Is the new test flaky? There's a "could not load CSI driver logs: the server rejected our request for an unknown reason (get pods csi-mockplugin-0)" in https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/89041/pull-kubernetes-e2e-gce-storage-slow/1245284273221537792/. /retest |
There was a problem hiding this comment.
This failed because "the server rejected our request for an unknown reason (get pods csi-mockplugin-0)" (https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/89041/pull-kubernetes-e2e-gce-storage-slow/1245284273221537792/).
Perhaps a retry loop would help?
There was a problem hiding this comment.
There is already a retry loop in the caller (via wait.Poll). Perhaps failures to retrieve log output should simply be treated here as "no output", i.e. return nil, nil?
There was a problem hiding this comment.
Maybe the pod is not Running yet. The new mock test calls createPod() and immediately after that it reads the pod logs. I added wait for PVC to get bound in between - the driver must be fully operational to provision a PV.
There was a problem hiding this comment.
Sorry about the failed build noise. Stupid typo...
There was a problem hiding this comment.
I don't get it. The mock driver provisioned a PV, yet it gets error from API server:
could not load CSI driver logs: the server rejected our request for an unknown reason (get pods csi-mockplugin-0)
There was a problem hiding this comment.
There is already a retry loop in the caller (via wait.Poll). Perhaps failures to retrieve log output should simply be treated here as "no output", i.e. return nil, nil?
I've implemented that idea in #88114 (3cf9ab1) and got all tests to pass. The log does indeed show that "the server rejected our request for an unknown reason" occurred, but tests succeeded after ignoring that error (https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/88114/pull-kubernetes-e2e-kind/1245706940810530817/build-log.txt).
There was a problem hiding this comment.
I added your last commit from #88114, hoping it helps.
I can't reproduce "the server rejected our request for an unknown reason" on any of my clusters.
There was a problem hiding this comment.
I can't reproduce "the server rejected our request for an unknown reason" on any of my clusters.
Me neither. I think saw it once while working on some other test, but as far as I remember, that then turned out to be because I was I was asking for logs after the pod had just been deleted. Perhaps here we have the inverse, asking for a very recently started pod? Just wondering.
|
/assign |
366b04d to
12fdc04
Compare
Especially related to "uncertain" global mounts. A large refactoring of CSI mock tests were necessary: - to be able to script the driver to return errors as required by the test - to parse the CSI driver logs to check kubelet called the right CSI calls
As seen in some test runs (https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/89041), retrieving output can fail with "the server rejected our request for an unknown reason (get pods csi-mockplugin-0)". If this truly an intermittent error, then the existing retry logic in the callers can deal with this.
12fdc04 to
981aae3
Compare
|
/retest |
1 similar comment
|
/retest |
|
@jsafrane: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
What this PR does / why we need it:
Add some test for
NodeStageerror handling. The main purpose is to test that:NodeUnstageis called afterNodeStagetransient error && corresponding pod is deleted.NodeUnstageis not called afterNodeStagefinal error && corresponding pod is deleted.The test (and whole CSI mock output handling) becomes quite complex.
This is just an exercise how to use the javascript hooks. If we decide this is useful, it can be extended to test also NodePublish (+ block mode of both).
@gnufied @tsmetana @pohly @msau42 @jingxu97, is it useful? We already have unit tests for this behavior.
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/kind cleanup
/sig storage