storage capacity tests#88114
Conversation
|
WIP because csi-test and external-provisioner need to be updated for all test cases to pass. |
a4d549e to
cf65bce
Compare
cf65bce to
3cf9ab1
Compare
|
I have rebased on top of #89041 to reuse some of the new utility functions in that PR (specifically, In addition, I am trying some enhancements for that PR. |
|
/retest Failures were unrelated. |
3cf9ab1 to
d6f0d0e
Compare
|
/retest |
|
@jsafrane what do you think about the log output retry approach (d6f0d0e)? Do you want to copy that into your PR or shall we merge your changes together with mine in this PR? |
parseMockLogs is called potentially multiple times while waiting for
output. Dumping all CSI calls each time is quite verbose and
repetitive. To verify what the driver has done already, the normal
capturing of the container log can be used instead:
csi-mockplugin-0/mock@127.0.0.1: gRPCCall: {"Method":"/csi.v1.Node/NodePublishVolume","Request"...
The code became obsolete with the introduction of parseMockLogs because that will retrieve the log itself. For debugging of a running test the normal pod output logging is sufficient.
The mock driver gets instructed to return a ResourceExhausted error for the first CreateVolume invocation via the storage class parameters. How this should be handled depends on the situation: for normal volumes, we just want external-scheduler to retry. For late binding, we want to reschedule the pod. It also depends on topology support.
The for loop that waited for the signal to delete pod had no timeout, so if something went wrong, it would wait for the entire test suite to time out.
The "error waiting for expected CSI calls" is redundant because it's immediately followed by checking that error with: framework.ExpectNoError(err, "while waiting for all CSI calls")
d6f0d0e to
48f8e39
Compare
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
@jsafrane: I've rebased on top of your merged PR and the tests passed now after a few flakes. Perhaps you can have a look, as you are familiar with code and I am touching a few things you just worked on? |
| } | ||
|
|
||
| var calls []mockCSICall | ||
| err = wait.Poll(time.Second, csiPodRunningTimeout, func() (done bool, err error) { |
There was a problem hiding this comment.
By this time, test pvc/pod has been created, run and deleted. So all calls must be in mock driver logs. Isn't 5 minute timeout too much to get the logs?
There was a problem hiding this comment.
I've replaced this with a per-test timeout. It would be even nicer if that deadline could also be passed into the other helper functions (like WaitForPodNameRunningInNamespace) but those have their own builtin timeouts.
@jsafrane: please have another look
The timeout for the two loops inside the test itself are now bounded by an upper limit for the duration of the entire test instead of having their own, rather arbitrary timeouts.
|
/retest |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jsafrane, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@pohly: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest |
What type of PR is this?
/kind failing-test
What this PR does / why we need it:
This adds tests for #72031.
Does this PR introduce a user-facing change?: