storage capacity tests by pohly · Pull Request #88114 · kubernetes/kubernetes

pohly · 2020-02-13T14:22:31Z

What type of PR is this?
/kind failing-test

What this PR does / why we need it:

This adds tests for #72031.

Does this PR introduce a user-facing change?:

NONE

pohly · 2020-02-13T14:23:09Z

WIP because csi-test and external-provisioner need to be updated for all test cases to pass.

pohly · 2020-04-02T13:38:28Z

I have rebased on top of #89041 to reuse some of the new utility functions in that PR (specifically, compareCSICalls) and to avoid conflicts when that PR merges.

In addition, I am trying some enhancements for that PR.

pohly · 2020-04-02T16:20:57Z

/retest

Failures were unrelated.

pohly · 2020-04-03T11:26:52Z

/retest

pohly · 2020-04-03T17:20:42Z

@jsafrane what do you think about the log output retry approach (d6f0d0e)? Do you want to copy that into your PR or shall we merge your changes together with mine in this PR?

jsafrane · 2020-04-06T13:31:15Z

@pohly, thanks for the suggestion. I incorporated your last commit into #89041 (with you as author) to see if it helps.

parseMockLogs is called potentially multiple times while waiting for output. Dumping all CSI calls each time is quite verbose and repetitive. To verify what the driver has done already, the normal capturing of the container log can be used instead: csi-mockplugin-0/mock@127.0.0.1: gRPCCall: {"Method":"/csi.v1.Node/NodePublishVolume","Request"...

The code became obsolete with the introduction of parseMockLogs because that will retrieve the log itself. For debugging of a running test the normal pod output logging is sufficient.

The mock driver gets instructed to return a ResourceExhausted error for the first CreateVolume invocation via the storage class parameters. How this should be handled depends on the situation: for normal volumes, we just want external-scheduler to retry. For late binding, we want to reschedule the pod. It also depends on topology support.

The for loop that waited for the signal to delete pod had no timeout, so if something went wrong, it would wait for the entire test suite to time out.

The "error waiting for expected CSI calls" is redundant because it's immediately followed by checking that error with: framework.ExpectNoError(err, "while waiting for all CSI calls")

pohly · 2020-04-07T13:43:08Z

/retest

pohly · 2020-04-07T16:05:16Z

/retest

pohly · 2020-04-07T18:43:45Z

/retest

pohly · 2020-04-07T20:31:49Z

@jsafrane: I've rebased on top of your merged PR and the tests passed now after a few flakes. Perhaps you can have a look, as you are familiar with code and I am touching a few things you just worked on?

jsafrane · 2020-04-08T07:45:55Z

+				}
+
+				var calls []mockCSICall
+				err = wait.Poll(time.Second, csiPodRunningTimeout, func() (done bool, err error) {


By this time, test pvc/pod has been created, run and deleted. So all calls must be in mock driver logs. Isn't 5 minute timeout too much to get the logs?

I've replaced this with a per-test timeout. It would be even nicer if that deadline could also be passed into the other helper functions (like WaitForPodNameRunningInNamespace) but those have their own builtin timeouts.

@jsafrane: please have another look

The timeout for the two loops inside the test itself are now bounded by an upper limit for the duration of the entire test instead of having their own, rather arbitrary timeouts.

pohly · 2020-04-14T12:06:30Z

/retest

jsafrane · 2020-04-15T08:00:42Z

/lgtm
/approve

k8s-ci-robot · 2020-04-15T08:01:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~test/e2e/storage/OWNERS~~ [jsafrane]
~~test/e2e/testing-manifests/storage-csi/OWNERS~~ [jsafrane]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2020-04-15T10:00:17Z

@pohly: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kubernetes-node-e2e-containerd	`48f8e39`	link	`/test pull-kubernetes-node-e2e-containerd`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

pohly · 2020-04-15T10:15:01Z

/retest

pohly mentioned this pull request Feb 13, 2020

controller: enable rescheduling of pods kubernetes-sigs/sig-storage-lib-external-provisioner#68

Merged

k8s-ci-robot requested review from gnufied and mkumatag February 13, 2020 14:23

k8s-ci-robot added area/test sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 13, 2020

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 28, 2020

pohly force-pushed the storage-capacity-tests branch from a4d549e to cf65bce Compare March 24, 2020 15:08

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 24, 2020

This was referenced Mar 24, 2020

Mock topology testing kubernetes-csi/csi-test#249

Merged

unset selected node when storage is exhausted for topology segment kubernetes-csi/external-provisioner#405

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2020

pohly mentioned this pull request Mar 26, 2020

CHANGELOG-3.1.md: v3.1.0 release kubernetes-csi/csi-test#256

Merged

pohly force-pushed the storage-capacity-tests branch from cf65bce to 3cf9ab1 Compare April 2, 2020 13:37

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 2, 2020

pohly mentioned this pull request Apr 2, 2020

Add NodeStage error tests #89041

Merged

pohly force-pushed the storage-capacity-tests branch from 3cf9ab1 to d6f0d0e Compare April 3, 2020 08:48

pohly added 5 commits April 7, 2020 13:07

mock tests: remove redundant retrieval of log output

367a23e

The code became obsolete with the introduction of parseMockLogs because that will retrieve the log itself. For debugging of a running test the normal pod output logging is sufficient.

mock tests: add timeout

2550051

The for loop that waited for the signal to delete pod had no timeout, so if something went wrong, it would wait for the entire test suite to time out.

mock tests: remove redundant wrapping of error

48f8e39

The "error waiting for expected CSI calls" is redundant because it's immediately followed by checking that error with: framework.ExpectNoError(err, "while waiting for all CSI calls")

pohly force-pushed the storage-capacity-tests branch from d6f0d0e to 48f8e39 Compare April 7, 2020 11:10

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 7, 2020

pohly changed the title ~~WIP: storage capacity tests~~ storage capacity tests Apr 7, 2020

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2020

jsafrane reviewed Apr 8, 2020

View reviewed changes

mock tests: per-test timeout for ResourceExhausted

2ae6cf5

The timeout for the two loops inside the test itself are now bounded by an upper limit for the duration of the entire test instead of having their own, rather arbitrary timeouts.

k8s-ci-robot assigned jsafrane Apr 15, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 15, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2020

k8s-ci-robot merged commit 4cf56b0 into kubernetes:master Apr 15, 2020

k8s-ci-robot added this to the v1.19 milestone Apr 15, 2020

gnufied mentioned this pull request Aug 3, 2020

test/e2e: fail test rather than flooding logs if PVC watch is closed prematurely #93658

Merged

Conversation

pohly commented Feb 13, 2020

Uh oh!

pohly commented Feb 13, 2020

Uh oh!

pohly commented Apr 2, 2020

Uh oh!

pohly commented Apr 2, 2020

Uh oh!

pohly commented Apr 3, 2020

Uh oh!

pohly commented Apr 3, 2020

Uh oh!

jsafrane commented Apr 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pohly commented Apr 7, 2020

Uh oh!

pohly commented Apr 7, 2020

Uh oh!

pohly commented Apr 7, 2020

Uh oh!

pohly commented Apr 7, 2020

Uh oh!

jsafrane Apr 8, 2020

Choose a reason for hiding this comment

Uh oh!

pohly Apr 14, 2020

Choose a reason for hiding this comment

Uh oh!

pohly commented Apr 14, 2020

Uh oh!

jsafrane commented Apr 15, 2020

Uh oh!

k8s-ci-robot commented Apr 15, 2020

Uh oh!

k8s-ci-robot commented Apr 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pohly commented Apr 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jsafrane commented Apr 6, 2020 •

edited

Loading

k8s-ci-robot commented Apr 15, 2020 •

edited

Loading