Skip to content

Bump up cluster nodes for pilot e2e tests on prow#3663

Merged
istio-merge-robot merged 1 commit intoistio:masterfrom
nmittler:test-timeout
Feb 21, 2018
Merged

Bump up cluster nodes for pilot e2e tests on prow#3663
istio-merge-robot merged 1 commit intoistio:masterfrom
nmittler:test-timeout

Conversation

@nmittler
Copy link
Copy Markdown
Contributor

The pilot e2e tests appear to be extremely flaky and take significantly
longer with a single node. Bumping up to 4 seems to vastly greatly
things.

@ldemailly
Copy link
Copy Markdown
Member

ldemailly commented Feb 21, 2018

Nice but Prow itself should just be scheduling not running tests ? Cc @sebastienvas

@chxchx
Copy link
Copy Markdown
Contributor

chxchx commented Feb 21, 2018

Yes @ldemailly is right. Prow is only scheduling and reporting. The actual tests run on GKE cluster istio-e2e-rbac-rotation-1 in istio-testing project. It might be due to the fact that this cluster is shared for presubmit so that multiple pending PRs running tests at the same time might slow down the tests. I know @sebastienvas has been working on resource pool prepromisioning and we are dogfooding it on postsubmit right now. It provides hermetic environments on project level and should help to speed up this test. Perhaps Seb could add more details too.

@nmittler
Copy link
Copy Markdown
Contributor Author

nmittler commented Feb 21, 2018

@chxchx I don't think that's true. In the log for a failed test, I see:

W0221 16:41:39.977] + echo 'Default cluster version: 1.9.2-gke.1'
W0221 16:41:39.977] + gcloud container clusters create e2e-pilot-4bcd7d39 --zone us-central1-f --project istio-testing --cluster-version 1.9.2-gke.1 --machine-type n1-standard-4 --num-nodes 1 --no-enable-legacy-authorization --enable-kubernetes-alpha --quiet

This implies that it's using a cluster from create_cluster(). Notice that it only has a single node.

@nmittler
Copy link
Copy Markdown
Contributor Author

cc @sebastienvas @hklai

@sebastienvas
Copy link
Copy Markdown
Contributor

You're right @nmittler, your change will improve run time. I'll be working on updating this to take advantage of the pre-provisioning shortly after I am done with postsubmit.

@chxchx
Copy link
Copy Markdown
Contributor

chxchx commented Feb 21, 2018

Now I see what you mean. Sorry I was confused by the PR title and thought this were to change the Prow cluster node numbers. The pilot e2e test creates e2e-pilot-4bcd7d39 cluster and runs on it. By adding more nodes like you did in this PR it should mitigate this problem. In fact this PR is the only one that passes pilot e2e test in the last 20 runs or so.

@nmittler
Copy link
Copy Markdown
Contributor Author

Also the error seems unrelated:

Makefile:412: recipe for target 'mixer-test' failed

@nmittler
Copy link
Copy Markdown
Contributor Author

@chxchx

In fact this PR is the only one that passes pilot e2e test in the last 20 runs or so.

ha! We've all been feeling the pain of these tests ... it's now completing in ~ 20 min, which includes a complete cluster spin-up. So, not too shabby .... and passing the tests is good too :P

@nmittler
Copy link
Copy Markdown
Contributor Author

Does anyone have any issues with forcing the merge? As I stated above, I don't think the error is related.

@ldemailly
Copy link
Copy Markdown
Member

I was confused too by "prow cluster" but you mean indeed the target clusters

I didn't know those were dynamically created, I thought it was either a pool or shared

but
/lgtm

@istio-merge-robot
Copy link
Copy Markdown

/lgtm cancel //PR changed after LGTM, removing LGTM. @ldemailly @nmittler

@nmittler nmittler changed the title Bump up prow cluster nodes from 1 to 4 Bump up cluster nodes for pilot e2e tests on prow Feb 21, 2018
The pilot e2e tests appear to be extremely flaky and take significantly
longer with a single node.  Bumping up to 4 seems to vastly greatly
things.
@nmittler
Copy link
Copy Markdown
Contributor Author

@ldemailly, @chxchx

I've updated the title to (hopefully) be more clear. Also, I've changed the scripts around so that only the pilot e2e tests will use 4 nodes. PTAL

@chxchx
Copy link
Copy Markdown
Contributor

chxchx commented Feb 21, 2018

/lgtm

@istio-merge-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chxchx, ldemailly

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@istio-merge-robot
Copy link
Copy Markdown

/test all [submit-queue is verifying that this PR is safe to merge]

@ldemailly
Copy link
Copy Markdown
Member

shouldn't it be multi nodes for everyone - which test will work well on 1 node ?

@nmittler
Copy link
Copy Markdown
Contributor Author

@ldemailly

shouldn't it be multi nodes for everyone - which test will work well on 1 node ?

I tried that, but the mixer tests were failing for some reason, so I changed it so that only pilot e2e is impacted. Didn't want to investigate - especially given that everything should be migrating to the pooled clusters eventually.

@nmittler
Copy link
Copy Markdown
Contributor Author

/test e2e-simple

@istio-merge-robot
Copy link
Copy Markdown

Automatic merge from submit-queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants