Skip to content

fix: set mtu as workaround to fix network timeouts in DinD test#23757

Merged
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
andyzhangx:mtu
Sep 27, 2021
Merged

fix: set mtu as workaround to fix network timeouts in DinD test#23757
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
andyzhangx:mtu

Conversation

@andyzhangx

@andyzhangx andyzhangx commented Sep 27, 2021

Copy link
Copy Markdown
Member

What type of PR is this?
/kind bug

What this PR does / why we need it:
fix: set mtu as workaround to fix network timeouts in DinD test, find detailed discussion here: https://kubernetes.slack.com/archives/C09QZ4DQB/p1632710433059900?thread_ts=1632414457.031600&cid=C09QZ4DQB

Which issue(s) this PR fixes:

Fixes #23741

remove empty line

fix comment
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 27, 2021
@k8s-ci-robot k8s-ci-robot added area/images sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Sep 27, 2021
@andyzhangx andyzhangx changed the title fix: set mtu as workaround in test fix: set mtu as workaround to fix network timeouts in DinD test Sep 27, 2021
@lizhuqi

lizhuqi commented Sep 27, 2021

Copy link
Copy Markdown
Contributor

/lgtm

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

@lizhuqi: changing LGTM is restricted to collaborators

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@randomvariable

Copy link
Copy Markdown
Member

/lgtm
/assign @BenTheElder

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 27, 2021
@CecileRobertMichon

Copy link
Copy Markdown
Member

/lgtm

@spiffxp

spiffxp commented Sep 27, 2021

Copy link
Copy Markdown
Contributor

Depending on what image is actually used in your jobs, you will need to:

  • wait for a new bootstrap image to be built
  • PR kubekins to be built from that new image, bootstrap-based jobs to use new image (or wait for next autobump PR to do so)
  • verify bootstrap-based jobs are ok
  • wait for that new kubekins to roll out
  • PR jobs to update to new kubekins (or wait for next autobump PR to do so)
  • verify kubekins-based jobs are ok

@spiffxp

spiffxp commented Sep 27, 2021

Copy link
Copy Markdown
Contributor

Basically, I'm not opposed, but I'd like to understand more about the scope here:

  • What specific jobs are failing?
  • Are they all using bootstrap or kubekins?
  • Do we have linkable examples / proof of other jobs that have been fixed by applying this change?

@aramase

aramase commented Sep 27, 2021

Copy link
Copy Markdown
Member

For the Secrets Store CSI Driver:

What specific jobs are failing?

All the jobs in this config using docker-in-docker are failing. This was blocking our release last week.

Are they all using bootstrap or kubekins?

All the jobs are using kubekins

Do we have linkable examples / proof of other jobs that have been fixed by applying this change?

We merged PR to add the iptables rules in all the jobs after some validation in our project. The tests have been passing after the PR merge.

Testgrid dashboard: https://testgrid.k8s.io/sig-auth-secrets-store-csi-driver

All the failures are on 09/22, 09/23 and mitigated on 09/24 with the fix.

Example failure: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-secrets-store-csi-driver-image-scan/1441120709228105728

@aramase

aramase commented Sep 27, 2021

Copy link
Copy Markdown
Member

/cc

@jsturtevant

Copy link
Copy Markdown
Contributor

The windows capz job is failing https://testgrid.k8s.io/sig-windows-master-release#capz-windows-dockershim-master, which uses DinD and kubekins.

@CecileRobertMichon

CecileRobertMichon commented Sep 27, 2021

Copy link
Copy Markdown
Member

@CecileRobertMichon

Copy link
Copy Markdown
Member

Examples of jobs that were fixed with the workaround: all the jobs changed in #23744, such as https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-provider-azure#capz-periodic-conformance-v1alpha4-main.

@spiffxp spiffxp left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 27, 2021
@k8s-ci-robot k8s-ci-robot merged commit 2df492b into kubernetes:master Sep 27, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Sep 27, 2021
@spiffxp

spiffxp commented Sep 28, 2021

Copy link
Copy Markdown
Contributor

wait for a new bootstrap image to be built
PR kubekins to be built from that new image, bootstrap-based jobs to use new image (or wait for next autobump PR to do so)

#23784 - autobump PR that auto-merged did this

wait for that new kubekins to roll out
PR jobs to update to new kubekins (or wait for next autobump PR to do so)

#23795 - I just approved this autobump PR

verify bootstrap-based jobs are ok
verify kubekins-based jobs are ok

I'm going to assume folks who have commented here can handle this part?

@CecileRobertMichon

Copy link
Copy Markdown
Member

We just got our first conformance test pass since 9/22 so looks like it's working 🎉

https://testgrid.k8s.io/provider-azure-master-signal#capz-conformance

@spiffxp

spiffxp commented Oct 4, 2021

Copy link
Copy Markdown
Contributor

I'm planning on gradually rolling this back using a mechanism that will allow us to do so with fewer PRs if we need to enable it again in the future, ref: #23741 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/images cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test failures starting 22 September due to network timeouts

9 participants