Skip to content

Test failures starting 22 September due to network timeouts #23741

@tam7t

Description

@tam7t

What happened:

On our around 22 September we started seeing CI failures in prow:

Example log from error:

Step 4/8 : RUN apk add --no-cache curl &&     curl -LO https://storage.googleapis.com/kubernetes-release/release/${KUBE_VERSION}/bin/linux/${ARCH}/kubectl &&     chmod +x kubectl
 ---> Running in d7e388707d87
fetch https://dl-cdn.alpinelinux.org/alpine/v3.14/main/x86_64/APKINDEX.tar.gz
{"component":"entrypoint","file":"prow/entrypoint/run.go:165","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 25m0s timeout","severity":"error","time":"2021-09-23T21:50:51Z"}

Ref run: https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_secrets-s[…]e-csi-driver-image-scan/1441072726373568512/build-log.txt

What you expected to happen:

Successful builds.

How to reproduce it (as minimally and precisely as possible):

This happens consistently on all of our builds. We've tried reverting commits in PRs and cannot find anything related to the test case that would cause this. The same tests running on the k8s-infra-prow-build cluster succeed.

We have also seen it fail on apt:

  Connection timed out [IP: 151.101.194.132 80]
[91mE: Failed to fetch http://deb.debian.org/debian/pool/main/u/util-linux/libblkid1_2.36.1-8_amd64.deb  Connection timed out [IP: 199.232.126.132 80]
E: Failed to fetch http://security.debian.org/debian-security/pool/updates/main/o/openssl/libssl1.1_1.1.1k-1%2bdeb11u1_amd64.deb  Connection timed out [IP: 151.101.2.132 80]
E: Failed to fetch http://security.debian.org/debian-security/pool/updates/main/o/openssl/openssl_1.1.1k-1%2bdeb11u1_amd64.deb  Connection timed out [IP: 151.101.194.132 80]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

We also tried setting and increasing CPU/Memory requests in #23723 and #23725 but were unsuccessful

Please provide links to example occurrences, if any:

Anything else we need to know?:

Discussion thread in slack: https://kubernetes.slack.com/archives/C09QZ4DQB/p1632414457031600

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/testingCategorizes an issue or PR as relevant to SIG Testing.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions