Skip to content

pytorch_linux_xenial_py3_6_gcc5_4_test failing due to cert issues #65931

@suo

Description

@suo

Current status: mitigated, rebase on master to get the fix #65934.

Error looks like:

Fetched 4466 kB in 4s (1113 kB/s)
Reading package lists...
W: The repository 'https://deb.nodesource.com/node_12.x xenial Release' does not have a Release file.
E: Failed to fetch https://deb.nodesource.com/node_12.x/dists/xenial/main/source/Sources  server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
E: Some index files failed to download. They have been ignored, or old ones used instead.

Incident timeline (times pacific)

User impact

Consistent failure of pytorch_linux_xenial_py3_6_gcc5_4_test, which is present on master and PR CI.

Root cause

Let's Encrypt's root certificate expired on 9/30 (helpful explanation). This broke our calls to the nodesource package repository, which we were using to install some tools at test time.

Mitigation

The issue was mitigated on trunk by forcing an update of gnutls, which contains the new root certificates.

Prevention/followups

Pre-installing dependencies in docker. We should not be installing anything during the core build and test phase. While eventually docker rebuilds would fail and we would have to apply the same mitigation, this is much less urgent than a CI breakage.

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @seemethere @malfet @pytorch/pytorch-dev-infra

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci: sevcritical failure affecting PyTorch CIhigh prioritymodule: ciRelated to continuous integrationmodule: third_partytriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions