Skip to content

misc(travis): fix travis hang by disabling yarn GPG verification#10075

Closed
jayaddison wants to merge 4 commits into
GoogleChrome:masterfrom
jayaddison:test-build-disable-yarn-gpg
Closed

misc(travis): fix travis hang by disabling yarn GPG verification#10075
jayaddison wants to merge 4 commits into
GoogleChrome:masterfrom
jayaddison:test-build-disable-yarn-gpg

Conversation

@jayaddison

Copy link
Copy Markdown
Contributor

Summary
Travis pull requests are running into 10 minute activity timeouts relatively frequently at the moment. At the time of writing, only 1 in the last 25 pull request builds succeeded.

Based on some web references, it looks like this might be due to the fact that the yarn install.sh script used by Travis may spin off a GPG agent during install verification, and this can prevent clean termination of build steps.

This changes attempts to disable the GPG verification check for PR builds only, via the Travis CI head_branch conditional.

I'm not yet certain this is the build failure cause -- disabling GPG verification seems ungreat, and every indication is that this was only a problem on the windows platform, so this is an experimental change.

@patrickhulce patrickhulce left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no linux is being hit by this now too!? :(

Can confirm this was a massive headache in lighthouse-ci windows builds

Personally, I say we just set YARN_GPG=no in all builds.

THanks so much for hunting this down and the PR @jayaddison!

@jayaddison

Copy link
Copy Markdown
Contributor Author

@patrickhulce Unfortunately I don't know 100% that it's the root cause, but it's my strong suspicion - basically I think the Travis cache stage gets stalled thanks to the GPG agent process. It fits the behavior pattern, and initially this PR passing also seems to support it working. But it'd probably take more builds and repeat evidence to confirm whether this is making a real difference.

@jayaddison

Copy link
Copy Markdown
Contributor Author

@patrickhulce As an outsider I guess I don't have a strong opinion, but for some reason, master builds do seem relatively reliable (and actually, that's probably strong-ish evidence that maybe something else is going on?) -- and verifying the yarn install files seems like good practice if possible. I wouldn't love hearing that there'd been some kind of yarn security issue down the line and then be the person responsible for having turned off GPG verification in master builds :)

@jayaddison

Copy link
Copy Markdown
Contributor Author

@patrickhulce Alright, yep - it looks like gpg --import starts a gpg-agent daemon process under linux platforms as well. Time to look into fixes there, and I may open a PR/issue against yarnpkg/website.

@paulirish

Copy link
Copy Markdown
Member

but for some reason, master builds do seem relatively reliable

not anymore! :)

image

@paulirish paulirish changed the title test(build): Disable yarn GPG verification during PR builds test(build): disable yarn GPG verification Dec 6, 2019
@paulirish paulirish changed the title test(build): disable yarn GPG verification misc(travis): fix travis hang by disabling yarn GPG verification Dec 6, 2019
@jayaddison

Copy link
Copy Markdown
Contributor Author

@paulirish an upstream fix might be ready from yarnpkg/website#1030 soon - maybe it's worth holding off temporarily? I'm a bit wary of disabling all GPG verification here (although yep, there's a tradeoff vs getting builds back in good shape for sure)

@paulirish

Copy link
Copy Markdown
Member

WHELP.. that didn't work. :/

image

@patrickhulce

Copy link
Copy Markdown
Collaborator

It's really interesting that it's not happening to lighthouse-ci and that it's even before yarn is installed, so I'm not sure the GPG is to blame here.

Based on the fact that it's hanging on adding the yarn directory to the cache and the master cache just ballooned to 6GB, maybe we try that?

@patrickhulce

Copy link
Copy Markdown
Collaborator

Well, that seemed to work at least and the cache is back down to ~900MB on master, maybe we just need to periodically clear them and/or prune unused yarn cruft? 🤷‍♂

https://travis-ci.org/GoogleChrome/lighthouse/builds/621861205?utm_medium=notification&utm_source=github_status

@jayaddison

Copy link
Copy Markdown
Contributor Author

@patrickhulce Great, ok - caching being the real root cause does sound like it fits the behaviour patterns and would make sense.

In that case I think I'll close this PR for now, and I'll continue discussing the yarnpkg PR to see whether it's worth landing that change (definitely seems less important if it has no impact here, but perhaps good to reduce their install script side-effects anyway).

@Daniel15

Daniel15 commented Dec 7, 2019

Copy link
Copy Markdown

I haven't looked too far into this (and don't have much experience with TravisCI), but why do you need to install Yarn fresh for every single build? Usually the better and faster approach is to use a Docker container with all your build tools already installed, and just rebuild the Docker container every so often to upgrade the tools.

If you disable GPG I'd recommend at least checking a SHA256 checksum of the file.

@jayaddison

Copy link
Copy Markdown
Contributor Author

@Daniel15 It looks like Travis-CI auto-detects and installs yarn by default under certain conditions -- there's no explicit configuration (or need) in the lighthouse project to reinstall it fresh, as far as I know.

@.patrickhulce's theory (which I think matches the evidence) is that the GPG agent isn't really the cause of the recent lighthouse Travis build failures; instead it may be that the size of the build cache (which contains the ~/.yarn directory and other filesystem contents) is causing problems somehow.

It's possible Travis does create & re-use 'build-ready environments' the way you mention via container images, avoiding the need to run the install.sh script for all builds, but the Windows issue reports would suggest it does invoke the script 'live' during at least some builds.

@jayaddison jayaddison deleted the test-build-disable-yarn-gpg branch December 11, 2019 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants