-
-
Notifications
You must be signed in to change notification settings - Fork 379
Description
So this is baffling to me, and I know it has nothing to do with anything we did, but it's super frustrating: for some reason, running curl https://codecov.io/bash has recently become super flaky and is making our CI runs fail all the time and we gotta do something.
Observations:
The most common failure seems to be a timeout during connect. I've also seen this, which is super weird:
+curl --retry 5 -o codecov.sh https://codecov.io/bash
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:15 --:--:-- 0
curl: (35) gnutls_handshake() failed: Error in the pull function.
According to the curl docs, curl --retry 5 should retry on timeouts but.... I can't tell whether it's actually doing that?
Codecov as a whole is kind of flaky, so it's tempting to blame this on them. But I'm not sure this is actually their fault... codecov.io appears to be hosted by google (based on running whois on the IP address), so I'm guessing it's some google CDN or something? (Or maybe it's just some random server running on google cloud with no CDN in front of it, who knows?)
The failures seem to all be from Travis, not Azure. And I'm pretty sure Travis is running on Google Cloud servers. Wild supposition: maybe Travis's traffic is considered "internal" to Google Cloud and doesn't hit the CDN, while Azure's traffic is "external" and does hit the CDN? I have no idea. I guess we could try running traceroute from both Travis and Azure, but that seems like it might be running off down an unproductive rabbit-hole.
...oh wait, and actually it just failed on azure too, so never mind: https://dev.azure.com/python-trio/trio/_build/results?buildId=1031&view=logs&jobId=872bf439-86bb-5ce5-edcd-c35619d700a0
so... what the heck can we do about this? the obvious answer is to retry, but curl's --retry isn't working. Is that because there's something wrong with --retry, or is it because when we hit one of these failures, it's somehow "sticky"? Would putting our own retry loop around curl help?
I guess we could also like, stash a copy of codecov-bash somewhere more reliable, but that seems like it would create all kinds of operational annoyances.