fix(ci): add retry logic for apt-get to prevent mirror desync failures#29236
Conversation
|
CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes. |
8e455ee to
f918e0c
Compare
Wrap apt-get commands in nick-fields/retry with 3 attempts and 30s wait to handle transient Ubuntu mirror desync errors that account for ~26% of setup environment CI failures. Add DPkg::Lock::Timeout=120 to handle dpkg lock contention on Cirrus self-hosted runners. Remove unnecessary apt install of gh CLI from create-release-draft workflow since gh is pre-installed on all ubuntu-latest runner images.
Add on_retry_command to run apt-get clean before each retry, clearing cached/corrupt package lists so apt-get update fetches fresh data from mirrors. Reduce timeout_minutes from 5 to 3 — apt takes 5-15s normally and even with a 120s dpkg lock wait the worst case is ~135s.
f918e0c to
a03f753
Compare
|
✅ E2E Fixture Validation — Schema is up to date |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a03f753. Configure here.
nick-fields/retry v3.0.2 does not set bash -e for multi-line commands. Without it, if apt-get update or apt-get install fails, bash continues to the trailing echo which exits 0, masking the failure and preventing retries from triggering. Credit: Cursor Bugbot review.
🔍 Smart E2E Test Selection
click to see 🤖 AI reasoning detailsE2E Test Selection:
Neither change touches application source code, test scenarios, controllers, UI components, navigation, or any user-facing functionality. No E2E tests need to run to validate these CI infrastructure improvements. Performance Test Selection: |
|
AI PR Analysis🚫 Merge safe: false | 🟠 Risk: high
AI analysis did not complete. Manual review recommended. |
|
Note on AI PR risk analysis failure: The |




Description
Problem: Ubuntu apt mirror desync failures account for 26% (~17 runs) of the 64 Setup Environment CI failures on
mainover 30 days (Mar 16 – Apr 16, 2026). The failure signature isapt-get updatefailing withFile has unexpected size ... Mirror sync in progress?when Ubuntu mirrors are mid-sync. Additionally, dpkg lock contention on Cirrus self-hosted runners caused ~2% (~1 run) of failures.All failures are 100% transient network/infrastructure issues — zero are caused by missing packages, version conflicts, or configuration errors. See INFRA-3580 for the full root cause analysis.
Solution:
setup-e2e-env/action.yml— Wrapapt-get update+apt-get installinnick-fields/retry(3 attempts, 30s wait, 3min timeout per attempt), matching the existing repo pattern already used 3 times in the same file. Add-o DPkg::Lock::Timeout=120to handle dpkg lock contention on Cirrus runners. Addon_retry_command: sudo apt-get cleanto clear cached/corrupt package lists before each retry.create-release-draft.yml— Remove unnecessaryapt update && apt install gh. GitHub CLI (gh) is pre-installed on allubuntu-latestrunner images (v2.89.0 on both Ubuntu 22.04 and 24.04 per actions/runner-images).Data-backed design decisions:
timeout_minutes: 3per attempt — apt takes 5-15s normally; even with a 120s dpkg lock wait the worst case is ~135s. 3 min is 12-36x the happy path while avoiding a 5-min wait on true hangs.retry_wait_seconds: 30gives mirrors time to finish syncing (typically resolves in <60s).on_retry_command: sudo apt-get cleanclears cached/corrupt package lists before each retry soapt-get updatefetches fresh data from mirrors.DPkg::Lock::Timeout=120has zero cost when no lock contention exists (the normal case). Needed because Android E2E runs on Cirrus self-hosted runners (ghcr.io/cirruslabs/ubuntu-runner-amd64:24.04-lg) where backgroundunattended-upgradesorapt-dailycan hold the dpkg lock.Changelog
CHANGELOG entry: null
Related issues
Fixes: INFRA-3580 (partial — addresses Ubuntu apt mirror desync and dpkg lock contention sub-causes)
Manual testing steps
Screenshots/Recordings
N/A — CI workflow changes only, no UI impact.
Before
N/A
After
N/A
Pre-merge author checklist
Pre-merge reviewer checklist
Note
Low Risk
Low risk CI-only change; main impact is altering how Android E2E Linux dependencies are installed and could affect runner setup if the retry wrapper is misconfigured.
Overview
Improves Android E2E setup reliability by wrapping the Linux
apt-get update/installstep innick-fields/retry, adding dpkg lock timeouts and cleanup on retry to better handle transient mirror/lock issues.Simplifies
create-release-draftby removingapt-basedghinstallation and only performinggh auth loginbefore running the release draft script.Reviewed by Cursor Bugbot for commit e6ee232. Bugbot is set up for automated code reviews on this repo. Configure here.