Summary
When XHarness cannot find a suitable Android emulator (exit code 81 / DEVICE_NOT_FOUND), it currently does a single-shot check and immediately gives up. It should attempt to restart the emulator and retry before failing.
Current Behavior
- AdbRunner.GetDevice() queries adb devices -l
- If no device matches the required architecture (e.g. x86_64), returns null
- WaitForDevice() may wait for boot completion, but only if a device was found
- Returns ExitCode.DEVICE_NOT_FOUND (81) with no recovery attempted
Helix retries the work item up to 3 times, but retries run on the same machine with the same dead emulator. The machine reboot only fires after all attempts are exhausted.
Proposed Behavior
Before returning DEVICE_NOT_FOUND, XHarness should attempt recovery:
- Reset the ADB daemon (adb stop-server then adb start-server)
- Re-check adb devices -- if the emulator reappears, continue
- If still missing, attempt to restart the emulator process via systemctl restart android-emulator (for systemd-managed emulators on Helix machines) or adb emu restart as a fallback
- Wait for boot completion (sys.boot_completed == 1) with a reasonable timeout
- Re-check for the required device
- Only return DEVICE_NOT_FOUND if all recovery attempts fail
Additionally, better diagnostics before failing would help:
- Log output of adb devices -l showing what IS available
- Log emulator process status (systemctl status)
- Log the specific architecture mismatch (needed x86_64, found only x86)
Evidence
Over the last 2 days in the dotnet/runtime CI pipeline (definition 129), there were 29 DEVICE_NOT_FOUND failures across the android-x64 Release AllSubsets_CoreCLR_Smoke leg, spread across 17 different machines. The common pattern:
- Machine has two emulators: emulator-5554 (x86, API 29) and emulator-5556 (x86_64, API 29)
- The x86_64 emulator (emulator-5556) crashes or fails to start
- XHarness finds only the x86 emulator, which does not match the x86_64 requirement
- Exits immediately with code 81
- All 3 Helix retry attempts fail because the emulator stays dead
Example failing work item: System.Security.Cryptography.Tests in Helix job ced47868-669f-4428-b0e8-ea795af7b0c3 on machine a003BI4. Only emulator-5554 (x86) was found, emulator-5556 (x86_64) was missing.
Console log: https://helix.dot.net/api/2019-06-17/jobs/ced47868-669f-4428-b0e8-ea795af7b0c3/workitems/System.Security.Cryptography.Tests/console
Impact
Affects multiple test suites on the android-x64 CoreCLR Smoke leg: System.Security.Cryptography.Tests (~10x), Android.Device_Emulator.JIT tests (~4x), System.Diagnostics.Tracing.Tests (~6x), and others. These all share the same root cause of a crashed x86_64 emulator with no recovery path.
Related: #1548 (tvOS device log stream blocking issue)
Summary
When XHarness cannot find a suitable Android emulator (exit code 81 / DEVICE_NOT_FOUND), it currently does a single-shot check and immediately gives up. It should attempt to restart the emulator and retry before failing.
Current Behavior
Helix retries the work item up to 3 times, but retries run on the same machine with the same dead emulator. The machine reboot only fires after all attempts are exhausted.
Proposed Behavior
Before returning DEVICE_NOT_FOUND, XHarness should attempt recovery:
Additionally, better diagnostics before failing would help:
Evidence
Over the last 2 days in the dotnet/runtime CI pipeline (definition 129), there were 29 DEVICE_NOT_FOUND failures across the android-x64 Release AllSubsets_CoreCLR_Smoke leg, spread across 17 different machines. The common pattern:
Example failing work item: System.Security.Cryptography.Tests in Helix job ced47868-669f-4428-b0e8-ea795af7b0c3 on machine a003BI4. Only emulator-5554 (x86) was found, emulator-5556 (x86_64) was missing.
Console log: https://helix.dot.net/api/2019-06-17/jobs/ced47868-669f-4428-b0e8-ea795af7b0c3/workitems/System.Security.Cryptography.Tests/console
Impact
Affects multiple test suites on the android-x64 CoreCLR Smoke leg: System.Security.Cryptography.Tests (~10x), Android.Device_Emulator.JIT tests (~4x), System.Diagnostics.Tracing.Tests (~6x), and others. These all share the same root cause of a crashed x86_64 emulator with no recovery path.
Related: #1548 (tvOS device log stream blocking issue)