Fix flaky boot issues by adding a retry parameter#563
Merged
dotdoom merged 8 commits intofutureware-tech:mainfrom Feb 5, 2026
Merged
Fix flaky boot issues by adding a retry parameter#563dotdoom merged 8 commits intofutureware-tech:mainfrom
dotdoom merged 8 commits intofutureware-tech:mainfrom
Conversation
Member
|
Thank you, LGTM! I'll resolve the conflicts that resulted from my earlier PR and merge this. |
29a0113 to
1833a2e
Compare
dotdoom
approved these changes
Feb 5, 2026
github-merge-queue Bot
pushed a commit
to SharezoneApp/sharezone-app
that referenced
this pull request
Feb 5, 2026
futureware-tech/simulator-action#563 got merged. No need to use the fork anymore.
Merged
olerass
added a commit
to rainbow-me/rainbow
that referenced
this pull request
Apr 22, 2026
The `Ensure Simulator is fully booted` step in `ios-e2e.yml` polls `xcrun simctl list | grep "Booted"` with a 120-second timeout. The `simulator-action` that creates the simulator already supports a `wait_for_boot: true` input that uses Apple's `simctl bootstatus` internally, which is the supported mechanism for detecting simulator readiness. The custom loop was added in #6713 (Aug 2025) during the GH Actions migration with no documented rationale; presumably the action's native wait was simply overlooked at the time. We noticed the grep-based detection is fragile while working on the iOS 26 SDK migration. An iOS 26.4 simulator booted in ~36 seconds, yet our custom loop timed out at 120s because the `grep 'Booted'` pattern didn't match reliably during the boot transition. Swapping to the action's native wait fixed it cleanly in that context. This change backports the combined swap to develop's current iPhone 16 / iOS 18.5 setup. Behavior should be equivalent on green runs and more robust on slow boots. @janicduplessis pointed out on this PR that the built-in wait has historically been flaky: `simctl bootstatus` sometimes returns while the simulator isn't actually ready, or hangs outright. We actually observed that hang pattern on one shard here. To combat this the change also bumps the simulator-action from v4 to v5, which addresses this exact class of flake. v5 ships `boot_timeout_seconds: 360` and `boot_retries: 2` as defaults (see [#563](futureware-tech/simulator-action#563)), so a hung `bootstatus` call now bails after 6 minutes and retries up to twice before failing the step. No explicit config needed on our side. Ref FEPLAT-81.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As reported in #548 this GitHub Action is sometimes a bit flaky. It doesn't complete the boot process of the simulator. As a workaround I add retries, that can be configured with the following parameters:
boot_timeout_seconds(default:360): Maximum number of seconds to wait for the Simulator to finish booting (0 disables the timeout)boot_retries(default:2): Number of times to retry booting when waiting for the Simulator to finish booting fails. Setting this to 2 will result in 3 attempts: one normal attempt and two retries.Before this PR the action failed 6x out of 20 runs (see logs). With the changes from this PR, the action failed 0 out of 40 runs (see these logs and these logs)
In this log the retry was used: https://github.com/nilsreichardt/integration_test_problem/actions/runs/21669542187/job/62473614621
Closes #548