fix(ci): exit-4 forensics for vanishing test files in run_tests_parallel.py#43646
Merged
Conversation
…sts exit-4 retries A PR-added test file (tests/test_iron_proxy.py, PR #30179) repeatedly failed exactly one CI shard with 'ERROR: file or directory not found' across 4 runs (including a fresh merge SHA on fresh runners), while the identical slice passes locally against the same merge commit and a tree-integrity watcher confirms no sibling test mutates the repo. Three unrelated branches showed the same one-shard signature the same day. We currently cannot attribute these because the log only carries pytest's exit-4 line. This adds a forensics block to the captured output when exit-4 survives the retry loop: - does the file exist NOW (post-retries) - parent dir entry count + similarly-named entries - git status --porcelain dirty-entry count + first 10 entries Zero behavior change: rc stays 4, retries unchanged, forensics wrapped in a broad try/except so they can never mask the failure. Two new tests cover the exhausted-retries and genuinely-missing paths.
Contributor
🔎 Lint report:
|
ethernet8023
approved these changes
Jun 10, 2026
19 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a per-file pytest run exhausts the exit-4 retry loop, the runner now appends filesystem forensics to the captured output — so the recurring CI-only "file or directory not found" failures become attributable from the log instead of guessed at.
Why
PR #30179's new
tests/test_iron_proxy.pyfailed exactly one shard (test (5)) withERROR: file or directory not foundacross 4 consecutive CI runs — including a fresh merge SHA on fresh runners — while:--copass counted its 88 tests seconds earlier (file provably on disk),237 files, 5755 tests, 0 failed),test (1),test (3),test (5)— one failing on a file that exists on main).One run also produced a stale-content mode: 8 tests failed importing
_egress_proxy_args_for_dockerfrom adocker.pythat momentarily had main's content while the same process had already imported the PR's brand-new sibling module. That is impossible against a stable checkout.The existing exit-4 backoff retry (3 spaced attempts) doesn't recover these, and the log carries nothing to diagnose them with. This PR adds the missing observability.
Changes
scripts/run_tests_parallel.py: whenrc == 4survives the retry loop, append a forensics block to the output: file-exists-now, retries used, parent-dir entry count + similarly-named entries,git status --porcelaindirty count + first 10 entries. Wrapped in broad try/except so forensics can never mask the rc=4. Zero behavior change otherwise.tests/test_run_tests_parallel.py: 2 new tests (exhausted-retries path asserts forensics +exists=True+ retry count; genuinely-missing path asserts fail-fast +exists=False).Validation
tests/test_run_tests_parallel.py_spawn_pytest_onceInfographic