fix: do not rerun when checkpoint job missing but downstream file exists#4124
Conversation
|
Caution Review failedFailed to post review comments 📝 WalkthroughWalkthroughRequire checkpoint outputs to both not need rerun and be present before enqueuing instances; add an integration test that deletes checkpoint outputs between runs to verify they are regenerated; remove an outdated checkpoint test. ChangesDAG checkpoint logic
Tests
Sequence Diagram(s)sequenceDiagram
participant Runner as Test/Runner
participant DAG as DAG.update_checkpoint_dependencies
participant Checkpoint as CheckpointJob
participant FS as Filesystem
Runner->>DAG: request candidate checkpoint jobs
DAG->>Checkpoint: for each candidate, evaluate needrun(checkpoint)
alt needrun == True
Checkpoint-->>DAG: flagged for rerun (skip enqueue)
else needrun == False
DAG->>FS: await is_output_present(checkpoint)
alt output exists
FS-->>DAG: present
DAG-->>Checkpoint: add to job_queue
else output missing
FS-->>DAG: missing
DAG-->>Checkpoint: do not add (will trigger rerun)
end
end
DAG-->>Runner: updated job_queue
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/tests.py (1)
1628-1639: Test validates the fix, but comments may be misleading.The test correctly verifies that the workflow doesn't crash when checkpoint outputs are deleted between runs, which addresses issue
#3879. However, the comments "should not fail (but nothing to do)" are potentially misleading — when checkpoint outputs are deleted, the checkpoint should be re-executed, so there is actually work to do.Consider verifying that deleted outputs are actually regenerated after rerun to strengthen the test:
💡 Optional: Verify outputs are regenerated
def test_checkpoint_missing_output(): """test for issue 3879, also covers 3009""" # normal run to create the checkpoint output and final output tmpdir = run(dpath("test_checkpoint_missing_output"), cleanup=False) assert tmpdir - # should not fail (but nothing to do) + # Delete checkpoint output; rerun should regenerate it (tmpdir / "output" / "test_1.txt").unlink() run(dpath("test_checkpoint_missing_output"), cleanup=False, tmpdir=tmpdir) - # should not fail (but nothing to do) + assert (tmpdir / "output" / "test_1.txt").exists(), "test_1.txt should be regenerated" + # Delete another checkpoint output; rerun should regenerate it (tmpdir / "output" / "test_0.txt").unlink() run(dpath("test_checkpoint_missing_output"), cleanup=False, tmpdir=tmpdir) + assert (tmpdir / "output" / "test_0.txt").exists(), "test_0.txt should be regenerated"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/tests.py` around lines 1628 - 1639, The test_comment is misleading and the test should assert that deleted checkpoint outputs are regenerated: in test_checkpoint_missing_output update the comments to reflect that deleting checkpoint outputs should trigger re-execution, and after each run(dpath("test_checkpoint_missing_output"), cleanup=False, tmpdir=tmpdir) add assertions that (tmpdir / "output" / "test_1.txt") and (tmpdir / "output" / "test_0.txt") respectively exist (or contain expected contents) to verify the checkpoint was rerun; reference the test function name test_checkpoint_missing_output, the run(...) helper, and the tmpdir / "output" / "<file>" paths when adding the checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/tests.py`:
- Around line 1628-1639: The test_comment is misleading and the test should
assert that deleted checkpoint outputs are regenerated: in
test_checkpoint_missing_output update the comments to reflect that deleting
checkpoint outputs should trigger re-execution, and after each
run(dpath("test_checkpoint_missing_output"), cleanup=False, tmpdir=tmpdir) add
assertions that (tmpdir / "output" / "test_1.txt") and (tmpdir / "output" /
"test_0.txt") respectively exist (or contain expected contents) to verify the
checkpoint was rerun; reference the test function name
test_checkpoint_missing_output, the run(...) helper, and the tmpdir / "output" /
"<file>" paths when adding the checks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 651adaa0-e9b3-444e-b3a3-2039c86d9178
📒 Files selected for processing (7)
src/snakemake/dag.pytests/test_checkpoint_missing_output/Snakefiletests/test_checkpoint_missing_output/expected-results/output/selection.txttests/test_checkpoint_missout/Snakefiletests/test_checkpoint_missout/expected-results/final.txttests/test_checkpoint_missout/problem.txttests/tests.py
💤 Files with no reviewable changes (1)
- tests/test_checkpoint_missout/Snakefile
|
Please format your code with |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/tests.py (1)
1628-1639: Test lacks assertions verifying checkpoint outputs are regenerated.The test validates that the workflow doesn't crash when checkpoint outputs are missing, but doesn't assert that the deleted files (
test_1.txt,test_0.txt) were actually regenerated after each rerun. Given the PR objective is to ensure "checkpoint outputs are properly detected and regenerated when deleted," consider adding assertions to verify the files exist after each run.Additionally, the comments "target file exists so nothing to do" are misleading since the files are being deleted and the checkpoint should be re-run to regenerate them.
💡 Suggested improvement
def test_checkpoint_missing_output(): """test for issue 3879, also covers 3009""" # normal run to create the checkpoint output and final output tmpdir = run(dpath("test_checkpoint_missing_output"), cleanup=False) assert tmpdir - # should not fail (target file exists so nothing to do) + # delete checkpoint output and verify it gets regenerated (tmpdir / "output" / "test_1.txt").unlink() run(dpath("test_checkpoint_missing_output"), cleanup=False, tmpdir=tmpdir) - # should not fail (target file exists so nothing to do) + assert (tmpdir / "output" / "test_1.txt").exists(), "test_1.txt should be regenerated" + # delete another checkpoint output and verify regeneration (tmpdir / "output" / "test_0.txt").unlink() run(dpath("test_checkpoint_missing_output"), cleanup=False, tmpdir=tmpdir) + assert (tmpdir / "output" / "test_0.txt").exists(), "test_0.txt should be regenerated" + shutil.rmtree(tmpdir, ignore_errors=ON_WINDOWS)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/tests.py` around lines 1628 - 1639, The test function test_checkpoint_missing_output currently deletes checkpoint output files but doesn't assert they are regenerated; update test_checkpoint_missing_output to assert existence of (tmpdir / "output" / "test_1.txt") after the first rerun and assert existence of (tmpdir / "output" / "test_0.txt") after the second rerun (using the existing run(...) helper and dpath(...) to locate the files), and revise the misleading comments to say that files are deleted and should be regenerated rather than "target file exists so nothing to do".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/tests.py`:
- Around line 1628-1639: The test function test_checkpoint_missing_output
currently deletes checkpoint output files but doesn't assert they are
regenerated; update test_checkpoint_missing_output to assert existence of
(tmpdir / "output" / "test_1.txt") after the first rerun and assert existence of
(tmpdir / "output" / "test_0.txt") after the second rerun (using the existing
run(...) helper and dpath(...) to locate the files), and revise the misleading
comments to say that files are deleted and should be regenerated rather than
"target file exists so nothing to do".
themavik
left a comment
There was a problem hiding this comment.
Gating checkpoint enqueue on is_output_present fixes the stale-node case. nit: confirm dropping test_checkpoint_missout does not leave 3009/3879 uncovered.
🤖 I have created a release *beep* *boop* --- ## [9.21.1](v9.21.0...v9.21.1) (2026-05-29) ### Bug Fixes * add default json function to benchmarks ([#4128](#4128)) ([41fab22](41fab22)) * do not rerun when checkpoint job missing but downstream file exists ([#4124](#4124)) ([a060b93](a060b93)) * ensure that error logs contain all available details ([#4183](#4183)) ([74a86e9](74a86e9)) * handle missing pss attribute in benchmark on Windows ([#4160](#4160)) ([da52080](da52080)) * implement Resources.setdefault ([#3968](#3968)) ([2413e99](2413e99)) * reporting remote nodes number ([#3978](#3978)) ([8c534f0](8c534f0)) * resolve pathvars before constructing storage queries ([#3969](#3969)) ([bd15237](bd15237)) * storage temp() file cleanup with RemoteProvider ([#4189](#4189)) ([898bad1](898bad1)) * tolerate FileNotFoundError in drop_iocache ([#4153](#4153)) ([#4191](#4191)) ([ce26b28](ce26b28)) ### Documentation * Added guide on debugging workflows ([#4029](#4029)) ([3d052ae](3d052ae)) * **cli:** Remove broken ref bold markup ([#4204](#4204)) ([1200ebf](1200ebf)) * remove duplicated resources attribute in rules.rst ([#4190](#4190)) ([6c8ecdd](6c8ecdd)) * **rules:** Update script type hint advice ([#4193](#4193)) ([6108712](6108712)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
will fix #3879. See also the discussion in #4092
QC
AI-assistance disclosure
I used AI assistance for:
Summary by CodeRabbit
Bug Fixes
Tests