fix: tolerate FileNotFoundError in drop_iocache (#4153)#4191
Conversation
Replace check-then-remove with try/except so parallel SLURM workers racing on .snakemake/iocache/latest.pkl don't crash when a sibling worker has already removed the file. Signed-off-by: SAY-5 <say.apm35@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR fixes a TOCTOU race condition in ChangesIOCache Race Condition Fix
🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🤖 I have created a release *beep* *boop* --- ## [9.21.1](v9.21.0...v9.21.1) (2026-05-29) ### Bug Fixes * add default json function to benchmarks ([#4128](#4128)) ([41fab22](41fab22)) * do not rerun when checkpoint job missing but downstream file exists ([#4124](#4124)) ([a060b93](a060b93)) * ensure that error logs contain all available details ([#4183](#4183)) ([74a86e9](74a86e9)) * handle missing pss attribute in benchmark on Windows ([#4160](#4160)) ([da52080](da52080)) * implement Resources.setdefault ([#3968](#3968)) ([2413e99](2413e99)) * reporting remote nodes number ([#3978](#3978)) ([8c534f0](8c534f0)) * resolve pathvars before constructing storage queries ([#3969](#3969)) ([bd15237](bd15237)) * storage temp() file cleanup with RemoteProvider ([#4189](#4189)) ([898bad1](898bad1)) * tolerate FileNotFoundError in drop_iocache ([#4153](#4153)) ([#4191](#4191)) ([ce26b28](ce26b28)) ### Documentation * Added guide on debugging workflows ([#4029](#4029)) ([3d052ae](3d052ae)) * **cli:** Remove broken ref bold markup ([#4204](#4204)) ([1200ebf](1200ebf)) * remove duplicated resources attribute in rules.rst ([#4190](#4190)) ([6c8ecdd](6c8ecdd)) * **rules:** Update script type hint advice ([#4193](#4193)) ([6108712](6108712)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Fixes #4153.
QC
AI-assistance disclosure
I used AI assistance for:
The race in `Persistence.drop_iocache()` (`src/snakemake/persistence/init.py`) is a check-then-remove pattern. Multiple parallel SLURM workers can pass `os.path.exists()` for `.snakemake/iocache/latest.pkl` (especially on shared filesystems with stale metadata caching), then have `os.remove()` raise `FileNotFoundError` because another worker already deleted the file, crashing DAG init.
Replaced the check-then-remove with a try/except that swallows `FileNotFoundError`. Added two regression tests in `tests/test_persistence.py` covering both the missing-file case and the normal removal path.
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests