feat: Allow storing snakemake metadata in files or databases by tedil · Pull Request #4012 · snakemake/snakemake

tedil · 2026-03-09T15:15:59Z

This PR allows choosing whether metadata stored in .snakemake/metadata is stored as files or in a database.
To this end, we expose --persistence-backend [file|db] and --persistence-backend-db-url URL] cli options to choose between the two.
There is NO conversion (between files and db backend) logic implemented.
There is also NO db schema migration logic implemented, yet (will leave that as future work ;)).

Resolves #2969 (well, at least I hope so)

Summary by CodeRabbit

New Features
- CLI options to select persistence backend (file or db) and to supply a DB URL.
Refactor
- Persistence redesigned into a pluggable, version-aware backend system with file- and SQLite-backed implementations, migration, locking, backup and cache support; workflow now selects backend dynamically.
Tests
- Tests updated and parameterized to exercise both persistence backends.

coderabbitai · 2026-03-09T15:16:31Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a pluggable, versioned persistence system: introduces PersistenceBase, FilePersistence, and DbPersistence; removes the old monolithic persistence module; wires new CLI options and WorkflowSettings to select backend and DB URL; updates workflow initialization and tests to exercise both backends.

Changes

Cohort / File(s)	Summary
CLI & Settings `src/snakemake/cli.py`, `src/snakemake/settings/types.py`	Add CLI options `--persistence-backend` and `--persistence-backend-db-url`; add `persistence_backend` and `persistence_backend_db_url` to `WorkflowSettings`; forward options into API/Workflow.
Persistence package (core API) `src/snakemake/persistence/__init__.py`	Add `PersistenceBase`, `MetadataRecord`, `ParamsChange`, mixins (EnvironmentMaintenanceMixin, FileBackupMixin), IO-cache, locking, metadata accessors, change-detection, async helpers and housekeeping utilities.
File backend `src/snakemake/persistence/file.py`	Add `FilePersistence` implementing on-disk record layout, atomic read/write/delete, v1→v2 migration, path encoding, LRU cache, incomplete tracking, and lock-file management.
Database backend `src/snakemake/persistence/db.py`	Add `DbPersistence` using SQLAlchemy ORM (`MetadataRecordORM`, `LockORM`), SQLite pragmas, table creation, caching, upsert/write/read/delete, incomplete-key queries, and lock storage.
Removed legacy module `src/snakemake/persistence.py`	Remove previous single-file `Persistence` implementation, its `Persistence` class, `ParamsChange`, and all accompanying helpers and public API.
Workflow integration `src/snakemake/workflow.py`	Replace direct `Persistence` instantiation with dynamic backend selection using `workflow_settings.persistence_backend`; import `FilePersistence`/`DbPersistence` and adopt `PersistenceBase` type hints; pass DB URL when required.
Tests & Test helpers `tests/common.py`, `tests/test_persistence.py`, `tests/tests.py`	Expose persistence backend options in test runner, switch tests to import specific backends, parametrize tests to exercise both file and db backends, and seed DB in tests when using SQLite backend.

Sequence Diagram(s)

sequenceDiagram
    participant User as CLI User
    participant CLI as CLI Parser
    participant Settings as WorkflowSettings
    participant Workflow as Workflow
    participant Backend as Persistence Backend

    User->>CLI: snakemake --persistence-backend db --persistence-backend-db-url <url>
    CLI->>CLI: parse arguments
    CLI->>Settings: construct with persistence_backend & persistence_backend_db_url
    Settings-->>Workflow: passed into Workflow constructor
    Workflow->>Workflow: inspect settings.persistence_backend
    alt backend == "db"
        Workflow->>Backend: instantiate DbPersistence(db_url=<url>)
    else backend == "file"
        Workflow->>Backend: instantiate FilePersistence()
    end
    Backend-->>Workflow: return PersistenceBase instance
    Workflow->>Workflow: store as _persistence

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description explains the feature and links to the relevant issue, but lacks completion of the QC checklist for test coverage and documentation updates.	Complete the QC checklist by explicitly confirming test coverage and documentation status, or update the description to address these items.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main feature: adding support for file-based or database-backed persistence of Snakemake metadata.
Linked Issues check	✅ Passed	The PR implements the core requirement from issue `#2969` by introducing database-backed persistence as an alternative to file-based metadata storage, reducing IOPS overhead on HPC systems.
Out of Scope Changes check	✅ Passed	All changes directly support the persistence backend feature. CLI options, settings, backend implementations, and test updates are all within scope; no conversion or schema migration logic is included, as explicitly noted.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch persistence-refactor

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cademirch

this all looks reasonable to me. one comment regarding multiple inheritance, disregard if it doesnt make sense. I agree it makes sense to split out the backup and env managing, but I tend to avoid multiple inheritance if possible.

src/snakemake/persistence_interface.py

tedil · 2026-03-10T11:25:16Z

I wonder if we could abstract this even further, given that the methods usually do almost exactly the same, but the db backend just queries the db first and converts to/from representations…

tedil · 2026-03-10T13:49:28Z

I wonder if we could abstract this even further, given that the methods usually do almost exactly the same, but the db backend just queries the db first and converts to/from representations…

Yup, so the "backends" basically just do their specific record reading/writing now, so mostly io related stuff; everything else is handled in the base class.

Also added some autogenerated docstrings for some of the abstract methods to make it a bit clearer on how to implement them.

tedil · 2026-03-10T13:50:30Z

Ah, and right now, I just enabled the db backend by default to see if the CI succeeds with that or not.
(there's no option yet to toggle that from the cli/config; technically, may not be needed if we make it a plugin, not sure about that, needs more discussion)

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (2)

src/snakemake/persistence/file.py (2)

154-172: Redundant path computation.

path is computed twice (lines 155 and 158). The second computation is unnecessary.

♻️ Remove redundant line

     def _io_read(self, subject, id):
         path = self._record_path(subject, id)
         if not os.path.exists(path):
             return dict()
-        path = self._record_path(subject, id)
         with open(path, "r") as f:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/snakemake/persistence/file.py` around lines 154 - 172, In _io_read remove
the redundant recomputation of path: compute path once by calling
self._record_path(subject, id) at the top of the function and reuse that
variable for the os.path.exists check and the subsequent open(), i.e. delete the
second call to self._record_path(subject, id) so the function uses the single
path variable throughout.

178-185: @lru_cache on instance method can cause memory leaks.

The lru_cache decorator holds references to self, preventing garbage collection of the FilePersistence instance. Since FilePersistence is typically long-lived per workflow and _clear_cache() exists, this may be acceptable, but consider using a cache that doesn't reference self if this class might be instantiated multiple times.

♻️ Alternative using explicit cache dictionary

+    def __init__(self, ...):
+        ...
+        self._record_cache = {}
+
-    `@lru_cache`()
-    def _read_record_cached(self, key: str) -> Optional[MetadataRecord]:
+    def _read_record_cached(self, key: str) -> Optional[MetadataRecord]:
+        if key in self._record_cache:
+            return self._record_cache[key]
         rec = self._io_read(self._metadata_path, key)
-        return (
+        result = (
             MetadataRecord(**{k: v for k, v in rec.items() if k in VALID_METADATA_KEYS})
             if rec
             else None
         )
+        self._record_cache[key] = result
+        return result
+
+    def _clear_cache(self) -> None:
-        self._read_record_cached.cache_clear()
+        self._record_cache.clear()
         self._incomplete_cache = None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/snakemake/persistence/file.py` around lines 178 - 185, The `@lru_cache` on
the instance method _read_record_cached retains references to self and can leak
FilePersistence instances; replace it with an instance-level cache (e.g.
self._read_record_cache dict) used inside _read_record_cached to store/return
results keyed by the method's key (and include self._metadata_path if records
vary by path), and update/extend the existing _clear_cache() to clear that dict;
locate usages in the FilePersistence class (methods _read_record_cached and
_clear_cache) and ensure thread-safety if needed (e.g., simple locking) when
accessing the cache.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/snakemake/cli.py`:
- Around line 1579-1585: The CLI help for the --persistence-backend-db-url
argument currently includes a password-bearing example DSN; update the help text
in the group_behavior.add_argument call for "--persistence-backend-db-url" to
remove the password from the example (e.g., use "postgresql://user@host/db" or
"postgresql://host/db") and add a short note directing users to provide
credentials via profiles or environment-based configuration instead of passing
secrets on the command line.

In `@src/snakemake/persistence/__init__.py`:
- Around line 173-185: The cleanup code that wraps shutil.rmtree and Path.unlink
currently raises WorkflowError without chaining, losing the original exception;
modify the exception handlers around the rmtree call (in the block handling
removal of conda env at variable address) and the yaml unlink block
(yaml_path.unlink()) to catch OSError (or Exception if broader) and re-raise
WorkflowError using exception chaining (raise WorkflowError(f"...: {e}") from e)
so the original filesystem exception and traceback are preserved.

In `@src/snakemake/persistence/db.py`:
- Around line 142-157: The code eagerly creates a stub MetadataRecordORM in
_mark_incomplete which leaves behind empty rows when metadata retention is
disabled; update _mark_incomplete to avoid creating a new MetadataRecordORM when
only marking incomplete (i.e., only session.get(MetadataRecordORM, key) and set
incomplete/external_jobid on existing record, do not instantiate a new
MetadataRecordORM), and also harden _unmark_incomplete to remove stub records
created solely for the incomplete marker (detect records with
record_format_version == 0 and no provenance/metadata fields and call
session.delete(record) instead of clearing incomplete) so
PersistenceBase.finished() / has_metadata() no longer observes dropped metadata.

In `@src/snakemake/persistence/file.py`:
- Around line 240-259: _write_locks currently overwrites
self._lockfile[lock_type] so repeated calls orphan earlier lock files; change
the storage to keep multiple files per type (e.g., make
self._lockfile[lock_type] a list) and append each new lockfile instead of
assigning, using something like self._lockfile.setdefault(lock_type,
[]).append(lockfile) in _write_locks; update _delete_locks to iterate all stored
paths for each lock_type (handle lists of paths, remove each file while keeping
the same OSError handling) and then clear the mapping.
- Around line 138-152: _in _io_delete_, recdirs is computed as a path relative
to subject then passed to os.removedirs which treats paths relative to cwd;
change to call os.removedirs with the actual directory path of the removed
record (use os.path.dirname(recpath) or an absolute path derived from recpath)
instead of the relative recdirs, and keep the existing check so you only call
removedirs when the directory isn't "." to avoid removing unexpected
directories.

In `@src/snakemake/workflow.py`:
- Around line 917-924: The persistence selection currently passes a None db_url
and silently accepts unknown backends; update the code in the block that checks
self.workflow_settings.persistence_backend so that if the backend == "db" you
only include "db_url" in persistence_kwargs when
self.workflow_settings.persistence_backend_db_url is truthy (do not forward
None), and if the backend value is neither "db" nor "file" raise WorkflowError;
ensure you set persistence = DbPersistence (with persistence_kwargs possibly
empty) for "db" and persistence = FilePersistence for the explicit "file"
backend.

---

Nitpick comments:
In `@src/snakemake/persistence/file.py`:
- Around line 154-172: In _io_read remove the redundant recomputation of path:
compute path once by calling self._record_path(subject, id) at the top of the
function and reuse that variable for the os.path.exists check and the subsequent
open(), i.e. delete the second call to self._record_path(subject, id) so the
function uses the single path variable throughout.
- Around line 178-185: The `@lru_cache` on the instance method _read_record_cached
retains references to self and can leak FilePersistence instances; replace it
with an instance-level cache (e.g. self._read_record_cache dict) used inside
_read_record_cached to store/return results keyed by the method's key (and
include self._metadata_path if records vary by path), and update/extend the
existing _clear_cache() to clear that dict; locate usages in the FilePersistence
class (methods _read_record_cached and _clear_cache) and ensure thread-safety if
needed (e.g., simple locking) when accessing the cache.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 82593f8e-10d6-443e-a3ba-1edaad4bc1e1

📥 Commits

Reviewing files that changed from the base of the PR and between b88171c and e757701.

⛔ Files ignored due to path filters (1)

pyproject.toml is excluded by !pyproject.toml

📒 Files selected for processing (10)

src/snakemake/cli.py
src/snakemake/persistence.py
src/snakemake/persistence/__init__.py
src/snakemake/persistence/db.py
src/snakemake/persistence/file.py
src/snakemake/settings/types.py
src/snakemake/workflow.py
tests/common.py
tests/test_persistence.py
tests/tests.py

💤 Files with no reviewable changes (1)

src/snakemake/persistence.py

src/snakemake/cli.py

src/snakemake/persistence/__init__.py

src/snakemake/persistence/db.py

src/snakemake/persistence/file.py

src/snakemake/workflow.py

github-actions · 2026-03-11T08:55:37Z

Please format your code with pixi run format

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

src/snakemake/persistence/__init__.py (1)

483-517: Compute input checksums once per job.

These digests depend only on job.input, so the current placement repeats the same checksum I/O once for every output file.

Suggested refactor

+            checksums = {
+                str(infile): await infile.checksum(self.max_checksum_file_size)
+                for infile in job.input
+            }
+            checksums = {k: v for k, v in checksums.items() if v is not None}
+
             for f in job.output:
                 key = self._get_key(f)
                 endtime = (
                     (await f.mtime()).local_or_storage()
                     if await f.exists()
                     else fallback_time
                 )
-
-                checksums = {
-                    str(infile): await infile.checksum(self.max_checksum_file_size)
-                    for infile in job.input
-                }
-                checksums = {k: v for k, v in checksums.items() if v is not None}
 
                 record = self._read_record(key) or MetadataRecord()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/snakemake/persistence/__init__.py` around lines 483 - 517, The loop over
"for f in job.output" recomputes input file checksums for every output; move the
checksum collection out of the loop and compute it once per job by evaluating
the dict comprehension that builds "checksums" (based on job.input and
self.max_checksum_file_size) before the loop, filter None values into
"input_checksums", and then inside the loop assign record.input_checksums =
input_checksums; adjust placement so symbols _get_key, _read_record,
_write_record, MetadataRecord, RECORD_FORMAT_VERSION, and any uses of
endtime/fallback_time remain unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/snakemake/persistence/db.py`:
- Around line 112-164: _read_record_cached is currently unbounded and not
invalidated on mutations; add a bounded cache and ensure mutations clear it.
Change `@lru_cache`(maxsize=None) on _read_record_cached to a bounded size like
`@lru_cache`(maxsize=1024) (or another appropriate maxsize for your workload), and
call self._clear_cache() at the end of each mutating method (_write_record,
_delete_record, _mark_incomplete, _unmark_incomplete) so the cached results are
invalidated after inserts/updates/deletes; keep the existing _clear_cache
implementation that calls self._read_record_cached.cache_clear().
- Around line 76-98: The DbPersistence __init__ currently ignores the provided
path when db_url is omitted; change the logic so if db_url is None (or omitted)
you construct the sqlite URL from the instance path: compute db_root = self.path
or Path.cwd(), db_file = (Path(db_root) / ".snakemake" /
"metadata.db").resolve(), then set self.engine =
create_engine(f"sqlite:///{db_file}") so the metadata DB lives under the
provided persistence root; reference DbPersistence.__init__, the db_url
parameter, and the self.engine creation when making this change.
- Around line 70-73: The current LockORM/table design is fine but _write_locks()
must not use session.merge() (an upsert) because that allows two concurrent
processes to overwrite each other's locks; change _write_locks() to attempt pure
inserts of new LockORM(...) rows (e.g., session.add_all or individual
session.add) and commit, catching sqlalchemy.exc.IntegrityError to detect
existing rows, then rollback and surface a lock-acquisition failure back to
PersistenceBase.lock(); remove/replace session.merge() calls so any duplicate
primary-key insert fails rather than upserting.

---

Nitpick comments:
In `@src/snakemake/persistence/__init__.py`:
- Around line 483-517: The loop over "for f in job.output" recomputes input file
checksums for every output; move the checksum collection out of the loop and
compute it once per job by evaluating the dict comprehension that builds
"checksums" (based on job.input and self.max_checksum_file_size) before the
loop, filter None values into "input_checksums", and then inside the loop assign
record.input_checksums = input_checksums; adjust placement so symbols _get_key,
_read_record, _write_record, MetadataRecord, RECORD_FORMAT_VERSION, and any uses
of endtime/fallback_time remain unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e5300a9b-ac8a-4fe3-84a5-0100357dda73

📥 Commits

Reviewing files that changed from the base of the PR and between e757701 and 61a2bc7.

📒 Files selected for processing (3)

src/snakemake/persistence/__init__.py
src/snakemake/persistence/db.py
src/snakemake/workflow.py

src/snakemake/persistence/db.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

src/snakemake/persistence/db.py (2)
195-199: ⚠️ Potential issue | 🔴 Critical

Use plain inserts for lock rows.

With file_path as the primary key, session.merge() turns duplicate lock acquisition into overwrite-or-insert. A concurrent process can therefore commit an already-held lock instead of surfacing contention. Use insert-only writes and fail on duplicates.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/snakemake/persistence/db.py` around lines 195 - 199, The _write_locks
method currently uses session.merge(LockORM(...)) which performs upsert
semantics and can overwrite an existing lock; change it to perform insert-only
writes so a duplicate primary-key insertion raises an IntegrityError and
surfaces contention. Replace session.merge(...) with session.add(...) /
session.add_all(...) (creating LockORM instances for each key) and let
session.commit() raise on duplicates, or explicitly catch
sqlalchemy.exc.IntegrityError around the commit in _write_locks to translate it
into a clear lock-contention error; refer to the _write_locks method, LockORM
model and Session/engine when making the change.
112-164: ⚠️ Potential issue | 🟠 Major

Bound and invalidate the metadata read cache.

_read_record_cached() can retain None or stale rows after _write_record(), _delete_record(), _mark_incomplete(), or _unmark_incomplete() because none of those paths clear it. maxsize=None also lets large workflows keep every touched record in memory.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/snakemake/persistence/db.py` around lines 112 - 164, _read_record_cached
uses lru_cache(maxsize=None) and is never invalidated, so writes/deletes/marks
leave stale or None results cached and unbounded cache can grow unbounded;
change the decorator to a bounded size (e.g., `@lru_cache`(maxsize=1024)) and call
self._clear_cache() at the end of _write_record, _delete_record (before return),
_mark_incomplete, and _unmark_incomplete to invalidate the cache after any
mutation so subsequent _read_record calls see fresh data.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/snakemake/persistence/db.py`:
- Around line 166-186: Both _filter_incomplete_keys and _get_external_jobids
build a single large IN(...) which can exceed SQLite bind limits; change each to
process keys in chunks (e.g., chunk_size = 1000) by iterating over chunks of
keys_list, issuing a separate select per chunk (same where predicates:
MetadataRecordORM.target.in_(chunk) + the existing incomplete/external_jobid
filters), collecting and unioning results into the returned set, and avoid early
returns so empty input still returns set(). Ensure you deduplicate across chunks
and use Session(self.engine) for the batched queries in the same way they are
currently opened.

---

Duplicate comments:
In `@src/snakemake/persistence/db.py`:
- Around line 195-199: The _write_locks method currently uses
session.merge(LockORM(...)) which performs upsert semantics and can overwrite an
existing lock; change it to perform insert-only writes so a duplicate
primary-key insertion raises an IntegrityError and surfaces contention. Replace
session.merge(...) with session.add(...) / session.add_all(...) (creating
LockORM instances for each key) and let session.commit() raise on duplicates, or
explicitly catch sqlalchemy.exc.IntegrityError around the commit in _write_locks
to translate it into a clear lock-contention error; refer to the _write_locks
method, LockORM model and Session/engine when making the change.
- Around line 112-164: _read_record_cached uses lru_cache(maxsize=None) and is
never invalidated, so writes/deletes/marks leave stale or None results cached
and unbounded cache can grow unbounded; change the decorator to a bounded size
(e.g., `@lru_cache`(maxsize=1024)) and call self._clear_cache() at the end of
_write_record, _delete_record (before return), _mark_incomplete, and
_unmark_incomplete to invalidate the cache after any mutation so subsequent
_read_record calls see fresh data.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8cc08e0b-c7a9-4e53-bcc8-280320cb9ea3

📥 Commits

Reviewing files that changed from the base of the PR and between 61a2bc7 and 55c0b5b.

📒 Files selected for processing (1)

src/snakemake/persistence/db.py

src/snakemake/persistence/db.py

…URL cli options

src/snakemake/settings/types.py

src/snakemake/cli.py

src/snakemake/persistence/db.py

cademirch · 2026-03-12T11:14:42Z

One immediate request without yet having checked the code: we should include here a test case that ensures that when running with a remote executor (or when having a spawned local job with a run directive), only the main process writes to persistence.

I tried running two of the suggested tests (and just looked at os.getpid and threading.get_ident at the time when _write_record is called) and they all report the same pid and tid, so not sure that's helpful? Or I didn't run them correctly…

A quick test to just spawn a couple hundred workers with 1000 jobs each seems to be unproblematic though, sqlite doesn't run into any issues with that, though may not be the case on network filesystems, which we cannot reasonably test.

To add, in my old testing I ran into issues with subprocesses writing to persistence. It shouldn't occur now, but any workflow with run directive will use subprocesses so worth running those tests as well

… side

tedil · 2026-03-13T10:37:23Z

I'll be adding one more test, sorry ^^

tedil · 2026-03-13T11:04:06Z

@cademirch added a test that uses run:, not sure it works well in CI (cores=20 will probably get scaled down, right?), but at least covers potential issues a tiny little bit.
@johanneskoester please review most recent changes (which also fixed a nasty "whoops" where I forgot to update the cli default to use the enum… ^^)

🤖 I have created a release *beep* *boop* --- ## [9.17.0](v9.16.3...v9.17.0) (2026-03-13) ### Features * Allow storing snakemake metadata in files or databases ([#4012](#4012)) ([dd75f31](dd75f31)) * Allow to specify comparison command per-unit test ([#3956](#3956)) ([b88171c](b88171c)) * job table orderd topological when run is started ([#4018](#4018)) ([75cf506](75cf506)) * lambda functions for priority in rules ([#3253](#3253)) ([d2aa226](d2aa226)) * Make on... directive of modules accessible ([#4050](#4050)) ([e9f2e1c](e9f2e1c)) ### Bug Fixes * adjust conda tests to not fail on apple silicon; fix [#4040](#4040) ([#4049](#4049)) ([f5b0142](f5b0142)) * allow "--containerize apptainer" to output apptainer format instead of dockerfile ([#4030](#4030)) ([f5cac30](f5cac30)) * apptainer command not recognized when singularity is absent ([#4010](#4010)) ([b8162e2](b8162e2)) * capture stderr when tests fail ([#3995](#3995)) ([97d74ba](97d74ba)) * **docs:** make Data-dependent conditional execution a complete example ([#4043](#4043)) ([3a1d7f2](3a1d7f2)) * don't build the DAG when running unlock. Fixes [#4000](#4000) and [#198](#198) ([#4007](#4007)) ([acf79fd](acf79fd)) * Ensure pixi tasks may be run as advertised ([#4046](#4046)) ([88253c2](88253c2)) * fix checkpoint handling corner cases ([#3870](#3870) and [#3559](#3559)) ([#4015](#4015)) ([63f4257](63f4257)) * issue 3642 ([#4054](#4054)) ([76e6fc2](76e6fc2)) * issue 3815 ([#4026](#4026)) ([b0eec96](b0eec96)) * logging None in shellcmd context causes error ([#4064](#4064)) ([d0652cd](d0652cd)) * lookup function returns default value for empty DataFrame queries ([#4056](#4056)) ([f71de97](f71de97)) * make `cache: omit-software` a rule specific property ([#4085](#4085)) ([034a9e7](034a9e7)) * reduce number of tests leaving temporary files behind ([#4033](#4033)) ([a3a1c97](a3a1c97)) * regression in dynamic resource handling ([#4038](#4038)) ([f2c554a](f2c554a)) * somewhat shorter announce message ([#4080](#4080)) ([57efc71](57efc71)) ### Performance Improvements * switch reretry with tenacity; decouple container classes (with Python 3.7 compat for old scripts) from rest of the codebase (enabling moving to newer python versions) ([#4032](#4032)) ([ffb19e7](ffb19e7)) ### Documentation * Add AI-assisted contributions policy to contributing guidelines ([#4051](#4051)) ([dd70526](dd70526)) * **codebase:** Update & simplify plugin architecture section ([#4052](#4052)) ([176cf63](176cf63)) * Correct workflow.source_path() description in documentation ([#4036](#4036)) ([45883c5](45883c5)) * fixed wrong code example for collect() function ([#4037](#4037)) ([5c85ed8](5c85ed8)) * Minor docs improvements ([#4089](#4089)) ([29ea226](29ea226)) * switch to sphinx_design for tabs ([#3976](#3976)) ([9674614](9674614)) * typo in the migration table breaking a pip install command ([#4024](#4024)) ([66f9dda](66f9dda)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

tedil requested a review from cademirch March 9, 2026 15:16

cademirch reviewed Mar 9, 2026

View reviewed changes

src/snakemake/persistence_interface.py Outdated Show resolved Hide resolved

tedil changed the title ~~feat: Introduce AbstractPersistence to allow separating persistence backends later on~~ feat: Introduce PersistenceBase to allow separate persistence backends Mar 10, 2026

tedil changed the title ~~feat: Introduce PersistenceBase to allow separate persistence backends~~ feat: Add --persistence-backend [file|db] and --persistence-backend-db-url URL to allow storing snakemake metadata in files or databases Mar 10, 2026

tedil changed the title ~~feat: Add --persistence-backend [file|db] and --persistence-backend-db-url URL to allow storing snakemake metadata in files or databases~~ feat: Allow storing snakemake metadata in files or databases Mar 10, 2026

tedil marked this pull request as ready for review March 10, 2026 14:55

tedil requested a review from johanneskoester as a code owner March 10, 2026 14:55

tedil added this to Snakemake Hackathon 2026 Mar 10, 2026

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

src/snakemake/persistence/db.py Outdated Show resolved Hide resolved

src/snakemake/persistence/db.py Show resolved Hide resolved

src/snakemake/persistence/db.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

src/snakemake/persistence/db.py Outdated Show resolved Hide resolved

tedil added 14 commits March 11, 2026 10:44

initial refactoring, wip

dfd45b1

refactor persistence into its own module

ec9a40b

fix missing import

62d1e5c

introduce db backend, add sqlalchemy dependency (wip)

762dc9b

moar shuffling around

027696d

some auto generated docstrings

0f2ce9f

re-instate comments lost during refactoring

70682cf

add --persistence-backend [file|db] and --persistence-backend-db-url …

e8cbf32

…URL cli options

add/modify test to cover both file and db persistence backend

8617118

use raise … from

05690cf

don't set db_url to None

b4e39c1

correctly handle unmark incomplete in db backend

b632bdd

fmt

e1d4283

fix incorrect recdir path for removedirs

b2ac9d4

johanneskoester requested changes Mar 12, 2026

View reviewed changes

src/snakemake/settings/types.py Outdated Show resolved Hide resolved

src/snakemake/cli.py Outdated Show resolved Hide resolved

src/snakemake/persistence/db.py Outdated Show resolved Hide resolved

tedil added 4 commits March 12, 2026 11:45

use sqlmodel instead of sqlalchemy+dataclasses

b28c5a4

add sqlmodel to dependencies

7fde3dc

walrus := yay

6a008f2

introduce persistence backend enum for cli

2df7ae9

tedil added 9 commits March 13, 2026 08:57

rename PersistenceBackend.DATABASE to DB

7439fa3

use latency wait as busy timeout just to be on the slightly more sure…

0da40cd

… side

added small note about sqlite on networked file systems

6622e4f

mention busy_timeout and latency_wait

9147ae5

add fallback match arm

278633f

use enum in test run fn

a08b9e1

adapt test to sqlmodel

e9ddad2

simplify metadata record building in test

4750d73

fix test

45318da

johanneskoester approved these changes Mar 13, 2026

View reviewed changes

Merge branch 'main' into persistence-refactor

3b4991e

johanneskoester enabled auto-merge (squash) March 13, 2026 10:31

tedil disabled auto-merge March 13, 2026 10:37

tedil added 3 commits March 13, 2026 12:02

fix persistencebackend default in cli

934b1c2

more potential options for concurrent access

174eea2

add test for run directive + concurrent execution

52c7fbd

tedil merged commit dd75f31 into main Mar 13, 2026
59 checks passed

tedil deleted the persistence-refactor branch March 13, 2026 11:43

github-project-automation bot moved this from In progress to Done in Snakemake Hackathon 2026 Mar 13, 2026

snakemake-bot mentioned this pull request Mar 13, 2026

chore(main): release 9.17.0 #3977

Merged

cademirch mentioned this pull request Mar 27, 2026

Millions of metadata files overload IO of cluster #275

Closed

Conversation

tedil commented Mar 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

cademirch left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tedil commented Mar 10, 2026

Uh oh!

tedil commented Mar 10, 2026

Uh oh!

tedil commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cademirch commented Mar 12, 2026

Uh oh!

tedil commented Mar 13, 2026

Uh oh!

tedil commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tedil commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

cademirch left a comment •

edited

Loading

tedil commented Mar 10, 2026 •

edited

Loading

tedil commented Mar 13, 2026 •

edited

Loading