Skip to content

feat: job table orderd topological when run is started#4018

Merged
johanneskoester merged 12 commits intosnakemake:mainfrom
SrPeter128:new_job_table_order
Mar 11, 2026
Merged

feat: job table orderd topological when run is started#4018
johanneskoester merged 12 commits intosnakemake:mainfrom
SrPeter128:new_job_table_order

Conversation

@SrPeter128
Copy link
Copy Markdown
Contributor

@SrPeter128 SrPeter128 commented Mar 9, 2026

This PR changes the sorting of the job table that is created at the beginning of a run. The jobs displayed here are no longer sorted alphabetically, but in the order in which they are sorted in the DAG (i.e., in the order in which they should be calculated). If there are many jobs in a level of the DAG, they are sorted alphabetically in the table.

This could close:
Fixes #1767
Fixes #3922

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).
    --> I haven't found anywhere here that explicitly mentions this table. If necessary, I will add it.

Summary by CodeRabbit

  • Bug Fixes

    • Workflow statistics now deduplicate job entries, aggregate per-job counts correctly, present entries in a stable topological order, and include an accurate total row — fixing prior ordering and count inaccuracies.
  • Tests

    • Added an example workflow and a test that performs a dry-run, generates the stats table, and verifies the exact topological order and counts for each entry and the total.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaced alphabetical per-rule job counting with a topologically ordered aggregation for DAG stats, moved tabulate import to module level, and added a test Snakefile plus a unit test verifying ordered stats and counts.

Changes

Cohort / File(s) Summary
DAG stats logic
src/snakemake/dag.py
Moved tabulate import to module scope. Replaced alphabetical per-rule counting with a topological traversal: compute relevant_jobs = needrun ∪ finished, traverse topologically (sorting within levels) to build an ordered unique job list, count occurrences in that order, deduplicate rows, add a total row, and update message generation.
Tests — workflow & assertions
tests/test_issue3922/Snakefile, tests/tests.py
Added a Snakefile defining rules (all, a, d, b, c, e) producing multiple c outputs. Added test_stats_table_order_and_counts() which runs a dry-run, parses the stats table via regex, and asserts topological order and counts (a:1, b:1, c:3, d:1, e:1, total:8).

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant DAG as DAG (src/snakemake/dag.py)
    participant Topo as Toposort traversal
    participant Counter as Counter/Rows builder
    participant Tab as Tabulate/Formatter
    CLI->>DAG: request stats (needrun, finished)
    DAG->>Topo: compute ordered job list from relevant_jobs
    Topo-->>DAG: ordered unique jobs
    DAG->>Counter: count occurrences per ordered job, add total
    Counter-->>Tab: rows and stats_dict
    Tab-->>CLI: formatted stats table output
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: reordering the job table to follow topological order rather than alphabetical, which aligns with the PR's primary objective.
Description check ✅ Passed The description clearly explains the change (job table now sorted topologically instead of alphabetically), references relevant issues, and confirms test coverage. One QC item is unchecked but acknowledged.
Linked Issues check ✅ Passed The PR successfully implements the core requirements from both issues #1767 and #3922: displaying the job summary table in topological (execution) order rather than alphabetical/lexicographical order.
Out of Scope Changes check ✅ Passed All changes are directly related to the PR objective: modifying DAG job ordering logic, updating the test Snakefile to validate correct ordering, and adding a test to verify the new topological ordering behavior.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@SrPeter128 SrPeter128 changed the title New job table order when job is started feat: New job table order when job is started Mar 9, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/snakemake/dag.py`:
- Around line 3197-3208: The current loop builds orderd_jobs by iterating
self.toposorted() (all jobs) which diverges from the filtered set used to
compute total_count and also uses repeated orderd_jobs.count(...) calls making
it quadratic; change the logic in the block that constructs
orderd_jobs/deduplicated_job_list so it only iterates the same filtered job set
used for total_count (e.g., the needrun/finished filtered list) instead of
self.toposorted(), and replace repeated .count(...) calls with a single pass
counter (e.g., collections.Counter) to compute counts once, then populate rows
and stats_dict from that counter; update references to orderd_jobs,
deduplicated_job_list, rows, and stats_dict accordingly so ordering and counts
match the filtered jobs.

In `@tests/tests.py`:
- Around line 2978-2981: The except block that currently does "assert False"
should instead raise an AssertionError with a clear message and chain the
original ValueError for context: catch the ValueError from names.index(expected,
idx) and raise AssertionError(f"Expected rule '{expected}' not found in stats
output") from the caught ValueError so the original traceback is preserved (look
for the try/except around names.index(expected, idx) using variables idx, names,
expected).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f536ffb4-8a0d-4b6c-b210-4ca141f24f0a

📥 Commits

Reviewing files that changed from the base of the PR and between b88171c and a47415d.

📒 Files selected for processing (3)
  • src/snakemake/dag.py
  • tests/test_issue3922/Snakefile
  • tests/tests.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/tests.py`:
- Line 2952: Replace the bare open call with a context manager to ensure the
file handle is closed: change the line that assigns txt from open(stats_path,
"r", encoding="utf-8").read() to using with open(stats_path, "r",
encoding="utf-8") as f: txt = f.read() so the file is properly closed after
reading.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a287f8ad-c58d-4b2e-80d9-25697e32827e

📥 Commits

Reviewing files that changed from the base of the PR and between a47415d and 40dde5f.

📒 Files selected for processing (2)
  • src/snakemake/dag.py
  • tests/tests.py

@SrPeter128 SrPeter128 changed the title feat: New job table order when job is started feat: job table orderd topological when job is started Mar 9, 2026
@SrPeter128 SrPeter128 changed the title feat: job table orderd topological when job is started feat: job table orderd topological when run is started Mar 9, 2026
@johanneskoester johanneskoester enabled auto-merge (squash) March 11, 2026 09:02
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/tests.py (1)

2957-3007: Well-implemented test for topological ordering and counts.

The past review issues have been addressed:

  • File handle now properly uses a context manager (line 2968-2969)
  • Exception chaining correctly preserves the original ValueError context (lines 2996-2999)

The test logic correctly verifies that the expected rules appear as a subsequence in topological order.

Optional improvement: Consider cleaning up the temp directory on successful test completion to avoid accumulating leftover directories in CI environments. Other tests in this file follow a pattern of manual cleanup when using cleanup=False:

♻️ Optional cleanup suggestion
 def test_stats_table_order_and_counts():
     # Run snakemake in the example dir and ask it to write stats.txt
     outdir = run(
         dpath("test_issue3922"),
         shellcmd="snakemake --dryrun > stats.txt 2>&1",
         check_results=False,
         cleanup=False,
     )
     stats_path = os.path.join(outdir, "stats.txt")
     assert os.path.exists(stats_path), "stats.txt was not created"
 
-    with open(stats_path, "r", encoding="utf-8") as f:
-        txt = f.read()
+    try:
+        with open(stats_path, "r", encoding="utf-8") as f:
+            txt = f.read()
     # ... rest of test logic ...
+    finally:
+        shutil.rmtree(outdir, ignore_errors=ON_WINDOWS)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/tests.py` around lines 2957 - 3007, The test
test_stats_table_order_and_counts leaves the temp output directory when run(...,
cleanup=False), which can accumulate artifacts in CI; update the test to remove
the created outdir on success (mirroring other tests) by calling the same
cleanup routine used elsewhere after the assertions (e.g., remove the directory
returned by run/dpath or call the project-specific cleanup helper), ensuring
cleanup runs only after all assertions pass and does not mask failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/tests.py`:
- Around line 2957-3007: The test test_stats_table_order_and_counts leaves the
temp output directory when run(..., cleanup=False), which can accumulate
artifacts in CI; update the test to remove the created outdir on success
(mirroring other tests) by calling the same cleanup routine used elsewhere after
the assertions (e.g., remove the directory returned by run/dpath or call the
project-specific cleanup helper), ensuring cleanup runs only after all
assertions pass and does not mask failures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 31e07bf0-5d4d-4620-9f65-65e77548a9a6

📥 Commits

Reviewing files that changed from the base of the PR and between c77bed7 and 47bfdb6.

📒 Files selected for processing (2)
  • src/snakemake/dag.py
  • tests/tests.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/snakemake/dag.py

@johanneskoester johanneskoester enabled auto-merge (squash) March 11, 2026 09:11
@johanneskoester johanneskoester merged commit 75cf506 into snakemake:main Mar 11, 2026
5 of 57 checks passed
johanneskoester pushed a commit that referenced this pull request Mar 13, 2026
🤖 I have created a release *beep* *boop*
---


##
[9.17.0](v9.16.3...v9.17.0)
(2026-03-13)


### Features

* Allow storing snakemake metadata in files or databases
([#4012](#4012))
([dd75f31](dd75f31))
* Allow to specify comparison command per-unit test
([#3956](#3956))
([b88171c](b88171c))
* job table orderd topological when run is started
([#4018](#4018))
([75cf506](75cf506))
* lambda functions for priority in rules
([#3253](#3253))
([d2aa226](d2aa226))
* Make on... directive of modules accessible
([#4050](#4050))
([e9f2e1c](e9f2e1c))


### Bug Fixes

* adjust conda tests to not fail on apple silicon; fix
[#4040](#4040)
([#4049](#4049))
([f5b0142](f5b0142))
* allow "--containerize apptainer" to output apptainer format instead of
dockerfile ([#4030](#4030))
([f5cac30](f5cac30))
* apptainer command not recognized when singularity is absent
([#4010](#4010))
([b8162e2](b8162e2))
* capture stderr when tests fail
([#3995](#3995))
([97d74ba](97d74ba))
* **docs:** make Data-dependent conditional execution a complete example
([#4043](#4043))
([3a1d7f2](3a1d7f2))
* don't build the DAG when running unlock. Fixes
[#4000](#4000) and
[#198](#198)
([#4007](#4007))
([acf79fd](acf79fd))
* Ensure pixi tasks may be run as advertised
([#4046](#4046))
([88253c2](88253c2))
* fix checkpoint handling corner cases
([#3870](#3870) and
[#3559](#3559))
([#4015](#4015))
([63f4257](63f4257))
* issue 3642
([#4054](#4054))
([76e6fc2](76e6fc2))
* issue 3815
([#4026](#4026))
([b0eec96](b0eec96))
* logging None in shellcmd context causes error
([#4064](#4064))
([d0652cd](d0652cd))
* lookup function returns default value for empty DataFrame queries
([#4056](#4056))
([f71de97](f71de97))
* make `cache: omit-software` a rule specific property
([#4085](#4085))
([034a9e7](034a9e7))
* reduce number of tests leaving temporary files behind
([#4033](#4033))
([a3a1c97](a3a1c97))
* regression in dynamic resource handling
([#4038](#4038))
([f2c554a](f2c554a))
* somewhat shorter announce message
([#4080](#4080))
([57efc71](57efc71))


### Performance Improvements

* switch reretry with tenacity; decouple container classes (with Python
3.7 compat for old scripts) from rest of the codebase (enabling moving
to newer python versions)
([#4032](#4032))
([ffb19e7](ffb19e7))


### Documentation

* Add AI-assisted contributions policy to contributing guidelines
([#4051](#4051))
([dd70526](dd70526))
* **codebase:** Update & simplify plugin architecture section
([#4052](#4052))
([176cf63](176cf63))
* Correct workflow.source_path() description in documentation
([#4036](#4036))
([45883c5](45883c5))
* fixed wrong code example for collect() function
([#4037](#4037))
([5c85ed8](5c85ed8))
* Minor docs improvements
([#4089](#4089))
([29ea226](29ea226))
* switch to sphinx_design for tabs
([#3976](#3976))
([9674614](9674614))
* typo in the migration table breaking a pip install command
([#4024](#4024))
([66f9dda](66f9dda))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Display jobs in job summary table in topological order Run-order -n summary (rather than alphabetical)

2 participants