fix: lookup function returns default value for empty DataFrame queries#4056
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe lookup function is fixed to return the default value when a DataFrame query yields no matching rows. The implementation adds a check to return the provided default instead of an empty list when results are empty. Documentation and tests are updated to reflect this corrected behavior. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/test_ioutils/Snakefile (1)
11-11: Consider coveringdefault=Nonefor query lookups too.This change depends on the
NODEFAULTsentinel, soNoneis the edge case most worth locking down. Right now the new assertion only exercises a truthy fallback value.🧪 Suggested test addition
assert lookup(query="sample == 999", within=samples, default="fallback") == "fallback" +assert lookup(query="sample == 999", within=samples, default=None) is None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_ioutils/Snakefile` at line 11, Add a test case to exercise the None fallback path for lookup by asserting lookup(query="sample == 999", within=samples, default=None) is None; also add a check for the sentinel behavior (e.g., call lookup(query="sample == 999", within=samples) or with default=NODEFAULT if your API exposes NODEFAULT) to ensure the function raises or returns the sentinel as expected. Locate the lookup call in the test and extend it with the two assertions referencing lookup and NODEFAULT to lock down the None edge case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/snakefiles/rules.rst`:
- Around line 461-462: Update the documentation sentence about fallback behavior
to exclude is_nrows lookups: clarify that the described
default/LookupError/empty-list behavior applies to dpath and query lookups, but
not to lookups using the is_nrows flag (i.e., lookup(query=..., is_nrows=...)),
which returns the boolean row-count check before any empty-result fallback is
applied; modify the line referencing dpath/query fallback to explicitly state
this exception so readers know is_nrows behaves differently.
---
Nitpick comments:
In `@tests/test_ioutils/Snakefile`:
- Line 11: Add a test case to exercise the None fallback path for lookup by
asserting lookup(query="sample == 999", within=samples, default=None) is None;
also add a check for the sentinel behavior (e.g., call lookup(query="sample ==
999", within=samples) or with default=NODEFAULT if your API exposes NODEFAULT)
to ensure the function raises or returns the sentinel as expected. Locate the
lookup call in the test and extend it with the two assertions referencing lookup
and NODEFAULT to lock down the None edge case.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 095a6ba2-b6b4-423d-90d6-ee79bd9ef7d8
📒 Files selected for processing (3)
docs/snakefiles/rules.rstsrc/snakemake/ioutils/lookup.pytests/test_ioutils/Snakefile
snakemake#4056) Closes snakemake#4048 ### Problem The `lookup()` function's `default` parameter was ignored when using `query` with a pandas DataFrame. If a query returned no matching rows, it returned an empty list `[]` instead of the provided default value. The `default` parameter only worked with `dpath` (dictionary lookups). ### Solution Added a check in the `do_query` function in `lookup.py` to return the `default` value when the query result is empty and a default is provided. ### Changes - `src/snakemake/ioutils/lookup.py` — Added 2 lines to check for empty query results and return the default value - `tests/test_ioutils/Snakefile` — Added an assert to test that the default value is returned for empty DataFrame queries - `docs/snakefiles/rules.rst` — Updated documentation to reflect that the `default` argument now works for both `dpath` and `query` ### QC * [x] The changes are already covered by an existing test case test_ioutils. Added a new assert in `tests/test_ioutils/Snakefile` to test DataFrame default value behaviour. * [x] The documentation (docs/) is updated: Updated docs/snakefiles/rules.rst to remove the note that `default` is ignored for queries. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Lookup function now returns the provided default value when queries yield no results. * None is now a valid default value. * **Documentation** * Updated lookup function behavior documentation. * **Tests** * Added test coverage for lookup function with default value fallback. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
🤖 I have created a release *beep* *boop* --- ## [9.17.0](v9.16.3...v9.17.0) (2026-03-13) ### Features * Allow storing snakemake metadata in files or databases ([#4012](#4012)) ([dd75f31](dd75f31)) * Allow to specify comparison command per-unit test ([#3956](#3956)) ([b88171c](b88171c)) * job table orderd topological when run is started ([#4018](#4018)) ([75cf506](75cf506)) * lambda functions for priority in rules ([#3253](#3253)) ([d2aa226](d2aa226)) * Make on... directive of modules accessible ([#4050](#4050)) ([e9f2e1c](e9f2e1c)) ### Bug Fixes * adjust conda tests to not fail on apple silicon; fix [#4040](#4040) ([#4049](#4049)) ([f5b0142](f5b0142)) * allow "--containerize apptainer" to output apptainer format instead of dockerfile ([#4030](#4030)) ([f5cac30](f5cac30)) * apptainer command not recognized when singularity is absent ([#4010](#4010)) ([b8162e2](b8162e2)) * capture stderr when tests fail ([#3995](#3995)) ([97d74ba](97d74ba)) * **docs:** make Data-dependent conditional execution a complete example ([#4043](#4043)) ([3a1d7f2](3a1d7f2)) * don't build the DAG when running unlock. Fixes [#4000](#4000) and [#198](#198) ([#4007](#4007)) ([acf79fd](acf79fd)) * Ensure pixi tasks may be run as advertised ([#4046](#4046)) ([88253c2](88253c2)) * fix checkpoint handling corner cases ([#3870](#3870) and [#3559](#3559)) ([#4015](#4015)) ([63f4257](63f4257)) * issue 3642 ([#4054](#4054)) ([76e6fc2](76e6fc2)) * issue 3815 ([#4026](#4026)) ([b0eec96](b0eec96)) * logging None in shellcmd context causes error ([#4064](#4064)) ([d0652cd](d0652cd)) * lookup function returns default value for empty DataFrame queries ([#4056](#4056)) ([f71de97](f71de97)) * make `cache: omit-software` a rule specific property ([#4085](#4085)) ([034a9e7](034a9e7)) * reduce number of tests leaving temporary files behind ([#4033](#4033)) ([a3a1c97](a3a1c97)) * regression in dynamic resource handling ([#4038](#4038)) ([f2c554a](f2c554a)) * somewhat shorter announce message ([#4080](#4080)) ([57efc71](57efc71)) ### Performance Improvements * switch reretry with tenacity; decouple container classes (with Python 3.7 compat for old scripts) from rest of the codebase (enabling moving to newer python versions) ([#4032](#4032)) ([ffb19e7](ffb19e7)) ### Documentation * Add AI-assisted contributions policy to contributing guidelines ([#4051](#4051)) ([dd70526](dd70526)) * **codebase:** Update & simplify plugin architecture section ([#4052](#4052)) ([176cf63](176cf63)) * Correct workflow.source_path() description in documentation ([#4036](#4036)) ([45883c5](45883c5)) * fixed wrong code example for collect() function ([#4037](#4037)) ([5c85ed8](5c85ed8)) * Minor docs improvements ([#4089](#4089)) ([29ea226](29ea226)) * switch to sphinx_design for tabs ([#3976](#3976)) ([9674614](9674614)) * typo in the migration table breaking a pip install command ([#4024](#4024)) ([66f9dda](66f9dda)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Closes #4048
Problem
The
lookup()function'sdefaultparameter was ignored when usingquerywith a pandas DataFrame. If a query returned no matching rows, it returned an empty list[]instead of the provided default value. Thedefaultparameter only worked withdpath(dictionary lookups).Solution
Added a check in the
do_queryfunction inlookup.pyto return thedefaultvalue when the query result is empty and a default is provided.Changes
src/snakemake/ioutils/lookup.py— Added 2 lines to check for empty query results and return the default valuetests/test_ioutils/Snakefile— Added an assert to test that the default value is returned for empty DataFrame queriesdocs/snakefiles/rules.rst— Updated documentation to reflect that thedefaultargument now works for bothdpathandqueryQC
tests/test_ioutils/Snakefileto test DataFrame default value behaviour.defaultis ignored for queries.Summary by CodeRabbit
New Features
Documentation
Tests