Skip to content

fix: lookup function returns default value for empty DataFrame queries#4056

Merged
johanneskoester merged 2 commits intosnakemake:mainfrom
ManarRashad:fix/issue-4048-default_values_dataframes
Mar 11, 2026
Merged

fix: lookup function returns default value for empty DataFrame queries#4056
johanneskoester merged 2 commits intosnakemake:mainfrom
ManarRashad:fix/issue-4048-default_values_dataframes

Conversation

@ManarRashad
Copy link
Copy Markdown
Contributor

@ManarRashad ManarRashad commented Mar 11, 2026

Closes #4048

Problem

The lookup() function's default parameter was ignored when using query with a pandas DataFrame. If a query returned no matching rows, it returned an empty list [] instead of the provided default value. The default parameter only worked with dpath (dictionary lookups).

Solution

Added a check in the do_query function in lookup.py to return the default value when the query result is empty and a default is provided.

Changes

  • src/snakemake/ioutils/lookup.py — Added 2 lines to check for empty query results and return the default value
  • tests/test_ioutils/Snakefile — Added an assert to test that the default value is returned for empty DataFrame queries
  • docs/snakefiles/rules.rst — Updated documentation to reflect that the default argument now works for both dpath and query

QC

  • The changes are already covered by an existing test case test_ioutils. Added a new assert in tests/test_ioutils/Snakefile to test DataFrame default value behaviour.
  • The documentation (docs/) is updated: Updated docs/snakefiles/rules.rst to remove the note that default is ignored for queries.

Summary by CodeRabbit

  • New Features

    • Lookup function now returns the provided default value when queries yield no results.
    • None is now a valid default value.
  • Documentation

    • Updated lookup function behavior documentation.
  • Tests

    • Added test coverage for lookup function with default value fallback.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 11, 2026

📝 Walkthrough

Walkthrough

The lookup function is fixed to return the default value when a DataFrame query yields no matching rows. The implementation adds a check to return the provided default instead of an empty list when results are empty. Documentation and tests are updated to reflect this corrected behavior.

Changes

Cohort / File(s) Summary
Documentation Update
docs/snakefiles/rules.rst
Updated documented behavior of lookup function to clarify that default values are returned when a query yields no matching rows, replacing prior phrasing that distinguished between dpath and query scenarios.
Implementation Fix
src/snakemake/ioutils/lookup.py
Added fallback logic to return the provided default value when a DataFrame query returns an empty result, inserting a check after query execution that bypasses further processing if result is empty and default is specified.
Test Addition
tests/test_ioutils/Snakefile
Added test assertion verifying that lookup returns the default value ("fallback") when a query on a DataFrame matches no rows.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing the lookup function to return default values for empty DataFrame queries.
Description check ✅ Passed The PR description includes all required sections: problem statement, solution, changes made, and completed QC checklist items confirming tests and documentation updates.
Linked Issues check ✅ Passed The PR implementation directly addresses issue #4048: it adds the check to return default values for empty DataFrame queries, updates documentation, and includes test coverage as required.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the lookup function's default parameter behavior for DataFrame queries as specified in issue #4048.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/test_ioutils/Snakefile (1)

11-11: Consider covering default=None for query lookups too.

This change depends on the NODEFAULT sentinel, so None is the edge case most worth locking down. Right now the new assertion only exercises a truthy fallback value.

🧪 Suggested test addition
 assert lookup(query="sample == 999", within=samples, default="fallback") == "fallback"
+assert lookup(query="sample == 999", within=samples, default=None) is None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_ioutils/Snakefile` at line 11, Add a test case to exercise the
None fallback path for lookup by asserting lookup(query="sample == 999",
within=samples, default=None) is None; also add a check for the sentinel
behavior (e.g., call lookup(query="sample == 999", within=samples) or with
default=NODEFAULT if your API exposes NODEFAULT) to ensure the function raises
or returns the sentinel as expected. Locate the lookup call in the test and
extend it with the two assertions referencing lookup and NODEFAULT to lock down
the None edge case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/snakefiles/rules.rst`:
- Around line 461-462: Update the documentation sentence about fallback behavior
to exclude is_nrows lookups: clarify that the described
default/LookupError/empty-list behavior applies to dpath and query lookups, but
not to lookups using the is_nrows flag (i.e., lookup(query=..., is_nrows=...)),
which returns the boolean row-count check before any empty-result fallback is
applied; modify the line referencing dpath/query fallback to explicitly state
this exception so readers know is_nrows behaves differently.

---

Nitpick comments:
In `@tests/test_ioutils/Snakefile`:
- Line 11: Add a test case to exercise the None fallback path for lookup by
asserting lookup(query="sample == 999", within=samples, default=None) is None;
also add a check for the sentinel behavior (e.g., call lookup(query="sample ==
999", within=samples) or with default=NODEFAULT if your API exposes NODEFAULT)
to ensure the function raises or returns the sentinel as expected. Locate the
lookup call in the test and extend it with the two assertions referencing lookup
and NODEFAULT to lock down the None edge case.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 095a6ba2-b6b4-423d-90d6-ee79bd9ef7d8

📥 Commits

Reviewing files that changed from the base of the PR and between 97d74ba and 0bf1da7.

📒 Files selected for processing (3)
  • docs/snakefiles/rules.rst
  • src/snakemake/ioutils/lookup.py
  • tests/test_ioutils/Snakefile

Copy link
Copy Markdown
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@johanneskoester johanneskoester merged commit f71de97 into snakemake:main Mar 11, 2026
61 checks passed
LKress pushed a commit to LKress/snakemake that referenced this pull request Mar 11, 2026
snakemake#4056)

Closes snakemake#4048 

### Problem
The `lookup()` function's `default` parameter was ignored when using
`query` with a pandas DataFrame. If a query returned no matching rows,
it returned an empty list `[]` instead of the provided default value.
The `default` parameter only worked with `dpath` (dictionary lookups).

  ### Solution
Added a check in the `do_query` function in `lookup.py` to return the
`default` value when the query result is empty and a default is
provided.

  ### Changes
- `src/snakemake/ioutils/lookup.py` — Added 2 lines to check for empty
query results and return the default value
- `tests/test_ioutils/Snakefile` — Added an assert to test that the
default value is returned for empty DataFrame queries
- `docs/snakefiles/rules.rst` — Updated documentation to reflect that
the `default` argument now works for both `dpath` and `query`

### QC
* [x] The changes are already covered by an existing test case
test_ioutils. Added a new assert in `tests/test_ioutils/Snakefile` to
test DataFrame default value behaviour.
* [x] The documentation (docs/) is updated: Updated
docs/snakefiles/rules.rst to remove the note that `default` is ignored
for queries.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Lookup function now returns the provided default value when queries
yield no results.
  * None is now a valid default value.

* **Documentation**
  * Updated lookup function behavior documentation.

* **Tests**
  * Added test coverage for lookup function with default value fallback.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
johanneskoester pushed a commit that referenced this pull request Mar 13, 2026
🤖 I have created a release *beep* *boop*
---


##
[9.17.0](v9.16.3...v9.17.0)
(2026-03-13)


### Features

* Allow storing snakemake metadata in files or databases
([#4012](#4012))
([dd75f31](dd75f31))
* Allow to specify comparison command per-unit test
([#3956](#3956))
([b88171c](b88171c))
* job table orderd topological when run is started
([#4018](#4018))
([75cf506](75cf506))
* lambda functions for priority in rules
([#3253](#3253))
([d2aa226](d2aa226))
* Make on... directive of modules accessible
([#4050](#4050))
([e9f2e1c](e9f2e1c))


### Bug Fixes

* adjust conda tests to not fail on apple silicon; fix
[#4040](#4040)
([#4049](#4049))
([f5b0142](f5b0142))
* allow "--containerize apptainer" to output apptainer format instead of
dockerfile ([#4030](#4030))
([f5cac30](f5cac30))
* apptainer command not recognized when singularity is absent
([#4010](#4010))
([b8162e2](b8162e2))
* capture stderr when tests fail
([#3995](#3995))
([97d74ba](97d74ba))
* **docs:** make Data-dependent conditional execution a complete example
([#4043](#4043))
([3a1d7f2](3a1d7f2))
* don't build the DAG when running unlock. Fixes
[#4000](#4000) and
[#198](#198)
([#4007](#4007))
([acf79fd](acf79fd))
* Ensure pixi tasks may be run as advertised
([#4046](#4046))
([88253c2](88253c2))
* fix checkpoint handling corner cases
([#3870](#3870) and
[#3559](#3559))
([#4015](#4015))
([63f4257](63f4257))
* issue 3642
([#4054](#4054))
([76e6fc2](76e6fc2))
* issue 3815
([#4026](#4026))
([b0eec96](b0eec96))
* logging None in shellcmd context causes error
([#4064](#4064))
([d0652cd](d0652cd))
* lookup function returns default value for empty DataFrame queries
([#4056](#4056))
([f71de97](f71de97))
* make `cache: omit-software` a rule specific property
([#4085](#4085))
([034a9e7](034a9e7))
* reduce number of tests leaving temporary files behind
([#4033](#4033))
([a3a1c97](a3a1c97))
* regression in dynamic resource handling
([#4038](#4038))
([f2c554a](f2c554a))
* somewhat shorter announce message
([#4080](#4080))
([57efc71](57efc71))


### Performance Improvements

* switch reretry with tenacity; decouple container classes (with Python
3.7 compat for old scripts) from rest of the codebase (enabling moving
to newer python versions)
([#4032](#4032))
([ffb19e7](ffb19e7))


### Documentation

* Add AI-assisted contributions policy to contributing guidelines
([#4051](#4051))
([dd70526](dd70526))
* **codebase:** Update & simplify plugin architecture section
([#4052](#4052))
([176cf63](176cf63))
* Correct workflow.source_path() description in documentation
([#4036](#4036))
([45883c5](45883c5))
* fixed wrong code example for collect() function
([#4037](#4037))
([5c85ed8](5c85ed8))
* Minor docs improvements
([#4089](#4089))
([29ea226](29ea226))
* switch to sphinx_design for tabs
([#3976](#3976))
([9674614](9674614))
* typo in the migration table breaking a pip install command
([#4024](#4024))
([66f9dda](66f9dda))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lookup function does not return default value with dataframes

2 participants