feat: Add extra input size properties#2424
Conversation
|
Kudos, SonarCloud Quality Gate passed!
|
|
Kudos, SonarCloud Quality Gate passed!
|
|
|
|
📝 Walkthrough📝 WalkthroughWalkthroughThe changes made in the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant InputFiles
participant FileSystem
User->>InputFiles: Request size
InputFiles->>FileSystem: Calculate individual file sizes
FileSystem-->>InputFiles: Return file sizes
InputFiles-->>User: Return total size
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
snakemake/io.py (3)
1754-1758: Consider using asyncio.gather for concurrent size retrieval.The current implementation awaits each file size sequentially. For better performance, consider using
asyncio.gatherto retrieve file sizes concurrently.- def size_files(self): - async def sizes(): - return [await f.size() for f in self] + def size_files(self): + async def sizes(): + return await asyncio.gather(*(f.size() for f in self))
1760-1771: Enhance clarity and maintainability of size conversion properties.Consider the following improvements:
- Use constants for size conversion factors
- Use floating-point division for more precise results
- Add docstrings to document the units
+ # Constants for size conversions + _BYTES_IN_KB = 1024.0 + _BYTES_IN_MB = _BYTES_IN_KB * 1024.0 + _BYTES_IN_GB = _BYTES_IN_MB * 1024.0 @property def size_files_kb(self): + """Returns a list of file sizes in kilobytes.""" - return [f / 1024 for f in self.size_files] + return [f / self._BYTES_IN_KB for f in self.size_files] @property def size_files_mb(self): + """Returns a list of file sizes in megabytes.""" - return [f / 1024 for f in self.size_files_kb] + return [f / self._BYTES_IN_MB for f in self.size_files] @property def size_files_gb(self): + """Returns a list of file sizes in gigabytes.""" - return [f / 1024 for f in self.size_files_mb] + return [f / self._BYTES_IN_GB for f in self.size_files]
1772-1786: Enhance clarity and consistency of total size properties.Consider adding docstrings and using the same constants for direct conversion from bytes.
@property def size(self): + """Returns the total size in bytes.""" return sum(self.size_files) @property def size_kb(self): + """Returns the total size in kilobytes.""" - return sum(self.size_files_kb) + return self.size / self._BYTES_IN_KB @property def size_mb(self): + """Returns the total size in megabytes.""" - return sum(self.size_files_mb) + return self.size / self._BYTES_IN_MB @property def size_gb(self): + """Returns the total size in gigabytes.""" - return sum(self.size_files_gb) + return self.size / self._BYTES_IN_GB
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
snakemake/io.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
snakemake/io.py (1)
Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.
| def size_files(self): | ||
| async def sizes(): | ||
| return [await f.size() for f in self] | ||
|
|
||
| return sum(async_run(sizes())) | ||
| return async_run(sizes()) | ||
|
|
||
| @property | ||
| def size_files_kb(self): | ||
| return [f / 1024 for f in self.size_files] | ||
|
|
||
| @property | ||
| def size_files_mb(self): | ||
| return [f / 1024 for f in self.size_files_kb] | ||
|
|
||
| @property | ||
| def size_files_gb(self): | ||
| return [f / 1024 for f in self.size_files_mb] | ||
|
|
||
| @property | ||
| def size(self): | ||
| return sum(self.size_files) | ||
|
|
||
| @property | ||
| def size_kb(self): | ||
| return sum(self.size_files_kb) | ||
|
|
||
| @property | ||
| def size_mb(self): | ||
| return self.size / 1024 / 1024 | ||
| return sum(self.size_files_mb) | ||
|
|
||
| @property | ||
| def size_gb(self): | ||
| return sum(self.size_files_gb) |
There was a problem hiding this comment.
💡 Codebase verification
Based on the search results, I can now provide a final response:
No test coverage found for new size properties, but they are used internally.
The size properties (size_kb, size_mb, size_gb) appear to be used internally by Snakemake for resource calculations, particularly for determining memory and disk requirements. However:
- No dedicated unit tests were found for these properties in test_io.py
- The properties are referenced in various resource calculations like
input.size_mbfor memory and disk allocation - The functionality is documented in the CLI help text
Recommendation:
- Add unit tests for these properties in tests/test_io.py to ensure correct size calculations
- Document these properties in the API documentation since they are used for resource calculations
🔗 Analysis chain
Verify test coverage and usage.
Please ensure that:
- Unit tests are added/updated to cover the new size properties
- The changes are documented in the user documentation
Let me check the test coverage specifically for these new size properties.
Let me check the actual implementation of these size properties to understand their usage.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for existing tests and documentation
# Check for test files
echo "Checking for test files..."
fd "test.*\.py$"
# Check for existing size property tests
echo "Checking for existing size property tests..."
rg -l "def test.*size"
# Check for documentation files
echo "Checking for documentation files..."
fd ".*\.(rst|md)$"
# Check for existing documentation of size properties
echo "Checking for existing documentation of size properties..."
rg -l "size.*_[kmg]b"
Length of output: 2538
Script:
#!/bin/bash
# Check test_io.py for size-related tests since our changes are in io.py
echo "Checking test_io.py for size tests..."
cat tests/test_io.py
# Check tests.py since it was flagged in the size property search
echo "Checking tests.py for size tests..."
cat tests/tests.py
# Check the documentation mentions in rules.rst and cli.rst
echo "Checking documentation for size properties..."
cat docs/snakefiles/rules.rst
cat docs/executing/cli.rst
Length of output: 198809
Script:
#!/bin/bash
# Check for actual usage of size properties in the codebase
echo "Checking for usage of size properties..."
rg -A 2 "size_[kmg]b" --no-filename
# Check for any tests of size properties
echo "Checking for size property tests..."
rg -A 2 "def test.*size" tests/test_io.py
Length of output: 2576
|
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (7)
snakemake/io.py (7)
1775-1780: Consider concurrent file-size retrieval to improve scalability.Currently, the code awaits each file size in sequence within the
sizes()coroutine, which can become slow for a large set of files. Leveraging concurrent tasks (e.g.,asyncio.gather) insidesizes()would reduce total wait time.A possible approach (shown as a diff example):
def size_files(self): async def sizes(): - return [await f.size() for f in self] + tasks = [asyncio.create_task(f.size()) for f in self] + return await asyncio.gather(*tasks) return async_run(sizes())
1781-1784: Use a named constant for clarity when converting to kilobytes.Dividing by 1024 in multiple places works, but introducing a local constant (e.g.,
KB_FACTOR = 1024) can improve readability and maintainability.
1785-1788: Avoid repeated 1024-based divisions for MB calculations.Since MB is just KB/1024, repeated divisions might introduce minor floating-point precision drift. Using a single step (e.g.,
size_in_mb = size_in_bytes / (1024**2)) can be clearer.
1789-1792: Consider a single-step conversion for GB.As with MB, using a single-step approach for GB can make the conversion more explicit (e.g.,
(f / 1024**3)) and slightly reduce the risk of floating-point drift from multiple divisions.
1793-1796: Cache or store the total size to avoid repeated async calls.Each property call to
size_filestriggers a new asynchronous retrieval. Summing again insizeleads to repeated I/O. Caching the result (e.g., via a memoized approach) can improve performance ifsizeis repeatedly accessed.
1797-1799: Consider memoizing the total size in KB.Similar to
size, callingsize_files_kbreruns asynchronous lookups. A shared cached result or lazy evaluation would prevent recalculations on subsequent property accesses.
1805-1807: Add dedicated tests for GB calculations.As a continuation of the above point, including checks for edge cases in
size_gb(e.g., empty files or very large files) would foster confidence in the correctness of these conversions.Would you like help drafting these additional test cases?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
snakemake/io.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
snakemake/io.py
⏰ Context from checks skipped due to timeout of 90000ms (40)
- GitHub Check: testing (10, 3.12, dash)
- GitHub Check: testing (10, 3.12, bash)
- GitHub Check: testing (10, 3.11, bash)
- GitHub Check: testing (9, 3.12, dash)
- GitHub Check: testing (9, 3.12, bash)
- GitHub Check: testing (9, 3.11, bash)
- GitHub Check: testing (8, 3.12, dash)
- GitHub Check: testing (8, 3.12, bash)
- GitHub Check: testing (8, 3.11, bash)
- GitHub Check: testing (7, 3.12, dash)
- GitHub Check: testing (7, 3.12, bash)
- GitHub Check: testing (7, 3.11, bash)
- GitHub Check: testing (6, 3.12, dash)
- GitHub Check: testing (6, 3.12, bash)
- GitHub Check: testing (6, 3.11, bash)
- GitHub Check: testing (5, 3.12, dash)
- GitHub Check: testing (5, 3.12, bash)
- GitHub Check: testing (5, 3.11, bash)
- GitHub Check: testing (4, 3.12, dash)
- GitHub Check: testing-windows (10)
- GitHub Check: testing (4, 3.12, bash)
- GitHub Check: testing-windows (9)
- GitHub Check: testing (4, 3.11, bash)
- GitHub Check: testing-windows (8)
- GitHub Check: testing (3, 3.12, dash)
- GitHub Check: testing (3, 3.12, bash)
- GitHub Check: testing (3, 3.11, bash)
- GitHub Check: testing-windows (7)
- GitHub Check: testing (2, 3.12, dash)
- GitHub Check: testing (2, 3.12, bash)
- GitHub Check: testing-windows (6)
- GitHub Check: testing (2, 3.11, bash)
- GitHub Check: testing-windows (5)
- GitHub Check: testing (1, 3.12, dash)
- GitHub Check: testing-windows (4)
- GitHub Check: testing (1, 3.12, bash)
- GitHub Check: testing-windows (3)
- GitHub Check: testing (1, 3.11, bash)
- GitHub Check: testing-windows (2)
- GitHub Check: testing-windows (1)
🔇 Additional comments (1)
snakemake/io.py (1)
1803-1804: Enhance test coverage for the new size properties.The
size_mbproperty (and others) lack dedicated unit tests. Ensure correctness by adding tests that validate both the numeric results and the asynchronous retrieval logic.
🤖 I have created a release *beep* *boop* --- ## [8.30.0](v8.29.3...v8.30.0) (2025-03-11) ### Features * Add extra input size properties ([#2424](#2424)) ([359ae2e](359ae2e)) * shell function calls inside of the 'run' directive now use conda, container, or envmodules specifications ([#2289](#2289)) ([0193e34](0193e34)) * xonsh support for script directive ([#3310](#3310)) ([d1c369b](d1c369b)) ### Bug Fixes * include unit test templates in `setup.py` ([#3362](#3362)) ([b47252c](b47252c)) ### Documentation * clearly explain report rendering to ZIP archive ([#3357](#3357)) ([948e8fb](948e8fb)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description <!--Add a description of your PR here--> Add properties to return size of `input` per file, both in bytes and megabytes. ### QC <!-- Make sure that you can tick the boxes below. --> * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced file size calculations with new properties for size in kilobytes, megabytes, and gigabytes. - Improved clarity and usability of the `InputFiles` class for better file size management. - **Bug Fixes** - Refined the `size` property to accurately reflect the total size of all files. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
🤖 I have created a release *beep* *boop* --- ## [8.30.0](snakemake/snakemake@v8.29.3...v8.30.0) (2025-03-11) ### Features * Add extra input size properties ([snakemake#2424](snakemake#2424)) ([359ae2e](snakemake@359ae2e)) * shell function calls inside of the 'run' directive now use conda, container, or envmodules specifications ([snakemake#2289](snakemake#2289)) ([0193e34](snakemake@0193e34)) * xonsh support for script directive ([snakemake#3310](snakemake#3310)) ([d1c369b](snakemake@d1c369b)) ### Bug Fixes * include unit test templates in `setup.py` ([snakemake#3362](snakemake#3362)) ([b47252c](snakemake@b47252c)) ### Documentation * clearly explain report rendering to ZIP archive ([snakemake#3357](snakemake#3357)) ([948e8fb](snakemake@948e8fb)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>












Description
Add properties to return size of
inputper file, both in bytes and megabytes.QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
InputFilesclass for better file size management.sizeproperty to accurately reflect the total size of all files.