Skip to content

Conversation

@jqnatividad
Copy link
Collaborator

resolves #3160

This stdin handling bug was inadvertently introduced when the --limit option was added to allow partitioning using columns with very high cardinality which opened thousands of file handles that overran the OS limit.

This was "fixed" by partitioning in batches below the operating system limit. This however, required, reading the input file several times which doesn't work with stdin as its consumed once its read for the first time, causing the stdin regression.

Fixed by writing stdin to a tempfile.

fix stdin bug introduced while adding `--limit` option

#2959
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a regression in the partition command's stdin handling that was introduced when the --limit option was added. The --limit feature processes data in batches to avoid exceeding OS file handle limits, requiring multiple passes over the input data. Since stdin can only be read once, this caused failures when using stdin with partitioning. The fix writes stdin to a temporary file before processing, allowing the batch logic to work correctly.

Key Changes:

  • Modified partition command to detect stdin input and write it to a temporary file
  • Added comprehensive test to verify stdin handling with the --limit flag
  • Implemented cleanup logic for the temporary file after processing

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/cmd/partition.rs Added stdin detection and tempfile handling logic (lines 105-122) and cleanup logic (lines 132-136) to enable stdin input with batch processing
tests/test_partition.rs Added partition_stdin_pipeline test to verify stdin input works correctly with --limit flag, testing multiple unique partition values across batches

Copy link
Contributor

Copilot AI commented Dec 5, 2025

@jqnatividad I've opened a new pull request, #3162, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits December 5, 2025 16:27
Co-authored-by: jqnatividad <1980690+jqnatividad@users.noreply.github.com>
Align partition stdin handling with split/stats pattern
@jqnatividad jqnatividad merged commit 45bd9a9 into master Dec 5, 2025
17 checks passed
@jqnatividad jqnatividad deleted the 3160-partition-stdin-handling branch December 5, 2025 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: qsv partition no longer works in pipeline since 7.1.0

2 participants