Fix infinite loop in skipPrefixBeforeHeader when file has fewer lines than skip_first_lines#101111
Fix infinite loop in skipPrefixBeforeHeader when file has fewer lines than skip_first_lines#101111alexey-milovidov wants to merge 11 commits intomasterfrom
skipPrefixBeforeHeader when file has fewer lines than skip_first_lines#101111Conversation
…es than `skip_first_lines` When `input_format_csv_skip_first_lines` or `input_format_tsv_skip_first_lines` is set to a value larger than the number of lines in the file, `skipPrefixBeforeHeader` enters an infinite loop doing `pread` at EOF. Each call to `readRow` succeeds with an empty result even at EOF, so the loop never terminates. Add an EOF check before each `readRow` call to break out of the loop when the file is exhausted. Found via AST fuzzer in Stress test (amd_msan) on PR #100947. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Workflow [PR], commit [53dd622] Summary: ❌
AI ReviewSummaryThis PR fixes a real infinite-loop risk in ClickHouse Rules
Final Verdict
|
| @@ -0,0 +1,7 @@ | |||
| -- Verify that skip_first_lines doesn't cause an infinite loop when the file has fewer lines than requested. | |||
There was a problem hiding this comment.
Does not reproduce.
… loop The `format()` function with inline data uses `ReadBufferFromMemory`, which doesn't trigger the pread-based infinite loop. Use `file()` table function instead, which reads via `ReadBufferFromFileDescriptorPRead`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With `skip_first_lines = 100`, the loop over `readRow` at EOF completes
quickly even on unfixed builds. Bumping to 1 billion makes the loop take
long enough for the test-runner timeout to detect the hang.
This addresses the bugfix-validation failure ("does not reproduce").
…rst-lines-infinite-loop
The .sql test does not reproduce the bug in bugfix validation because the loop in `skipPrefixBeforeHeader` (1B iterations of a fast no-op at EOF) completes within the test timeout, producing the same empty output on both old and new code. Convert to a .sh test that wraps the query in `timeout 10`. On unfixed builds, the 1B-iteration loop takes much longer than 10 seconds, so `timeout` kills the process with a non-zero exit code. On fixed builds, the EOF check breaks the loop instantly and the query succeeds with exit code 0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rst-lines-infinite-loop
Test number 04068 was already taken by `04068_constant_fold_union_intersect` on master. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| @@ -0,0 +1,20 @@ | |||
| #!/usr/bin/env bash | |||
| # Tags: no-parallel | |||
There was a problem hiding this comment.
# Tags: no-parallel looks unnecessary here and will reduce stateless test parallelism in CI. This test already uses ${CLICKHOUSE_DATABASE}-scoped file names, so it should be isolated across parallel runs.
Please remove the no-parallel tag unless there is a concrete cross-test interference that requires serialization.
…rst-lines-infinite-loop
The test uses `${CLICKHOUSE_DATABASE}`-scoped file names, so it is
already isolated across parallel runs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rst-lines-infinite-loop
LLVM Coverage Report
Changed lines: 100.00% (16/16) · Uncovered code |
When
input_format_csv_skip_first_linesorinput_format_tsv_skip_first_linesis set to a value larger than the number of lines in the file,skipPrefixBeforeHeaderenters an infinite loop doingpreadat EOF. Each call toreadRowsucceeds with an empty result even at EOF, so the loop never terminates.Add an EOF check before each
readRowcall to break out of the loop when the file is exhausted.Found via AST fuzzer in Stress test (amd_msan) on PR #100947.
Changelog category (leave one):