Validate block_size_bytes parameter in estimateCompressionRatio#100298
Validate block_size_bytes parameter in estimateCompressionRatio#100298alexey-milovidov merged 14 commits intomasterfrom
Conversation
The AST fuzzer found that passing a huge value like 9223372036854775806 as the `block_size_bytes` parameter to `estimateCompressionRatio` causes a LOGICAL_ERROR exception when trying to allocate a buffer of that size. Add a maximum limit of 256 MiB for the parameter and reject larger values with a proper error message. https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=ccdc901a930a375ee446b7625ccf22c93732350e&name_0=MasterCI&name_1=AST%20fuzzer%20%28arm_asan_ubsan%29 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Workflow [PR], commit [0daa483] Summary: ✅ AI ReviewSummaryThis PR adds validation for Missing context
ClickHouse Rules
Final Verdict
|
The test was flaky under ASan because `clickhouse-local` emits the warning "ASan doesn't fully support makecontext/swapcontext functions" to stderr. Test cases test2 and test4 already filtered this warning with `grep -av`, but test1 and test3 did not, causing the `awk` to produce extra garbage output like "ASandoesn't" and fail the diff. Add the same ASan warning filter to test1 and test3. #100298 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `transform` function casts WHEN values to the expression type. When WHEN values are Nullable (e.g. `CAST(NULL AS Nullable(Int32))`) but the expression is non-Nullable, the cast fails with "Cannot convert NULL value to non-Nullable type". This became visible after #99839 made `aggregate_functions_null_for_empty` produce Nullable result types for more aggregates, causing SQLLogic tests to hit this path. Fix by wrapping the entire `transform` attempt (array construction, build, and execute) in a try-catch that falls through to the `multiIf` path on type-related errors. The `multiIf` path handles Nullable WHEN values correctly via `caseWhenEquals`. https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100293&sha=f067c3bec9fe4a7a07e810a27c9587ea9f777bfb&name_0=PR&name_1=SQLLogic%20test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use `tryGetLeastSupertype` and a Nullable compatibility check to determine whether `transform` can handle the types, rather than catching multiple exception codes after the fact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| throw Exception(ErrorCodes::BAD_QUERY_PARAMETER, "block_size_bytes should be greater than 0"); | ||
|
|
||
| /// Limit to 256 MiB to prevent absurd memory allocations from fuzzed queries | ||
| static constexpr UInt64 max_block_size_bytes = 256 * 1024 * 1024; |
There was a problem hiding this comment.
max_block_size_bytes = 256 * 1024 * 1024 is introduced as a hidden magic constant. This changes user-visible behavior for estimateCompressionRatio without making the limit discoverable/configurable, and it can drift from existing compression-related limits (max_compress_block_size).
Please either:
- reuse an existing setting/constant for compression block bounds, or
- surface this limit explicitly in function documentation and keep the bound in a shared named constant with rationale.
bb6c905 to
49fd1a4
Compare
| @@ -0,0 +1,2 @@ | |||
| SELECT estimateCompressionRatio(9223372036854775806)(*) FROM numbers(1); -- { serverError BAD_QUERY_PARAMETER } | |||
| SELECT estimateCompressionRatio(268435457)(*) FROM numbers(1); -- { serverError BAD_QUERY_PARAMETER } | |||
There was a problem hiding this comment.
block_size_bytes = 268435456 (exactly 256 MiB).
This test currently only covers invalid values (> 256 MiB), so an off-by-one regression in the upper bound would not be detected.
There was a problem hiding this comment.
No, this does not matter.
d54605f to
d1ac6aa
Compare
| @@ -0,0 +1,27 @@ | |||
| -- Test that CASE expression handles Nullable WHEN values gracefully | |||
There was a problem hiding this comment.
estimateCompressionRatio block_size_bytes validation). The PR description and code changes here do not explain the CASE/Nullable behavior coverage added in 04054.
Please either:
- split these tests into a separate PR with matching motivation, or
- update this PR description to explain why this behavior must be validated together with this fix.
There was a problem hiding this comment.
"... The default value is 1 MiB (1048576 bytes). Maximum allowed value is 256 MiB (268435456 bytes)."
| SELECT CASE 10 WHEN NULL THEN 1 WHEN 10 THEN 2 ELSE 3 END; | ||
|
|
||
| -- With cast_keep_nullable, CAST(NULL AS INTEGER) becomes Nullable | ||
| SET cast_keep_nullable = 1; |
There was a problem hiding this comment.
cast_keep_nullable but never restores it. Since settings persist within the test file, all subsequent CASE checks run under a modified session state, which makes the coverage less isolated and can hide regressions under default settings.
Please add SET cast_keep_nullable = 0; (or move the SET immediately before/after the single query that needs it) so later checks validate behavior independently.
…ssion-ratio-huge-block-size
… documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…olation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ck_size_bytes` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LLVM Coverage Report
Changed lines: 100.00% (18/18) · Uncovered code |
MergeTree settings like max_compress_block_size, min_compress_block_size, marks_compress_block_size, and primary_key_compress_block_size flow directly to buffer allocation constructors (CompressedWriteBuffer, WriteBufferFromFile) without size validation. When the AST fuzzer mutates these to values near INT64_MAX via CREATE TABLE SETTINGS, adding SIMD padding (~63 bytes) overflows past the allocator's checkSize() limit of 2^63, causing a LOGICAL_ERROR abort (server crash). Fix: clamp all compress block size settings to 256 MiB (matching the limit in PR ClickHouse#100298 for estimateCompressionRatio) in the MergeTreeWriterSettings constructor and the column-level override in MergeTreeDataPartWriterWide. Affected crash variants: STID 0883-4864, 0883-4856, 0883-5a5b, 0883-2066, 0883-22d2, 0883-45dc (24 hits in 30 days, 3 on master). The estimateCompressionRatio variant (0883-248c) was separately fixed by PR ClickHouse#100298. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MergeTree settings like max_compress_block_size, min_compress_block_size, marks_compress_block_size, and primary_key_compress_block_size flow directly to buffer allocation constructors (CompressedWriteBuffer, WriteBufferFromFile) without size validation. When the AST fuzzer mutates these to values near INT64_MAX via CREATE TABLE SETTINGS, adding SIMD padding (~63 bytes) overflows past the allocator's checkSize() limit of 2^63, causing a LOGICAL_ERROR abort (server crash). Fix: clamp all compress block size settings to 256 MiB (matching the limit in PR ClickHouse#100298 for estimateCompressionRatio) in the MergeTreeWriterSettings constructor and the column-level override in MergeTreeDataPartWriterWide. Affected crash variants: STID 0883-4864, 0883-4856, 0883-5a5b, 0883-2066, 0883-22d2, 0883-45dc (24 hits in 30 days, 3 on master). The estimateCompressionRatio variant (0883-248c) was separately fixed by PR ClickHouse#100298. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pression-ratio-huge-block-size Validate block_size_bytes parameter in estimateCompressionRatio
The AST fuzzer found that passing a huge value like
9223372036854775806as theblock_size_bytesparameter toestimateCompressionRatiocauses a LOGICAL_ERROR exception when theCompressedWriteBufferconstructor tries to allocate a buffer of that size.Add a maximum limit of 256 MiB for the
block_size_bytesparameter and reject larger values with a proper error message.AST fuzzer report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=ccdc901a930a375ee446b7625ccf22c93732350e&name_0=MasterCI&name_1=AST%20fuzzer%20%28arm_asan_ubsan%29
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix LOGICAL_ERROR exception in
estimateCompressionRatiowhenblock_size_bytesparameter is extremely large.Documentation entry for user-facing changes