Skip to content

Add server-side AST fuzzer#97568

Merged
alexey-milovidov merged 9 commits intomasterfrom
server-side-ast-fuzzer
Feb 23, 2026
Merged

Add server-side AST fuzzer#97568
alexey-milovidov merged 9 commits intomasterfrom
server-side-ast-fuzzer

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Feb 21, 2026

Closes #28107.

Motivation: We can enable it in Stress tests.
Also, we can combine BuzzHouse with ASTFuzzer.

Summary

  • Bring the existing QueryFuzzer (from clickhouse-client) to the server side
  • Two new experimental settings: ast_fuzzer_runs (number/probability of fuzzed queries per normal query) and ast_fuzzer_any_query (whether to fuzz DDL/INSERT or only read-only queries)
  • A single global QueryFuzzer instance accumulates AST fragments across all sessions, producing increasingly interesting mutations over time
  • Fuzzed queries are executed internally with results discarded; failures are logged at TRACE level

Test plan

  • Build succeeds
  • SET ast_fuzzer_runs = 3; SELECT 1 produces fuzzed queries visible in server TRACE logs
  • DDL queries are not fuzzed by default, but are fuzzed with SET ast_fuzzer_any_query = 1
  • Stateless test 03833_server_ast_fuzzer passes

Changelog category (leave one):

  • Experimental Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add server-side AST fuzzer controlled by ast_fuzzer_runs and ast_fuzzer_any_query settings. When enabled, the server runs randomized mutations of each query after its normal execution, discarding the results.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

New experimental settings:

  • ast_fuzzer_runs (Float, default 0): Controls the server-side AST fuzzer. 0 = disabled, 0 < value < 1 = probability of one fuzzed run, >= 1 = number of fuzzed runs per query.
  • ast_fuzzer_any_query (Bool, default false): When false, only read-only queries (SELECT, EXPLAIN, SHOW, DESCRIBE, EXISTS) are fuzzed. When true, all query types are fuzzed.

Example: SET ast_fuzzer_runs = 3; SELECT number FROM numbers(10) WHERE number > 5;

🤖 Generated with Claude Code

Bring the existing `QueryFuzzer` (used in clickhouse-client) to the
server. Two new experimental settings control it:

- `ast_fuzzer_runs` (Float, default 0): 0 disables, 0<v<1 is
  probability, >=1 is the number of fuzzed queries per normal query.
- `ast_fuzzer_any_query` (Bool, default false): when false only
  read-only queries are fuzzed; when true all query types are fuzzed.

A single global `QueryFuzzer` instance accumulates AST fragments from
all queries across all sessions, producing increasingly interesting
mutations over time. Fuzzed queries are executed internally with
results discarded; failures are logged at TRACE level and fed back
via `notifyQueryFailed`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 21, 2026

Workflow [PR], commit [ae4101b]

Summary:

alexey-milovidov and others added 3 commits February 21, 2026 16:24
Add a non-deterministic SQL function `fuzzQuery` that parses a query string,
applies random AST mutations via the global `QueryFuzzer`, and returns the
fuzzed query as a string. Guarded by `allow_fuzz_query_functions` setting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add `.introduced_in = {26, 2}` to `fuzzQuery` function documentation
  to fix the `02415_all_new_functions_must_have_version_information` test.
- Add `allow_fuzz_query_functions` to `enableAllExperimentalSettings.cpp`
  to fix the style check.

#97568

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…zzed queries

Fuzzed queries are expected to fail with errors. Set `send_logs_level = 'fatal'`
to prevent those `<Error>` messages from appearing in stderr, which causes the
flaky check to report "having stderror".

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=97568&sha=f7b86ffd9e6d3cca1f8d3b1beba038f13aa119af&name_0=PR&name_1=Stateless%20tests%20%28amd_tsan%2C%20flaky%20check%29

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov self-assigned this Feb 21, 2026
@alexey-milovidov
Copy link
Copy Markdown
Member Author

Add ProfileEvent for the number of fuzzed queries attempted and CurrentMetric on the current size of the fuzzer's accumulated state.

alexey-milovidov and others added 2 commits February 22, 2026 12:54
- Fix zero byte handling in `fuzzQuery`: exclude trailing zero bytes from
  `ColumnString` data when computing the `end` pointer for the parser, and
  remove unused `single_offset` variable.
- Remove `ast_fuzzer_any_query` test case that could replace `SELECT` with
  other statements and break other tests.
- Generalize Bernoulli distribution for fractional `ast_fuzzer_runs` values
  to work with both integer and fractional parts.
- Add `ASTFuzzerAccumulatedFragments` metric and `ASTFuzzerQueries` profile
  event for observability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The review comment about zero bytes referred to the output buffer
management, not the input parsing. The `end` pointer should include the
full range from `ColumnString`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ClickHouse ClickHouse deleted a comment from CLAassistant Feb 22, 2026
…r::fuzzMain`

Instead of setting the metric at each call site after calling `fuzzMain`,
set it once inside the fuzzer itself.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov added this pull request to the merge queue Feb 23, 2026
Merged via the queue into master with commit fdd1745 Feb 23, 2026
148 checks passed
@alexey-milovidov alexey-milovidov deleted the server-side-ast-fuzzer branch February 23, 2026 11:53
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 23, 2026
@azat
Copy link
Copy Markdown
Member

azat commented Feb 24, 2026

@azat
Copy link
Copy Markdown
Member

azat commented Feb 24, 2026

Fix - #97835

Algunenano pushed a commit to Algunenano/ClickHouse that referenced this pull request Feb 24, 2026
- Add `.introduced_in = {26, 2}` to `fuzzQuery` function documentation
  to fix the `02415_all_new_functions_must_have_version_information` test.
- Add `allow_fuzz_query_functions` to `enableAllExperimentalSettings.cpp`
  to fix the style check.

ClickHouse#97568

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Algunenano pushed a commit to Algunenano/ClickHouse that referenced this pull request Feb 24, 2026
- Fix zero byte handling in `fuzzQuery`: exclude trailing zero bytes from
  `ColumnString` data when computing the `end` pointer for the parser, and
  remove unused `single_offset` variable.
- Remove `ast_fuzzer_any_query` test case that could replace `SELECT` with
  other statements and break other tests.
- Generalize Bernoulli distribution for fractional `ast_fuzzer_runs` values
  to work with both integer and fractional parts.
- Add `ASTFuzzerAccumulatedFragments` metric and `ASTFuzzerQueries` profile
  event for observability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Algunenano pushed a commit to Algunenano/ClickHouse that referenced this pull request Feb 24, 2026
std::pair<std::shared_ptr<QueryFuzzer>, std::unique_lock<std::mutex>> getGlobalASTFuzzer()
{
static std::mutex mutex;
static std::shared_ptr<QueryFuzzer> fuzzer = std::make_shared<QueryFuzzer>(randomSeed());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is main reason to have global QueryFuzzer (instead of one per FunctionFuzzQuery for example) that it works better the more queries fed into it? Maybe se still could store it in global context?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original motivation was to make sure that queries from .sh tests are also used for fuzzing.
But now the main motivation - combining it with Stress test and BuzzHouse. It appears to be super powerful: #98138

The goal is: if some bug can be found - increase the probability of finding it :)

@vdimir vdimir added the post-approved Approved, but after the PR is merged. label Mar 18, 2026
@vdimir vdimir self-assigned this Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

post-approved Approved, but after the PR is merged. pr-experimental Experimental Feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add query fuzzer on server side.

4 participants