ci/find_tests: use git diff in CI instead of gh pr diff for coverage-based test selection#101553
ci/find_tests: use git diff in CI instead of gh pr diff for coverage-based test selection#101553
Conversation
`get_diff_text` was calling `gh pr diff` unconditionally, but flaky check containers do not have `enable_gh_auth=True` so `gh` is unauthenticated. This caused Step 3 (coverage-relevant tests) to always return 0 — the empty diff yielded no changed lines and the CIDB coverage query was never issued. Fix: in non-local runs use `git diff <merge_base_commit_sha>...HEAD` (the merge base is stored in `JOB_KV_DATA` by `store_data.py`). `gh pr diff` is kept for local runs only. Also fix the cache guard: `not self._diff_text` treated an empty string as uncached, causing `gh pr diff` to be called twice per job. Changed to `self._diff_text is not None`. Also fix `query_result.strip()` on a potentially-`None` CIDB result (`or ""` guard on line 123).
|
Workflow [PR], commit [34802b9] Summary: ❌
AI ReviewSummaryThis PR improves CI test targeting and rewrites many stateless tests to avoid static database names, but it also introduces several correctness regressions in Findings❌ Blockers
ClickHouse Rules
Performance & Safety
Final Verdict
|
ci/jobs/scripts/find_tests.py
Outdated
| f"gh pr diff {self.info.pr_number} --repo ClickHouse/ClickHouse" | ||
| ) | ||
| else: | ||
| merge_base = (self.info.get_kv_data() or {}).get("merge_base_commit_sha", "") |
There was a problem hiding this comment.
Using merge_base_commit_sha computed against master can produce an incorrect diff for PRs targeting non-master base branches (for example backports to release branches). In those cases git diff <master_merge_base>...HEAD may include unrelated changes and skew coverage-based test selection.
Please derive the merge base from the PR actual base ref (or use the merge-commit parents in CI) before running git diff, so the diff always matches the PR scope.
The flaky check Docker container has no .git directory, so the previous fix using `git diff <merge_base>...HEAD` failed with 'Not a git repository'. `gh pr diff` also fails because there is no gh auth. Use the unauthenticated GitHub API endpoint instead: https://patch-diff.githubusercontent.com/raw/ClickHouse/ClickHouse/pull/{PR}.diff This works from any context (container, local, CI) without requiring git history or gh authentication. `gh pr diff` is kept for local runs.
| check(*sorting_key_type); | ||
| sorting_key_type->forEachChild(check); | ||
| if (sorting_key_type->haveSubtypes()) | ||
| sorting_key_type->forEachChild(check); |
There was a problem hiding this comment.
validateSortingKeyType now skips forEachChild when haveSubtypes is false, but DataTypeObject (JSON) is a real counterexample: it reports haveSubtypes() == false while still exposing nested typed paths via forEachChild.
That means nested Dynamic/Variant under JSON are no longer validated in ORDER BY when allow_suspicious_types_in_order_by = 0, so queries that previously threw ILLEGAL_COLUMN can silently pass. This is a behavior regression in type safety checks.
Please keep recursive validation unconditional (as before), or gate on a predicate that matches forEachChild semantics rather than haveSubtypes.
Two changes: 1. Remove force_ok_exit=True for flaky check. The flag was unconditional, so any failure (actual test fail, server died, unexpected termination) was silently reported as Success. Now flaky-check failures actually fail the CI job. 2. Move the time-limit enforcement from --global_time_limit inside clickhouse-test to Shell.run(timeout=...) on the Python side. With --global_time_limit, clickhouse-test raises GlobalTimeout inside the multiprocessing worker, which sends SIGTERM to every worker process. Workers die with 'Terminated: 15', cleanup is messy, and the result is marked 'terminated unexpectedly' (ERROR status). With Shell.run(timeout=...), SIGTERM is sent to the clickhouse-test process itself after the allotted time. clickhouse-test handles SIGTERM via its own signal_handler, finishes the currently running tests, and exits cleanly — allowing the result to be collected properly.
…allel-self collisions
Tests that CREATE DATABASE with static (hardcoded) names fail when the
flaky check runs them in parallel with themselves — the second worker
finds the database already exists. The existing mechanism renames only
the *primary* test database (to test_XXXXXXXX) but leaves explicitly
created extra databases unchanged.
This commit adds transparent name mangling for such tests:
1. Before executing a test, scan its content for CREATE DATABASE <name>
where <name> is a bare identifier (not a query parameter like
{VAR:Type}, not a shell variable like $VAR, not a keyword).
2. Derive a per-run suffix from the unique part of the primary database
name: database='test_a1b2c3d4' → suffix='_a1b2c3d4'.
3. Write the test content with all occurrences of each extra database
name replaced by name+suffix into a temp file in the test's tmp dir.
Use the temp file for execution instead of the original.
4. After execution, normalize name+suffix → name in stdout (matching
the existing replace_in_file(stdout, database, 'default') pattern).
Reference files need no changes.
Two helper functions are added near replace_in_file():
- find_extra_databases(content): regex-based scanner with filters for
SQL/shell keywords, shell-variable continuations (name${...}),
names shorter than 3 chars, and built-in databases.
- mangle_extra_databases(content, extra_dbs, suffix): word-boundary
replacement for all found names.
Validated on 114 tests with static extra databases (round-trip test:
mangle → normalize → compare with original reference: all pass) over
10165 total test files scanned.
| # Flaky-check exits normally via --global_time_limit or --max-failures. | ||
| # Both are expected stopping conditions; do not block the pipeline. | ||
| force_ok_exit = True | ||
| if "parallel" in test_options and test_result: |
There was a problem hiding this comment.
The PR removes the dedicated flaky-check non-blocking behavior (if is_flaky_check: force_ok_exit = True).
This changes job semantics: flaky-check can now fail the pipeline whenever there are 4+ failed tests or any non-test-runner error path, even though this job is designed as a signal-only flaky detector (bounded by --max-failures / global timeout), not a merge blocker.
Please restore explicit non-blocking behavior for flaky-check (e.g. keep force_ok_exit = True when is_flaky_check) and leave stricter blocking policy to dedicated required jobs.
…tdout Previous approach mangled the test SQL but then normalized stdout back to original names. The correct approach (per user feedback) is: 1. Mangle the test file (name → name+suffix in a temp copy) 2. Mangle the reference file identically (name → name+suffix in a temp copy) 3. Compare stdout (containing mangled names) against the mangled reference This way stdout is never post-processed and the comparison is direct. self.reference_file is redirected to the temp mangled copy for the duration of the test run. Validated locally on 97 tests (89 with static extra databases + 8 unaffected): - 45 passed, 10 skipped - 22 'failures' are all pre-existing local env issues (missing test.hits dataset, no cluster, privileges, version incompatibilities) - Only 1 failure (01470_show_databases_like) is mangling-related: the test uses LIKE '%01470' which doesn't match the suffixed name. That test is tagged no-parallel and won't run in the flaky check.
tests/clickhouse-test
Outdated
| ) | ||
| with open(_mangled_ref_file, "w") as _f: | ||
| _f.write(_mangled_ref) | ||
| self.reference_file = _mangled_ref_file |
There was a problem hiding this comment.
run_single_test mutates self.reference_file to a per-run temp path (args.test_tmp_dir/...) and never restores it.
That becomes a correctness issue on retries: after a failed run, _cleanup removes args.test_tmp_dir, but the same TestCase object is reused for retry (while True in run_tests). On the next run, self.reference_file still points to a deleted file, so diffing can fail for the wrong reason (missing reference instead of real output mismatch).
Please keep self.reference_file stable across runs (e.g. store the mangled reference in a local variable used only for this run, or reinitialize self.reference_file to get_reference_file at the start of each retry).
| f"gh pr diff {self.info.pr_number} --repo ClickHouse/ClickHouse" | ||
| ) | ||
| else: | ||
| self._diff_text = Shell.get_output( |
There was a problem hiding this comment.
In CI path, get_diff_text still uses Shell.get_output with default strict=False.
If curl fails (transient network error, GitHub outage/rate-limit), Shell.get_output returns empty stdout and this method silently returns "". That recreates the original failure mode: no changed lines, no CIDB lookup, and zero coverage-based tests selected, but without a hard failure.
Please make diff fetch fail-fast here (e.g. Shell.get_output(..., strict=True)), or explicitly validate non-empty diff text and raise when fetching fails.
Tests tagged no-parallel never run concurrently with themselves, so hardcoded database names cannot collide. Skipping the mangling for them avoids false failures in cases like 01470_show_databases_like where the test uses a partial name in a LIKE clause that wouldn't match the suffixed database name.
tests/clickhouse-test
Outdated
| # Pattern to find hardcoded (static) database names in CREATE DATABASE statements. | ||
| # Captures the identifier after CREATE DATABASE [IF NOT EXISTS]. | ||
| # Excludes query-parameter syntax like {VAR:Type} and function calls like currentDatabase(). | ||
| _CREATE_DB_RE = re.compile( |
There was a problem hiding this comment.
find_extra_databases only matches unquoted identifiers ([a-zA-Z_][a-zA-Z0-9_]*), so tests that create quoted static DB names are skipped and can still collide in parallel flaky-check runs.
Concrete example already in tree: tests/queries/0_stateless/03566_system_completions_table.sql uses:
CREATE DATABASE IF NOT EXISTS `0003566aaadatabase`;find_extra_databases returns [] for this file, so no mangling is applied even though the DB name is hardcoded and shared by all workers.
Please extend detection/replacement to support backtick-quoted identifiers too (including replacement in mangle_extra_databases) so quoted static names are isolated the same way as unquoted ones.
.sh and .expect tests are executed directly as scripts, so the temp copy must be executable. Copy the original file's mode bits after writing the mangled content.
tests/clickhouse-test
Outdated
| ) | ||
|
|
||
|
|
||
| def find_extra_databases(content: str) -> list: |
There was a problem hiding this comment.
mangle_extra_databases currently replaces every word-boundary match in the whole file, so any false-positive token extracted by find_extra_databases rewrites unrelated SQL.
Concrete path to failure:
find_extra_databasesscans raw text (including comments/strings) and can extract tokens that are not DB identifiers from realCREATE DATABASEstatements.mangle_extra_databasesthen globally rewrites that token everywhere.- If the token is
cluster, statements like... ON CLUSTER cluster ...become... ON CLUSTER cluster_<suffix> ..., which changes execution semantics and can fail because that cluster does not exist.
Please restrict replacement to CREATE/USE/DROP/ATTACH/DETACH DATABASE identifier positions instead of global \bname\b substitution, or parse SQL AST and rewrite only database identifiers.
…ktick support Two fixes: 1. Skip temp-file creation for non-.sql tests (.sh, .expect). .sh scripts source ../shell_config.sh relative to their own path. When the mangled copy is placed in test_tmp_dir (a subdirectory of the tmp area, not the suite dir), that relative path no longer resolves and the script immediately fails with 'No such file or directory'. Restrict temp-file mangling to .sql tests only, where the file is piped via stdin and no relative sourcing occurs. 2. Support backtick-quoted database identifiers. _CREATE_DB_RE now matches both bare identifiers and `quoted` names. find_extra_databases picks the non-None group. mangle_extra_databases replaces both `name` → `name_sfx` (quoted) and word-boundary name → name_sfx (bare). Example: 03566_system_completions_table.sql uses CREATE DATABASE IF NOT EXISTS `0003566aaadatabase` which was previously undetected and now gets correctly mangled.
| if Path(test_output_file).exists(): | ||
| Path(test_output_file).unlink() | ||
| Shell.run(command, verbose=True) | ||
| Shell.run(command, verbose=True, timeout=global_time_limit if global_time_limit > 0 else None) |
There was a problem hiding this comment.
Replacing clickhouse-test's internal --global_time_limit with an external Shell.run(..., timeout=...) changes timeout semantics: the parent sends SIGTERM/SIGKILL to the whole process group, so clickhouse-test may stop without executing its own global-timeout path (GlobalTimeout in workers), cleanup, and final result aggregation.
This can produce partial/empty result artifacts and make flaky-check output less reliable exactly at timeout boundaries. Please keep --global_time_limit in the runner command and use process-level timeout only as a higher-level watchdog (with some safety margin).
The approach of rewriting test SQL and reference files at runtime has too many edge cases and breaks tests in CI. Reverting completely.
When flaky check runs tests 50 times in parallel, tests that create
databases with static names (e.g. `test_02771`) conflict with each
other — all copies try to create the same database simultaneously.
Fix by running a script (`fix_static_db_names.py`) that:
- Rewrites 82 tests to use `{CLICKHOUSE_DATABASE_1:Identifier}` (SQL)
or `${CLICKHOUSE_DATABASE_1}` (shell) instead of static names
- Adds `no-flaky-check` tag to 73 tests that either check database
names in their reference output, need multiple extra databases
(only `CLICKHOUSE_DATABASE_1` is defined in `clickhouse-test`),
or use special-character database names (backtick-quoted)
|
|
||
| assert len(db_names) == 1 | ||
| name = db_names[0] | ||
| var = 'CLICKHOUSE_DATABASE_1' |
There was a problem hiding this comment.
This script rewrites .sh tests to ${CLICKHOUSE_DATABASE_1} (replace_db_name_sh), but clickhouse-test does not export CLICKHOUSE_DATABASE_1 to shell tests (only CLICKHOUSE_DATABASE is exported).
That makes rewritten tests expand ${CLICKHOUSE_DATABASE_1} to an empty string and produce invalid SQL like CREATE DATABASE ;.
Please either:
- export
CLICKHOUSE_DATABASE_1fromclickhouse-testfor shell tests, or - avoid rewriting
.shtests to${CLICKHOUSE_DATABASE_1}until that env var exists.
After migrating tests from static database names to {CLICKHOUSE_DATABASE_1:Identifier},
several issues remained:
1. Shell tests: add CLICKHOUSE_DATABASE_1/2 exports to shell_config.sh
so ${CLICKHOUSE_DATABASE_1} is available as a bash variable
2. Delete fix_static_db_names.py from test directory (clickhouse-test
picks up .py files as tests, causing "Permission denied" failure)
3. Dictionary source DB field: {CLICKHOUSE_DATABASE_1:Identifier} in
DB '...' string literal is not substituted. Fix: add USE statement
after CREATE DATABASE, then use DB currentDatabase() in SOURCE
4. View/materialized view body: {db:Identifier}.table in CREATE VIEW AS
SELECT body fails (ClickHouse bug - param substitution doesn't work in
view body when the view name also uses a parameter). Fix: after USE,
use unqualified table names in the view body
5. dictGet* function string arguments: '{CLICKHOUSE_DATABASE_1:Identifier}.dict'
in string literals is not substituted. Fix: use unqualified 'dict' name
since USE sets the current database context
6. EXISTS DATABASE, DETACH/ATTACH DATABASE, Buffer engine: add these
patterns to the replacement (missed in original script)
- Use currentDatabase() in system.* table queries instead of
{CLICKHOUSE_DATABASE_1:String} (required by style checker)
- Fix 01257: unquoted static DB name in dictionary SOURCE was not replaced
- Add no-parallel tag to 04036_system_commands_coverage (uses SYSTEM DROP)
… dictionary The dictionary one_cell_cache_ints_overflow is defined in tests/config/ints_dictionary.xml with hardcoded db test_01054_overflow. Changing the database name breaks the dictionary source lookup. Add no-flaky-check tag instead since the database name is constrained by the external config file.
| DROP TABLE dict_db_02179.dict_data; | ||
| DROP DATABASE dict_db_02179; | ||
| DROP DATABASE empty_db_02179; | ||
| DROP DATABASE IF EXISTS {CLICKHOUSE_DATABASE_2:Identifier}; |
There was a problem hiding this comment.
02179_dict_reload_on_cluster.sql now uses {CLICKHOUSE_DATABASE_2:Identifier} (lines 35/36/42), but clickhouse-test only injects --param_CLICKHOUSE_DATABASE and --param_CLICKHOUSE_DATABASE_1 for need-query-parameters tests.
For .sql tests this means parameter substitution for CLICKHOUSE_DATABASE_2 is missing, so the test can fail with an unknown query parameter error when parsing those statements.
Please also pass --param_CLICKHOUSE_DATABASE_2=<database>_2 in run_single_test next to CLICKHOUSE_DATABASE_1.
…m substitution
When {CLICKHOUSE_DATABASE_1:Identifier} is used as a qualifier before
table names starting with digits (e.g. .03711_join_with), ClickHouse's
parser fails with a syntax error after the backtick-quoted identifier.
The test also has 'long' and 'no-parallel' tags, so it won't run in
parallel in flaky check. Revert to static db name and add no-flaky-check.
…ed input table
After USE {CLICKHOUSE_DATABASE_1:Identifier}, unqualified CREATE TABLE goes
to the extra database. The test had both a data table 'input' (for the
dictionary source) and a distributed table 'input' in the same database,
causing TABLE_ALREADY_EXISTS. Rename the data table to 'dict_source' to
avoid the conflict while keeping the test logic intact.
- clickhouse-test: also pass --param_CLICKHOUSE_DATABASE_2 for SQL tests
that need a second extra database variable
- 01103, 02115, 03031: fix static database name in Distributed engine
definition (engine=Distributed(cluster, db_name, ...) was not replaced
by the initial script since it's not a standard SQL identifier context)
- 01527: fix dictGet with unqualified 'dict' in Distributed sharding key -
use currentDatabase() || '.dict' since string literals aren't substituted
- 01656: remove USE statement (original had none) and update query_log
LIKE filter to match the dynamic CREATE DATABASE query text instead of
the old static name pattern '%database test_query_log_factories_info%'
- 03533: replace static --database test CLI argument with
--database ${CLICKHOUSE_DATABASE_1} in the two places that need it
- 01527: the Distributed sharding key uses dictGet with a constant string dict name. Making it dynamic via currentDatabase() || '.dict' fails because ClickHouse tries to evaluate it as a constant during skipUnusedShards optimization. Revert to static db name + no-flaky-check. - 01656: test queries system.query_log filtering by the static database name in the query text. The filter is hard to generalize with dynamic db names while keeping the test semantically correct. The test has no-parallel tag so it won't conflict with itself. Add no-flaky-check.
The test comment explicitly says: 'CNF optimization uses QueryNodeHash to
order conditions. We need fixed database.table.column identifier name to
stabilize result'. Changing the database name from the static db_memory_01625
to a dynamic {CLICKHOUSE_DATABASE_1:Identifier} changes the hash values
and causes non-deterministic AND argument ordering in EXPLAIN output.
Add no-flaky-check tag instead since the static database name is required
for stable test output.
| and `gh pr diff` fail there. Use the public GitHub API via curl instead, | ||
| which requires no authentication for public repos. | ||
| """ | ||
| if hasattr(self, '_diff_text') and self._diff_text is not None: |
There was a problem hiding this comment.
_parse_diff_hunk_ranges mis-parses insertion-only hunks.
For a header like @@ -10,0 +11,2 @@, count is 0, so current code computes end = start + count - 1 = 9 (i.e. end < start). Concrete trace:
start=10,count=0→hunk=(10, 9)
This creates an invalid old-file range and then feeds into range merging / SQL prefilters. Please normalize zero-count hunks to a valid insertion-point range (for example, treat count=0 as end=start) before storing them.
…text
After USE {CLICKHOUSE_DATABASE_1:Identifier}, DB currentDatabase() returns
the extra database. But table_for_dict was created before the USE statement
in the default database, so the dictionary source pointing to
DB currentDatabase() couldn't find it.
Fix: move CREATE TABLE and INSERT to after the USE statement so
table_for_dict lives in the same database as the dictionary source.
… creation
This test uses clickhouse-format --obfuscate to test SQL obfuscation.
The database name 'db' is just a test input string, not an actual database
being created. Obfuscation output is deterministic per input, so
changing 'db' to ${CLICKHOUSE_DATABASE_1} breaks the reference output.
There's no parallel conflict risk since no database is actually created.
- 01150_ddl_guard_rwr: DETACH/ATTACH DATABASE calls were still using
static 'test_01150' name instead of ${CLICKHOUSE_DATABASE_1}.
The race condition test needs all operations on the same database.
- 01802_test_postgresql_protocol_with_row_policy: psql connections
were using static 'db01802' database name. After migrating the
SQL setup to use ${CLICKHOUSE_DATABASE_1}, the psql connection
target also needs to use the dynamic name.
- 02404_memory_bound_merging: The cluster
test_cluster_two_shards_different_databases_with_local has
'shard_1' hardcoded as the default_database for the second shard
in tests/config/config.d/clusters.xml. The test MUST use this
static name. Revert to original and add no-flaky-check.
Fix test failures: - 01257_dictionary_mismatch_types: use currentDatabase() || '.table1_dict' for dictGet/dictHas calls so the dictionary is found correctly in ParallelReplicas mode where queries route to remote shards (127.0.0.2) that don't have the unqualified dict in their local context - 02115_rewrite_local_join_right_distribute_table: update .oldanalyzer.reference to use normalized 'default_1' instead of static 'test_02115' database name (clickhouse-test normalizes the random test db name to 'default', so CLICKHOUSE_DATABASE_1 normalizes to 'default_1') Flaky check improvements (clickhouse-test): - Add --sequential-test-runs: controls run count for no-parallel and no-flaky-check tests (defaults to half of --test-runs) - Add --no-flaky-check-sequential: when set in flaky check mode, run no-flaky-check tests sequentially (like no-parallel) with --sequential-test-runs runs instead of skipping them entirely - is_sequential_test now returns True for no-flaky-check tests when --no-flaky-check-sequential is active - apply_test_runs now uses sequential_runs for sequential tests so no-parallel tests also run fewer times than parallel tests
…flag - 01257_dictionary_mismatch_types: revert to static db name, add no-flaky-check tag. The dictGet calls with currentDatabase() || '...' are complex and the test is no-parallel anyway. - clickhouse-test: remove --no-flaky-check-sequential flag. When --flaky-check is passed, no-flaky-check tests now automatically run sequentially (like no-parallel) with --sequential-test-runs iterations. This is the intended default behavior in flaky check mode.
LLVM Coverage Report
Changed lines: 100.00% (5/5) · Uncovered code |
The flaky check job containers do not have
enable_gh_auth=True, soghisunauthenticated inside them.
get_diff_textwas callinggh pr diffunconditionally, which returned an empty string silently. This caused Step 3
(coverage-relevant tests) to always produce 0 results — an empty diff yields
no changed lines and the CIDB coverage query is never issued.
Fix: in non-local (CI) runs, use
git diff <merge_base_commit_sha>...HEADinstead. The merge base SHA is already stored in
JOB_KV_DATAbystore_data.py.gh pr diffis kept only for local runs.Also fix two secondary issues found during analysis:
_diff_textcache guard usednot self._diff_text, which treated anempty string as uncached and caused
gh pr diffto be called twice per job(producing two
WARNING: stderr: Welcome to GitHub CLI!lines in the log).Changed to
self._diff_text is not None.cidb.query()can returnNone; addedor ""guard before calling.strip()on the result.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
Documentation entry for user-facing changes