Skip to content

Commit d7f724b

Browse files
authored
Merge branch 'master' into fix-lm-unused_columns
2 parents c0853ab + 37db54d commit d7f724b

4,045 files changed

Lines changed: 136108 additions & 40535 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/CLAUDE.md

Lines changed: 69 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,16 @@
11
When working with a branch, do not use rebase or amend - add new commits instead.
22

3+
Do not commit to the master branch. Create a new branch for every task.
4+
35
When writing text such as documentation, comments, or commit messages, wrap literal names from ClickHouse SQL language, classes and functions, or literal excerpts from log messages inside inline code blocks, such as: `MergeTree`.
46

57
When writing text such as documentation, comments, or commit messages, write names of functions and methods as `f` instead of `f()` - we prefer it for mathematical purity when it refers a function itself rather than its application.
68

79
When mentioning logical errors, say "exception" instead of "crash", because they don't crash the server in the release build.
810

9-
Links to ClickHouse CI, such as `https://s3.amazonaws.com/clickhouse-test-reports/json.html?...` should be interpreted with a headless browser, e.g., Playwright, because they contain JavaScript. Use the tool at `.claude/tools/fetch_ci_report.js`:
11+
Links to ClickHouse CI, such as `https://s3.amazonaws.com/clickhouse-test-reports/json.html?...` should be analyzed using the tool at `.claude/tools/fetch_ci_report.js`, which directly fetches the underlying JSON data without requiring a browser:
1012

1113
```bash
12-
# Install playwright if needed (one-time setup)
13-
cd /tmp && npm install playwright && npx playwright install chromium
14-
1514
# Fetch and analyze CI report
1615
node /path/to/ClickHouse/.claude/tools/fetch_ci_report.js "<ci-url>" [options]
1716

@@ -21,6 +20,7 @@ node /path/to/ClickHouse/.claude/tools/fetch_ci_report.js "<ci-url>" [options]
2120
# --all Show all test results
2221
# --links Show artifact links (logs.tar.gz, etc.)
2322
# --download-logs Download logs.tar.gz to /tmp/ci_logs.tar.gz
23+
# --credentials <user,password> HTTP Basic Auth for private repositories
2424

2525
# Examples:
2626
node .claude/tools/fetch_ci_report.js "https://s3.amazonaws.com/..." --failed --links
@@ -33,6 +33,62 @@ tar -xzf /tmp/ci_logs.tar.gz ci/tmp/pytest_parallel.jsonl
3333
grep "test_name" ci/tmp/pytest_parallel.jsonl | python3 -c "import sys,json; [print(json.loads(l).get('longrepr','')) for l in sys.stdin if 'failed' in l]"
3434
```
3535

36+
To compile and run C++ code snippets against the ClickHouse codebase without modifying any source files, use the tool at `.claude/tools/cppexpr.sh`. This is a wrapper around `utils/c++expr` that auto-detects build directories and handles working directory setup. When asked about the size, layout, or alignment of ClickHouse data structures, or asked to compare performance of code snippets, use this tool to get a definitive answer instead of guessing.
37+
38+
```bash
39+
# Query the size of a ClickHouse data structure
40+
.claude/tools/cppexpr.sh -i Core/Block.h 'OUT(sizeof(DB::Block))'
41+
42+
# Query multiple expressions at once
43+
.claude/tools/cppexpr.sh -i Core/Field.h 'OUT(sizeof(DB::Field)) OUT(sizeof(DB::Array))'
44+
45+
# Use global code for helper functions or custom types
46+
.claude/tools/cppexpr.sh -g 'struct Foo { int a; double b; };' 'OUT(sizeof(Foo)) OUT(alignof(Foo))'
47+
48+
# Benchmark a code snippet (100000 iterations, 5 tests)
49+
.claude/tools/cppexpr.sh -i Common/Stopwatch.h -b 100000 'Stopwatch sw;'
50+
51+
# Standalone mode (no ClickHouse headers, just standard C++)
52+
.claude/tools/cppexpr.sh --plain 'OUT(sizeof(std::string))'
53+
```
54+
55+
Key options: `-i HEADER` to include headers, `-g 'CODE'` for global-scope code, `-b STEPS` for benchmarking, `-l LIB` to link extra libraries, `--plain` for standalone compilation without ClickHouse. The `OUT(expr)` macro prints `expr -> value`.
56+
57+
When asked to analyze assembly, inspect generated code, find register spills, check branch density, compare codegen between builds, or investigate optimization opportunities in compiled functions, use the tool at `.claude/tools/analyze-assembly.py`. It disassembles functions from a compiled binary, builds a CFG, computes metrics (spill/branch/call density), and reports findings. Use it instead of manually running `llvm-objdump` or `llvm-nm`.
58+
59+
```bash
60+
# Basic analysis of a function
61+
python3 .claude/tools/analyze-assembly.py <binary> "<function_name>"
62+
63+
# Search for overloaded/templated functions by regex
64+
python3 .claude/tools/analyze-assembly.py <binary> "insertRangeFrom" --search
65+
66+
# Pick a specific overload from ambiguous results
67+
python3 .claude/tools/analyze-assembly.py <binary> "insertRangeFrom" --search --select 3
68+
69+
# JSON output for structured analysis
70+
python3 .claude/tools/analyze-assembly.py <binary> "<function_name>" --format json
71+
72+
# Source-interleaved disassembly (needs debug info)
73+
python3 .claude/tools/analyze-assembly.py <binary> "<function_name>" --source
74+
75+
# Microarchitectural analysis of loop bodies (--mcpu is required)
76+
python3 .claude/tools/analyze-assembly.py <binary> "<function_name>" --mca --mcpu=znver3
77+
78+
# Profile-weighted analysis (re-ranks findings by runtime impact)
79+
python3 .claude/tools/analyze-assembly.py <binary> "<function_name>" --perf-map tmp/perf.map.jsonl
80+
81+
# Compare codegen between two builds
82+
python3 .claude/tools/analyze-assembly.py --before <old_binary> --after <new_binary> "<function_name>"
83+
84+
# Verbose mode to see tool commands
85+
python3 .claude/tools/analyze-assembly.py <binary> "<function_name>" -v
86+
```
87+
88+
Key options: `--search` for regex matching, `--fuzzy` for substring matching, `--select N` to pick from ambiguous results, `--all` to analyze all matches, `--context N` to show surrounding symbols, `--max-instructions N` to control output size, `--mca --mcpu=<model>` for llvm-mca throughput analysis, `--perf-map <file>` for runtime-weighted scoring, `--before`/`--after` for diff mode. The tool caches symbol tables by build-id for fast repeated queries.
89+
90+
**IMPORTANT**: `--select N` does NOT imply `--search`. When using a regex pattern with `--select`, you MUST also pass `--search`, e.g. `--search --select 1`. Without `--search`, the pattern is treated as a literal exact match and will fail.
91+
3692
You can build multiple versions of ClickHouse inside `build_*` directories, such as `build`, `build_debug`, `build_asan`, etc.
3793

3894
You can run integration tests as in `tests/integration/README.md` using: `python -m ci.praktika run "integration" --test <selectors>` invoked from the repository root.
@@ -49,17 +105,23 @@ When writing messages, say ASan, not ASAN, and similar (because there are two wo
49105

50106
When checking the CI status, pay attention to the comment from robot with the links first. Look at the Praktika reports first. The logs of GitHub actions usually contain less info.
51107

52-
Do not use `-j` argument with ninja - let it decide automatically.
108+
Do not use `-j` argument with ninja; do not use `nproc` - let it decide automatically.
53109

54110
If I provided a URL with the CI report, logs, or examples, include it in the commit message.
55111

56112
When creating a pull request, append Changelog category and Changelog entry according to this template: `.github/PULL_REQUEST_TEMPLATE.md`. The "Bug Fix" category should be used only for real bug fixes, while for fixing CI reports you can use the "CI Fix or improvement" category. Include the URL to CI report I provided if any. If the PR is about a CI failure, search for the corresponding open issues and provide a link in the PR description.
57113

58-
ARM machines in CI are not slow. They are similar to x86 in performance.
114+
ARM machines in CI are not slow. They are similar to x86 in performance.
115+
116+
Use `tmp` subdirectory in the current directory for temporary files (logs, downloads, scripts, etc.), do not use `/tmp`. Create the directory if needed.
117+
118+
When there are crucial findings (when I corrected your behavior, you when you found a crucial insight about the code), append them to `.claude/learnings.md`, but be concise. You can commit the changes in learnings.md along with the other changes. Read this file at start.
59119

60120
Always load and apply the following skills:
61121

62-
- .claude/skills/install-skills
63122
- .claude/skills/build
64123
- .claude/skills/test
65124
- .claude/skills/fix-sync
125+
- .claude/skills/alloc-profile
126+
- .claude/skills/bisect
127+
- .claude/skills/create-worktree

.claude/settings.json

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"permissions": {
3+
"allow": [
4+
"Bash(gh pr view:*)",
5+
"Bash(gh issue view:*)",
6+
"Bash(gh pr list:*)",
7+
"Bash(gh issue list:*)",
8+
"Bash(gh pr checks:*)",
9+
"Bash(gh pr diff:*)",
10+
"Bash(gh search:*)",
11+
"WebFetch(domain:github.com)"
12+
]
13+
}
14+
}

0 commit comments

Comments
 (0)