Skip to content

Add all aarch64 build variants and complete stress test matrix#98677

Merged
alexey-milovidov merged 33 commits intomasterfrom
add-arm-build-variants-and-stress-tests
Mar 30, 2026
Merged

Add all aarch64 build variants and complete stress test matrix#98677
alexey-milovidov merged 33 commits intomasterfrom
add-arm-build-variants-and-stress-tests

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Mar 3, 2026

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Summary

  • Add missing aarch64 builds for arm_debug, arm_msan, and arm_ubsan variants with full .deb package support
  • Produce .deb packages for arm_tsan (previously only produced a binary, moved from extra_validation_build_jobs to build_jobs)
  • Add missing stress tests: amd_asan, arm_release, arm_debug, arm_tsan, arm_msan, arm_ubsan
  • Fix typo in DEB_AMD_MSAN artifact name string (DEB_AMD_MSAM -> DEB_AMD_MSAN)
  • All 5 variants (debug, ASan, TSan, MSan, UBSan) now build for both amd64 and aarch64, each with a corresponding stress test

Test plan

  • Verify all new ARM build jobs appear in CI and build successfully
  • Verify .deb packages are produced for all new ARM variants
  • Verify all new stress tests run against the correct .deb artifacts
  • Verify extra_validation_build_jobs (now empty) causes no issues in PR workflow

Note

Medium Risk
Touches multiple GitHub Actions workflows and CI job/artifact definitions, so miswired dependencies or artifact names could break or significantly slow CI despite no production code changes.

Overview
CI now builds the full aarch64 variant set by adding arm_debug, arm_msan, and arm_ubsan builds (and .deb artifacts) and ensuring arm_tsan also produces a .deb (moved out of extra_validation_build_jobs).

Stress-test coverage is expanded by adding missing stress tests (amd_asan, amd_msan, amd_ubsan, plus arm_release, arm_debug, arm_tsan, arm_msan, arm_ubsan) and wiring these jobs into master, pull_request, and release_branches workflows (including updated needs/finish-workflow aggregation).

Also fixes a typo in the DEB_AMD_MSAN artifact name and tightens backport_branches workflow job filtering to target amd_* variants explicitly.

Written by Cursor Bugbot for commit e596991. This will update automatically on new commits. Configure here.

…ss test matrix

Add missing aarch64 builds for `arm_debug`, `arm_msan`, and `arm_ubsan`
variants, and produce `.deb` packages for all ARM builds including `arm_tsan`
(which previously only produced a binary). Add the corresponding stress tests
for all new variants plus `amd_asan` and `arm_release` which were also missing.

This ensures all 5 sanitizer/debug variants build for both architectures and
each has a corresponding stress test.

Changelog category: CI Fix or Improvement

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 3, 2026

Workflow [PR], commit [8d9cfbb]

Summary:


AI Review

Summary

This PR expands CI coverage for aarch64 build variants and stress tests, fixes one artifact-name typo (DEB_AMD_MSAN), and applies several typo-only fixes in comments/strings. I did not find correctness, safety, or rollout blockers in the non-workflow code changes. Overall verdict: approve.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

alexey-milovidov and others added 3 commits March 4, 2026 01:01
…EB_AMD_MSAN`

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- "some some" -> "some" in `threadPoolCallbackRunner.h`
- "the the" -> "the" in `DatabaseReplicatedWorker.cpp`, `StorageSystemReplicas.cpp`, `LibArchiveWriter.h`
- "coordiation" -> "coordination" in `DatabaseReplicatedWorker.cpp`

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clickhouse-gh clickhouse-gh bot added the pr-ci label Mar 4, 2026
The `backport_branches.py` used bare substring filters like `"asan"` and
`"tsan"` which now also match the new ARM variants (`arm_asan`, `arm_tsan`),
but the backport build filter only includes AMD builds. Changed the filters
to use `"amd_asan"`, `"amd_tsan"`, and `"amd_debug"` to avoid referencing
artifacts that no build job provides.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clickhouse-gh clickhouse-gh bot added the submodule changed At least one submodule changed in this PR. label Mar 4, 2026
DEB_AMD_RELEASE = "DEB_AMD_RELEASE"
DEB_AMD_ASAN = "DEB_AMD_ASAN"
DEB_AMD_TSAN = "DEB_AMD_TSAN"
DEB_AMD_MSAN = "DEB_AMD_MSAM"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny

@alexey-milovidov alexey-milovidov requested a review from maxknv March 4, 2026 08:56
alexey-milovidov and others added 4 commits March 4, 2026 18:34
`simsimd_capabilities` probes SIMD functions with `n=0` using a tiny
8-byte dummy buffer. SVE functions use `do { } while (i < n)` loops
that execute once even with n=0, and MSan instruments predicated loads
as full-width vector reads. Enlarged the buffer to 256 bytes to cover
the widest SVE vector (2048 bits).

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98677&sha=a1b9d7f6170c510431fce962a869aa617d88d888&name_0=PR&name_1=Stress%20test%20%28arm_msan%29

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
alexey-milovidov and others added 10 commits March 6, 2026 23:51
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous submodule pointer (2ccd366) was on a divergent branch
from master's version (b8f4527). Update to 1ab7d5e which includes
master's version plus MSan fixes on top.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nts-and-stress-tests

# Conflicts:
#	.github/workflows/master.yml
#	.github/workflows/pull_request.yml
…nts-and-stress-tests

# Conflicts:
#	.github/workflows/pull_request.yml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflict in `finish_workflow` needs list by keeping `libfuzzer_tests` from the branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
alexey-milovidov and others added 2 commits March 18, 2026 02:24
The merge conflict resolution in 8c4d316 incorrectly kept `libfuzzer_tests`
in the `finish_workflow` needs list, but this job does not exist in
`pull_request.yml`. This caused GitHub Actions to reject the entire workflow
file, preventing CI from running.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov
Copy link
Copy Markdown
Member Author

Depends on #100138.

alexey-milovidov and others added 6 commits March 22, 2026 19:40
Reconcile the PR's ARM build additions with master's `ASAN` -> `ASAN_UBSAN`
rename (combined Address + Undefined Behavior sanitizer builds).
Regenerated workflow YAML files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test fails with TSan due to floating-point precision differences
when random settings change the evaluation order of sum(constant - column).
Wrap Float64 arithmetic queries with `round(..., 2)` to avoid flaky results.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98677&sha=c1d06924bbc951a62ad4a320efdcb22722d72d4b&name_0=PR&name_1=Stateless%20tests%20%28amd_tsan%2C%20parallel%2C%202%2F2%29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Round Float64 results in test 02931_rewrite_sum_column_and_constant
…nts-and-stress-tests

# Conflicts:
#	.github/workflows/pull_request.yml
#	tests/queries/0_stateless/02931_rewrite_sum_column_and_constant.reference
#	tests/queries/0_stateless/02931_rewrite_sum_column_and_constant.sql
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 29, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 24.50% 24.60% +0.10%
Branches 76.70% 76.70% +0.00%

Changed lines: 63.64% (7/11) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov merged commit 696456d into master Mar 30, 2026
163 of 164 checks passed
@alexey-milovidov alexey-milovidov deleted the add-arm-build-variants-and-stress-tests branch March 30, 2026 01:55
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 30, 2026
Desel72 pushed a commit to Desel72/ClickHouse that referenced this pull request Mar 30, 2026
…iants-and-stress-tests

Add all aarch64 build variants and complete stress test matrix
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 1, 2026
The `TryResult` in `ConnectionEstablisherAsync` is populated inside a
fiber whose stack is allocated via `aligned_alloc`. MSan treats such
memory as uninitialized and cannot track writes through
`boost::context`'s uninstrumented assembly for fiber switches.

This restores the `__msan_unpoison` call in `getResult()` that was
previously added in 262fbda and later removed in 31e4980 when
the blanket fiber stack unpoisoning was introduced. The blanket
approach (unpoisoning at allocation time) proved insufficient — STID
4179-5154 spiked from 4 hits to 19 hits in 30 days after PR ClickHouse#98677
expanded ARM64 CI coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants