Skip to content

Add startup warning when mdraid is being resynchronized#100941

Open
alexey-milovidov wants to merge 9 commits intomasterfrom
mdraid-resync-warning
Open

Add startup warning when mdraid is being resynchronized#100941
alexey-milovidov wants to merge 9 commits intomasterfrom
mdraid-resync-warning

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

Add a startup sanity check that warns when any Linux mdraid array is currently being checked, repaired, or resynchronized. This can significantly degrade disk I/O performance and is useful to know about when diagnosing slow queries or merges.

The check iterates over /sys/block/md*/md/sync_action and warns if the action is anything other than idle.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add a startup warning (visible in system.warnings) when a Linux mdraid array is being resynchronized, as this can degrade disk I/O performance.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 27, 2026

Workflow [PR], commit [ae900cb]

Summary:

job_name test_name status info comment
Stress test (amd_msan) failure
Logical error: Shard number is greater than shard count: shard_num=A shard_count=B cluster=C (STID: 5066-457d) FAIL cidb
Stress test (arm_msan) failure
MemorySanitizer: use-of-uninitialized-value (STID: 1003-358c) FAIL cidb, issue

AI Review

Summary

This PR adds Linux mdraid startup warnings in sanityChecks: one for non-idle sync_action (resync/check/repair activity) and one for non-normal array_state, with warning lifecycle support via Context::addOrUpdateWarningMessage overload for std::optional<PreformattedMessage>. The implementation is straightforward, bounded to startup checks, and consistent with existing system.warnings behavior. I did not find correctness, safety, concurrency, or performance issues that require code changes.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Mar 27, 2026
@azat azat self-assigned this Mar 27, 2026
alexey-milovidov and others added 2 commits March 28, 2026 02:14
…load for `addOrUpdateWarningMessage`

- Check `/sys/block/mdX/md/array_state` in addition to `sync_action` to warn
  when an mdraid array is in a non-normal state (degraded, inactive, etc.)
- Add `addOrUpdateWarningMessage` overload accepting `std::optional<PreformattedMessage>`:
  passing `nullopt` removes the warning, enabling future periodic checks to
  automatically clear stale warnings when conditions resolve.
- New warning type `LINUX_MDRAID_IS_DEGRADED`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 31, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.00% -0.10%
Functions 90.90% 90.90% +0.00%
Branches 76.70% 76.60% -0.10%

Changed lines: 50.00% (31/62) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants