Skip to content

Reduce size and copy cost of Settings objects#101228

Open
Algunenano wants to merge 5 commits intoClickHouse:masterfrom
Algunenano:remove-settings-vtable
Open

Reduce size and copy cost of Settings objects#101228
Algunenano wants to merge 5 commits intoClickHouse:masterfrom
Algunenano:remove-settings-vtable

Conversation

@Algunenano
Copy link
Copy Markdown
Member

@Algunenano Algunenano commented Mar 30, 2026

Reduce the size and copy cost of Settings and all other settings objects.

1. Remove vtable from setting field types

Replace virtual dispatch through SettingFieldBase with a manual vtable (SettingFieldOps) shared per type. This eliminates the 8-byte vptr from every setting field instance. SettingFieldBool goes from 16 to 2 bytes, SettingFieldUInt64 from 24 to 16 bytes. Total savings: ~10 KB per Settings object.

2. Generated typed-array Data layout for Settings

A CMake code generator groups settings into typed arrays (Bool_[795], UInt64_[428], etc.) instead of one member per setting. This eliminates inter-type padding waste and lets the compiler use memcpy per array for copies. The copy constructor shrinks from 50 KB / 7654 instructions to 7 KB / 1135 instructions (7x). Settings definitions are extracted to SettingsList.inc so the generator can include them via the preprocessor, ensuring it always sees exactly what the compiler sees.

Closes #58797

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Reduce memory footprint and copy cost of Settings objects by removing vtable overhead from setting fields, and using a generated typed-array layout.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 30, 2026

Workflow [PR], commit [5bafb82]

Summary:

job_name test_name status info comment
Stateless tests (arm_asan_ubsan, targeted) failure
03758_fix_isssue_87414 FAIL cidb
Stress test (amd_tsan) failure
Logical error: Shard number is greater than shard count: shard_num=A shard_count=B cluster=C (STID: 5066-457d) FAIL cidb
Stress test (arm_msan) failure
MemorySanitizer: use-of-uninitialized-value (STID: 1003-326e) FAIL cidb, issue

AI Review

Summary

This PR refactors Settings storage and access to reduce memory footprint and copy cost by introducing generated typed-array SettingsData, offset-based SettingIndex, and shared type-erased ops. High-level direction is good, but there is a blocker in the code-generation pipeline: gen_settings_data.sh can fail to preprocess settings on real builds (missing generated include paths like config.h) while suppressing diagnostics and compiler failure signals.

PR Metadata
  • Changelog category: ✅ Correct (Performance Improvement).
  • Changelog entry requirement: ✅ Required for this category and present.
  • Changelog entry quality: ✅ Specific and user-readable.
Findings
  • ❌ Blockers
    • [src/Core/gen_settings_data.sh:25-34] The settings extractor preprocess step hides compiler errors (2>/dev/null) and masks failure (|| true) while using only source include paths. In this PR head, extraction fails due to missing generated config.h include path, causing ERROR: No settings extracted and build breakage.
    • Suggested fix: pass required build include directories (at least binary root with generated headers), and fail immediately on preprocessor errors instead of swallowing stderr/exit status.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout ⚠️ Code-generation step currently masks preprocessing failures and can fail in build environments.
Compilation time
Final Verdict
  • Status: ❌ Block
  • Minimum required actions:
    • Fix gen_settings_data.sh preprocessing invocation to include generated-header paths needed by gen_settings_extract.cpp (including config.h).
    • Remove failure masking (2>/dev/null + || true) so extraction errors fail loudly and deterministically.

@clickhouse-gh clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Mar 30, 2026
@Algunenano
Copy link
Copy Markdown
Member Author

@azat @alexey-milovidov Maybe you would like to have a look at this PR. It's going to conflict all the time, so I'll prepare the private changes too and once it's reviewed fix new conflicts with master (it will conflict on anything changing session settings)

Algunenano and others added 2 commits March 31, 2026 10:53
Replace virtual dispatch through `SettingFieldBase` with a manual vtable
(`SettingFieldOps`) that stores typed function pointers shared per setting
field type. This eliminates the 8-byte vptr from every setting field
instance in the `Data` struct.

Key changes:
- `SettingFieldBase` is now an empty base class (no virtual methods)
- `SettingFieldOps` provides type-erased operations via named template
  functions (one static instance per concrete type, not per setting)
- `FieldInfo` uses `offsetof` for data access instead of per-setting
  lambdas, and a single lazy lambda for default values
- Per-setting lambda count reduced from 2 to 1 (was 2200, now 1100)

Size reductions per field type (before → after):
- `SettingFieldBool`: 16 → 2 bytes (556 settings → ~7.8 KB saved)
- `SettingFieldUInt64`: 24 → 16 bytes (352 settings → ~2.8 KB saved)
- `SettingFieldFloat`: 16 → 8 bytes
- `SettingFieldString`: 40 → 32 bytes
- All other types: 8 bytes saved each

Total estimated savings: ~10 KB per `Settings` object.
Settings.cpp compile time unchanged (~16s).

References ClickHouse#58797

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the macro-generated Data struct (one member per setting) with a
code-generated layout that groups settings into typed arrays:

    SettingFieldBool Bool_[795];
    SettingFieldUInt64 UInt64_[428];
    ...

This eliminates inter-type padding waste and enables the compiler to use
`memcpy` per array for copy/move, reducing the copy constructor from
50 KB / 7654 instructions to 7 KB / 1135 instructions (7x smaller).

Key pieces:
- `SettingIndex<T>`: typed offset replacing pointer-to-member, defined
  in SettingIndex.h
- `SettingsData.generated.h`: generated Data struct with typed arrays,
  `SettingOffset::` constants, and default-value constructor
- `DECLARE_SETTINGS_TRAITS_GENERATED`: new macro variant that accepts
  an externally-defined Data type
- Settings.cpp overrides: `IMPLEMENT_SETTING_SUBSCRIPT_OPERATOR` and
  `INITIALIZE_SETTING_EXTERN` use generated offsets with pre-baked
  base class adjustment for optimal `operator[]` (3 instructions)

The generator pipeline (not yet wired into CMake):
1. Preprocessor extracts TYPE/NAME/DEFAULT via marker macros
2. Shell script groups by type and emits the generated header

Other settings classes are unchanged — they continue using
pointer-to-member access.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Algunenano Algunenano force-pushed the remove-settings-vtable branch from cd8a072 to 1f27ade Compare March 31, 2026 08:54
@Algunenano
Copy link
Copy Markdown
Member Author

I've removed COW changes since it overcomplicated many things and required a lot of changes. We use std::unique_ptr<*Settings> and (*settings)[...] would always use the non-const version, which would call cow_if_shared(). Maybe we can introduce it in the future

Algunenano and others added 3 commits March 31, 2026 11:00
Settings definitions moved from Settings.cpp to SettingsList.inc.
Update CI scripts that parse settings definitions:
- check-settings-style: replace Settings.cpp with SettingsList.inc,
  strip "List" suffix from filename to get correct class name "Settings"
- check_cpp.sh: grep SettingsList.inc for allow_* settings
- feature_docs.py: recognize SettingsList.inc as embedded docs
- pr_info.py: include SettingsList.inc in docs-related files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The generator emitted `SettingFieldBool(0)` / `SettingFieldBool(1)` for
obsolete settings defined with `MAKE_OBSOLETE(M, Bool, name, 0)`.
clang-tidy requires `true`/`false` literals instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 31, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.00% -0.10%
Functions 90.90% 90.80% -0.10%
Branches 76.70% 76.60% -0.10%

Changed lines: 95.21% (278/292) · Uncovered code

Full report · Diff report

@Algunenano
Copy link
Copy Markdown
Member Author

This looks good. It should help with anything using settings, as we'll have smaller functions and less iterative binary code.

I expect this will conflict all the time with new settings being added, so once it's reviewed and accepted I'll do the final conflict resolution and merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Settings objects became too big

1 participant