Skip to content

Add jemalloc profiling web UI for ClickHouse Keeper#100606

Open
murphy-4o wants to merge 13 commits intoClickHouse:masterfrom
murphy-4o:keeper-jemalloc-webui
Open

Add jemalloc profiling web UI for ClickHouse Keeper#100606
murphy-4o wants to merge 13 commits intoClickHouse:masterfrom
murphy-4o:keeper-jemalloc-webui

Conversation

@murphy-4o
Copy link
Copy Markdown
Member

@murphy-4o murphy-4o commented Mar 24, 2026

Expose jemalloc heap profiling and statistics through Keeper's HTTP control interface, matching the functionality already available in the ClickHouse Server web UI.

Closes #100544

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added a jemalloc profiling web UI for ClickHouse Keeper, available at /jemalloc on the HTTP control port.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Expose jemalloc heap profiling and statistics through Keeper's HTTP
control interface, matching the functionality already available in the
ClickHouse Server web UI.

New REST API endpoints on the Keeper HTTP control port:
- GET /jemalloc — serves the interactive web UI (shared with server via
  `isKeeper` mode flag injected at serve time)
- GET /jemalloc/stats — returns raw `malloc_stats_print` output
- GET /jemalloc/status — returns profiling state as JSON
- GET /jemalloc/profile?format={collapsed|raw} — flushes a heap profile,
  symbolizes it, and returns collapsed stacks for flame graph rendering

The web UI reuses the server's `jemalloc.html` with a thin `isKeeper`
abstraction that swaps SQL-based data fetching for REST API calls and
hides server-specific features (Query Profiler, connection params, output
format options). This avoids duplicating ~3000 lines of HTML/CSS/JS.

New shared helpers in `Jemalloc::heapProfileToCollapsedStacks` (parses
raw jemalloc heap dump, symbolizes addresses via `StackTrace::forEachFrame`,
aggregates into FlameGraph-compatible collapsed stacks) and
`Jemalloc::getStats` (captures `malloc_stats_print` output as a string).

Closes ClickHouse#100544

Made-with: Cursor
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 24, 2026

Workflow [PR], commit [60f939e]

Summary:

job_name test_name status info comment
Stateless tests (amd_tsan, s3 storage, parallel, 1/2) failure
03668_shard_join_in_reverse_order FAIL cidb, issue
Stress test (arm_debug) failure
Logical error: No available columns (STID: 3938-33a6) FAIL cidb, issue
Stress test (arm_tsan) failure
Logical error: '(isConst() || isSparse() || isReplicated()) ? getDataType() == rhs.getDataType() : typeid(*this) == typeid(rhs)' (STID: 2508-324f) FAIL cidb
AST fuzzer (arm_asan_ubsan) failure
Logical error: Bad cast from type A to B (STID: 1635-2ab2) FAIL cidb

AI Review

Summary

This PR adds a Keeper-facing jemalloc web UI and REST endpoints (/jemalloc, /jemalloc/stats, /jemalloc/status, /jemalloc/profile) by reusing server-side jemalloc rendering and wiring Keeper handlers. At current PR head, I did not find additional high-confidence correctness/safety/performance issues beyond already discussed inline review threads.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Mar 24, 2026
Add Keeper tab to the jemalloc web UI section showing the HTTP control
port URL and the REST API endpoints for programmatic access.

Made-with: Cursor
…oc UI

Add config example showing that `keeper_server.http_control.port` must be
explicitly enabled, and a security warning noting the lack of
application-level auth on the Keeper HTTP control port with recommended
network-level mitigations.

Made-with: Cursor
… duplicate comment

- `handleStatus` no longer silently swallows `mallctl` exceptions.
  Failures are logged via `tryLogCurrentException` and surfaced in the
  JSON response as an `errors` array so operators can distinguish
  "profiling disabled" from "status unknown".
- Remove triple-duplicated "Substitute user name" comment in
  `jemalloc.html`.

Made-with: Cursor
…d integration tests

- Inject the Keeper config `<script>` after `<head>` instead of before
  `<!DOCTYPE html>` to avoid triggering quirks mode in browsers.
- Redirect `/jemalloc/` (trailing slash) to `/jemalloc` with HTTP 301 so
  the UI is reachable from both paths.
- Add integration tests for all Keeper jemalloc HTTP endpoints: web UI,
  trailing-slash redirect, /stats, /status, bad profile format (400),
  and unknown API path (404).

Made-with: Cursor
- Avoid `bugprone-suspicious-stringview-data-usage` by building the full
  HTML string with the injected config script before writing.
- Use const reference for `path` to avoid unnecessary copy
  (`performance-unnecessary-copy-initialization`).

Made-with: Cursor
- Call `setResponseDefaultHeaders` on 400/404/301 responses for
  consistent CORS/cache headers across all paths.
- Use `data()` instead of `c_str()` for consistency with the rest of the
  file and the codebase.

Made-with: Cursor
- Register `KeeperJemallocAPIHandler` unconditionally so it works in
  sanitizer builds where `USE_JEMALLOC` is off.  API routes return
  HTTP 501 when jemalloc is not linked; the `/jemalloc/` trailing-slash
  redirect works in all builds.
- Short-circuit HEAD on `/jemalloc/profile` before `flushProfile` to
  avoid expensive side effects from lightweight probes.
- Fix wrong error key `thread.prof.active` → `prof.thread_active_init`
  in `handleStatus` JSON errors array.
- Move `#include <Server/KeeperJemallocHandler.h>` from the header to
  `KeeperHTTPHandlerFactory.cpp` to reduce transitive compile cost.
- Rewrite integration tests: always validate correct behavior for both
  jemalloc and non-jemalloc builds (no skips in sanitizer CI), add
  HEAD-method coverage, add positive-path profile tests, parametrize
  to eliminate duplication.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100606&sha=6ef7cf2a5d0ff347554cf731107fc69d631dc38b&name_0=PR

Made-with: Cursor
…h routing

Replace the monolithic `KeeperJemallocAPIHandler` with dedicated handler
classes (`KeeperJemallocProfileHandler`, `KeeperJemallocStatsHandler`,
`KeeperJemallocStatusHandler`, `KeeperJemallocRedirectHandler`,
`KeeperJemallocNotAvailableHandler`) each registered on a strict path via
the factory.  This eliminates manual URL dispatching inside `handleRequest`
and follows the same pattern used by other Keeper HTTP handlers.

Also fixes `#include` indentation inside `#if USE_JEMALLOC`.

Made-with: Cursor
- Fix `attachStrictPath` not matching URIs with query parameters: replace
  with a custom filter that matches the path portion, ignoring `?...`
  suffixes.  This caused `/jemalloc/profile?format=bad` to 404 in
  non-jemalloc builds instead of returning 501.

- Fix HEAD test assertions: HEAD responses carry no body, so checking
  `response.text` for "not available" fails on 501.  Only assert the
  status code for HEAD.

- Add HEAD short-circuit to `KeeperJemallocStatsHandler` to avoid the
  expensive `je_malloc_stats_print` call when no body is needed.

- Handle `status.errors` in the web UI's `checkProfilingEnabled`: when
  the `/jemalloc/status` response includes errors, treat profiling state
  as unknown rather than silently defaulting to disabled.

Made-with: Cursor
@murphy-4o murphy-4o force-pushed the keeper-jemalloc-webui branch from 69cb391 to ede64f4 Compare March 27, 2026 06:44
When `std::filesystem::remove` fails in the `SCOPE_EXIT` of
`KeeperJemallocProfileHandler`, log a `LOG_WARNING` with the file path
and error message instead of silently dropping the `std::error_code`.

https: //github.com/ClickHouse/pull/100606#discussion_r2999333464
Made-with: Cursor
`KeeperJemallocStatusHandler` previously substituted default values
(false/0) when `mallctl` reads failed, making it impossible for consumers
to distinguish "profiling is disabled" from "status unknown".

Replace bare values with `std::optional<T>` and serialize failed reads as
JSON `null` instead of silent defaults.  When `opt.prof` itself fails,
all dependent fields (`prof_active`, `thread_active_init`, `lg_sample`)
are also `null`.  When `opt.prof` is definitively `false`, dependent
fields carry authoritative false/0 values.

The `errors` array is still present for failed keys, providing
machine-readable diagnostics alongside the per-field tri-state.

https: //github.com/ClickHouse/pull/100606#discussion_r2999333464
Made-with: Cursor
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 31, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.00% -0.10%
Functions 90.90% 90.90% +0.00%
Branches 76.70% 76.60% -0.10%

Changed lines: 0.00% (0/355) · Uncovered code

Full report · Diff report

@murphy-4o murphy-4o marked this pull request as ready for review April 1, 2026 03:27
@murphy-4o murphy-4o requested a review from antonio2368 April 1, 2026 03:27
@antonio2368 antonio2368 self-assigned this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add jemalloc profiling web UI for ClickHouse Keeper

2 participants