Skip to content

[clickhouse] Enable system.metric_log and system.asynchronous_metric_log tables#7100

Merged
karencfv merged 2 commits into
oxidecomputer:mainfrom
karencfv:enable-monitoring
Nov 19, 2024
Merged

[clickhouse] Enable system.metric_log and system.asynchronous_metric_log tables#7100
karencfv merged 2 commits into
oxidecomputer:mainfrom
karencfv:enable-monitoring

Conversation

@karencfv

@karencfv karencfv commented Nov 19, 2024

Copy link
Copy Markdown
Contributor

Overview

In order to reliably roll out a replicated ClickHouse cluster, we'll be running a set of long running testing (stage 1 of RFD #468).

ClickHouse provides a system database with several tables with information about the system. For monitoring purposes, the system.asynchronous_metric_log and system.metric_log tables are particularly useful. With them we can retrieve information about queries per second, CPU usage, memory usage etc. Full lists of available metrics here and here

During our long running testing I'd like to give these tables a TTL of 30 days. Once we are confident the system is stable and we roll out the cluster to all racks, we can reduce TTL to 7 or 14 days.

This PR will only enable the tables themselves. There will be follow up PRs to actually retrieve the data we'll be monitoring

Manual testing

Queries per second

oxz_clickhouse_eecd32cc-ebf2-4196-912f-5bb440b104a0.local :) SELECT toStartOfInterval(event_time, INTERVAL 60 SECOND) AS t, avg(ProfileEvent_Query)
FROM system.metric_log
WHERE event_date >= toDate(now() - 86400) AND event_time >= now() - 86400
GROUP BY t
ORDER BY t WITH FILL STEP 60
SETTINGS date_time_output_format = 'iso'

SELECT
    toStartOfInterval(event_time, toIntervalSecond(60)) AS t,
    avg(ProfileEvent_Query)
FROM system.metric_log
WHERE (event_date >= toDate(now() - 86400)) AND (event_time >= (now() - 86400))
GROUP BY t
ORDER BY t ASC WITH FILL STEP 60
SETTINGS date_time_output_format = 'iso'

Query id: 1b91946b-fe8b-4074-bc94-f071f72f55f5

┌────────────────────t─┬─avg(ProfileEvent_Query)─┐
│ 2024-11-18T06:40:00Z │      1.3571428571428572 │
│ 2024-11-18T06:41:00Z │      1.3666666666666667 │
│ 2024-11-18T06:42:00Z │      1.3666666666666667 │
│ 2024-11-18T06:43:00Z │      1.3666666666666667 │
│ 2024-11-18T06:44:00Z │      1.3666666666666667 │

Disk usage

oxz_clickhouse_eecd32cc-ebf2-4196-912f-5bb440b104a0.local :) SELECT toStartOfInterval(event_time, INTERVAL 60 SECOND) AS t, avg(value)
FROM system.asynchronous_metric_log
WHERE event_date >= toDate(now() - 86400) AND event_time >= now() - 86400
AND metric = 'DiskUsed_default'
GROUP BY t
ORDER BY t WITH FILL STEP 60
SETTINGS date_time_output_format = 'iso'

SELECT
    toStartOfInterval(event_time, toIntervalSecond(60)) AS t,
    avg(value)
FROM system.asynchronous_metric_log
WHERE (event_date >= toDate(now() - 86400)) AND (event_time >= (now() - 86400)) AND (metric = 'DiskUsed_default')
GROUP BY t
ORDER BY t ASC WITH FILL STEP 60
SETTINGS date_time_output_format = 'iso'

Query id: bcf9cd9b-7fea-4aea-866b-d69e60a7c0b6

┌────────────────────t─┬─────────avg(value)─┐
│ 2024-11-18T06:42:00Z │  860941425.7777778 │
│ 2024-11-18T06:43:00Z │  865134523.7333333 │
│ 2024-11-18T06:44:00Z │          871888896 │
│ 2024-11-18T06:45:00Z │  874408891.7333333 │
│ 2024-11-18T06:46:00Z │          878761984 │
│ 2024-11-18T06:47:00Z │  881646933.3333334 │
│ 2024-11-18T06:48:00Z │  883998788.2666667 │

Related #6953

@andrewjstone andrewjstone left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome :)

@karencfv karencfv merged commit 7947f75 into oxidecomputer:main Nov 19, 2024
@karencfv karencfv deleted the enable-monitoring branch November 19, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants