Skip to content

Reduce memory usage of performance tests and reduce variance#80880

Merged
Algunenano merged 8 commits intoClickHouse:masterfrom
Algunenano:perf_memory
May 30, 2025
Merged

Reduce memory usage of performance tests and reduce variance#80880
Algunenano merged 8 commits intoClickHouse:masterfrom
Algunenano:perf_memory

Conversation

@Algunenano
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Reduce memory usage of performance tests and reduce variance

Sometimes tests are failing due to the servers reaching the max amount of allotted memory. This is due to unbounded inserts (benchmarking inserts in a Memory table) and merges of system.query_log.

Closes #80854
References #78555

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@Algunenano Algunenano added the ci-performance performance only label May 27, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented May 27, 2025

Workflow [PR], commit [a3dfa48]

@clickhouse-gh clickhouse-gh bot added the pr-ci label May 27, 2025
@Algunenano Algunenano added the 🍃 green ci 🌿 Fixing flaky tests in CI label May 27, 2025
@azat azat self-assigned this May 27, 2025
@azat
Copy link
Copy Markdown
Member

azat commented May 27, 2025

Performance Comparison (arm_release,master_head,2/3) — ERROR: Job killed, exit code [1] - set status to [error].

Something else also failed:

  File "/home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/./ci/jobs/performance_tests.py", line 494, in main
    results.append(Result.from_commands_run(name="Tests", command=commands))
  File "/home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/ci/praktika/result.py", line 442, in from_commands_run
    result = command_(*command_args, **command_kwargs)
  File "/home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/./ci/jobs/performance_tests.py", line 483, in run_tests
    CHServer.run_test(
  File "/home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/./ci/jobs/performance_tests.py", line 161, in run_test
    res, out, err = Shell.get_res_stdout_stderr(
  File "/home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/ci/praktika/utils.py", line 188, in get_res_stdout_stderr
    res = subprocess.run(
  File "/usr/lib/python3.10/subprocess.py", line 505, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/lib/python3.10/subprocess.py", line 1154, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.10/subprocess.py", line 2059, in _communicate
    stdout = self._translate_newlines(stdout,
  File "/usr/lib/python3.10/subprocess.py", line 1031, in _translate_newlines
    data = data.decode(encoding, errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 19392: invalid start byte

@Algunenano
Copy link
Copy Markdown
Member Author

Something else also failed:

It's impossible to know what. The output of the command ./tests/performance/scripts/perf.py ... printed some non utf-8 character and Python, by default, does not like that. I've modified the behaviour in d0b94df to ignore those characters in praktika cc @maxknv

@Algunenano Algunenano enabled auto-merge May 29, 2025 10:00
@Algunenano Algunenano removed the ci-performance performance only label May 29, 2025
@Algunenano Algunenano disabled auto-merge May 29, 2025 10:04
@Algunenano
Copy link
Copy Markdown
Member Author

It is still creating logs:

3001:2025.05.28 14:28:42.914373 [ 2388 ] {} <Trace> SystemLog (system.metric_log): Flushing system log, 8 entries to flush up to offset 54
3022:2025.05.28 14:28:42.980139 [ 2388 ] {} <Trace> SystemLog (system.metric_log): Flushed system log up to offset 54
3124:2025.05.28 14:28:43.311090 [ 2385 ] {} <Trace> SystemLog (system.trace_log): Flushing system log, 368 entries to flush up to offset 6409
3166:2025.05.28 14:28:43.409237 [ 2385 ] {} <Trace> SystemLog (system.trace_log): Flushed system log up to offset 6409
3823:2025.05.28 14:28:45.966976 [ 2393 ] {} <Trace> SystemLog (system.asynchronous_metric_log): Flushing system log, 1316 entries to flush up to offset 10708
3828:2025.05.28 14:28:45.970041 [ 2393 ] {} <Trace> SystemLog (system.asynchronous_metric_log): Flushed system log up to offset 10708
4131:2025.05.28 14:28:49.618503 [ 2402 ] {} <Trace> SystemLog (system.query_metric_log): Flushing system log, 2 entries to flush up to offset 2
4132:2025.05.28 14:28:49.618547 [ 2402 ] {} <Debug> SystemLog (system.query_metric_log): Creating new table system.query_metric_log for QueryMetricLog
4144:2025.05.28 14:28:49.690589 [ 2402 ] {} <Trace> SystemLog (system.query_metric_log): Flushed system log up to offset 2
4145:2025.05.28 14:28:50.033304 [ 2397 ] {} <Trace> SystemLog (system.transactions_info_log): Flushing system log, 9 entries to flush up to offset 115
4150:2025.05.28 14:28:50.035563 [ 2397 ] {} <Trace> SystemLog (system.transactions_info_log): Flushed system log up to offset 115
4151:2025.05.28 14:28:50.038289 [ 2389 ] {} <Trace> SystemLog (system.latency_log): Flushing system log, 8 entries to flush up to offset 61
4156:2025.05.28 14:28:50.040962 [ 2389 ] {} <Trace> SystemLog (system.latency_log): Flushed system log up to offset 61
4157:2025.05.28 14:28:50.043028 [ 2387 ] {} <Trace> SystemLog (system.text_log): Flushing system log, 1312 entries to flush up to offset 4122
4158:2025.05.28 14:28:50.044920 [ 2398 ] {} <Trace> SystemLog (system.processors_profile_log): Flushing system log, 1408 entries to flush up to offset 7440
4166:2025.05.28 14:28:50.048614 [ 2387 ] {} <Trace> SystemLog (system.text_log): Flushed system log up to offset 4122
4168:2025.05.28 14:28:50.049037 [ 2398 ] {} <Trace> SystemLog (system.processors_profile_log): Flushed system log up to offset 7440
4169:2025.05.28 14:28:50.063752 [ 2383 ] {} <Trace> SystemLog (system.query_log): Flushing system log, 36 entries to flush up to offset 284
4174:2025.05.28 14:28:50.070260 [ 2383 ] {} <Trace> SystemLog (system.query_log): Flushed system log up to offset 284
4200:2025.05.28 14:28:50.480248 [ 2388 ] {} <Trace> SystemLog (system.metric_log): Flushing system log, 7 entries to flush up to offset 61
4205:2025.05.28 14:28:50.543467 [ 2388 ] {} <Trace> SystemLog (system.metric_log): Flushed system log up to offset 61
4243:2025.05.28 14:28:50.909358 [ 2385 ] {} <Trace> SystemLog (system.trace_log): Flushing system log, 2886 entries to flush up to offset 9295
4248:2025.05.28 14:28:51.173621 [ 2385 ] {} <Trace> SystemLog (system.trace_log): Flushed system log up to offset 9295
4290:2025.05.28 14:28:52.970153 [ 2393 ] {} <Trace> SystemLog (system.asynchronous_metric_log): Flushing system log, 1316 entries to flush up to offset 12024
4295:2025.05.28 14:28:52.972972 [ 2393 ] {} <Trace> SystemLog (system.asynchronous_metric_log): Flushed system log up to offset 12024

@Algunenano Algunenano added the ci-performance performance only label May 29, 2025
@Algunenano
Copy link
Copy Markdown
Member Author

Algunenano commented May 30, 2025

Things look still unstable, but at least memory should be better and we are actually using the intended configuration. Overwriting the status to avoid having to wait N hours to rerun again

@Algunenano Algunenano enabled auto-merge May 30, 2025 09:46
@Algunenano Algunenano removed the ci-performance performance only label May 30, 2025
@Algunenano Algunenano added this pull request to the merge queue May 30, 2025
Merged via the queue into ClickHouse:master with commit fa6123d May 30, 2025
119 of 121 checks passed
@Algunenano Algunenano deleted the perf_memory branch May 30, 2025 10:06
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🍃 green ci 🌿 Fixing flaky tests in CI pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Broken perf test: SIZES_OF_ARRAYS_DONT_MATCH

3 participants