Skip to content

Move logging to a separate thread#80125

Merged
Algunenano merged 28 commits intoClickHouse:masterfrom
Algunenano:poco_async_logger
Jun 15, 2025
Merged

Move logging to a separate thread#80125
Algunenano merged 28 commits intoClickHouse:masterfrom
Algunenano:poco_async_logger

Conversation

@Algunenano
Copy link
Copy Markdown
Member

@Algunenano Algunenano commented May 12, 2025

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Make logging asynchronous by default. You can disable this by setting <async>false</async> under <logger>

This is a simpler alternative to #77107. It uses a queue and a thread per channel, removing the load from query execution threads.

Tested config:

    <logger>
        <console>1</console>
        <level>trace</level>
        <log>/mnt/ch/ch_data/log/prod-1/clickhouse-server/clickhouse-server.log</log>
        <errorlog>/mnt/ch/ch_data/log/prod-1/clickhouse-server/clickhouse-server.err.log</errorlog>
        <size>1000M</size>
        <count>10</count>
        <use_syslog>1</use_syslog>
        <syslog>
            <format>syslog</format>
        </syslog>
       <async>true</async>
    </logger>

clickhouse benchmark --port 49000 --query "SELECT COUNT(*) FROM hits WHERE AdvEngineID <> 0;" -i 4000

  • Before: localhost:49000, queries: 4000, QPS: 225.307, RPS: 14476060039.031, MiB/s: 1356.911, result RPS: 225.307, result MiB/s: 0.002.
  • After: localhost:49000, queries: 4000, QPS: 254.954, RPS: 16383865299.908, MiB/s: 1535.463, result RPS: 254.954, result MiB/s: 0.002.

clickhouse benchmark --port 49000 --query "SELECT COUNT(*) FROM hits" -i 10000

  • Before: localhost:49000, queries: 10000, QPS: 1001.945, RPS: 1001.945, MiB/s: 0.015, result RPS: 1001.945, result MiB/s: 0.008.
  • After: localhost:49000, queries: 10000, QPS: 1265.246, RPS: 1265.246, MiB/s: 0.019, result RPS: 1265.246, result MiB/s: 0.010.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented May 12, 2025

Workflow [PR], commit [36b9b5b]

@clickhouse-gh clickhouse-gh bot added the pr-performance Pull request with some performance improvements label May 12, 2025
@Algunenano Algunenano marked this pull request as draft May 13, 2025 12:48
@Algunenano Algunenano force-pushed the poco_async_logger branch from 1950100 to dd0af56 Compare May 13, 2025 18:00
@Algunenano Algunenano force-pushed the poco_async_logger branch from 43d6606 to 3aae854 Compare May 30, 2025 10:48
@Algunenano Algunenano mentioned this pull request May 30, 2025
12 tasks
@Algunenano Algunenano marked this pull request as ready for review June 9, 2025 19:46
@pamarcos pamarcos self-requested a review June 10, 2025 09:56
@pamarcos pamarcos self-assigned this Jun 10, 2025
@Algunenano
Copy link
Copy Markdown
Member Author

Some notes about the PR after talking with @antonio2368:

  • The change has some risks as it's changing the behaviour. Most notably since now running threads do not wait for flush and can continue working and logging more things, it's possible that the queues grow quite a bit and take more and more memory.

    • We'll do more testing and add a queue limit if necessary to drop messages if they grow too much.
    • Another possible improvement to speed up the flushing threads would be to replace Poco::NotificationQueue with a queue that allows to retrieve the whole queue at once. This allows simpler flushing (obviously) but also speeds up some channels since we don't need to flush as many times: writing 20 messages + 1 flush to disk is much faster than 20 times * 1 message + 1 flush.
  • The default behaviour changes to async because I expect it to be net positive in all cases. If we have to go back changing it is trivial (it's a config change, or a one line change for the default) but I'd rather have more testing asap rather than wait

@Algunenano Algunenano enabled auto-merge June 15, 2025 21:16
@Algunenano Algunenano added this pull request to the merge queue Jun 15, 2025
Merged via the queue into ClickHouse:master with commit 97bd60d Jun 15, 2025
116 of 122 checks passed
@Algunenano Algunenano deleted the poco_async_logger branch June 15, 2025 21:41
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 15, 2025
@azat
Copy link
Copy Markdown
Member

azat commented Jun 17, 2025

Here are some more numbers for clickhouse benchmark --port 49000 --query "SELECT COUNT(*) FROM datasets.hits_v1" -i 10000

  • logger.async=false >& /tmp/server.log (tmpfs) - localhost:9000, queries: 10000, QPS: 793.061, RPS: 793.061, MiB/s: 0.012, result RPS: 793.061, result MiB/s: 0.006.
  • logger.async=false >& /dev/null - localhost:9000, queries: 10000, QPS: 791.032, RPS: 791.032, MiB/s: 0.012, result RPS: 791.032, result MiB/s: 0.006.
  • logger.async=true (logs to terminal, in tmux) - localhost:9000, queries: 10000, QPS: 803.503, RPS: 803.503, MiB/s: 0.012, result RPS: 803.503, result MiB/s: 0.006.
  • logger.async=false (logs to terminal, in tmux) - localhost:9000, queries: 10000, QPS: 680.163, RPS: 680.163, MiB/s: 0.010, result RPS: 680.163, result MiB/s: 0.005.

@azat
Copy link
Copy Markdown
Member

azat commented Jun 17, 2025

Also need to add ThreadStatus into Poco::Thread (we already have this in PooledThread, but not here)

@azat
Copy link
Copy Markdown
Member

azat commented Jun 17, 2025

Also, it may cause deadlock on shutdown

I will disable the async for now - #82034

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants