Skip to content

Reintroduce async logging#82516

Merged
Algunenano merged 10 commits intoClickHouse:masterfrom
Algunenano:reintroduce_async_logging
Jul 1, 2025
Merged

Reintroduce async logging#82516
Algunenano merged 10 commits intoClickHouse:masterfrom
Algunenano:reintroduce_async_logging

Conversation

@Algunenano
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Introduce async logging

Original PR: #80125
Addressed the issues found by #82068 and randomizes which logger is used in the CI. I expect we'll remove sync logging once async logger is battle-tested but we should keep it for now

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

…125-poco_async_logger"

This reverts commit 8bd9858, reversing
changes made to 40e0202.
…dledSignals

It might happen if the DateLUT global instance is destroyed before the HandledSignals global instance
@Algunenano Algunenano requested a review from azat June 24, 2025 17:41
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jun 24, 2025

Workflow [PR], commit [617e418]

Summary:

job_name test_name status info comment
Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, 2/2) failure
Logical error thrown (see clickhouse-server.log or logical_errors.txt) FAIL
Stateless tests (amd_binary, ParallelReplicas, s3 storage) failure
01169_alter_partition_isolation_stress FAIL
Stateless tests (amd_ubsan) failure
02280_add_query_level_settings FAIL
Integration tests (asan, old analyzer, 3/6) failure
test_refreshable_mat_view/test.py::test_simple_append[False-True-SELECT now() as a, b as b FROM src1] FAIL
Stress test (amd_ubsan) failure
Server died FAIL
Hung check failed, possible deadlock found (see hung_check.log) FAIL
Killed by signal (in clickhouse-server.log) FAIL
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL
Killed by signal (output files) FAIL
Found signal in gdb.log FAIL

@clickhouse-gh clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Jun 24, 2025
@Algunenano Algunenano requested a review from pamarcos June 24, 2025 17:45
@pamarcos pamarcos self-assigned this Jun 25, 2025
@Algunenano
Copy link
Copy Markdown
Member Author

@Algunenano
Copy link
Copy Markdown
Member Author

Integration tests (release, 4/4): Failure in test_config_decryption/test_wrong_settings_zk.py::test_no_encryption_key_in_zk . Seems related as the test didn't really fail before and it's expecting some logs. Need to investigate

Probably related to the fact that now we don't flush the logs queue when close() is called. I'll do some checks and change it

Copy link
Copy Markdown
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not looked deeply, only the patch that fixes possible crashes, and this part LGTM

throw SystemException("cannot start thread");
FastMutex::ScopedLock l(_pData->mutex);
_pData->pRunnableTarget = pTarget;
if (pthread_create(&_pData->thread, &attributes, runnableEntry, this))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I guess we don't need ThreadStatus here? (like we have in

DB::ThreadStatus thread_status;
) If we are using this type of thread only for system things, like logging flush, since memory will be take into account anyway since we have total_memory_tracker.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do, but we can add it for consistency

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it locally but it made things way more complicated in terms of initialization and handling. I think I'm going to avoid it for now, since these threads are not meant to do anything related to queries and so on.

@Algunenano Algunenano force-pushed the reintroduce_async_logging branch from 7b93e84 to f713e97 Compare July 1, 2025 13:25
@Algunenano
Copy link
Copy Markdown
Member Author

Algunenano commented Jul 1, 2025

The CI is extremely slow and painful as of late. Failures so far:

@Algunenano Algunenano enabled auto-merge July 1, 2025 22:23
@Algunenano Algunenano added this pull request to the merge queue Jul 1, 2025
Merged via the queue into ClickHouse:master with commit 98fde27 Jul 1, 2025
115 of 126 checks passed
@Algunenano Algunenano deleted the reintroduce_async_logging branch July 1, 2025 22:54
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants