Skip to content

Avoid leaking of tracked memory for async logging and text_log#87584

Merged
azat merged 13 commits intoClickHouse:masterfrom
azat:async-logging-memory-tracking
Sep 27, 2025
Merged

Avoid leaking of tracked memory for async logging and text_log#87584
azat merged 13 commits intoClickHouse:masterfrom
azat:async-logging-memory-tracking

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Sep 24, 2025

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Avoid leaking of tracked memory for async logging (can have a significant drift, for 10 hours, ~100GiB) and text_log (almost same drift is possible).

Fixes: #82036

@azat azat requested a review from Algunenano September 24, 2025 17:23
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Sep 24, 2025

Workflow [PR], commit [97e95fa]

Summary:

job_name test_name status info comment
Integration tests (amd_binary, 4/5) failure
test_postgresql_replica_database_engine/test_0.py::test_table_schema_changes FAIL
Integration tests (arm_binary, distributed plan, 1/4) failure
test_storage_rabbitmq/test.py::test_rabbitmq_random_detach FAIL
Stress test (amd_ubsan) failure
Server died FAIL
Hung check failed, possible deadlock found (see hung_check.log) FAIL
Killed by signal (in clickhouse-server.log) FAIL
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL
Killed by signal (output files) FAIL
Found signal in gdb.log FAIL

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Sep 24, 2025
@azat azat added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Sep 24, 2025
@Algunenano Algunenano self-assigned this Sep 24, 2025
Copy link
Copy Markdown
Member

@Algunenano Algunenano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few of comments:

  • Could you please leave a comment explaining why it's done. It's obvious to me now, but probably not to everyone and not me in a few months.
  • I think we can remove
    [[maybe_unused]] MemoryTrackerDebugBlockerInThread blocker;
    since a) it's already covered here and b) it's also covered by SystemLogQueue<LogElement>::push.
  • Could this have been an issue inside sync logging when calling pushExtendedMessageToInternalTCPTextLogQueue? It seems it might grow the queue, allocating extra memory that's not freed but it's tied to CurrentThread so probably accounting was kept correctly.
  • This makes me wonder what happens in logging threads if an exception is thrown when logging, inside while (is_open). It seems the thread might stop and never recover. I'll investigate further

@azat azat changed the title Avoid leaking of tracked memory with async logging Avoid leaking of tracked memory with async logging and sync logging to text_log and send_logs_level Sep 25, 2025
@azat azat changed the title Avoid leaking of tracked memory with async logging and sync logging to text_log and send_logs_level Avoid leaking of tracked memory for async logging, text_log and send_logs_level Sep 25, 2025
@azat
Copy link
Copy Markdown
Member Author

azat commented Sep 25, 2025

Thanks @Algunenano, decent comments!

Could you please leave a comment explaining why it's done. It's obvious to me now, but probably not to everyone and not me in a few months.

Sure

I think we can remove MemoryTrackerDebugBlockerInThread blocker;
since a) it's already covered here and b) it's also covered by SystemLogQueue::push.

The reason for it is very different, it is to avoid polluting trace_log with MemoryAllocatedWithoutCheck (see 4117df8 and this is the case of resize only actually, not the log element itself, which allocations should not be accounted in query context as well). Let me put a comment as well.

Could this have been an issue inside sync logging when calling pushExtendedMessageToInternalTCPTextLogQueue? It seems it might grow the queue, allocating extra memory that's not freed but it's tied to CurrentThread so probably accounting was kept correctly.

Exactly.

This makes me wonder what happens in logging threads if an exception is thrown when logging, inside while (is_open). It seems the thread might stop and never recover. I'll investigate further

Likely, but the memory exceptions are blocked in that function.

@azat
Copy link
Copy Markdown
Member Author

azat commented Sep 25, 2025

Could this have been an issue inside sync logging when calling pushExtendedMessageToInternalTCPTextLogQueue? It seems it might grow the queue, allocating extra memory that's not freed but it's tied to CurrentThread so probably accounting was kept correctly.

Actually I think for this case it should be accounted properly, since TCPHandler (and HTTPHandler) has CurrentThread::QueryScope which creates initial thread group for a query

@azat azat changed the title Avoid leaking of tracked memory for async logging, text_log and send_logs_level Avoid leaking of tracked memory for async logging and text_log Sep 25, 2025
@azat azat force-pushed the async-logging-memory-tracking branch from 64e5696 to e1464e8 Compare September 25, 2025 15:41
@azat azat requested a review from Algunenano September 25, 2025 15:42
@azat azat marked this pull request as ready for review September 25, 2025 15:45
@Algunenano
Copy link
Copy Markdown
Member

Likely, but the memory exceptions are blocked in that function.

Yes, but I'm guessing there might be other functions, like a temporal disk failure or something similar (thinking about a remote syslog for example). I'll do some more digging and probably cover it with a try...catch(...) just in case

@azat
Copy link
Copy Markdown
Member Author

azat commented Sep 25, 2025

@azat azat force-pushed the async-logging-memory-tracking branch from e1464e8 to c90ae43 Compare September 26, 2025 10:37
@azat
Copy link
Copy Markdown
Member Author

azat commented Sep 27, 2025

There can be some other issues (since we seen that memory usage was very low, so likely we have some places where we allocate with blocked memory tracker, or from another context, but free w/o blocking tracker), but, those will be done separately, this was the major one, others should be less significant (if any). Merging.

@azat azat added this pull request to the merge queue Sep 27, 2025
Merged via the queue into ClickHouse:master with commit 5fb4d67 Sep 27, 2025
119 of 123 checks passed
@azat azat deleted the async-logging-memory-tracking branch September 27, 2025 11:26
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Sep 27, 2025
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added pr-backports-created-cloud deprecated label, NOOP pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Sep 27, 2025
robot-clickhouse-ci-1 added a commit that referenced this pull request Sep 27, 2025
Cherry pick #87584 to 25.9: Avoid leaking of tracked memory for async logging and text_log
robot-clickhouse added a commit that referenced this pull request Sep 27, 2025
robot-ch-test-poll3 added a commit that referenced this pull request Sep 27, 2025
Cherry pick #87584 to 25.8: Avoid leaking of tracked memory for async logging and text_log
robot-clickhouse added a commit that referenced this pull request Sep 27, 2025
robot-ch-test-poll3 added a commit that referenced this pull request Sep 29, 2025
Cherry pick #87584 to 25.7: Avoid leaking of tracked memory for async logging and text_log
robot-clickhouse added a commit that referenced this pull request Sep 29, 2025
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Sep 29, 2025
clickhouse-gh bot added a commit that referenced this pull request Sep 29, 2025
Backport #87584 to 25.7: Avoid leaking of tracked memory for async logging and text_log
azat added a commit that referenced this pull request Oct 2, 2025
Backport #87584 to 25.8: Avoid leaking of tracked memory for async logging and text_log
azat added a commit that referenced this pull request Oct 2, 2025
Backport #87584 to 25.9: Avoid leaking of tracked memory for async logging and text_log
@Algunenano
Copy link
Copy Markdown
Member

Likely, but the memory exceptions are blocked in that function.
Yes, but I'm guessing there might be other functions, like a temporal disk failure or something similar (thinking about a remote syslog for example). I'll do some more digging and probably cover it with a try...catch(...) just in case

I should have done this before: #88814

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-backports-created-cloud deprecated label, NOOP pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory Tracking Issues

5 participants