Skip to content

Fix deadlock in ~Consumer() during rd_kafka_consumer_close()#6

Closed
azat wants to merge 1 commit intoremove-recursive-submodulesfrom
fix-consumer-close-deadlock
Closed

Fix deadlock in ~Consumer() during rd_kafka_consumer_close()#6
azat wants to merge 1 commit intoremove-recursive-submodulesfrom
fix-consumer-close-deadlock

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Mar 24, 2026

~Consumer() called close() which invoked rd_kafka_consumer_close() directly, bypassing rd_kafka_destroy_flags(). This meant rk_terminate was never set, so consumer_close() took the blocking path, dispatched a rebalance callback, and handle_rebalance() called rd_kafka_assign() — a synchronous cross-thread RPC to the cgrp ops queue. The reply could get purged by rd_kafka_cgrp_terminated() racing on the internal main thread, deadlocking the caller forever.

Replace close() with destroy_handle() which calls rd_kafka_destroy_flags() directly — the same path librdkafka uses internally. This sets rk_terminate before calling consumer_close(), enabling the fast non-blocking path when RD_KAFKA_DESTROY_F_NO_CONSUMER_CLOSE is set. The handle stays valid in the unique_ptr during the destroy (so get_handle() works if rebalance callbacks fire), then is released to prevent double-destroy in ~KafkaHandleBase.

Refs: ClickHouse/ClickHouse#100511

~Consumer() called close() which invoked rd_kafka_consumer_close() directly,
bypassing rd_kafka_destroy_flags(). This meant rk_terminate was never set, so
consumer_close() took the blocking path, dispatched a rebalance callback, and
handle_rebalance() called rd_kafka_assign() — a synchronous cross-thread RPC
to the cgrp ops queue. The reply could get purged by
rd_kafka_cgrp_terminated() racing on the internal main thread, deadlocking the
caller forever.

Replace close() with destroy_handle() which calls rd_kafka_destroy_flags()
directly — the same path librdkafka uses internally. This sets rk_terminate
before calling consumer_close(), enabling the fast non-blocking path when
RD_KAFKA_DESTROY_F_NO_CONSUMER_CLOSE is set. The handle stays valid in the
unique_ptr during the destroy (so get_handle() works if rebalance callbacks
fire), then is released to prevent double-destroy in ~KafkaHandleBase.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@azat
Copy link
Copy Markdown
Member Author

azat commented Mar 24, 2026

I will first merge ClickHouse/ClickHouse#100612

Then, I will properly merge everything into ClickHouse/release-0.4.1 branch, this will include #5, and update submodules again

@azat
Copy link
Copy Markdown
Member Author

azat commented Mar 24, 2026

This has been merged into proper branch manually - origin/ClickHouse/release-0.4.1

Including all missing changes

$ git co origin/ClickHouse/release-0.4.1
$ git munn origin/remove-recursive-submodules
$ git munn origin/fix-consumer-close-deadlock
$ git oneline -5
8cc2f31 (HEAD -> ClickHouse/release-0.4.1) Merge remote-tracking branch 'origin/fix-consumer-close-deadlock' into ClickHouse/release-0.4.1
5523710 Merge remote-tracking branch 'origin/remove-recursive-submodules' into ClickHouse/release-0.4.1
99ee856 (origin/fix-consumer-close-deadlock, fix-consumer-close-deadlock) Fix deadlock in ~Consumer() during rd_kafka_consumer_close()
0403eb3 (origin/remove-recursive-submodules) Remove recursive submodules
ee62c51 (origin/HEAD, origin/ClickHouse/release-0.4.1) Merge pull request #5 from kalavt/ClickHouse/release-0.4.1
$ git push github:ClickHouse/cppkafka.git ClickHouse/release-0.4.1
warning: not sending a push certificate since the receiving end does not support --signed push
Enumerating objects: 19, done.
Counting objects: 100% (19/19), done.
Delta compression using up to 64 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (7/7), 947 bytes | 947.00 KiB/s, done.
Total 7 (delta 4), reused 0 (delta 0), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (4/4), completed with 3 local objects.
To github.com:ClickHouse/cppkafka.git
   ee62c51..8cc2f31  ClickHouse/release-0.4.1 -> ClickHouse/release-0.4.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants