system.kafka_consumers table to monitor kafka consumers#50999
system.kafka_consumers table to monitor kafka consumers#50999kssenii merged 36 commits intoClickHouse:masterfrom
Conversation
|
Implements #5128 (comment) |
|
This is an automated comment for commit ade9c3d with description of existing statuses. It's updated for the latest CI running
|
|
Hello @kssenii , thank you for assigning this PR. |
13a3e88 to
f9653a3
Compare
src/Storages/Kafka/StorageKafka.cpp
Outdated
There was a problem hiding this comment.
No sure how expensive is that, but i doubt it comes 'for free'. I think that every few seconds - should be enough (but it would be nice to measure the impact). You can try this one https://github.com/filimonov/ch-kafka-consume-perftest
There was a problem hiding this comment.
I've tweaked 0025-timers.c test https://github.com/ilejn/librdkafka/blob/prof_stat_cb/tests/0025-timers.c
No visible delays even with 1 millisecond interval.
Although
- json is abridged for several first calls (IOW for 2-4 milliseconds)
- I do not have consumers, so the test is not really meaningful (creating many topics did not change anything).
There was a problem hiding this comment.
abs(dateDiff('second', d, now())) < 20
risky. CI/CD sometimes works superslow... Maybe we can push there a fixed timestamp (begining of the test) and compare to that?
There was a problem hiding this comment.
Actually comparing to now() seems more load-agnostic than comparing to a time that had obtained before.
20 seconds should be enough, but ok, I'll change it to 30.
446fc05 to
539f127
Compare
c8219b9 to
4d60f18
Compare
|
I believe that test failures are not caused by my changes. |
src/Storages/Kafka/KafkaConsumer.cpp
Outdated
src/Storages/Kafka/StorageKafka.h
Outdated
There was a problem hiding this comment.
shouldn't that be per consumer? (instead of global per table).
There was a problem hiding this comment.
The statistic is global (it is complete for every consumer), I do not see why it is better to have individual copy of the same data in all consumers.
@filimonov Do you have concerns behind this suggestion?
There was a problem hiding this comment.
Have to admit that Mikhail is right, statistic is not global [enough].
ilejn@hetzner:~$ echo "select rdkafka_stat from system.kafka_consumers where table='another_kafka';" | PATH=$PATH:$HOME/projects/ClickHouse_master/build/programs/ clickhouse-client | jq | grep topic
"topic": "another",
"topic": "another",
"topics": {
"topic": "another",
"topic": "another",
"topic": "another",
"topics": {
"topic": "another",
ilejn@hetzner:~$ echo "select rdkafka_stat from system.kafka_consumers where table='kafka';" | PATH=$PATH:$HOME/projects/ClickHouse_master/build/programs/ clickhouse-client | jq | grep topic
"topic": "extop",
"topic": "extop",
"topics": {
"topic": "extop",
"topic": "extop",
"topic": "extop",
"topics": {
"topic": "extop",
So I am moving it to consumers.
41ce6b3 to
ade9c3d
Compare
|
Hello Mikhail @filimonov , could you have a look, please. |
filimonov
left a comment
There was a problem hiding this comment.
LGFM.
One case which is not well covered here is when the exception in the materialized view. At that stage there is no good way to attribute the problem with a particular consumer / topic / offset so it's hard to push that information to kafka_consumers table (well may be push that issue to all the consumers iteratively, but that is also non-perfect).
Maybe we can leave it like that for now. Materialized view errors should be visible in other places (system.errors, system.query_view_log etc) so it's less important
|
Hello @kssenii , |
|
Hello @ilejn, |
Hello @Yerkon , |
This comment was marked as outdated.
This comment was marked as outdated.
|
@Yerkon |
|
Hello @Slach, problem was already solved. It was some missing grants |
|
|
||
| /* | ||
| * Needed until | ||
| * https://github.com/mfontanini/cppkafka/pull/309 |
There was a problem hiding this comment.
Looks like this got merged mfontanini/cppkafka#309
And it worth to update submodule and remove this function.
|
@ilejn, you can see that the Kafka library has data races and memory leaks. |
Hello @alexey-milovidov , librdkafka is not the best library in the world indeed, but I am not sure that I got your point. system.kafka_consumers supposed to help customers to find out and mitigate [librd]kafka issues. If librdkafka does have memory leaks, extending system.kafka_consumers to cover these types of issues may be a good starting point to fixing them. |
Your criticism regarding librdkafka might not be unfounded, yet it's essential to recognize that Kafka's architecture adheres to the principle of "simple broker, complex consumer." This design philosophy necessitates a multitude of sophisticated features that need to be supported on the client side. The complexity inherent in these features, especially given their multi-threaded implementation in pure C, and attempts to overlay a simple external API atop them, naturally give rise to various issues. While librdkafka may not be flawless, it is the official client, supporting all (or nearly all) Kafka features. As of my knowledge, there are no significantly better alternatives now in c++ world. The temptation to "write your own kafka client with blackjack etc" does exist, yet this usually dissipates upon delving into the details and grasping the scale of the challenge at hand. Creating something from scratch, like a client for the Kafka 0.8 protocol without complex authentication methods, is feasible within a reasonable timeframe. However, developing a universal client that supports Kafka protocol 1.0 features and beyond is a task that spans months or years, requiring ongoing maintenance thereafter. To give you an idea of the scale:
(LOC stats from https://ghloc.vercel.app/) The client implementations with fewer lines of code usually lack support for some important or modern features of Apache Kafka. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
system table to monitor kafka consumers
Documentation entry for user-facing changes