Use Confluent and move the kafka-consumers into one consumer struct by replay · Pull Request #879 · grafana/metrictank

replay · 2018-03-22T20:22:51Z

Replaces the sarama consumers with confluent ones.
Also gets rid of the duplication between the kafka notifier and kafka input by moving all kafka consumer related stuff into a new struct that's used by both of them.

tehlers320 · 2018-03-22T21:09:48Z

Just some feedback as somebody who tried tsdb-gw via rpm after this similar update. I think FPM should be set to have a depends on librdkafka, there is a no librdkafka in any of the el6/amzn1 repos however there is a nice spec file out there already for users to build it themselves.

How metrictank will probably behave after merge:

tsdb-gw
tsdb-gw: error while loading shared libraries: librdkafka.so.1: cannot open shared object file: No such file or directory

Users can build the dependant rpm here:
https://github.com/edenhill/librdkafka/blob/master/packaging/rpm/librdkafka.spec

Suggested update file: https://github.com/grafana/metrictank/blob/master/scripts/build_packages.sh

I'm not sure how to handle it but this is a breaking change for anybody upgrading rpm on el6/amzn1. el6 is EOL in 2020.

woodsaj · 2018-03-23T08:17:39Z

kafka/consumer.go

+	defer c.wg.Done()
+
+	var ok bool
+	var offsetPtr *int64


offsetPtr is being updated on every msg, but then nothing is done with it.

this value is read by the monitorLag() method: https://github.com/grafana/metrictank/pull/879/files#diff-57960cc1ee87fab707306aab51440a91R251

woodsaj · 2018-03-23T08:18:43Z

kafka/consumer.go

+				}
+
+				c.conf.MessageHandler(e.Value, tp.Partition)
+				atomic.StoreInt64(offsetPtr, int64(tp.Offset))


looks like this should be

atomic.StoreInt64(c.currentOffsets[tp.Partition], int64(tp.Offset))

i was worried about a message arriving from a partition that we did not expect. in such a case we would then call atomic.StoreInt64(nil, int64(tp.Offset)) if we didn't verify first that this partition id exists in c.currentOffsets

oh, i missed that you are setting offsetPtr to c.currentOffsets[tp.Partition] on L#221

woodsaj · 2018-03-23T08:25:34Z

kafka/consumer.go

+				c.consumer.Unassign()
+				log.Info("kafka-consumer: Revoked partitions: %+v", e)
+			case confluent.PartitionEOF:
+				fmt.Printf("%% Reached %v\n", e)


this should either be removed, or use log.Debug. But i think we should just set enable.partition.eof to false in the confluent.ConfigMap to prevent these events from being emitted.

woodsaj · 2018-03-23T08:30:37Z

kafka/consumer.go

+				c.partitionLogSize[partition].Set(int(newest))
+			}
+
+			c.partitionOffset[partition].Set(int(offset))


this is already set on L#252

woodsaj · 2018-03-23T13:08:57Z

kafka/consumer.go

+				currentOffset = time.Now().Add(-1*offsetDuration).UnixNano() / int64(time.Millisecond)
+				currentOffset, _, err = c.tryGetOffset(topic, partition, currentOffset, 3, time.Second)
+				if err != nil {
+					return err


If the offset is outside of the range what kafka has, it will return an error. If that happens we just want to use oldest, not return an error.

actually this comment sounds like in this case it would just use oldest:
https://github.com/confluentinc/confluent-kafka-go/blob/master/kafka/consumer.go#L488-L490

but i'll fall back to oldest in case of any error now

woodsaj · 2018-03-23T13:17:42Z

kafka/consumer.go

+		return 0, 0, err
+	}
+
+	var val1, val2 int64


lets rename these to something meaningful.

woodsaj · 2018-03-23T13:19:28Z

kafka/consumer.go

+			times, err = c.consumer.OffsetsForTimes(times, c.conf.MetadataTimeout)
+			if err == nil {
+				if len(times) == 0 {
+					err = fmt.Errorf("Got 0 topics returned from broker")


we need to fall back to offsetEnd

why should this fall back to offsetEnd, while in the above case it should fall back to offsetBeginning? shouldn't they both fall back to offsetBeginning?

how is that?

metrictank/kafka/consumer.go

Lines 360 to 383 in 985f69a

for {

if offset == confluent.OffsetBeginning || offset == confluent.OffsetEnd {

beginning, end, err = c.consumer.QueryWatermarkOffsets(topic, partition, c.conf.MetadataTimeout)

if err == nil {

if offset == confluent.OffsetBeginning {

return beginning, nil

} else {

return end, nil

}

}

} else {

times := []confluent.TopicPartition{{Topic: &topic, Partition: partition, Offset: offset}}

times, err = c.consumer.OffsetsForTimes(times, c.conf.MetadataTimeout)

if err != nil || len(times) == 0 {

if err == nil {

err = fmt.Errorf("Failed to get offset %d from kafka, falling back to \"oldest\"", offset)

} else {

err = fmt.Errorf("Failed to get offset %d from kafka, falling back to \"oldest\": %s", offset, err)

}

offset = confluent.OffsetBeginning

} else {

return int64(times[0].Offset), nil

}

}

yep, i meant offsetBeginning.

woodsaj · 2018-03-23T13:22:16Z

kafka/consumer.go

+	return c.consumer.Assign(topicPartitions)
+}
+
+func (c *Consumer) tryGetOffset(topic string, partition int32, offsetI int64, attempts int, sleep time.Duration) (int64, int64, error) {


why does this return 2 values?

changing that. it made sense in an older version of the code, but now not anymore

woodsaj · 2018-03-23T13:29:04Z

kafka/partitions.go

+				continue
+			}
+
+			if tm, ok = metadata.Topics[topic]; !ok || len(tm.Partitions) == 0 {


no need to check if topic is in metadata.Topics, it has already been checked.

Maybe change these 2 checks into

tm, ok := metadata.Topics[topic] if !ok || tm.Error.Code() == confluent.ErrUnknownTopic { log.Warn("kafka: unknown topic %s, %d retries", topic, retry) time.Sleep() continue } if len(tm.Partitions) == 0 { log.Warn("kafka: 0 partitions returned for %s, %d retries left, %d backoffMs", topic, retry, backoff) sleep() continue }

woodsaj · 2018-03-23T13:53:45Z

mdata/notifierKafka/cfg.go

-	fs.StringVar(&offsetStr, "offset", "last", "Set the offset to start consuming from. Can be one of newest, oldest,last or a time duration")
-	fs.StringVar(&dataDir, "data-dir", "", "Directory to store partition offsets index")
 	fs.DurationVar(&offsetCommitInterval, "offset-commit-interval", time.Second*5, "Interval at which offsets should be saved.")
+	fs.IntVar(&batchNumMessages, "batch-num-messages", 10000, "Maximum number of messages batched in one MessageSet")


ass all of these flag vars are just going to be put into a kafka.ConsumerConf{} why just just a package global consumerConfig
eg

var consumerConfig kafka.ConsumerConf func init() { consumerConfig = kafka.NewConfig() fs := flag.NewFlagSet("kafka-cluster", flag.ExitOnError) fs.IntVar(&consumerConfig.MetadataTimeout, "consumer-metadata-timeout-ms", 10000, "Maximum time to wait for the broker to send its metadata in ms") ...

woodsaj · 2018-03-23T14:01:52Z

mdata/notifierKafka/notifierKafka.go

+					break EVENTS
+				}
+			default:
+				fmt.Printf("Ignored unexpected event: %s\n", ev)


this should be a log message

Dieterbe · 2018-04-23T18:43:40Z

Afaict this is disabled, or did I miss something?

where is it being disabled? i don't see that in our code and according to https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md it defaults to true

replay · 2018-04-26T13:15:55Z

@Dieterbe

Afaict this is disabled, or did I miss something?
where is it being disabled? i don't see that in our code and according to https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md it defaults to true

here: https://github.com/grafana/metrictank/pull/879/files#diff-57960cc1ee87fab707306aab51440a91R102

replay · 2018-04-26T17:32:06Z

I ran two types of benchmarks now, comparing the current master with this branch.
In the first one I filled kafka with 72h of data (100 mpo), then i started MT and collected metrics about it while it was replicating the backlog.
In the second one I fed MT with a steady 100k datapoints/second, which is less than the maximum it could do on my test env.

When I fed them with a steady stream of data that was below their maximum, then the cpu & memory usage metrics look pretty similar. When they have to replay the backlog, which maxes them out, the current confluent branch seems to be quite a lot faster, but it also uses much more CPU/Memory. I think I should probably check if there is a way to optimize this memory usage.

replaying backlog:

master:
MT: https://snapshot.raintank.io/dashboard/snapshot/EWZd5a7yzGes0q72HehUelgusv4qZ4lh
Containers: https://snapshot.raintank.io/dashboard/snapshot/MTmhJUOO0LWRq2JhQbLOEN7qWfxGcLaB

confluent:
MT: https://snapshot.raintank.io/dashboard/snapshot/hxthSWe23LqxxacxPy0sauglYT652u9P
Containers: https://snapshot.raintank.io/dashboard/snapshot/296TJsNXArhpt7QzYFmY84mJsnOi0xbZ

steady consuming:

master:
MT: https://snapshot.raintank.io/dashboard/snapshot/9EXHJwmvJVZh2jdC24iUAjH1j5GbM3qQ
Containers: https://snapshot.raintank.io/dashboard/snapshot/pRk6cI9cMXCaJpe9K5uury829hf3Uf5q

confluent:
MT: https://snapshot.raintank.io/dashboard/snapshot/htB6ukVxKBK755rLBF9osWnl0mitvKG8
Containers: https://snapshot.raintank.io/dashboard/snapshot/l3U6jDVTDDREKF5Fa9fVbBlU1c4IPtrn

Dieterbe · 2018-05-15T17:25:41Z

we decided to go forward with sarama instead. see #906

replay requested review from Dieterbe, jtlisi and woodsaj March 22, 2018 20:22

replay force-pushed the confluent branch from 3112d5c to 79f9560 Compare March 22, 2018 20:24

replay changed the title ~~Use Confluent and move the kafka-consumer stuff into one struct~~ Use Confluent and move the kafka-consumers into one consumer struct Mar 22, 2018

replay changed the title ~~Use Confluent and move the kafka-consumers into one consumer struct~~ WIP: Use Confluent and move the kafka-consumers into one consumer struct Mar 22, 2018

replay changed the title ~~WIP: Use Confluent and move the kafka-consumers into one consumer struct~~ [WIP] Use Confluent and move the kafka-consumers into one consumer struct Mar 22, 2018

woodsaj reviewed Mar 23, 2018

View reviewed changes

replay force-pushed the confluent branch 11 times, most recently from fdd5285 to 7f010ed Compare March 23, 2018 15:58

replay added 13 commits April 20, 2018 14:23

install package cloud

ff0ea26

update docs according to new flags

f2d5222

take advantage of docker image layer cache

54ee5f1

remove unnecessary loglevel check

9cae5a4

add lz4 dependency

8116e64

add lz4 lib to docker image

cb0d7f9

use lz4-libs

8a97f39

fixes according to comments

4989332

refactor client configs

f490e2e

rename offset-commit-interval to lag-collection-interval

f0cc606

verify offset value

d6c0e6f

use time values where possible

4a7f556

remove consumer prefix from parameters and update example configs

943fa76

replay force-pushed the confluent branch from 1ca9df2 to 943fa76 Compare April 20, 2018 18:23

bugfix: do not unnecessarily multiply backoff time

136bfe7

replay force-pushed the confluent branch 2 times, most recently from 0884f9e to 718029b Compare April 25, 2018 21:20

associate partitions with consumer threads

f8b4dd5

replay force-pushed the confluent branch from 718029b to f8b4dd5 Compare April 26, 2018 13:14

replay added 7 commits April 26, 2018 13:34

increase net-max-open-requests

4fb8c33

use single consumer with polling

ae51bd2

single consumer with partition channels

8410e64

assign multiple partitions to consumer thread

41714c5

independently configure producer from consumer

947ae9c

remove debug statement

16ca8ea

clean up the way kafka client configs get built

d4df47e

	for {
	if offset == confluent.OffsetBeginning \|\| offset == confluent.OffsetEnd {
	beginning, end, err = c.consumer.QueryWatermarkOffsets(topic, partition, c.conf.MetadataTimeout)
	if err == nil {
	if offset == confluent.OffsetBeginning {
	return beginning, nil
	} else {
	return end, nil
	}
	}
	} else {
	times := []confluent.TopicPartition{{Topic: &topic, Partition: partition, Offset: offset}}
	times, err = c.consumer.OffsetsForTimes(times, c.conf.MetadataTimeout)
	if err != nil \|\| len(times) == 0 {
	if err == nil {
	err = fmt.Errorf("Failed to get offset %d from kafka, falling back to \"oldest\"", offset)
	} else {
	err = fmt.Errorf("Failed to get offset %d from kafka, falling back to \"oldest\": %s", offset, err)
	}
	offset = confluent.OffsetBeginning
	} else {
	return int64(times[0].Offset), nil
	}
	}

Conversation

replay commented Mar 22, 2018

Uh oh!

tehlers320 commented Mar 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

replay Mar 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

woodsaj Mar 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

woodsaj Mar 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

woodsaj Mar 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dieterbe commented Apr 23, 2018

Uh oh!

replay commented Apr 26, 2018

Uh oh!

replay commented Apr 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dieterbe commented May 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

replay Mar 23, 2018 •

edited

Loading

woodsaj Mar 23, 2018 •

edited

Loading

woodsaj Mar 26, 2018 •

edited

Loading

woodsaj Mar 23, 2018 •

edited

Loading

replay commented Apr 26, 2018 •

edited

Loading