kafka-cluster partitions don't necessarily match kafka-mdm-in partitions

Data comes into Metrictank on the data topic (let's call that `mdm`) and when chunks are persisted to cassandra, summaries of the chunk are sent to the persist topic (let's call that `persist`). On start up, MT uses `persist` to avoid overwriting chunks in cassandra that were already persisted. This means that it is important that the summaries in `persist` line up with the data in `mdm` for the instance handling a given partition.

1. MT doesn't really enforce *how* the data in `mdm` is partitioned, just that partitioning is consistent.
2. Partitioning in `persist` is either "ByOrg" or "BySeries" (see [here](https://github.com/grafana/metrictank/blob/85a628a3771b2961d324e1eed2ef01df3bf2e3bf/mdata/notifierKafka/notifierKafka.go#L257) and [here](https://github.com/grafana/metrictank/blob/d12b42fc10dec9486b8184bd5ff7efd39aa48aab/metrictank-sample.ini#L296)). If this isn't how data in `mdm` is partitioned, there is trouble.

It seems to me that the simpler solution is to simply use def.Partition to put the summaries into `persist`.

I am rolling out this change on our side and can submit a cleanup PR. I'm not sure if removing the parameter entirely is backwards compatible or not, however.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kafka-cluster partitions don't necessarily match kafka-mdm-in partitions #950

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

kafka-cluster partitions don't necessarily match kafka-mdm-in partitions #950

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions