Skip to content

Cache labels.Labels to Identify Series in Remote Write#5849

Merged
csmarchbanks merged 2 commits intoprometheus:masterfrom
csmarchbanks:rw-use-labels
Sep 4, 2019
Merged

Cache labels.Labels to Identify Series in Remote Write#5849
csmarchbanks merged 2 commits intoprometheus:masterfrom
csmarchbanks:rw-use-labels

Conversation

@csmarchbanks
Copy link
Member

@csmarchbanks csmarchbanks commented Aug 9, 2019

This is an experiment to have the remote write component use labels.Labels instead of []prompb.Label when passing samples around/identifying series. A label.Label uses less space so steady state memory usage should be less, with the tradeoff of transforming label.Labels to []prompb.Label for each remote request.

Results after removing some extra sources of allocations look very good. This is for a server with three remote-write targets, and ~8k samples per second ingestion.

Benchmarks

benchmark                     old ns/op     new ns/op     delta
BenchmarkSampleDelivery-8     998914        1007751       +0.88%

benchmark                     old allocs     new allocs     delta
BenchmarkSampleDelivery-8     7117           6117           -14.05%

benchmark                     old bytes     new bytes     delta
BenchmarkSampleDelivery-8     688391        640400        -6.97%

Heap profiles

Heap profiles show a 50% reduction in memory used by StoreSeries as expected.

v2.11.1:

v2 11 1

rw-use-labels branch

rw-use-labels

Other metrics

CPU usage is very similar between the servers. However, there is a reduction in allocations:
ReducedAllocations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really help? Changing the number of shards isn't that often.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks improved some when I first added it. The pool is used for every http request, not just the number of shards, though it could probably be done by keeping a single []prompb.TimeSeries per shard instead of using a pool, similar to the compression buffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can be smarter here I think, I presume this is where the allocations are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Many were here, and some still in buildWriteRequest from making []prompb.Sample. Latest update helps both of these.

@csmarchbanks csmarchbanks marked this pull request as ready for review August 10, 2019 04:05
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer causes an alloc. I'd avoid sync.Pool entirely, it has interesting interactions with GC and in this case we know exactly how much memory we need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Removed the pool.

@brian-brazil
Copy link
Contributor

That's looking okay. What do the benchmarks say now?

@csmarchbanks
Copy link
Member Author

I removed the merge conflict, and updated the benchmarks, slightly fewer allocations than last time, CPU jumps around between runs but is about the same before/after.

I don't expect the graphs to change too much, but can repost them in a few hours if interested.

@csmarchbanks
Copy link
Member Author

@brian-brazil would you mind taking another look when you have a moment?

@brian-brazil
Copy link
Contributor

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwplotka I just updated this PR to make it build after the streaming remote read work was merged. Is it is ok that labelsToLabelsProto no longer interns all the labels?

I don't see any calls to interner.release, so it is probably better not to intern during the remote read path anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like remote read only uses labelsToLabelsProto in tests so it should be okay

Copy link
Member

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I similar results to the benchmark; about the same CPU usage, anywhere from 15-30% less memory usage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like remote read only uses labelsToLabelsProto in tests so it should be okay

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, would be nice if we could use a different name. To me sample implies just ts and value, but I can't think of anything better right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could call it something like queueSample to differentiate it a bit, but I don't know if that gains anything.

This will use half the steady state memory as required by prompb.Label.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants