tsdb:Optimize LabelValues API performance (#14551)#18069
tsdb:Optimize LabelValues API performance (#14551)#18069bboreham merged 1 commit intoprometheus:mainfrom
Conversation
bboreham
left a comment
There was a problem hiding this comment.
Thanks for this.
If it's not in the first postings, I would expect seek() to examine further postings until it finds one with the target or reaches the end.
I expect it still works in your implementation because you're calling it from a loop that will call it again. But this is not the standard behaviour of seek(). Maybe calling it something else would be fine.
I'll rename it to seekHead or updateHead to make it clear it's only operating on the top element. ig seekHead sounds better? |
Yes, ok with |
|
BTW there is a tool |
Why not use If you have a good reason to need a new benchmark, please include this in your PR. |
the existing one covers the high-overlap case (like 90% match) where |
|
I have now studied the actual algorithmic change you made; it's a great find. Thanks again. |
Happy to know u liked the changes, i would like to modify the stuff on my own this time although it might take 3-4hrs as am currently engaged with one thing but surely i will do it |
613665b to
27983fa
Compare
|
hey @bboreham i renamed seek to seekHead so it's less confusing, and I nuked that unused next() method entirely (plus updated the tests to use seekHead instead). |
27983fa to
42a86d6
Compare
…#14551) Signed-off-by: Divyansh Mishra <divyanshmishra@Divyanshs-MacBook-Air-3.local>
42a86d6 to
fcb6806
Compare
Hey @bboreham, just a reminder to check out this PR |
I get a pretty good speedup on the existing benchmark: But yours is a bigger percentage: |
bboreham
left a comment
There was a problem hiding this comment.
Great, code change is very clean now.
I didn't study the new benchmark closely.
tsdb: Optimize LabelValues for sparse intersections (Fixes #14551)
Description
This PR optimizes the
FindIntersectingPostingsfunction to improve the performance ofLabelValuesqueries when matchers are involved.The Problem:
In scenarios where a matcher selects a range of IDs that is disjoint from or far ahead of the current candidate postings (e.g.,
matcheris at ID 1,000,000 but thecandidateis at ID 0), the previous implementation would callNext()on the candidate roughly 1,000,000 times to catch up. This resulted in O(N) complexity where N is the number of non-matching IDs, causing significant performance degradation and even timeouts for large datasets.The Solution:
I've updated
FindIntersectingPostingsto useSeek()on the candidate instead ofNext(). This allows the iterator to skip non-matching ranges efficiently (O(log N)), aligning the performance ofLabelValueswith the much fasterSeriesAPI for equivalent queries.Benchmarks
I created a reproduction benchmark
BenchmarkLabelValues_SlowPathto simulate the issue (dense candidates, sparse matcher).Results:
Verification
go test ./tsdb/index/...to ensure no regressions in the index package.LabelValuesreturns correct results with the optimization applied.Which issue(s) does the PR fix:
Fixes #14551
Does this PR introduce a user-facing change?