tsdb: Early compaction of stale series#16929
Conversation
7f92b48 to
f6d7ac4
Compare
|
/prombench main |
|
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️ Compared versions: After the successful deployment (check status here), the benchmarking results can be viewed at: Available Commands:
|
|
/prombench cancel |
|
Benchmark cancel is in progress. |
|
Looks like stale series tracking is not working. Stale samples are not being put for the series I guess. |
4486bdb to
e693e22
Compare
|
/prombench main |
|
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️ Compared versions: After the successful deployment (check status here), the benchmarking results can be viewed at: Available Commands:
|
|
/prombench stop |
|
Incorrect Available Commands:
Advanced Flags for
Examples:
|
|
/prombench main |
|
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️ Compared versions: After the successful deployment (check status here), the benchmarking results can be viewed at: Available Commands:
|
|
/prombench cancel |
|
Benchmark cancel is in progress. |
|
/prombench cancel |
|
Benchmark cancel is in progress. |
|
Took a quick look at the profiles and confirmed that instant queries is taking the extra CPU. In the below pic, the red box is an additional CPU that stale series compaction introduces, since now it has to look at the block on disk for all instant queries. There is no way around it.
Here are the profiles that I downloaded from prombench: |
|
The memory results look really good. For sure something we will want behind a feature flag for now. If we can improve on the CPU overhead, this may be something to enable by default in the future. |
|
I'm surprised instant queries (or any queries) against TSDB blocks are so CPU intensive on Prometheus. OR... prombench results are not realistic -- like it spams queries way to often then realistically users would do. One important case is ofc alerting/recording rules - if they hit TSDB block, that block should be partially cached then (see below) It would be useful to understand a single common query CPU overhead for in-mem and TSDB block... also we could cache index a bit at least for stale near-real time blocks to mitigate some CPU with a bit more memory (hopefully this will diminishes the memory results - this cache should only be short-living for similar queries or heavy instant query load 🤔). Maybe we learn about some need for optimizing the TSDB read path with this work (: I still think this would be an interesting mode e.g. for us (Google), where we keep local query capability for debugging in some cases but we use cloud as a first order. Thanks for extensive research! |
2cc719a to
56f761d
Compare
It queries frequently, which might be taken to simulate a large user population or a lot of recording rules, but perhaps more importantly it never queries more than 1 hour back. That is what PRs like prometheus/test-infra#782 are seeking to change. So if you make every query hit every block, that will make quite a difference. |
2bc6610 to
5b1e6fe
Compare
|
Hello from the bug-scrub! @jesusvazquez I see you were assigned - do you think you will get a chance to look at it? |
393bd08 to
b209783
Compare
|
I'll have a look at this next week, starting my PTO today for a few days 🙏 |
5265f19 to
3f51be0
Compare
|
@jesusvazquez I have synced this PR with main branch and fixed the lint and is ready for review |
jesusvazquez
left a comment
There was a problem hiding this comment.
Left a few comments, overall in good shape.
519a2d2 to
72590c4
Compare
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
72590c4 to
3e4a094
Compare
SuperQ
left a comment
There was a problem hiding this comment.
Nice. Based on our production testing, I the we found that ~50% was a good threshold.
Should we document any recommendations or wait for more user feedback?
We should wait for some user feedback. IMO it's more to do with the pattern in which stale series ratio goes up and down and the memory headroom, and less about the actual value of the ratio. As part of my upcoming talk, I plan to do some more testing for config options. |










Closes #13616
Based on prometheus/proposals#55
Stale series tracking was added in #16925. This PR compacts the stale series into its own block before the normal compaction hits. Here is how the config works:
stale_series_compaction_threshold: As soon as the ratio of stale series in the head block crossesStaleSeriesImmediateCompactionThreshold, TSDB performs a stale series compaction and puts all the stale series into a block and removed it from the head, but it does not remove it from the WAL. (technically this condition is checked every minute and not exactly immediate)Additional details