Optimise WAL loading by removing extra map and caching min-time#9160
Optimise WAL loading by removing extra map and caching min-time#9160codesome merged 10 commits intoprometheus:mainfrom
Conversation
codesome
left a comment
There was a problem hiding this comment.
LGTM, but you need a rebase for couple of thing :)
head.gowas split into multiple files, soprocessWALSamplesandloadWALare inhead_wal.go- The place where you are caching
mmMaxTimeinloadWALrecently had some update there, so you will need to cache it at 2 different places now.
0c509e6 to
ce1229e
Compare
codesome
left a comment
There was a problem hiding this comment.
LGTM! Since you have got CI failure to fix, I have left few more nits for the comments to go along with it :)
25392e3 to
3858117
Compare
|
It's failing a WAL corruption unit test now, which I shall attempt to debug through, but if the answer is instantly apparent to someone with more familiarity in tsdb please shout. |
|
If you don't get to it by next week, I will investigate the failure on Tuesday/Wednesday next week |
ba9d074 to
b6cdf1f
Compare
|
The failure was due to skipping Incidentally I question why |
There will always be a head chunk with at least 1 sample (we m-map a chunk when we are going to add a sample to the head chunk that will be newly created). But that is only when things are running normal, if there was WAL corruption and no head chunk exists, then we should actually look at m-map chunks too :) so, something to fix here.
Not sure, need to check the usage |
codesome
left a comment
There was a problem hiding this comment.
LGTM. But there is a conflict again :(
So that goroutines are stopped and resources released Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Series are distributed across workers by taking the modulus of the ID with the number of workers, so multiples of 100 are a poor choice. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Real Prometheus cuts chunks every 120 samples, then skips those samples when re-reading the WAL. Simulate this by creating a single mapped chunk for each series, since the max time is all the reader looks at. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
The locks that is commented to reduce contention in are now sharded 32,000 ways, so won't be contended. Removing the map saves memory and goes just as fast. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
So we can skip calling append() for samples it will reject. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Including refactor to extract function `setMMappedChunks`, to reduce code duplication. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This ensures we have the correct values if no WAL samples are added for that series. Note that `mSeries.maxTime()` was always `math.MinInt64` before, since that function doesn't consider mmapped chunks. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
b6cdf1f to
fdd24c0
Compare
|
Rebased. |
* Fix `kuma_sd` targetgroup reporting (#9157) * Bundle all xDS targets into a single group Signed-off-by: austin ce <austin.cawley@gmail.com> * Snapshot in-memory chunks on shutdown for faster restarts (#7229) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Rename links Signed-off-by: Levi Harrison <git@leviharrison.dev> * Remove Individual Data Type Caps in Per-shard Buffering for Remote Write (#8921) * Moved everything to nPending buffer Signed-off-by: Levi Harrison <git@leviharrison.dev> * Simplify exemplar capacity addition Signed-off-by: Levi Harrison <git@leviharrison.dev> * Added pre-allocation Signed-off-by: Levi Harrison <git@leviharrison.dev> * Don't allocate if not sending exemplars Signed-off-by: Levi Harrison <git@leviharrison.dev> * Avoid deadlock when processing duplicate series record (#9170) * Avoid deadlock when processing duplicate series record `processWALSamples()` needs to be able to send on its output channel before it can read the input channel, so reads to allow this in case the output channel is full. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * processWALSamples: update comment Previous text seems to relate to an earlier implementation. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Optimise WAL loading by removing extra map and caching min-time (#9160) * BenchmarkLoadWAL: close WAL after use So that goroutines are stopped and resources released Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * BenchmarkLoadWAL: make series IDs co-prime with #workers Series are distributed across workers by taking the modulus of the ID with the number of workers, so multiples of 100 are a poor choice. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * BenchmarkLoadWAL: simulate mmapped chunks Real Prometheus cuts chunks every 120 samples, then skips those samples when re-reading the WAL. Simulate this by creating a single mapped chunk for each series, since the max time is all the reader looks at. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Fix comment Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Remove series map from processWALSamples() The locks that is commented to reduce contention in are now sharded 32,000 ways, so won't be contended. Removing the map saves memory and goes just as fast. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * loadWAL: Cache the last mmapped chunk time So we can skip calling append() for samples it will reject. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Improvements from code review Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Full stops and capitals on comments Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Cache max time in both places mmappedChunks is updated Including refactor to extract function `setMMappedChunks`, to reduce code duplication. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Update head min/max time when mmapped chunks added This ensures we have the correct values if no WAL samples are added for that series. Note that `mSeries.maxTime()` was always `math.MinInt64` before, since that function doesn't consider mmapped chunks. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Split Go and React Tests (#8897) * Added go-ci and react-ci Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu> Signed-off-by: Levi Harrison <git@leviharrison.dev> * Remove search keymap from new expression editor (#9184) Signed-off-by: Julius Volz <julius.volz@gmail.com> Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com> Co-authored-by: Levi Harrison <git@leviharrison.dev> Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu> Co-authored-by: Bryan Boreham <bjboreham@gmail.com> Co-authored-by: Julius Volz <julius.volz@gmail.com>
This PR replaces #8645; it includes the first four commits which address benchmark issues at #9101.
Fixes #8638.
Two code changes:
refSeriescache built inprocessWALSamples(). "Mitigate lock contention in getByID", it said, but the locks are sharded so with any likely number of workers there is no contention.memSeriesstruct.I spent a lot of time exploring different options, and concluded this is a good combination.
Benchmarking them both separately:
Just A: memory is reduced when we have a lot of series, but it runs a little slower.
Just B: a bit faster, slightly higher memory use.
A+B: faster and lower memory.
However the benefit is much clearer when we load a real 5GB WAL - it needs 19% less CPU and 24% less RAM:
All benchmarks run on:
(It is disappointing that we end up carrying the max timestamp around for the life of the program even though we only use it at startup, but I couldn't see an easy way around that.)