Skip to content

tsdb: turn off transaction isolation for head compaction#11317

Merged
codesome merged 4 commits intoprometheus:mainfrom
bboreham:disisolate-compaction
Sep 27, 2022
Merged

tsdb: turn off transaction isolation for head compaction#11317
codesome merged 4 commits intoprometheus:mainfrom
bboreham:disisolate-compaction

Conversation

@bboreham
Copy link
Member

This saves a lot of work tracking appends done while compaction is ongoing.

head.compactable() ensures that the end of the compaction range is more than chunkRange/2 back from now, and head.appendableMinValidTime() ensures that no new appends can start within the compaction range.

We do need to wait for any overlapping appenders that started previously to finish.

Fixes #9253.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
So that we can see when appending has moved past a certain point in time.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This will be used when for head compaction.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
@bboreham bboreham requested a review from codesome as a code owner September 16, 2022 13:54
@bboreham bboreham force-pushed the disisolate-compaction branch from 68ededf to 341a51d Compare September 16, 2022 16:03
@bboreham
Copy link
Member Author

CI failing on Windows only:

=== RUN   TestDeletedSamplesAndSeriesStillInWALAfterCheckpoint
    head_test.go:1028: 
        	Error Trace:	D:\a\prometheus\prometheus\tsdb\head_test.go:1028
        	Error:      	Not equal: 
        	            	expected: 9999
        	            	actual  : 13851
        	Test:       	TestDeletedSamplesAndSeriesStillInWALAfterCheckpoint
--- FAIL: TestDeletedSamplesAndSeriesStillInWALAfterCheckpoint (0.31s)

It seems to have read more data than was written?

This saves a lot of work tracking appends done while compaction is ongoing.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
@bboreham bboreham force-pushed the disisolate-compaction branch from 341a51d to 7b62131 Compare September 17, 2022 12:59
@bboreham
Copy link
Member Author

I did a git commit --amend with no change and the test passes now, so I'll call that one a flake.

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will have a very big impact on some of our instances that fall over into a memory death spiral.

@beorn7
Copy link
Member

beorn7 commented Sep 20, 2022

🎉

I think this should be looked at by @codesome as the most qualified person. I'll have a detailed look ASAP, too, but if there is anything wrong with this, @codesome is much more likely to spot it.

@beorn7 beorn7 mentioned this pull request Sep 27, 2022
8 tasks
@codesome codesome merged commit ff00dee into prometheus:main Sep 27, 2022
@bboreham
Copy link
Member Author

bboreham commented Nov 1, 2022

This was the result on one of the production systems at my work: memory usage dropped by half.
image

https://twitter.com/_codesome/status/1577264183208398848

roidelapluie pushed a commit to roidelapluie/prometheus that referenced this pull request Nov 23, 2022
…11317)

* tsdb: add a basic test for read/write isolation

* tsdb: store the min time with isolationAppender
So that we can see when appending has moved past a certain point in time.

* tsdb: allow RangeHead to have isolation disabled
This will be used when for head compaction.

* tsdb: do head compaction with isolation disabled
This saves a lot of work tracking appends done while compaction is ongoing.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
@bboreham bboreham deleted the disisolate-compaction branch August 30, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory Leak in v2.27.1: tsdb.txRing consumes much heap memory

4 participants