feat: Logging and fsync delay for retention deletion by devanbenz · Pull Request #27114 · influxdata/influxdb

devanbenz · 2026-01-13T16:24:08Z

This area of the code is where we "hang" during retention policy deletion. It only occurs in very high cardinality dbs ~10 million+. DeleteSeriesID holds a mutex lock and does a fsync call. Running this millions of times + contention + disk I/O between various other writers/readers this could potentially go on for days.

Notes on changes:

After running influxdb with inch running in the background I see the following

ts=2026-01-13T22:19:09.734212Z lvl=warn msg="Average elapsed time for each deletion" log_id=10QLBhBl000 avg=7.677ms

That's ~7.5ms per DeleteSeriesID call.

So for ~35 million cardinality it should take approximately ~3 days to run. I would expect it to take longer due to lock contention + higher resource utilization. I also had far less cardinality.

It does appear like the fsync is killing performance

ts=2026-01-14T21:11:19.477584Z lvl=info msg="Average elapsed time for each series deletion" log_id=10Ra89D0000 avg=0.000ms

The time is so little I need to adjust my calculations

ts=2026-01-14T21:11:19.292908Z lvl=info msg="DeleteShard: 10000 series deleted" log_id=10Ra89D0000 service=store db=retention_slow shard_id=67 deleted=10001 remaining=489899 total=499900 elapsed=2.380ms

~2ms for 10k series vs previously it was ~7ms for a single series.

This PR also adds a log to trigger every 10k series when we're looping through and deleting series for better debugging.

I suspect this area of the code is where we "hang" during retention policy deletion. It only occurs in very high cardinality dbs ~10 million+. `DeleteSeriesID` holds a mutex lock and does a fsync call. Running this millions of times + contention + disk I/O between various other writers/readers this could potentially go on for days. This PR adds a WARN log to trigger every 24 hours when we're looping through and deleting series.

davidby-influx

I think extracting the call to Flush and making DeleteSeriesID take an iterable construct would be worth testing. The only place it is called outside of a loop is in a test, which could easily be changed to a single element array.

tsdb/store.go

tsdb/series_file.go

tsdb/series_partition.go

tsdb/store.go

gwossum

Few minor questions, but overall nice job on finding this performance bottleneck!

davidby-influx · 2026-01-15T00:18:55Z

@devanbenz - Please create port issues for this in main-2.x and RR. Add to the epic issue, as well: OSS 2.9.0 cherry-picks

davidby-influx

Great start!

tsdb/series_file.go

tsdb/series_partition.go

davidby-influx · 2026-01-15T02:10:41Z

My thoughts was to have SeriesFile.DeleteSeriesID take a list/iterable of IDs and move the looping into it. All the places it is called are loops except one test, which could become a single element slice.

devanbenz · 2026-01-15T16:55:54Z

My thoughts was to have SeriesFile.DeleteSeriesID take a list/iterable of IDs and move the looping into it. All the places it is called are loops except one test, which could become a single element slice.

~~I created a DeleteSeries method which just takes in an iterable and a function. Not sure if I like this better than ss.ForEach.~~

I kind of think we should keep the code as is since it's been using ForEach for many years. I'm tempted to not try and change the code too much and just move around the flush semantics. I think having DeleteSeriesID as an atomic operation is fine, if I wanted to throw the iterable inside the deletion I would probably implement a DeleteManySeries function or something and give it a slice of IDs or the SeriesIDSet.

tsdb/store.go

davidby-influx

There's a big opportunity for another optimization here....

tsdb/series_partition.go

tsdb/store.go

tsdb/engine/tsm1/engine.go

davidby-influx · 2026-01-15T23:30:28Z

Please add to the 2.9.0 epic if you haven't yet.

tsdb/series_file.go

davidby-influx

Getting faster!

tsdb/series_file.go

tsdb/series_file_test.go

tsdb/engine/tsm1/engine.go

tsdb/series_file_test.go

tsdb/series_partition.go

tsdb/series_file.go

pre-allocate slices add a constant for SeriesN in segment flush tests check error returns in tests

tsdb/series_file.go

tsdb/series_file_test.go

davidby-influx

LGTM.

Nice work, should give us a big speed-up for high-cardinality databases.

This area of the code is where we "hang" during retention policy deletion. It only occurs in very high cardinality dbs ~10 million+. DeleteSeriesID holds a mutex lock and does a fsync call. Running this millions of times + contention + disk I/O between various other writers/readers this could potentially go on for days. This PR batches sync operations instead of running a sync during every series deletion op. It also adds additional logging to retention series deletion. (cherry picked from commit c836ac2)

* feat: Logging and fsync delay for retention deletion (#27114) This area of the code is where we "hang" during retention policy deletion. It only occurs in very high cardinality dbs ~10 million+. DeleteSeriesID holds a mutex lock and does a fsync call. Running this millions of times + contention + disk I/O between various other writers/readers this could potentially go on for days. This PR batches sync operations instead of running a sync during every series deletion op. It also adds additional logging to retention series deletion. (cherry picked from commit c836ac2)

devanbenz added 3 commits January 13, 2026 10:19

feat: Use 10_000 series constant instead of 24 hours

66c3775

feat: Use info log

7893021

devanbenz changed the title ~~feat: Adds WARN log if series deletion takes longer than 24h~~ feat: Adds log per 10k series during shard deletion Jan 14, 2026

devanbenz marked this pull request as ready for review January 14, 2026 18:46

devanbenz requested a review from davidby-influx January 14, 2026 18:46

davidby-influx requested changes Jan 14, 2026

View reviewed changes

tsdb/store.go Outdated Show resolved Hide resolved

devanbenz added 4 commits January 14, 2026 16:48

feat: Let's add the flush changes to this PR

1742429

feat: adjust comment

6ae9a25

feat: Use a modulo

a53bb61

feat: Rename vars for better clarity

07a6c66

devanbenz changed the title ~~feat: Adds log per 10k series during shard deletion~~ feat: Logging and fsync delay for retention deletion Jan 14, 2026

feat: Use f.partitions index directly

67414c4

devanbenz marked this pull request as draft January 14, 2026 23:25

gwossum reviewed Jan 14, 2026

View reviewed changes

tsdb/series_file.go Outdated Show resolved Hide resolved

gwossum reviewed Jan 14, 2026

View reviewed changes

tsdb/series_partition.go Outdated Show resolved Hide resolved

gwossum reviewed Jan 14, 2026

View reviewed changes

tsdb/store.go Show resolved Hide resolved

gwossum reviewed Jan 14, 2026

View reviewed changes

davidby-influx assigned devanbenz Jan 15, 2026

davidby-influx added area/tsm kind/enhancement kind/perf team/edge labels Jan 15, 2026

davidby-influx requested changes Jan 15, 2026

View reviewed changes

tsdb/series_file.go Outdated Show resolved Hide resolved

tsdb/series_partition.go Outdated Show resolved Hide resolved

devanbenz added 2 commits January 15, 2026 10:50

feat: Remove code dupe, create DeleteSeries which takes and iter and fn

e6dc601

feat: Updates to errors in segment flushing

d3dc71d

feat: accidently moved comment

2c6528a

feat: update comment

aba4bbe

devanbenz commented Jan 15, 2026

View reviewed changes

tsdb/store.go Show resolved Hide resolved

devanbenz marked this pull request as ready for review January 15, 2026 21:57

davidby-influx requested changes Jan 15, 2026

View reviewed changes

tsdb/series_partition.go Outdated Show resolved Hide resolved

tsdb/store.go Outdated Show resolved Hide resolved

tsdb/store.go Show resolved Hide resolved

tsdb/engine/tsm1/engine.go Outdated Show resolved Hide resolved

davidby-influx reviewed Jan 16, 2026

View reviewed changes

tsdb/series_file.go Show resolved Hide resolved

devanbenz added 7 commits January 16, 2026 10:23

feat: Return flush, add sfile path to log, optimize deleteSeriesRange

fefb99d

feat: Add locking to FlushSegments

352856f

feat: remove test logging

ab5406e

feat: Parallel FlushSegments

71a35e7

feat: Add FlushSegments test, use SeriesFilePartitionN for err chan len

24fdea2

feat: Remove dead code

881f2af

feat: errChan simplification

5014c28

devanbenz requested a review from davidby-influx January 16, 2026 18:35

davidby-influx requested changes Jan 16, 2026

View reviewed changes

devanbenz added 3 commits January 16, 2026 14:35

feat: Add segment name to error logs

fbb2aab

pre-allocate slices add a constant for SeriesN in segment flush tests check error returns in tests

feat: pre-allocate in test

63b90d4

feat: one more segment.Flush()

2de8798

davidby-influx reviewed Jan 16, 2026

View reviewed changes

tsdb/series_file.go Outdated Show resolved Hide resolved

davidby-influx reviewed Jan 16, 2026

View reviewed changes

tsdb/series_file_test.go Outdated Show resolved Hide resolved

davidby-influx reviewed Jan 16, 2026

View reviewed changes

tsdb/series_file_test.go Outdated Show resolved Hide resolved

feat: fix make for slices

553e13d

devanbenz requested a review from davidby-influx January 16, 2026 21:32

davidby-influx approved these changes Jan 16, 2026

View reviewed changes

devanbenz merged commit c836ac2 into master-1.x Jan 16, 2026
9 checks passed

devanbenz deleted the db/retention-hanging branch January 16, 2026 22:24

devanbenz added a commit that referenced this pull request Jan 16, 2026

feat: Logging and fsync delay for retention deletion (#27114) (#27129)

1420aaf

Conversation

devanbenz commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidby-influx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gwossum left a comment

Choose a reason for hiding this comment

Uh oh!

davidby-influx commented Jan 15, 2026

Uh oh!

davidby-influx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davidby-influx commented Jan 15, 2026

Uh oh!

devanbenz commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

davidby-influx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidby-influx commented Jan 15, 2026

Uh oh!

Uh oh!

davidby-influx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidby-influx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

devanbenz commented Jan 13, 2026 •

edited

Loading

devanbenz commented Jan 15, 2026 •

edited

Loading