Skip to content

[influxdb1.x/2.x]: purger code may prevent compactions from finishing #26110

@philjb

Description

@philjb

Copied from this comment:

Requires investigation to confirm the issues noted below.

  • purger uses a go map, which doesn't release backing memory ever - so it'll be as large as the largest set of compaction input files (which might be a lot for a full compaction). Minor issue, but could remedy by allocating a new map when len(p.files) == 0.
  • The purger seems to suffer from a similar issue that @gwossum identified with the retention service. It calls tsmfile.InUse() and then later tsmfile.Close() without blocking new readers in the inbetween - a time of check, time of use flaw. Therefore tsmfile.Close() which waits on readers to finish (Unref()) has an unbounded runtime. The purger's lock is held over the close() call, which means the lock is held for an unbounded time, which you might say is ok because this happens in the purger's "purge" goroutine, but the purger.Add(files[]) call needs to get that lock too and purger.Add() is called synchronously in Filestore.replace() which means replace() has an unbounded runtime. All this means, if i'm reading it right, that purging tsm files could freeze up compaction.
    • One could make it go f.purger.add(inuse) to decouple the purger and compaction which is nice, but it'd be better to use a sync map and/or reduce the purger lock holding time and/or put tsm file closing into a goroutine and/or call SetNewReadersBlocked before the InUse check, like the retention service does.

Metadata

Metadata

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions