Skip to content

Reclaim ColumnFileBig that cannot be contained by any segment #5950

@breezewish

Description

@breezewish

Enhancement

Suppose we write a ColumnFileBig into the memtable of a segment without any delta:

-Inf                            +Inf
  |<------------------------------>| Segmemt
                     |<-CFBig->|

Then physical split happens:

-Inf                            +Inf
  |<--------------->|<------------>| Segmemt
                     |<-CFBig->|

The ColumnFileBig is now referenced by both two segments. And,

  1. The right segment may trigger a delta merge because the delta layer is big.

  2. The left segment will not trigger a delta merge because the delta is still empty -- Its referenced CFBig is not contained in the segment.

As a result, the ColumnFileBig is kept being referenced and not recycled, until user manually triggers a DeltaMerge for all segments.

This happens when we ingest SSTs quickly (using a higher ingest concurrency), result in 25% space amplification in my experiment.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions