Skip to content

Add background gc thread for DeltaTree storage #1507

@JaySon-Huang

Description

@JaySon-Huang

Now DeltaTree storage will filter outdated data while doing "MergeDelta"/"SegmentSplit"/"SegmentMerge". The conditions that triggering these tasks is "number of written rows" or "number of written delete_range".

https://github.com/pingcap/tics/blob/2e8dd5486f2e72546327b5f12c602afd8bcc0d2a/dbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp#L830-L837

If users write lots of "delete" at SQL level, it will write some rows with "delete mark" into DT. Right after writing those delete rows, they may be not ready to be filtered out when "MergeDelta"/"SegmentSplit"/"SegmentMerge" is happening because of the gc-safepoint.

After that, TiKV gc the outdated data and merge some regions TiKV thinks is "too small".
TiFlash will also apply merge region command. But the rows will be left on disk since we don't meet the conditions to trigger "MergeDelta"/"SegmentSplit"/"SegmentMerge" task and do compaction on those key-ranges.


To solve this problem, we may:

  • After some regions are merge, add a hint for DT
  • If DT scan a large amount of data from disk, but the result size after MVCC filter is quite small, we can add a hint

The hint should trigger data compaction (background merge-delta) for DT.

Metadata

Metadata

Assignees

Labels

type/enhancementThe issue or PR belongs to an enhancement.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions