-
Notifications
You must be signed in to change notification settings - Fork 411
Description
Now DeltaTree storage will filter outdated data while doing "MergeDelta"/"SegmentSplit"/"SegmentMerge". The conditions that triggering these tasks is "number of written rows" or "number of written delete_range".
If users write lots of "delete" at SQL level, it will write some rows with "delete mark" into DT. Right after writing those delete rows, they may be not ready to be filtered out when "MergeDelta"/"SegmentSplit"/"SegmentMerge" is happening because of the gc-safepoint.
After that, TiKV gc the outdated data and merge some regions TiKV thinks is "too small".
TiFlash will also apply merge region command. But the rows will be left on disk since we don't meet the conditions to trigger "MergeDelta"/"SegmentSplit"/"SegmentMerge" task and do compaction on those key-ranges.
To solve this problem, we may:
- After some regions are merge, add a hint for DT
- If DT scan a large amount of data from disk, but the result size after MVCC filter is quite small, we can add a hint
The hint should trigger data compaction (background merge-delta) for DT.