-
Notifications
You must be signed in to change notification settings - Fork 410
Description
We meet some extreme cases that block PageStorage GC from running normally, for example:
- Moderate GC policy leaves lots of legacy PageFile TiFlash OOM because of too many legacy PageFile are not GC #1550
- Duplicated PageFile throw exception in GC Duplicate PageFile make TiFlash can not GC delta data #2169
- MPP tasks may hold streams (and related segment snapshot) after canceled
MPPTask::runImplis not exception safe #2322
These extreme cases/bugs leave lots of PageFile on disks. When running GC, we open lots of (several hundred to thousands of) PageFiles and read all theirs meta parts from data in PageFile::MetaMergingReader::initialize.
https://github.com/pingcap/tics/blob/ec5f976a8fb85db497d3f9f67cd0717885d8075a/dbms/src/Storages/Page/gc/LegacyCompactor.cpp#L110-L119
https://github.com/pingcap/tics/blob/ec5f976a8fb85db497d3f9f67cd0717885d8075a/dbms/src/Storages/Page/PageFile.cpp#L249-L255
Assume that each meta part of one PageFile is 500KiB, if there are 2000 PageFiles left on disk, then each round of GC we need to scan 1 GiB.
Instead of reading all meta parts of (thousands of) PageFiles once, we can allocate a smaller buffer size in PageFile::MetaMergingReader::initialize, and read the rest data from the disk while running PageFile::MetaMergingReader::moveNext.
This change will keep some file descriptor opened for a while and call ::read several times, but it can reduce the memory cost when there are lots of PageFiles.