Skip to content

PageStorage can not GC legacy files due to non-continuous sequence number #2317

@JaySon-Huang

Description

@JaySon-Huang

After "PageStorage multi-threads write #464", we treat multiple-meta files as a WAL for the PageStorage.
There would be a case for letting holes in the WAL (combined from multiple meta files) and stop PageStorage from compacting those legacy files.
https://github.com/pingcap/tics/blob/03e10febb96f0861ef0cca3b5b07d73983bc1fe8/dbms/src/Storages/Page/gc/LegacyCompactor.cpp#L143-L152

Thread 1 tries to write a WriteBatch with seq=999 (called wb1), thread 2 try to write a WriteBatch with seq=1000 (called wb2). However, wb2 is committed to disk first. And the process crashes in the middle of writing wb1. After recovering from disk, the wb1 is throw away while wb2 is left.
Then there would be a hole in the WAL.
We need to automatically recover from crashes in the middle from writing, so just skip the hole and continue to compact.

Related logging:
From "PageStorage: db_45.t_154.log collectPageFilesToCompact stop on PageFile_29674_0, type: Legacy, sequence: 29391230 last sequence: 29391228", we can see that the GC stop on the same PageFile_29674_0, type: Legacy for a long time, and the "sequence" and "last sequence" is not consecutive.

[2021/06/30 04:31:18.934 +08:00] [INFO] [<unknown>] ["PageStorage: db_45.t_154.log restore 0 puts and 11868 refs and 0 deletes and 11867 upserts from checkpoint PageFile_29670_0 sequence: 29391228"] [thread_id=5147]
[2021/06/30 04:31:18.936 +08:00] [DEBUG] [<unknown>] ["PageStorage: db_45.t_154.log collectPageFilesToCompact stop on PageFile_29674_0, type: Legacy, sequence: 29391230 last sequence: 29391228"] [thread_id=5147]
[2021/06/30 04:31:26.260 +08:00] [DEBUG] [<unknown>] ["PageStorage: db_45.t_154.log LegacyCompactor::tryCompact exit without compaction, candidates size: 0, compact_legacy_min_num: 3"] [thread_id=5147]
[2021/06/30 04:31:26.291 +08:00] [DEBUG] [<unknown>] ["PageStorage: db_45.t_154.log PageFile_49627_0 is full, create new PageFile_49631_0 for write [path=/data1/tidb-data/tiflash-9000/data/t_154/log]"] [thread_id=6565]
[2021/06/30 04:31:26.663 +08:00] [INFO] [<unknown>] ["PageStorage: db_45.t_154.log GC decide to migrate 7 files, containing 14479 pages to PageFile_49619_1, path /data1/tidb-data/tiflash-9000/data/t_154/log"] [thread_id=5147]
[2021/06/30 04:31:32.237 +08:00] [DEBUG] [<unknown>] ["PageStorage: db_45.t_154.log Migrate pages to PageFile_49619_1, migrate: [((49602,1),13353),((49614,1),840),((49615,0),51),((49616,0),61),((49617,0),51),((49618,0),53),((49619,0),70),], remove: [], Config{ PageStorage::Config {gc_min_files:3, gc_min_bytes:67108864, gc_max_valid_rate:0.950, gc_min_legacy_num:3, gc_max_expect_legacy: 100, gc_max_valid_rate_bound: 0.950, prob_do_gc_when_write_is_low:100, open_file_max_idle_time:15} }"] [thread_id=5147]
[2021/06/30 04:31:32.237 +08:00] [INFO] [<unknown>] ["PageStorage: db_45.t_154.log GC have migrated 14479 Pages to PageFile_49619_1"] [thread_id=5147]
[2021/06/30 04:31:32.283 +08:00] [DEBUG] [<unknown>] ["PageStorage: db_45.t_154.log gcApply remove 20 invalid snapshots, 2 snapshots left, longest lifetime 0.061 seconds, created from thread_id 6548"] [thread_id=5147]
[2021/06/30 04:31:32.353 +08:00] [INFO] [<unknown>] ["PageStorage: db_45.t_154.log GC exit within 113.44 sec. PageFiles from [29670,0,Checkpoint] to [49619,0,Formal], num files: 22845, num legacy:22838, compact legacy archive files: 0, remove data files: 0, gc apply: 0 puts and 0 refs and 0 deletes and 14479 upserts"] [thread_id=5147]

[2021/06/30 04:33:26.014 +08:00] [INFO] [<unknown>] ["PageStorage: db_45.t_154.log restore 0 puts and 11868 refs and 0 deletes and 11867 upserts from checkpoint PageFile_29670_0 sequence: 29391228"] [thread_id=6370]
[2021/06/30 04:33:26.017 +08:00] [DEBUG] [<unknown>] ["PageStorage: db_45.t_154.log collectPageFilesToCompact stop on PageFile_29674_0, type: Legacy, sequence: 29391230 last sequence: 29391228"] [thread_id=6370]
[2021/06/30 04:33:33.326 +08:00] [DEBUG] [<unknown>] ["PageStorage: db_45.t_154.log LegacyCompactor::tryCompact exit without compaction, candidates size: 0, compact_legacy_min_num: 3"] [thread_id=6370]
[2021/06/30 04:33:33.963 +08:00] [INFO] [<unknown>] ["PageStorage: db_45.t_154.log GC decide to migrate 16 files, containing 14182 pages to PageFile_49627_1, path /data1/tidb-data/tiflash-9000/data/t_154/log"] [thread_id=6370]
[2021/06/30 04:33:39.122 +08:00] [DEBUG] [<unknown>] ["PageStorage: db_45.t_154.log Migrate pages to PageFile_49627_1, migrate: [((49619,1),13690),((49620,0),50),((49621,0),47),((49622,0),55),((49623,0),61),((49624,0),68),((49625,0),68),((49626,0),68),((49627,0),75),], remove: [(49602,1),(49614,1),(49615,0),(49616,0),(49617,0),(49618,0),(49619,0),], Config{ PageStorage::Config {gc_min_files:3, gc_min_bytes:67108864, gc_max_valid_rate:0.950, gc_min_legacy_num:3, gc_max_expect_legacy: 100, gc_max_valid_rate_bound: 0.950, prob_do_gc_when_write_is_low:100, open_file_max_idle_time:15} }"] [thread_id=6370]
[2021/06/30 04:33:39.122 +08:00] [INFO] [<unknown>] ["PageStorage: db_45.t_154.log GC have migrated 14182 Pages to PageFile_49627_1"] [thread_id=6370]
[2021/06/30 04:33:39.156 +08:00] [DEBUG] [<unknown>] ["PageStorage: db_45.t_154.log gcApply remove 218 invalid snapshots, 2 snapshots left, longest lifetime 6.971 seconds, created from thread_id 6677"] [thread_id=6370]
[2021/06/30 04:33:39.580 +08:00] [INFO] [<unknown>] ["PageStorage: db_45.t_154.log GC exit within 116.03 sec. PageFiles from [29670,0,Checkpoint] to [49627,0,Formal], num files: 22854, num legacy:22838, compact legacy archive files: 0, remove data files: 7, gc apply: 0 puts and 0 refs and 0 deletes and 14182 upserts"] [thread_id=6370]

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions