Skip to content

TiFlash OOM because of too many legacy PageFile are not GC #1550

@JaySon-Huang

Description

@JaySon-Huang

In a user scenario, it was found that TiFlash memory usage continued to rise and would be killed due to oom every 30min or so.
After troubleshooting, it was found that some PageFiles in Legacy state were not compacted because of a moderate GC policy of TiFlash's Delta part data. Nearly 10,000 files ranging from hundreds of KB to several MB were read from disk each time GC was performed.
The oom situation was resolved after providing tools for users to stop TiFlash process and do compaction to delta data.
Presumably, the frequent allocate and free small chunks of memory were causing some unknown bug that was causing memory usage to keep going up.

For the time being, we cannot reproduce the situation where memory keeps going up, but we can make this situation mitigated by adjusting the GC threshold and performing more aggressive compaction.

image

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions