-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Motivation
At present, the use of Doris often encounters the limitation bottleneck of mem limit, which leads to many queries can not be completed.
Although we can solve this problem by adjusting the mem_limit of query. But in some memory bottleneck scenarios, this is futile.
The capacity of the disk is usually about 100 times of the memory, if we can spill the data beyond the memory limit to the disk. This almost solves the above problem perfectly, but the speed of disk is much slower than that of memory, which will also lead to long execution time of query.
It can bring us the following benefits:
-
In some memory tight scenarios, more memory is available at the expense of query execution time. This is necessary in some scenarios
-
Doris can dispose larger query without memory constraints
Implementation
-
Now, The
BufferedBlockMgr2andDiskIOMgrhave already supported to spill mem data to disk. We need to use these functions to writes data to a temporary work area on disk. The default location of this work area isdoris-scratch, when an operation completes, the data is removed from the disk. -
There are 3 version of
BufferedTupleStreamwhich make us confuse. We need to unify the abstraction of this important part to do a good job for spilling to disk. -
Successively implement the disk dropping function of the following execution nodes:
- Sort
- Aggregation
- Analytic function
- Join
-
Remove redundant code, such as
BufferTupleStream,HashTableand so on. -
Some optimization of spilling to disk:
- Size limit of temporary file
- Limit of IO speed of spilling to disk
- Using the IO capability of SSD
- Compression and decompression of spilling data