Skip to content

[Enhancement] limit the amount of error_log when load to save disks #27481

@freemandealer

Description

@freemandealer

Search before asking

  • I had searched in the issues and found no similar issues.

Description

Description in English:
In the load process, if there are problems with the original data, we will store the error data in an error_log file on the disk for subsequent debugging. However, if there are many error data, it will occupy a lot of disk space. Now we want to limit the number of error data that is saved to the disk.

  1. Be familiar with the usage of doris' import function and internal implementation process
  2. Add a new be configuration item load_error_log_limit_bytes = default value 200MB
  3. Use the newly added threshold to limit the amount of data that RuntimeState::append_error_msg_to_file writes to disk
  4. Write regression cases for testing and verification

Description in Chinese:
在导入过程中,如果原始数据有问题,那么我们会把错误数据存放到磁盘上的一个 error_log 文件中方便后续 debug。但是如果错误数据很多,就会占用大量的磁盘空间。所以需要限制落盘的错误数据数量。

  1. 熟悉 doris 的导入功能用法和内部实现流程
  2. 增加新的 be 配置项目 load_error_log_limit_bytes = 默认值 200MB
  3. 使用新增的阈值限制 RuntimeState::append_error_msg_to_file 落盘数据量
  4. 编写回归case进行测试和验证

Solution

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions