Skip to content

Refine error handling/cancel in TiFlash MPP system #5095

@windtalker

Description

@windtalker

Enhancement

Currently, error handling/cancel in TiFlash is error prone, it has caused many issues such as #4441, #4219, #4202 etc.
We want to refine the error handling/cancel logical in TiFlash MPP system to make it less error prone.

Some basic ideas:

  • Refine MPPTunnel
    • MPPTunnel has 3 mode: local, sync and async, currently, the implementation of MPPTunnel is based on is_local and is_async flag, which makes the code complex and error prone.
  • MPPTunnel/BlockIO/ExchangeReceiver should be treated as the top level components in MPPTask
  • Each top level components in MPPTask should implement its own cancel and handleError method
  • Like MPPTask::cancel, there should be a method like MPPTask::handleError method to handle errors based on task status
  • Like cancel, there should be a query level error handling method, so once a MPPTask meet error, all the related tasks in the same TiFlash node can see the error and stop running
  • Local tunnel should not introduce direct dependency from send task to receive task
  • As Do not print write to tunnel which is already closed log if query is canceled expectedly before #4208 avoid to print too much meaningless log when error/cancel happens

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions