Skip to content

Improvements to the retry logic for mutation. #57089

@MikhailBurdukov

Description

@MikhailBurdukov

Use case

If mutation query is incorrect or can't be performed right now then it will hang in infinite loop with high resources usage:
Example (from #55946):

CREATE TABLE main (id Int8) ENGINE=MergeTree() ORDER BY id;
INSERT INTO main SELECT * FROM system.numbers LIMIT 2;

ALTER TABLE main DELETE WHERE id IN (SELECT id FROM nonexistent);

BTW: the number of log records is increasing rapidly with:

2023.11.22 07:38:54.030832 [ 1198979 ] {17605073-9555-4773-b45c-438dbcccfcca::all_1_1_0_2} <Error> virtual bool DB::MutatePlainMergeTreeTask::executeStep(): Code: 60. DB::Exception: Table default.nonexistent does not exist: While processing id IN ((SELECT id FROM default.nonexistent) AS _subquery642). (UNKNOWN_TABLE), Stack trace (when copying this message, always include the lines below):

Describe the solution you'd like

The good way solution that I can see:

  1. Have a backoff logic (e.g. exponential). Will provide an ability for reduced CPU usage, memory usage and log file sizes.
  2. Have a logic to automatic mutation canceling.

There are several difficulties to contemplate:

  • Backoff
    To manage the time until the next mutation retry, we need the time of the last failure and already made attempts count. The system.mutations can be extended to keep these values for example.
    But what if table is replicated then need to synchronize it somehow. What to do if some replicas lag behind? Should we wait until all replicas to complete the mutation?
    Also in that case we will have a list with last failure times. Which one should be a basis? What if the replicas will have different error codes?

  • Canceling
    Better to check this issue with full discussion(Add limitation for retrying mutations #55777). Summary we will need a transactions to canceling mutation.

The exponential backoff is good starting point I think.
I believe this is a debatable topic and require more ideas then I have.

Additional context
Related issues:
#36987
#55777
#55946

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions