Improvements to the retry logic for mutation.

**Use case**

If mutation query is incorrect or can't be performed right now then it will hang in infinite loop with high resources usage:
Example (from #55946): 
```
CREATE TABLE main (id Int8) ENGINE=MergeTree() ORDER BY id;
INSERT INTO main SELECT * FROM system.numbers LIMIT 2;

ALTER TABLE main DELETE WHERE id IN (SELECT id FROM nonexistent);
``` 
BTW: the number of log records is increasing rapidly with:

```
2023.11.22 07:38:54.030832 [ 1198979 ] {17605073-9555-4773-b45c-438dbcccfcca::all_1_1_0_2} <Error> virtual bool DB::MutatePlainMergeTreeTask::executeStep(): Code: 60. DB::Exception: Table default.nonexistent does not exist: While processing id IN ((SELECT id FROM default.nonexistent) AS _subquery642). (UNKNOWN_TABLE), Stack trace (when copying this message, always include the lines below):
```

**Describe the solution you'd like**

The good way solution that I can see:
 1.  Have a  backoff logic (e.g. exponential). Will provide an ability for reduced CPU usage, memory usage and log file sizes.
 2.  Have a logic to automatic mutation canceling.
 
There are several difficulties to contemplate:

- Backoff
  To manage the time until the next mutation retry, we need the time of the last failure and already made attempts count. The `system.mutations` can be extended to keep these values for example. 
 But what if table is replicated then need to synchronize it somehow. What to do if some replicas lag behind? Should we wait until all replicas to complete the mutation? 
 Also in that case we will have a list with last failure times. Which one should be a basis? What if the replicas will have different error codes? 

- Canceling
Better to check this issue with full discussion(#55777). Summary we will need a transactions to canceling mutation.   

The exponential backoff is good starting point I think.
I believe this is a debatable topic and  require more ideas then I have.

**Additional context**
Related issues:
#36987
#55777
#55946

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to the retry logic for mutation. #57089

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Improvements to the retry logic for mutation. #57089

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions