You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If mutation query is incorrect or can't be performed right now then it will hang in infinite loop with high resources usage:
Example (from #55946):
CREATE TABLE main (id Int8) ENGINE=MergeTree() ORDER BY id;
INSERT INTO main SELECT * FROM system.numbers LIMIT 2;
ALTER TABLE main DELETE WHERE id IN (SELECT id FROM nonexistent);
BTW: the number of log records is increasing rapidly with:
2023.11.22 07:38:54.030832 [ 1198979 ] {17605073-9555-4773-b45c-438dbcccfcca::all_1_1_0_2} <Error> virtual bool DB::MutatePlainMergeTreeTask::executeStep(): Code: 60. DB::Exception: Table default.nonexistent does not exist: While processing id IN ((SELECT id FROM default.nonexistent) AS _subquery642). (UNKNOWN_TABLE), Stack trace (when copying this message, always include the lines below):
Describe the solution you'd like
The good way solution that I can see:
Have a backoff logic (e.g. exponential). Will provide an ability for reduced CPU usage, memory usage and log file sizes.
Have a logic to automatic mutation canceling.
There are several difficulties to contemplate:
Backoff
To manage the time until the next mutation retry, we need the time of the last failure and already made attempts count. The system.mutations can be extended to keep these values for example.
But what if table is replicated then need to synchronize it somehow. What to do if some replicas lag behind? Should we wait until all replicas to complete the mutation?
Also in that case we will have a list with last failure times. Which one should be a basis? What if the replicas will have different error codes?
Use case
If mutation query is incorrect or can't be performed right now then it will hang in infinite loop with high resources usage:
Example (from #55946):
BTW: the number of log records is increasing rapidly with:
Describe the solution you'd like
The good way solution that I can see:
There are several difficulties to contemplate:
Backoff
To manage the time until the next mutation retry, we need the time of the last failure and already made attempts count. The
system.mutationscan be extended to keep these values for example.But what if table is replicated then need to synchronize it somehow. What to do if some replicas lag behind? Should we wait until all replicas to complete the mutation?
Also in that case we will have a list with last failure times. Which one should be a basis? What if the replicas will have different error codes?
Canceling
Better to check this issue with full discussion(Add limitation for retrying mutations #55777). Summary we will need a transactions to canceling mutation.
The exponential backoff is good starting point I think.
I believe this is a debatable topic and require more ideas then I have.
Additional context
Related issues:
#36987
#55777
#55946