-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Background
Here are 3 tikv
-
tikv-2: Has the least remaining disk space (20G) < 5%, with some 'AlmostFull' errors, it refuse the write, and also other nodes do not send append log messages to this node (this issue lacks monitoring for handling proposals correctly if the majority of peers are disk full, as noted in Pull Request handle proposals correctly if majority peers are disk full by hicqu · Pull Request #10671 · tikv/tik).

-
tikv0: because tikv-2 not push ahead raftlog, so tikv0, tikv-0 cannot compact log, the memory increase. after high memory usage in tikv0, it refuse to write(append):


-
tikv1: also cannot server write because other 2 node stop append.
What do you want to see
Improve this case: We need to prevent the expansion of this failure radius. the RAC is one node disk full, but the others are not full. which lets another node's memory increase and another node's server because busy, it is unreasonable. if tikv-0 can evict the entry cache or compaction log force, then tikv-0 will not cause fault propagation, thereby reducing the explosion radius.
Improve the observability: the tidb side and client side see some error backoff is regionMiss or regionUnabliable, from the log, it report server is busy from tikv-2, the root cause is actually from the tikv-1 with disk full.
Version
v6.1.3

