-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Labels
kind/fixCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.
Description
Describe the bug
When cluster is doing balance, load process's failure possibility is increasing.
Error message shows "xxxx tablet has few replicas" or "tablet writer write failed, res = -215"
To Reproduce
add a new BE to the cluster and start a long time load job.
Expected behavior
Cluster balance should not cause load failed.
Reason
It is because the balance logic is first adding a replica, and then delete a old replica. But when deleting the old replica, it does not consider the load job on that replica, so the load job may failed to find the replica because the replica is already being deleted.
Metadata
Metadata
Assignees
Labels
kind/fixCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.