Skip to content

Load failed when cluster balacing #1580

@morningman

Description

@morningman

Describe the bug
When cluster is doing balance, load process's failure possibility is increasing.
Error message shows "xxxx tablet has few replicas" or "tablet writer write failed, res = -215"

To Reproduce
add a new BE to the cluster and start a long time load job.

Expected behavior
Cluster balance should not cause load failed.

Reason
It is because the balance logic is first adding a replica, and then delete a old replica. But when deleting the old replica, it does not consider the load job on that replica, so the load job may failed to find the replica because the replica is already being deleted.

Metadata

Metadata

Assignees

Labels

kind/fixCategorizes issue or PR as related to a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions