Fix bug that cluster balance may cause load job failed #1581

morningman · 2019-08-04T15:23:00Z

The bug is described in issue #1580 . And this patch will fix 2 cases of cluster balance

After finish adding the new replica, the new replica's version may not catch up with
the visible version, so the new replica may be treated as a stale and redundant replica, which
will be deleting at next tablet checking round.

I add a mark named needFurtherRepair to the newly added replica, only mark it when that replica's version does not catch up with visible version. This replica will receive a further repair at next tablet checking round, instead of being deleted.
When deleting the redundant replicas, there may be some load jobs on it. Delete these replicas may cause the load job fail.

Before deleting a redundant replica, I first mark the next txn id on that replica, and set replica's
state to CLONE. The CLONE state will ensure that no more load jobs will be on that replica, and we
will wait all load jobs before the marked txn id to be finished. After that, the replica can be deleted safely.

ISSUE: #1580

* http://jira.selectdb.com:8090/browse/CORE-1734

morningman added 5 commits August 2, 2019 19:52

first commit

42c54d0

add watermark txn id

5de4129

add replica watermark txn id

4b7615f

add comment

88db4d0

add Decommission replica state

c7ee5e8

chaoyli approved these changes Aug 8, 2019

View reviewed changes

morningman merged commit 69de5df into apache:master Aug 8, 2019

imay mentioned this pull request Sep 26, 2019

Release Notes 0.11.0 #1891

Closed

swjtu-zhanglei added a commit to swjtu-zhanglei/incubator-doris that referenced this pull request Jul 25, 2023

[feature](selectdb-cloud) Fix rollup/mv case incorrect (apache#1581)

527361b

* http://jira.selectdb.com:8090/browse/CORE-1734

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix bug that cluster balance may cause load job failed #1581

Fix bug that cluster balance may cause load job failed #1581

Uh oh!

morningman commented Aug 4, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix bug that cluster balance may cause load job failed #1581

Fix bug that cluster balance may cause load job failed #1581

Uh oh!

Conversation

morningman commented Aug 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

morningman commented Aug 4, 2019 •

edited

Loading