Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Aug 4, 2019

The bug is described in issue #1580 . And this patch will fix 2 cases of cluster balance

  1. After finish adding the new replica, the new replica's version may not catch up with
    the visible version, so the new replica may be treated as a stale and redundant replica, which
    will be deleting at next tablet checking round.

    I add a mark named needFurtherRepair to the newly added replica, only mark it when that replica's version does not catch up with visible version. This replica will receive a further repair at next tablet checking round, instead of being deleted.

  2. When deleting the redundant replicas, there may be some load jobs on it. Delete these replicas may cause the load job fail.

    Before deleting a redundant replica, I first mark the next txn id on that replica, and set replica's
    state to CLONE. The CLONE state will ensure that no more load jobs will be on that replica, and we
    will wait all load jobs before the marked txn id to be finished. After that, the replica can be deleted safely.

ISSUE: #1580

@morningman morningman merged commit 69de5df into apache:master Aug 8, 2019
@imay imay mentioned this pull request Sep 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants