Skip to content

OnCommit timer extensions related to recovery lead to large delays before ChangeView #16

@roman-khimov

Description

@roman-khimov

Node number four changes view:

2020-01-30T15:34:29.557Z        DEBUG   dbft@v0.0.0-20200130105505-02c208d154bf/dbft.go:104     initialize      {"height": 6011, "view": 1, "index": 3, "role": "Backup"}
2020-01-30T15:34:29.557Z        DEBUG   dbft@v0.0.0-20200130105505-02c208d154bf/dbft.go:549     reset timer     {"h": 6011, "v": 1, "delay": "20s"}

Then processes commit message:

2020-01-30T15:34:29.557Z        DEBUG   consensus/consensus.go:177      received message        {"from": 2}
2020-01-30T15:34:29.557Z        DEBUG   dbft@v0.0.0-20200130105505-02c208d154bf/dbft.go:203     received message        {"type": "Commit", "from": 2, "height": 6011, "view": 0, "my_height": 6011, "my_view": 1}
{"from": 0}

Then processes lots of recovery requests (as it's the chosen one for this role) from node two, but the timer only fires after 12m (20s * 36):

2020-01-30T15:46:29.557Z        DEBUG   consensus/consensus.go:172      timer fired     {"height": 6011, "view": 1}
2020-01-30T15:46:29.557Z        DEBUG   dbft@v0.0.0-20200130105505-02c208d154bf/dbft.go:174     timeout {"height": 6011, "view": 1}
2020-01-30T15:46:29.557Z        DEBUG   dbft@v0.0.0-20200130105505-02c208d154bf/dbft.go:549     reset timer     {"h": 6011, "v": 1, "delay": "40s"}
2020-01-30T15:46:29.557Z        DEBUG   dbft@v0.0.0-20200130105505-02c208d154bf/send.go:73      request change view     {"view": 1, "height": 6011, "new_view": 2, "nc": 1, "nf": 0}
2020-01-30T15:46:29.557Z        DEBUG   dbft@v0.0.0-20200130105505-02c208d154bf/send.go:9       broadcasting message    {"type": "ChangeView", "height": 6011, "view": 1}
2020-01-30T15:46:29.892Z        DEBUG   dbft@v0.0.0-20200130105505-02c208d154bf/dbft.go:104     initialize      {"height": 6011, "view": 2, "index": 3, "role": "Backup"}
2020-01-30T15:46:29.893Z        DEBUG   dbft@v0.0.0-20200130105505-02c208d154bf/dbft.go:549     reset timer     {"h": 6011, "v": 2, "delay": "40s"}

After which nodes quickly change the view and accept new block:

2020-01-30T15:46:31.443Z        DEBUG   dbft@v0.0.0-20200130105505-02c208d154bf/check.go:64     approving block {"height": 6011, "hash": "5d08efb5b013bf4348b53d8f20d074321d102b229da9198851143d483ef1ef94", "tx_count": 8422, "merkle": "d2c2cd6ab14fcfcb59613fde2c1784b802abffefcc39ace51bd9f0f408b6bcad", "prev": "8
115f28d30b7e48ad4e0a19e790a55cc42328a644c5405c516862c408dbe6585"}
2020-01-30T15:46:33.296Z        DEBUG   dbft@v0.0.0-20200130105505-02c208d154bf/dbft.go:104     initialize      {"height": 6012, "view": 0, "index": 3, "role": "Backup"}
2020-01-30T15:46:33.296Z        DEBUG   dbft@v0.0.0-20200130105505-02c208d154bf/dbft.go:549     reset timer     {"h": 6012, "v": 0, "delay": "10s"}

I don't think it's correct because the node number two is stuck in view 0 of block 6011 and these timer extensions are very suspicious for the case when we have another view already.

They were added in neo-project/neo#704 in C# implementation, BTW.

four.log
one.log
three.log
two.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions