Skip to content

pool/store: Fix inverted logic on inactive deadline checks#74

Merged
shazow merged 3 commits intovipnode:masterfrom
ryanschneider:inactive-fix
Sep 25, 2019
Merged

pool/store: Fix inverted logic on inactive deadline checks#74
shazow merged 3 commits intovipnode:masterfrom
ryanschneider:inactive-fix

Conversation

@ryanschneider
Copy link
Copy Markdown
Collaborator

I've tested this on our local cluster and it seems to resolve the surprise InactivePeers. I think I see now what was happening:

  • usually the numUpdated == len(node.peers) check was short-circuiting the inactive check.
  • but occasionally if the number of peers changed (e.g. due to a peering disconnect because of a devp2p timeout), it would cause the short-circuit to not activate on all the nodes in the pool, leading to the cascading disconnects and the flapping between inactive and not.

I first tested this by leaving it as .Before but removing the short-circuit, and the peering looked as unstable immediately as it did during the peering issues we'd seen:

image

In the above graph the pool w/o the short-circuit was running from ~19:07 - 19:12 UTC, and the code in this PR was running on the pool server at 19:13 onwards, the peering was remaining quite stable since, but has only been running for ~1h now.

I haven't checked the unit tests yet as to why they were passing before, but my guess is they weren't fully exercising the expiration timing.

@ryanschneider
Copy link
Copy Markdown
Collaborator Author

I added another fix in d2f02e2 for the memory pool, as stopping an agent didn't seem to have any effect on trigger inactive peers, because the pool was using the wrong timestamp for when a peer was last seen. I'm not familiar enough w/ Badger to make the same change on the badger store, so figured I'd leave that for you.

With this change I now correctly see other agents considering a peer invalid when the peers vipnode-agent has been stopped for a couple minutes, and then everything recovers when the agent is restarted.

@ryanschneider
Copy link
Copy Markdown
Collaborator Author

Oops, and 9cbe3ce fixes the unit test failing since .LastSeen was the 0 value.

inactiveDeadline := now.Add(-store.ExpireInterval)
for nodeID, timestamp := range nodePeers {
if timestamp.Before(inactiveDeadline) {
if timestamp.After(inactiveDeadline) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, thanks for catching this!

@shazow shazow merged commit 77b84ae into vipnode:master Sep 25, 2019
@shazow shazow mentioned this pull request Sep 25, 2019
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants