Skip to content

JAVA-3005 Refresh entire node list when new node added#1604

Merged
absurdfarce merged 1 commit into
apache:4.xfrom
akhaku:refreshNodeList
Jul 13, 2022
Merged

JAVA-3005 Refresh entire node list when new node added#1604
absurdfarce merged 1 commit into
apache:4.xfrom
akhaku:refreshNodeList

Conversation

@akhaku

@akhaku akhaku commented Jul 4, 2022

Copy link
Copy Markdown
Contributor

Empirically, Cassandra does not appear to send
TopologyChangeType.REMOVED_NODE when a node in the ring
crashes hard and is eventually replaced, even though the
bad node is removed from the system tables. This results in
the driver holding on to that node until a restart.
The 3.x driver refreshes the entire node list when a node is
added, so when a new node comes up to replace the crashed
node, the crashed node is removed from the driver's state.
With this change we port that behavior to the 4.x driver and
refresh the entire node list when a new node is added.
This also fixes JAVA-3002.

logPrefix,
event.broadcastRpcAddress);
metadataManager.addNode(event.broadcastRpcAddress);
metadataManager.refreshNodes();

@akhaku akhaku Jul 4, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that refreshNodes is an async operation and we aren't waiting for the result. That's probably fine though, we're in an async code path anyway (debounced handling of topology events).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but addNode is also async. That's fine.

Comment thread changelog/README.md

### 4.14.2 (in progress)

- [bug] JAVA-3002 JAVA-3005: Refresh entire node list when a new node is added

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also just reference one of these tickets and close the other one as a duplicate, let me know

Empirically, Cassandra does not appear to send
TopologyChangeType.REMOVED_NODE when a node in the ring
crashes hard and is eventually replaced, even though the
bad node is removed from the system tables. This results in
the driver holding on to that node until a restart.
The 3.x driver refreshes the entire node list when a node is
added, so when a new node comes up to replace the crashed
node, the crashed node is removed from the driver's state.
With this change we port that behavior to the 4.x driver and
refresh the entire node list when a new node is added.
This also fixes JAVA-3002.
@akhaku akhaku force-pushed the refreshNodeList branch from 814e8bf to 424cfa7 Compare July 7, 2022 17:45
@akhaku

akhaku commented Jul 12, 2022

Copy link
Copy Markdown
Contributor Author

cc @absurdfarce

@absurdfarce

Copy link
Copy Markdown
Contributor

No worries @akhaku , it is in the process of being reviewed right now :)

@absurdfarce absurdfarce left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes make perfect sense given the underlying discussion of the issue on JAVA-3002 and JAVA-3005. Re-ran test scenario described in JAVA-3002 and confirmed that behaviour with this change matches up to exactly what I would expect.

@absurdfarce absurdfarce merged commit ffcf0d5 into apache:4.x Jul 13, 2022
@akhaku akhaku deleted the refreshNodeList branch July 13, 2022 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants