Skip to content

replication: explore eliminating lazy internalRaftGroup initialization in favor of quiescence #73715

@nvb

Description

@nvb

From a discussion in #73362.

@tbg: This is pretty annoying; we morally say that a replica unquiesces when it needs to use raft, but then we also say this isn't true, you need some folks bypassing this check and creating the group first? This is pretty complex and I'm not at all clear on why this works out in practice (I assume we have the right calls in the actual critical path).

I think we should initialize the raft group right away. Is now perhaps the right time to make that change? Doing this lazily was a half-baked way to avoid election storms back when this was still a problem and possibly before we even had quiescence. Now that quiescence is the default, what's the benefit of the lazy raft group?

@nvanbenschoten: I think I see what you're saying. The idea would be that if quiescence is the default and we only unquiesce in all the same places that would otherwise initialize the raft group, then this lazy raft group creation isn't doing anything anymore, and so the dependency from internalRaftGroup != nil -> !quiescent is unnecessary complexity. I think I agree with that in theory, as it simplifies the states that a replica can be in significantly.

Jira issue: CRDB-11704

Metadata

Metadata

Assignees

Labels

A-kv-replicationRelating to Raft, consensus, and coordination.C-cleanupTech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions