Skip to content

[LI-HOTFIX] Transfer leadership for UnderMinISR partitions#406

Merged
gitlw merged 2 commits into
linkedin:3.0-lifrom
gitlw:broker_initiated_leadership_transfer3
Oct 25, 2022
Merged

[LI-HOTFIX] Transfer leadership for UnderMinISR partitions#406
gitlw merged 2 commits into
linkedin:3.0-lifrom
gitlw:broker_initiated_leadership_transfer3

Conversation

@gitlw

@gitlw gitlw commented Oct 21, 2022

Copy link
Copy Markdown

TICKET = LIKAFKA-47596
LI_DESCRIPTION =
This PR adds the following change:

When a partition's ISR size is about to fall under the MinISR setting,
instead of letting it go under MinISR, the leader host would try to initiate a leadership transfer.
Currently, we chose the first replica in the ISR to be the new leader.

EXIT_CRITERIA = If and when this change is accepted in upstream Kafka.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@gitlw gitlw requested a review from groelofs October 21, 2022 23:49
@gitlw gitlw marked this pull request as ready for review October 21, 2022 23:49

@groelofs groelofs left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though admittedly I was getting a little tired at the end, so I hope I didn't miss anything. :-/ Sorry for the delay!

leaderLogIfLocal.flatMap { log =>
val outOfSyncReplicaIds = getOutOfSyncReplicas(replicaLagTimeMaxMs)
if ((inSyncReplicaIds -- outOfSyncReplicaIds).size < log.config.minInSyncReplicas) {
inSyncReplicaIds.find(_ != localBrokerId)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a TODO to consider a smarter handoff algorithm in the future? (For example, based on some metric.)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought more about this. Suppose we have the following condition with 3 replicas in the ISR:
leader A has 5 messages (offsets 1-5);
follower B has 4 messages (offsets 1-4);
follower C has 3 messages (offsets 1-3).

One might think that transferring the new leader to B is the preferable option. However, because the high watermark is the lowest end offset of the ISR, i.e. determined by follower C in this case,
regardless of whether we transfer the leader to B or C, the produce request for msg at offset 4 and 5 will receive a "NotLeaderForPartition" error and retry the produce requests.
Thus transferring the leadership to B does not actually help.

Besides this line of thinking, I'm not sure how else the handoff algorithm can be smarter.

Comment thread core/src/test/scala/unit/kafka/integration/DegradedLeaderTest.scala Outdated
@gitlw gitlw merged commit d58aa46 into linkedin:3.0-li Oct 25, 2022
@gitlw gitlw deleted the broker_initiated_leadership_transfer3 branch October 25, 2022 17:06
earlcoder added a commit that referenced this pull request Apr 28, 2026
The async/event-based replica fetcher series (TransferLeaderManager,
AbstractAsyncFetcher, AsyncReplicaFetcher, FetcherEventBus,
FetcherEventManager — PRs #121/#123/#124/#143/#144/#403/#406) was
removed in the PR #538 squash. The li.async.fetcher.enable config key
was left behind in KafkaConfig as dead surface area: it parsed and
validated, but had no consumers.

Going with audit P1 #6 option (a): retire the feature, remove the dead
config key. Operators with li.async.fetcher.enable=<anything> in their
server.properties may see an 'Unknown configuration' warning at broker
startup; this is harmless — the broker still starts and the value would
have been a no-op anyway.

Removes:
- Defaults.LiAsyncFetcherEnabled
- KafkaConfig.LiAsyncFetcherEnableProp
- The brokerConfigDef.define(...) registration
- KafkaConfig.liAsyncFetcherEnable accessor

Audit option (b) — porting the entire async fetcher series forward
into 3.6-li — was deferred. If the optimization is needed in the
future, it would need to be reimplemented from scratch on top of
upstream's current ReplicaFetcherThread / ReplicaFetcherManager.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants