[LI-HOTFIX] Event based fetcher part 3/3: Adopting the event based fetcher model#124
Conversation
|
would someone take down the broker by sending "ALTER_REPLICA_LOG_DIRS" requests to the clusters? |
There was a problem hiding this comment.
this function is called on every leaderAndIsr request as part of "becomeLeaderOrFollower". So how much delay can be introduced by sending "getPartitionsCount" request to event manager and waiting for result?
There was a problem hiding this comment.
In upstream Kafka, the request handler thread needs to compete for the partitionMapLock in order to get the partitions count. And the fetcher.idle variable is a hotfix made internally by us.
When switching to the event-based model, I feel it's cleaner to request the partition count using an event + future.
I'm not sure about the performance difference yet, but I have plans to eventually merge the remove partition, add partition and get partition count into one event if necessary. So ultimately the extra delay introduced by the GetPartitionCount event may not matter.
a04854e to
324a3d2
Compare
I've added a change in the latest commit to disallow the AlterReplicaLogDirs request in the authorization step, |
xiowu0
left a comment
There was a problem hiding this comment.
LGTM.
Do you plan to re-enable JBOD in the future?
|
@xiowu0 Thanks for the review. I'll revisit JBOD only if it turns out to be needed within LI. |
…tcher model (linkedin#124) TICKET = KAFKA-10734 LI_DESCRIPTION = Part 3 of 3 PRs to change the fetcher into an event-based model EXIT_CRITERIA = When KAFKA-10734 is closed and the changes are pulled in as a part of a release
…tcher model (linkedin#124) TICKET = KAFKA-10734 LI_DESCRIPTION = Part 3 of 3 PRs to change the fetcher into an event-based model EXIT_CRITERIA = When KAFKA-10734 is closed and the changes are pulled in as a part of a release
…tcher model (linkedin#124) TICKET = KAFKA-10734 LI_DESCRIPTION = Part 3 of 3 PRs to change the fetcher into an event-based model EXIT_CRITERIA = When KAFKA-10734 is closed and the changes are pulled in as a part of a release
The async/event-based replica fetcher series (TransferLeaderManager, AbstractAsyncFetcher, AsyncReplicaFetcher, FetcherEventBus, FetcherEventManager — PRs #121/#123/#124/#143/#144/#403/#406) was removed in the PR #538 squash. The li.async.fetcher.enable config key was left behind in KafkaConfig as dead surface area: it parsed and validated, but had no consumers. Going with audit P1 #6 option (a): retire the feature, remove the dead config key. Operators with li.async.fetcher.enable=<anything> in their server.properties may see an 'Unknown configuration' warning at broker startup; this is harmless — the broker still starts and the value would have been a no-op anyway. Removes: - Defaults.LiAsyncFetcherEnabled - KafkaConfig.LiAsyncFetcherEnableProp - The brokerConfigDef.define(...) registration - KafkaConfig.liAsyncFetcherEnable accessor Audit option (b) — porting the entire async fetcher series forward into 3.6-li — was deferred. If the optimization is needed in the future, it would need to be reimplemented from scratch on top of upstream's current ReplicaFetcherThread / ReplicaFetcherManager.
This PR starts using the event based fetcher model, which replaces the existing lock based classes AbstractFetcherThread
and ReplicaFetcherThread.
To make implementation easier, I've decided to disable the following features for now
the JBOD logic implemented via ReplicaAlterLogDirsManager for now.
JBOD is not used within LinkedIn, so this change should be safe.
support for dynamically changing the fetcher thread pool size,
I checked the clusters in a given production fabric, including krc, queuing, metrics, venice, tracking, kac, data-deployment
and don't see a ZooKeeper override for the "num.replica.fetchers" config.
Many tests within the ConsumerBounceTest class are flaky, which is echod by the open source tickets
https://issues.apache.org/jira/browse/KAFKA-7893?jql=text%20~%20%22ConsumerBounceTest%22
Thus I'm disabling the entire test class for now.
Committer Checklist (excluded from commit message)