Skip to content

Fixing the CPU Intensive RemoveAll with Lists in Sticky & Load Based Partition Assignment Strategies#965

Merged
shrinandthakkar merged 2 commits intolinkedin:masterfrom
shrinandthakkar:fixRemoveAllCallWithList
Oct 27, 2023
Merged

Fixing the CPU Intensive RemoveAll with Lists in Sticky & Load Based Partition Assignment Strategies#965
shrinandthakkar merged 2 commits intolinkedin:masterfrom
shrinandthakkar:fixRemoveAllCallWithList

Conversation

@shrinandthakkar
Copy link
Copy Markdown
Collaborator

Summary

  • The coordinator thread should be able to finish any event in less than the configured heartbeat period (default 1 minute). Lately it has been observed that all the partition assignment events are taking more than approximately 1.5 minutes to complete for every request for large clusters with around ~500K partitions per datastream.

  • The issue seems to be related to this code where the thread is stuck in the removeAll call, where one of the collections is a list. This may result in higher CPU usage.

  • This has been confirmed with thread dumps and logs from a partition heavy cluster's performance.


Important: DO NOT REPORT SECURITY ISSUES DIRECTLY ON GITHUB.
For reporting security issues and contributing security fixes,
please, email security@linkedin.com instead, as described in
the contribution guidelines.

Please, take a minute to review the contribution guidelines at:
https://github.com/linkedin/Brooklin/blob/master/CONTRIBUTING.md

@shrinandthakkar shrinandthakkar merged commit 0faec8e into linkedin:master Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants