[improve] [broker] filter system topics while shedding by thetumbled · Pull Request #18936 · apache/pulsar

thetumbled · 2022-12-15T07:56:24Z

Motivation

Topics/Bundles will be unload while doing shedding, and there are some special topics that should not be unloaded for some reason. For example, if transaction_coordinator_assign is unloaded, the corresponding TC need to be recovered, which is time consuming.
So, we have better avoid unload these topics. And i found that such features have been implemented in the latest branch except branch-2.9.

Modifications

fitler system topics while shedding in branch-2.9.

Verifying this change

Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is already covered by existing tests, such as (please describe tests).

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository: thetumbled#9

congbobo184

A system topic can be unloaded, allowing it to be load balanced. If you filter the system topic, it may lead to uneven resource allocation.

thetumbled · 2022-12-15T08:32:20Z

A system topic can be unloaded, allowing it to be load balanced. If you filter the system topic, it may lead to uneven resource allocation.

In the master branch, topics/bundles in pulsar/system will be filter.
org.apache.pulsar.broker.loadbalance.LoadData#getBundleDataForLoadShedding

and actually we can achieve distributing bundles containing transaction_coordinator_assign evenly and avoiding unloading these bundles while shedding at the same time.

In our production clusters, we use AvgShedder described in #18186.

When the cluster is initializing or the broker is restarted, bundles will be distributed randomly or based on hashing algorithm, which is similar to be a uniform distribution. So we can ensure that bundles containing transaction_coordinator_assign will be distributed evenly across brokers.
When we need to do shedding, we will filter bundles containing transaction_coordinator_assign to avoid TC recovery.

congbobo184 · 2022-12-15T09:56:32Z

In the master branch, topics/bundles in pulsar/system will be filter.

I think this logic was introduced by mistake by pr #15252

and actually we can achieve distributing bundles containing transaction_coordinator_assign evenly and avoiding unloading these bundles while shedding at the same time.

In our production clusters, we use AvgShedder described in #18186.

I will see the PIP later.

and I think the transaction_coordinator_assign can be shedding in any time, we could use a smoother strategy, but it doesn't prevent being shed

thetumbled · 2022-12-15T10:12:10Z

In the master branch, topics/bundles in pulsar/system will be filter.

I think this logic was introduced by mistake by pr #15252

Should i raise a PR to revert mistake in the master branch?

and I think the transaction_coordinator_assign can be shedding in any time, we could use a smoother strategy, but it doesn't prevent being shed

I think that load balancing strategy such as ThresholdShedder do not work well with transaction_coordinator_assign, which will do many meaningless bundles unloading. and the cost of TC recovery is pretty high that there are more than 20 minutes of unavailable time in our test.

congbobo184 · 2022-12-15T12:46:00Z

Should i raise a PR to revert mistake in the master branch?

yes, I think we need a pr to revert this change

I think that load balancing strategy such as ThresholdShedder do not work well with transaction_coordinator_assign, > which will do many meaningless bundles unloading. and the cost of TC recovery is pretty high that there are more than 20 minutes of unavailable time in our test.

I think we need to find out why the Tc recover so slowly, Is it a logic error, or need to expand the TC?

if it is a logic error, we need to fix.

later I will think about how to optimize the recovery time of TC

github-actions · 2023-01-15T02:01:47Z

The pr had no activity for 30 days, mark with Stale label.

filter system topics.

ad6306d

thetumbled mentioned this pull request Dec 15, 2022

[improve] [broker] filter system topics while shedding thetumbled/pulsar#9

Closed

congbobo184 requested changes Dec 15, 2022

View reviewed changes

thetumbled mentioned this pull request Dec 16, 2022

[fix] [broker] do not filter system topic while shedding. #18949

Merged

5 tasks

github-actions Bot added the Stale label Jan 15, 2023

thetumbled closed this May 31, 2023

Technoboy- mentioned this pull request Jun 5, 2023

[Bug] TC recovery takes too much time #20489

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve] [broker] filter system topics while shedding#18936

[improve] [broker] filter system topics while shedding#18936
thetumbled wants to merge 1 commit into
apache:branch-2.9from
thetumbled:improve_filter_systemTopicsBundle

thetumbled commented Dec 15, 2022 •

edited

Loading

Uh oh!

congbobo184 left a comment

Uh oh!

thetumbled commented Dec 15, 2022

Uh oh!

congbobo184 commented Dec 15, 2022

Uh oh!

thetumbled commented Dec 15, 2022

Uh oh!

congbobo184 commented Dec 15, 2022

Uh oh!

github-actions Bot commented Jan 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thetumbled commented Dec 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Verifying this change

Documentation

Matching PR in forked repository

Uh oh!

congbobo184 left a comment

Choose a reason for hiding this comment

Uh oh!

thetumbled commented Dec 15, 2022

Uh oh!

congbobo184 commented Dec 15, 2022

Uh oh!

thetumbled commented Dec 15, 2022

Uh oh!

congbobo184 commented Dec 15, 2022

Uh oh!

github-actions Bot commented Jan 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thetumbled commented Dec 15, 2022 •

edited

Loading