Skip to content

Conversation

@heesung-sohn
Copy link
Contributor

@heesung-sohn heesung-sohn commented Sep 12, 2022

Fixes #

Master Issue: #

Motivation

Currently, the load report frequency is hard-coded at 5secs, and the JVM memory usage is re-computed in the ZK report thread every 5 sec. In the high traffic environment, when GC is frequent, the memory usage could fluctuate in 5 secs, and this could result in writing load reports to ZK too frequently(every 5 sec).

Modifications

Verifying this change

  • Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • Anything that affects deployment

Documentation

  • doc-required
    (Your PR needs to update docs and you will update later). It needs to update the broker config doc. https://pulsar.apache.org/docs/reference-configuration/#broker

  • [] doc-not-needed
    (Please explain why) loadBalancerReportUpdateMinIntervalMilliSeconds

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Sep 12, 2022
}

public double getMaxResourceUsage(boolean checkMemoryUsage) {
if (checkMemoryUsage) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to remove this config option and instead only check the direct memory but not the heap memory usage here.

Copy link
Contributor Author

@heesung-sohn heesung-sohn Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

I initially thought this checkMemoryUsage config is safer since it does keep the old behavior(checking memory).

I see that memory usage is always noisy, so I agree that we could remove it from getMaxResourceUsage.

Note that we keep getMaxResourceUsageWithWeightWithinLimit() as-is -- we are not removing the memory usage signal in this func, as its memory usage can be already disabled by loadBalancerMemoryResourceWeight=0.0.

@heesung-sohn heesung-sohn force-pushed the update-lb-update-frequency-config branch from e0d963d to 94ad0f9 Compare September 13, 2022 01:07
@heesung-sohn heesung-sohn changed the title [improve][loadbalance] added loadBalancerReportUpdateMemoryUsageChecked and loadBalancerReportUpdateMinIntervalMilliSeconds configs [improve][loadbalance] added loadBalancerReportUpdateMemoryUsageChecked and ignores memory usage in getMaxResourceUsage() Sep 13, 2022
@merlimat merlimat added this to the 2.12.0 milestone Sep 13, 2022
@merlimat merlimat added release/2.9.4 release/2.11.1 release/2.10.3 type/bug The PR fixed a bug or issue reported a bug labels Sep 13, 2022
@heesung-sohn
Copy link
Contributor Author

/pulsarbot run-failure-checks

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And could you please also add a test for the new configuration?

conf/broker.conf Outdated
loadBalancerReportUpdateThresholdPercentage=10

# minimum interval to update load report
loadBalancerReportUpdateMinIntervalMilliSeconds=5000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
loadBalancerReportUpdateMinIntervalMilliSeconds=5000
loadBalancerReportUpdateMinIntervalMillis=5000

Just keep consistent with other configuration names in the broker.conf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Comment on lines +2092 to +2095
@FieldContext(
category = CATEGORY_LOAD_BALANCER,
dynamic = true,
doc = "Min delay of load report to collect, in milli-seconds"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move to line 2090?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

public double getMaxResourceUsage() {
return max(cpu.percentUsage(), memory.percentUsage(), directMemory.percentUsage(), bandwidthIn.percentUsage(),
// does not consider memory because it is noisy by gc.
return max(cpu.percentUsage(), directMemory.percentUsage(), bandwidthIn.percentUsage(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please help add a unit test?
Looks like we changed the behavior, but all the tests got passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

… config and ignores memory usage in getMaxResourceUsage$()
@heesung-sohn heesung-sohn force-pushed the update-lb-update-frequency-config branch from 94ad0f9 to 5844002 Compare September 13, 2022 23:02
@heesung-sohn heesung-sohn changed the title [improve][loadbalance] added loadBalancerReportUpdateMemoryUsageChecked and ignores memory usage in getMaxResourceUsage() [improve][loadbalance] added loadBalancerReportUpdateMinIntervalMillis and ignores memory usage in getMaxResourceUsage() Sep 14, 2022
@heesung-sohn
Copy link
Contributor Author

/pulsarbot run-failure-checks

1 similar comment
@heesung-sohn
Copy link
Contributor Author

/pulsarbot run-failure-checks

@codelipenghui codelipenghui merged commit de7c586 into apache:master Sep 14, 2022
@codelipenghui codelipenghui added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/broker labels Sep 14, 2022
codelipenghui pushed a commit that referenced this pull request Sep 21, 2022
…s and ignores memory usage in getMaxResourceUsage() (#17598)

(cherry picked from commit de7c586)
codelipenghui pushed a commit that referenced this pull request Sep 21, 2022
…s and ignores memory usage in getMaxResourceUsage() (#17598)

(cherry picked from commit de7c586)
@codelipenghui codelipenghui removed this from the 2.12.0 milestone Sep 21, 2022
@codelipenghui codelipenghui added this to the 2.11.0 milestone Sep 21, 2022
codelipenghui pushed a commit that referenced this pull request Sep 21, 2022
…s and ignores memory usage in getMaxResourceUsage() (#17598)

(cherry picked from commit de7c586)
@momo-jun
Copy link
Contributor

@Anonymitaet the configuration doc update can be automated in the next version, which means we don't need to update https://pulsar.apache.org/docs/reference-configuration/#broker manually, correct?

@Anonymitaet
Copy link
Member

@Anonymitaet the configuration doc update can be automated in the next version, which means we don't need to update https://pulsar.apache.org/docs/reference-configuration/#broker manually, correct?

yes

@momo-jun momo-jun added doc-complete Your PR changes impact docs and the related docs have been already added. and removed doc-required Your PR changes impact docs and you will update later. labels Sep 23, 2022
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Sep 28, 2022
…s and ignores memory usage in getMaxResourceUsage() (apache#17598)

(cherry picked from commit de7c586)
(cherry picked from commit a6cb786)
@heesung-sohn heesung-sohn deleted the update-lb-update-frequency-config branch April 2, 2024 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/broker cherry-picked/branch-2.9 Archived: 2.9 is end of life cherry-picked/branch-2.10 cherry-picked/branch-2.11 doc-complete Your PR changes impact docs and the related docs have been already added. release/2.9.4 release/2.10.2 type/bug The PR fixed a bug or issue reported a bug type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants