Skip to content

Conversation

@vinkal-chudgar
Copy link
Contributor

@vinkal-chudgar vinkal-chudgar commented Oct 16, 2025

Motivation

Fixes #24693.

ResourceGroupService unconditionally schedules two periodic tasks (aggregating local usage and calculating quotas) during broker initialization, executing them every resourceUsageTransportPublishIntervalInSecs (default 60 seconds) regardless of whether any tenants or namespaces are registered to resource groups. This wastes resources on brokers where the resource group feature is unused.

This PR implements explicit lifecycle management: periodic tasks now start only when the first tenant or namespace is registered to any resource group, and stop when the last registration is removed. This eliminates unnecessary periodic task execution and resource consumption on brokers where the resource group feature is idle.

Modifications

  • Lazy scheduling: Periodic tasks (aggregateResourceGroupLocalUsages, calculateQuotaForAllResourceGroups) are no longer unconditionally started in initialize(). They start only when tenant or namespace registrations are added (via registerTenant/registerNameSpace calls) and stop when the last registration is removed.
  • Automatic cleanup: Tasks stop automatically when the last attachment is removed, eliminating background work when resource groups become unused.
  • CAS-guarded concurrency: Task lifecycle is controlled via an AtomicBoolean with compare-and-set to avoid double start/stop and blocking.
  • Dynamic config: Existing dynamic rescheduling based on resourceUsageTransportPublishIntervalInSecs is preserved; guards prevent reschedule attempts when schedulers are stopped.
  • Docs: Updated the ServiceConfiguration field doc to reflect “run only when in use” and to clarify behavior when a ResourceUsageTransportManager is configured.

Modified files

  • ResourceGroupService.java: add maybeStartSchedulers(), maybeStopSchedulersIfIdle(), hasActiveAttachments(), plus small guard/logging improvements.
  • ServiceConfiguration.java: clarify resourceUsageTransportPublishIntervalInSecs doc.
  • ResourceGroupServiceTest.java: add new tests and adapt existing tests for lazy lifecycle.

Verifying this change

  • Make sure that the change passes the CI checks.

This change is verified by unit tests. Tests are sleep free, run with per test isolation, and use 60s timeouts.

New unit tests (ResourceGroupServiceTest.java)

  • testLazyStartStopAndReschedule() verifies lazy start, deterministic reschedule on interval change, and stop on last detach.
  • testNoStartOnRGCreateOnly() ensures no schedulers start when a resource group has no attachments and direct calls do not auto start.
  • testStartOnTenantAttachment() confirms that a tenant only attachment starts both schedulers and detaching the last tenant stops them.
  • testStopOnLastDetachWithMixedRefs() checks that schedulers keep running while any attachment exists and stop only after the last is removed.
  • testNoRescheduleWhenStopped() ensures interval changes do not schedule tasks when the service is idle.

Updated unit tests (ResourceGroupServiceTest.java)

  • testClose(): start schedulers via an attachment, call close(), then assert scheduled task references are set to null and isSchedulersRunning() is false.

  • testResourceGroupOps(): The test now calls calculateQuotaForAllResourceGroups() before unregistering all attachments, rather than after.

Test utilities

  • isSchedulersRunning() accessor annotated with com.google.common.annotations.VisibleForTesting.
  • createResourceGroupService() helper that returns a fresh ResourceGroupService(pulsar) per test.

Personal CI Results

Tested in Personal CI fork: vinkal-chudgar#1

Status:

  • 43 checks passed (build, tests, validation)
  • 3 coverage upload jobs failed due to missing Codecov token (expected in forks)
  • All core test suites passed successfully

Evidence: https://github.com/vinkal-chudgar/pulsar/pull/1/checks

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Configuration note: Only the documentation string for resourceUsageTransportPublishIntervalInSecs was updated. No default values were changed.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Note: Inline configuration documentation was updated in ServiceConfiguration.java. No website docs need changes

Matching PR in forked repository

PR in forked repository: vinkal-chudgar#1

…idle

ResourceGroupService previously executed periodic tasks unconditionally
every resourceUsageTransportPublishIntervalInSecs. This change starts both
tasks only when at least one tenant or namespace is attached to any
resource group, and stops them when no attachments remain. Each task
reschedules itself if the interval changes at runtime. close() now cancels
and clears both futures.

Fixes apache#24693

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
@github-actions
Copy link

@vinkal-chudgar Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Oct 16, 2025
@vinkal-chudgar vinkal-chudgar marked this pull request as ready for review October 16, 2025 14:30
@lhotari
Copy link
Member

lhotari commented Oct 16, 2025

Thanks for the great contribution! I didn't do a full review yet. A few minor comments.

@vinkal-chudgar
Copy link
Contributor Author

Thanks for the great contribution! I didn't do a full review yet. A few minor comments.

Thanks for the quick look! I'll address the comments when available.

@vinkal-chudgar
Copy link
Contributor Author

@lhotari - When convenient, could you please review this PR?

…ks; adopt 'registered' terminology

- Moved the duplicated early-return checks from both periodic methods aggregateResourceGroupLocalUsages and calculateQuotaForAllResourceGroups into a single helper: shouldRunPeriodicTasks
- Rename hasActiveAttachments() to hasActiveResourceGroups()
- Update comments/logs to use 'registered' consistently.

No functional change.

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
@vinkal-chudgar
Copy link
Contributor Author

@lhotari - Thanks for the review. I've addressed both comments. Could you please take a look when you have a moment?

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good work @vinkal-chudgar

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces lazy initialization and lifecycle management for ResourceGroupService periodic schedulers. The schedulers now start only when the first tenant or namespace is attached to a resource group and stop automatically when all attachments are removed, reducing unnecessary overhead when resource groups exist but are not actively used.

  • Adds lazy scheduler startup/shutdown triggered by tenant/namespace registrations
  • Implements deterministic rescheduling when publish intervals change dynamically
  • Ensures proper cleanup of scheduler state in close() method

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
ResourceGroupService.java Implements lazy scheduler lifecycle with maybeStartSchedulers() and maybeStopSchedulersIfIdle() methods, adds guards to periodic tasks, and ensures proper cleanup with null-setting of futures
ResourceGroupServiceTest.java Adds comprehensive tests for lazy scheduler lifecycle, rescheduling behavior, and close() cleanup; fixes test ordering issue in testResourceGroupOps()
ServiceConfiguration.java Updates documentation for resourceUsageTransportPublishIntervalInSecs to explain the new lazy scheduler behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lhotari
Copy link
Member

lhotari commented Oct 31, 2025

/pulsarbot rerun-failure-checks

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 80.43478% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.28%. Comparing base (a091ea7) to head (aeb3547).
⚠️ Report is 28 commits behind head on master.

Files with missing lines Patch % Lines
...sar/broker/resourcegroup/ResourceGroupService.java 80.43% 0 Missing and 9 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24859      +/-   ##
============================================
- Coverage     74.29%   74.28%   -0.01%     
- Complexity    33825    33855      +30     
============================================
  Files          1913     1913              
  Lines        149281   149365      +84     
  Branches      17325    17342      +17     
============================================
+ Hits         110902   110961      +59     
- Misses        29540    29546       +6     
- Partials       8839     8858      +19     
Flag Coverage Δ
inttests 26.31% <10.86%> (-0.10%) ⬇️
systests 22.77% <10.86%> (-0.04%) ⬇️
unittests 73.81% <80.43%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...org/apache/pulsar/broker/ServiceConfiguration.java 98.20% <ø> (ø)
...sar/broker/resourcegroup/ResourceGroupService.java 78.27% <80.43%> (+5.17%) ⬆️

... and 88 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lhotari lhotari merged commit 06f7142 into apache:master Oct 31, 2025
103 of 107 checks passed
lhotari pushed a commit that referenced this pull request Oct 31, 2025
…gistered (#24859)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 06f7142)
lhotari pushed a commit that referenced this pull request Oct 31, 2025
…gistered (#24859)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 06f7142)
lhotari pushed a commit that referenced this pull request Oct 31, 2025
…gistered (#24859)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 06f7142)
lhotari pushed a commit that referenced this pull request Oct 31, 2025
…gistered (#24859)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 06f7142)
ganesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 3, 2025
…gistered (apache#24859)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 06f7142)
(cherry picked from commit f535106)
manas-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 3, 2025
…gistered (apache#24859)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 06f7142)
(cherry picked from commit aaabe70)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 4, 2025
…gistered (apache#24859)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 06f7142)
(cherry picked from commit aaabe70)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 6, 2025
…gistered (apache#24859)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 06f7142)
(cherry picked from commit f535106)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] ResourceGroupService requests all topic stats every 60 seconds by default even when it shouldn't be enabled

3 participants