Skip to content

[improve][meta] Support set metadata size threshold for compression.#19561

Merged
Technoboy- merged 9 commits into
apache:masterfrom
lifepuzzlefun:conditional_compress_metadata_by_size
May 31, 2023
Merged

[improve][meta] Support set metadata size threshold for compression.#19561
Technoboy- merged 9 commits into
apache:masterfrom
lifepuzzlefun:conditional_compress_metadata_by_size

Conversation

@lifepuzzlefun

@lifepuzzlefun lifepuzzlefun commented Feb 18, 2023

Copy link
Copy Markdown
Contributor

PIP-270

Motivation

Current we support to set compression on ManagedLedger and ManagedCursor metadata
by managedLedgerInfoCompressionType and managedCursorInfoCompressionType

the metadata of ManagedLedger is meanly composed by a ledger id list and some other field
to describe pulsar topic storage state. you can use the metadata to find which the message is stored.
when pulsar topic message is deleled by the retention policy or ledger rollover. the metadata is remove or append the metadata list.

this metadata can be small when retention time is short.

the metadata of ManagedCursor is the subscription state of a pulsar subscription. if user ack an message this will trigger the subscription state save in the metadata of ManagedCursor

if user always use accumlateAcknowledge api this state is also small enough.

all those metadata is saved in metadataStore.

sometimes the metadata is too small to compress. so a size based config is needed.

Modifications

check size before metadata compression.

Verifying this change

This change added tests and can be verified as follows:

add unit test case for metadata big or small than threshold.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

if metadata size before serialization is smaller then threshold. compression will not apply.
@github-actions github-actions Bot added the doc-not-needed Your PR changes do not impact docs label Feb 18, 2023
@lifepuzzlefun lifepuzzlefun changed the title Support set metadata size threshold for compression. [improve][metadata] Support set metadata size threshold for compression. Feb 18, 2023
@lifepuzzlefun lifepuzzlefun changed the title [improve][metadata] Support set metadata size threshold for compression. [improve][meta] Support set metadata size threshold for compression. Feb 18, 2023
@lifepuzzlefun

Copy link
Copy Markdown
Contributor Author

@codelipenghui @Technoboy- @AnonHxy plesase take a look. thank you.

* ManagedLedgerInfo compression threshold. If the origin metadata size below configuration.
* compression will not apply.
*/
private int managedLedgerInfoCompressionSizeThresholdBytes = 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set a suitable default value?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current value is the same behavior before this feature. maybe we can set to 128 or more. In our production env most cursor metadata is less than this value.

@315157973

Copy link
Copy Markdown
Contributor

/pulsarbot run-failure-checks

@lifepuzzlefun

Copy link
Copy Markdown
Contributor Author

@Technoboy- @congbobo184 @poorbarcode can you take a look ?

@poorbarcode poorbarcode left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some comments

@lifepuzzlefun

Copy link
Copy Markdown
Contributor Author

@poorbarcode @Technoboy- @315157973 can this pr be merged. if need change please let me know : - )

@315157973

Copy link
Copy Markdown
Contributor

Some unit tests fail, please rebase master

@Technoboy- Technoboy- closed this Mar 9, 2023
@Technoboy- Technoboy- reopened this Mar 9, 2023
@Technoboy- Technoboy- added this to the 3.0.0 milestone Mar 9, 2023
@Technoboy-

Copy link
Copy Markdown
Contributor

@Technoboy- @congbobo184 @poorbarcode can you take a look ?

Sorry for the late response.

@315157973 315157973 closed this Mar 10, 2023
@315157973 315157973 reopened this Mar 10, 2023
@315157973 315157973 closed this Mar 10, 2023
@315157973 315157973 reopened this Mar 10, 2023
@codecov-commenter

codecov-commenter commented Mar 10, 2023

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 66.58711% with 140 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.96%. Comparing base (b07f2a9) to head (1248bd4).
⚠️ Report is 2006 commits behind head on master.

Files with missing lines Patch % Lines
...xtensions/channel/ServiceUnitStateChannelImpl.java 66.07% 15 Missing and 4 partials ⚠️
...r/impl/SnapshotSegmentAbortedTxnProcessorImpl.java 68.18% 10 Missing and 4 partials ⚠️
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 27.77% 8 Missing and 5 partials ⚠️
...a/org/apache/pulsar/websocket/ConsumerHandler.java 0.00% 12 Missing ⚠️
...ava/org/apache/pulsar/websocket/ReaderHandler.java 0.00% 12 Missing ⚠️
...va/org/apache/pulsar/client/impl/ConsumerImpl.java 66.66% 11 Missing ⚠️
.../pulsar/functions/runtime/JavaInstanceStarter.java 0.00% 11 Missing ⚠️
...a/org/apache/pulsar/websocket/ProducerHandler.java 0.00% 11 Missing ⚠️
...ache/pulsar/broker/namespace/NamespaceService.java 20.00% 6 Missing and 2 partials ⚠️
...functions/runtime/thread/ThreadRuntimeFactory.java 50.00% 4 Missing and 1 partial ⚠️
... and 11 more
Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #19561       +/-   ##
=============================================
+ Coverage     30.23%   72.96%   +42.72%     
- Complexity      295    31913    +31618     
=============================================
  Files          1689     1865      +176     
  Lines        129020   138445     +9425     
  Branches      14070    15192     +1122     
=============================================
+ Hits          39013   101015    +62002     
+ Misses        84148    29436    -54712     
- Partials       5859     7994     +2135     
Flag Coverage Δ
inttests 24.30% <15.51%> (+0.13%) ⬆️
systests 24.94% <19.33%> (-0.03%) ⬇️
unittests 72.25% <66.58%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...bookkeeper/mledger/ManagedLedgerFactoryConfig.java 90.00% <100.00%> (+2.50%) ⬆️
...kkeeper/mledger/impl/ManagedLedgerFactoryImpl.java 80.79% <100.00%> (+40.03%) ⬆️
.../apache/bookkeeper/mledger/impl/MetaStoreImpl.java 85.91% <100.00%> (+44.71%) ⬆️
...hentication/oidc/AuthenticationProviderOpenID.java 75.86% <ø> (+75.86%) ⬆️
...org/apache/pulsar/broker/ServiceConfiguration.java 99.37% <100.00%> (+1.35%) ⬆️
...ache/pulsar/broker/ManagedLedgerClientFactory.java 82.05% <100.00%> (+38.80%) ⬆️
...g/apache/pulsar/broker/admin/impl/BrokersBase.java 70.45% <100.00%> (+40.45%) ⬆️
.../org/apache/pulsar/broker/admin/v2/Namespaces.java 68.19% <ø> (+54.63%) ⬆️
...ache/pulsar/broker/loadbalance/LinuxInfoUtils.java 43.65% <ø> (+7.14%) ⬆️
...lance/extensions/ExtensibleLoadManagerWrapper.java 58.62% <100.00%> (+58.62%) ⬆️
... and 49 more

... and 1431 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@315157973

Copy link
Copy Markdown
Contributor

@lifepuzzlefun Please fix the unit test

@lifepuzzlefun lifepuzzlefun force-pushed the conditional_compress_metadata_by_size branch from 4282e6d to 2d52942 Compare March 11, 2023 13:05
@lifepuzzlefun

Copy link
Copy Markdown
Contributor Author

I found one flaky test in previous CI and try to fix by #19793.

the current fail in https://github.com/apache/pulsar/actions/runs/4382425219/jobs/7688447199

org.apache.pulsar.broker.service.ManagedLedgerCompressionTest.testRestartBrokerEnableManagedLedgerInfoCompression(org.apache.pulsar.broker.service.ManagedLedgerCompressionTest) pass in my local env.

trigger another CI run to see if it fail again.

@lifepuzzlefun

Copy link
Copy Markdown
Contributor Author

@315157973 @poorbarcode @Technoboy- can this pr be merged?

@315157973

Copy link
Copy Markdown
Contributor

@315157973 @poorbarcode @Technoboy- can this pr be merged?

The button is unavailable due to unit tests not all passing

@congbobo184 congbobo184 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where can we change these config values? I haven't seen any opreation that we can change these config values. Maybe we should create a PIP to add these two configurations?

@lifepuzzlefun

Copy link
Copy Markdown
Contributor Author

@congbobo184 aha, i forget add those config in the pulsar configuration. i have add them in the new commit : - )
and do we need a pip to merge this patch ? this is a small feature introduce to pulsar. may be i can update the doc to tell people the support settings.

@congbobo184

Copy link
Copy Markdown
Contributor

@lifepuzzlefun hi, add any config, we also should create a PIP to discuss and vote

When read from schemaCache, check if the schema version means empty.

when use Schema.BYTES consumer and enable retry and dlq policy.

need cache the empty schema version to avoid send a lot of GetOrCreateSchema request to retrieve the schema version.
@poorbarcode poorbarcode modified the milestones: 3.0.0, 3.1.0 Apr 10, 2023
@github-actions

Copy link
Copy Markdown

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions Bot added the Stale label May 11, 2023
@lifepuzzlefun

Copy link
Copy Markdown
Contributor Author

@congbobo184 hi, sorry for the late reply, i have just create a pip to discuss.

@lifepuzzlefun

Copy link
Copy Markdown
Contributor Author

#20307 pip-270 created. hopes for your comments and vote, thank you~

the discuss mail list address is write at the bottom, feel free to join the discuss.

@github-actions github-actions Bot removed the Stale label May 13, 2023
@Technoboy-

Copy link
Copy Markdown
Contributor

@congbobo184 Could you take a review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/metadata doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants