Skip to content

Conversation

@Jason918
Copy link
Contributor

@Jason918 Jason918 commented Nov 12, 2021

Motivation

Add Metadata Store implemented with rocksdb.

This is part of the work to remove zk from standalone server. All metadata can be stored in rocksdb.

Modifications

Add Metadata Store implemented with rocksdb.

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as each test class extends BaseMetadataStoreTest

This change added tests and can be verified as follows:

  • org.apache.pulsar.metadata.impl.RocksdbMetadataStoreTest

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

Check the box below and label this PR (if you have committer privilege).

Need to update docs?

  • no-need-doc

Not ready for user yet.

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Nov 12, 2021
Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just left few comments

@merlimat merlimat added this to the 2.10.0 milestone Nov 12, 2021
@merlimat merlimat added the type/feature The PR added a new feature or issue requested a new feature label Nov 12, 2021
Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I am understand the value for this implementation.

RocksDB is only local, so you could use this mostly only for Pulsar Standalone.

Why should I use RocksDB and not ZooKeeper for Pulsar Standalone ?

Starting RocksDB vs Starting ZooKeeper is not a big difference.

Also we should state clearly that using this implementation is not possible for production usage because each broker will have its own copy of the data, as RocksDB is not a distributed database.

@Jason918
Copy link
Contributor Author

@eolivelli

RocksDB is only local, so you could use this mostly only for Pulsar Standalone.

Yes, that's plan. This is one of multi-step work for Pulsar Standalone.

Why should I use RocksDB and not ZooKeeper for Pulsar Standalone ?

IMHO, in Pulsar Standalone, we already have RocksDb in dependency, so by removing zk from standalone mode, we can make it more lightweight, with less running memory consumption and less time of starting up. Although pulsar standalone is not for production use, it's quite useful and important for offline developing, testing and CI/CD.

Also we should state clearly that using this implementation is not possible for production usage because each broker will have its own copy of the data, as RocksDB is not a distributed database.

Yes, more detailed doc will be added later, since there is some more work to achieve this, like "Bridge BookKeeper metadata into MetadataStore"...

@merlimat @codelipenghui @hangc0276 Do we have more detailed doc for this (Removing zk from Pulsar standalone)?

@hezhangjian
Copy link
Member

@eolivelli

RocksDB is only local, so you could use this mostly only for Pulsar Standalone.

Yes, that's plan. This is one of multi-step work for Pulsar Standalone.

Why should I use RocksDB and not ZooKeeper for Pulsar Standalone ?

IMHO, in Pulsar Standalone, we already have RocksDb in dependency, so by removing zk from standalone mode, we can make it more lightweight, with less running memory consumption and less time of starting up. Although pulsar standalone is not for production use, it's quite useful and important for offline developing, testing and CI/CD.

Also we should state clearly that using this implementation is not possible for production usage because each broker will have its own copy of the data, as RocksDB is not a distributed database.

Yes, more detailed doc will be added later, since there is some more work to achieve this, like "Bridge BookKeeper metadata into MetadataStore"...

@merlimat @codelipenghui @hangc0276 Do we have more detailed doc for this (Removing zk from Pulsar standalone)?

Why we need to remove zk from pulsar standalone. It can be an option.
I can still use pulsar standalone to test some metadata storing on zk.

@Jason918
Copy link
Contributor Author

Why we need to remove zk from pulsar standalone. It can be an option.
I can still use pulsar standalone to test some metadata storing on zk.

Yes, it's optional.

@merlimat
Copy link
Contributor

RocksDB is only local, so you could use this mostly only for Pulsar Standalone.

Yes, that's the only point of it.

Why should I use RocksDB and not ZooKeeper for Pulsar Standalone ?
Starting RocksDB vs Starting ZooKeeper is not a big difference.

Not really, there is a huge difference in performance and, much more important, in resources usage.

Pulsar in Standalone mode is not a distributed system, so it doesn't need any of the feature that ZK provides.

@merlimat
Copy link
Contributor

Why we need to remove zk from pulsar standalone. It can be an option.
I can still use pulsar standalone to test some metadata storing on zk.

For general users, it's much better to not use ZK for standalone.

The primary role of standalone is not being a (Pulsar) developer tool.

@merlimat
Copy link
Contributor

Also we should state clearly that using this implementation is not possible for production usage because each broker will have its own copy of the data, as RocksDB is not a distributed database.

One can also use Standalone for production, in determined environments, so the distinction here is not very meaningful.

The distinction is : distributed vs standalone

@hezhangjian
Copy link
Member

@merlimat We already have a metadata interface, so I think add rocksdb as an option is not much difference with remove zookeeper. If people who use zookeeper standalone for production, they can still use zookeeper standalone. Or they have to migrate the metadata from zookeeper to rocksdb. What do you think?

@merlimat
Copy link
Contributor

@merlimat We already have a metadata interface, so I think add rocksdb as an option is not much difference with remove zookeeper. If people who use zookeeper standalone for production, they can still use zookeeper standalone. Or they have to migrate the metadata from zookeeper to rocksdb. What do you think?

That's outside the scope of this PR and a full proposal would have to be submitted for this.

My short form opinion is:

  • If a standalone Pulsar is already there, keep using the existing format with ZooKeeper
  • If starting a fresh Standalone, default to RocksDb for metadata and Functions state store. Also, use local FS for storing functions jars
  • If user configure to use ZK, keep using ZK.

@Jason918
Copy link
Contributor Author

@eolivelli Any suggestions about this work?

category = CATEGORY_SERVER,
doc = "Configuration file path for local metadata store. It's supported by RocksdbMetadataStore for now."
)
private String metadataStoreConfigPath = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private String metadataStoreConfigPath = null;
private String rocksDbMetadataStorePath = null;

WDYT @merlimat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codelipenghui Do you mean rocksDbMetadataStoreConfigPath? This is used for configs. Data store path is defined by metadataURL.

@merlimat
Copy link
Contributor

@eolivelli PTAL again

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

But I am against moving to rocksdb as default option to Pulsar Standalone.
This is because with RocksDb it is not possible to connect to the metadatastore and inspect the contents or use other tools (like BK tools).

Standalone is very useful while developing Pulsar and to reproduce bugs and do troubleshooting.
BTW we will have a separate discussion om the ML for sure

@eolivelli eolivelli merged commit 7058d77 into apache:master Nov 29, 2021
zeo1995 pushed a commit to zeo1995/pulsar that referenced this pull request Dec 1, 2021
* up/master: (75 commits)
  [website][upgrade]feat: website upgrade / docs migration - 2.5.1 Get Started/Concepts and Architecture/Pulsar Schema (apache#13030)
  Fix environment variable assignment in startup scripts (apache#13025)
  update 2.8.x (apache#13029)
  [Doc] add tips for Pulsar tools (apache#13044)
  Suggest to use tlsPort instead of deprecated TlsEnable (apache#13039)
  Integration tests for function-worker rebalance and drain operations. (apache#13058)
  fix(functions): missing runtime set in GoInstanceConfig (apache#13031)
  [pulsar-admin] Add get-replicated-subscription-status command for topic (apache#12891)
  [Broker] Consider topics in pulsar/system namespace as system topics (apache#13050)
  Fix typo: correct sizeUint to sizeUnit (apache#13040)
  fix-12894 (apache#12896)
  Don't attempt to delete pending ack store unless transactions are enabled (apache#13041)
  [Perf] Evaluate the current protocol version once (apache#13045)
  Fix Issue apache#12885, Unordered consuming case in Key_Shared subscription (apache#12890)
  [broker]Optimize topicMaxMessageSize with topic local cache. (apache#12830)
  [PIP-105] Part-2 Support pluggable entry filter in Dispatcher (apache#12970)
  [website] Modify admin-api-topic.md document (apache#12996)
  add missed import (apache#13037)
  [metadata] Add RocksdbMetadataStore (apache#12776)
  [C] Add pulsar_client_subscribe_multi_topics and pulsar_client_subscribe_pattern (apache#12965)
  ...

# Conflicts:
#	site2/website-next/docusaurus.config.js
#	site2/website-next/versioned_sidebars/version-2.6.1-sidebars.json
#	site2/website-next/versions.json
fxbing pushed a commit to fxbing/pulsar that referenced this pull request Dec 19, 2021
@BewareMyPower
Copy link
Contributor

I just have a question. When we use RocksDB as Pulsar standalone's metadata store, how do we check the metadata like we did via bin/pulsar zookeeper-shell?

@Jason918
Copy link
Contributor Author

I just have a question. When we use RocksDB as Pulsar standalone's metadata store, how do we check the metadata like we did via bin/pulsar zookeeper-shell?

There is a PIP for this. See #13346

@Anonymitaet Anonymitaet added the doc-complete Your PR changes impact docs and the related docs have been already added. label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-complete Your PR changes impact docs and the related docs have been already added. doc-not-needed Your PR changes do not impact docs type/feature The PR added a new feature or issue requested a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants