Skip to content

Store metadata in ZK with binary protobuf format #281

@merlimat

Description

@merlimat

In Pulsar we are storing a lot of metadata in ZooKeeper using different formats:

  • BookKeeper ledgers: Protobuf Text
  • Managed Ledgers and cursors: Protobuf Text
  • Broker and namespace bundles load reports: JSON

Using text formats has been good for quick debugging sessions without special tools but has drawbacks:

  • Size of data stored in ZK can be significant when many topics (>1M) are active in a cluster. Protobuf text format is like json and needs to repeat all the field names each time.
  • Speed of serializing/deserializing (binary formats are always faster to parse)
  • Garbage generated (with binary format we could switch to the custom protobuf code generator to generate reusable objects)
  • Backward compatibility. Text protobuf is not backward compatible (unlike the binary parser), it will fail to parse unknown fields (and there's no way to change that). This makes very difficult to change the format (typically we would do 1 release that can understand the new format but still writes the old one, then next release to write new format). Backward compatibility is key to ensure we can rollback a release if some issue is detected during deployment.

Of the 3 categories listed above, I don't think we should bother about load reports, because they're not where the bulk of metadata is.

My proposal would be:

  • 1.17 release:

    1. Add the code to read both formats
    2. A config switch to enable writing binary format for ML and cursors data in ZK, with default to text format.
    3. Add tools to dump the content of a ML for human consumption
  • 1.18 release:

    1. Make binary default
    2. Remove config switch for text/binary

Once the change has been implemented it would be easy to pre-verify the size difference and eventually think of storing even BK ledgers in binary format.

cc: @saandrews @rdhabalia @msb-at-yahoo @sschepens

Metadata

Metadata

Assignees

Labels

type/enhancementThe enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions