Skip to content

feat(core-clp)!: Change archive format version from a counter to a SemVer value.#698

Merged
davemarco merged 14 commits into
y-scope:mainfrom
davemarco:semver
Feb 3, 2025
Merged

feat(core-clp)!: Change archive format version from a counter to a SemVer value.#698
davemarco merged 14 commits into
y-scope:mainfrom
davemarco:semver

Conversation

@davemarco

@davemarco davemarco commented Jan 28, 2025

Copy link
Copy Markdown
Contributor

Description

This PR changes the archive version format from a uint_16 counter to a uint_32 semver. It is breaking

The major version is allocated 8 bits, and the minor version is allocated 8 bits, the patch version is allocated 16 bits.

The main purpose of this PR is to bring the OSS main branch into alignment with a private branch, to make the reading each others archives possible.

I removed the development version flag since it isn't really compatible with semver. Moreover, it seemed unused as all released versions were "development".

Version

The new starting version will be 0.1.0, and will be independent from the existing private branch version 0.1.1. The following describes the new versioning system for public and private branches.

We can now support the following cases:

Private archive format Public archive format
Private code can read Yes No
Public code can read Yes Yes

Achieved using two version numbers as follows.

Public archive version Private archive version Meaning
0.0.0 x.y.z A private archive version that's incompatible with any public version.
a.b.c x.y.z A private archive version that's compatible with a specific public version.
a.b.c 0.0.0 A public archive version. The private archive version is ignored.

Note how we treat 0.0.0 as an invalid version number and instead use it to indicate incompatibility.

The private version number will be stored as new field in the archive metadata(To be added in PR #700); however, the public code can ignore it entirely, while the private code can use it to determine the private archive format
version, without interfering with the public archive format version.

Validation performed

Compressed and decompressed an archive successfully. Verified, as expected, that a failure occurs when reading older archives.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Enhanced archive format versioning with explicit major, minor, and patch version support.
    • Expanded archive format version type to support a larger range of version numbers.
  • Refactor

    • Updated archive format version constant structure for clarity.
    • Modified type definition for archive format version to use a 32-bit integer.
    • Changed initialization of archive format version in metadata to the default version.
  • Technical Improvement

    • Improved version tracking mechanism with a more granular versioning approach.
    • Updated compatibility checks for archive format version.

@coderabbitai

coderabbitai Bot commented Jan 28, 2025

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@davemarco has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 25 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between ace79e3 and 3063f6b.

📒 Files selected for processing (4)
  • components/core/src/clp/streaming_archive/ArchiveMetadata.hpp (1 hunks)
  • components/core/src/clp/streaming_archive/Constants.hpp (1 hunks)
  • components/core/src/clp/streaming_archive/reader/Archive.cpp (1 hunks)
  • components/core/src/clp/streaming_archive/writer/Archive.cpp (1 hunks)

Walkthrough

The pull request introduces modifications to archive format versioning in the CLP (Compressed Log Processing) library. The changes involve updating the data type for archive_format_version_t from a 16-bit to a 32-bit unsigned integer and restructuring how version information is represented. The modifications include removing the previous version constant and defining new constants for major, minor, and patch versions, while maintaining a single combined version constant.

Changes

File Change Summary
components/core/src/clp/Defs.h Changed archive_format_version_t from uint16_t to uint32_t and removed related comments.
components/core/src/clp/streaming_archive/Constants.hpp Removed cArchiveFormatVersion constant and added new version constants:
- cDefaultVersionMajor (0)
- cDefaultVersionMinor (1)
- cDefaultVersionPatch (0)
Introduced cDefaultVersion and cCustomVersion constants.
components/core/src/clp/streaming_archive/reader/Archive.cpp Updated open method to compare against cArchiveFormatVersion::cDefaultVersion instead of cArchiveFormatVersion.
components/core/src/clp/streaming_archive/writer/Archive.cpp Updated open method to initialize m_local_metadata with cArchiveFormatVersion::cDefaultVersion instead of cArchiveFormatVersion.
components/core/src/clp/streaming_archive/ArchiveMetadata.hpp Updated initialization of m_archive_format_version from cArchiveFormatVersion to cArchiveFormatVersion::cDefaultVersion.

Suggested reviewers

  • wraymo

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
components/core/src/clp/Defs.h (1)

22-22: Add max constant for consistency with other type definitions.

The type change from uint16_t to uint32_t looks good and aligns with the semver format requirements. However, for consistency with other type definitions in this file, consider adding a max constant.

using archive_format_version_t = uint32_t;
+constexpr archive_format_version_t cArchiveFormatVersionMax
+        = std::numeric_limits<archive_format_version_t>::max();
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 029b50b and bf3de6f.

📒 Files selected for processing (2)
  • components/core/src/clp/Defs.h (1 hunks)
  • components/core/src/clp/streaming_archive/Constants.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/streaming_archive/Constants.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🪛 GitHub Actions: clp-lint
components/core/src/clp/streaming_archive/Constants.hpp

[error] 11-12: Code formatting violations detected. The code should be formatted using clang-format.

⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-focal-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-focal-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-13, true)

Comment thread components/core/src/clp/streaming_archive/Constants.hpp Outdated
Comment thread components/core/src/clp/streaming_archive/Constants.hpp Outdated
@davemarco davemarco changed the title feat!(clp): Upgrade archive version format to semVer. feat(clp)!: Upgrade archive version format to semVer. Jan 28, 2025
@davemarco davemarco changed the title feat(clp)!: Upgrade archive version format to semVer. feat(clp)!: Upgrade archive format version from counter to semVer. Jan 28, 2025
@davemarco davemarco requested a review from haiqi96 January 28, 2025 03:04

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a761bf6 and c562142.

📒 Files selected for processing (1)
  • components/core/src/clp/streaming_archive/Constants.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/streaming_archive/Constants.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
components/core/src/clp/streaming_archive/Constants.hpp (1)
Learnt from: davemarco
PR: y-scope/clp#698
File: components/core/src/clp/streaming_archive/Constants.hpp:9-9
Timestamp: 2025-01-28T03:02:30.542Z
Learning: In the CLP project's archive version format (semver), the patch version uses 16 bits (uint16_t), while major version uses 8 bits and minor version uses 16 bits.
⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-focal-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-focal-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build (macos-latest)
🔇 Additional comments (1)
components/core/src/clp/streaming_archive/Constants.hpp (1)

10-13: LGTM! Remember to address formatting.

The bit allocation for combining version components is correct. However, please note that there was a previous pipeline failure regarding formatting.

Comment thread components/core/src/clp/streaming_archive/Constants.hpp Outdated
@haiqi96

haiqi96 commented Jan 28, 2025

Copy link
Copy Markdown
Contributor

Looks reasonable. Does @LinZhihao-723 agree with the version number 0.1.2?

@LinZhihao-723

Copy link
Copy Markdown
Member

Looks reasonable. Does @LinZhihao-723 agree with the version number 0.1.2?

The private branch is using 0.1.1. Considering we don't port all archive format changes from the private branch and also have new changes in the OSS archive format, I can see there will be problems if we use 0.1.2 here. Can we pick a different minor version? Either 0.2.x or 0.0.x?
@kirkrodrigues what do you think?

@davemarco

davemarco commented Jan 29, 2025

Copy link
Copy Markdown
Contributor Author

Can we pick a different minor version? Either 0.2.x or 0.0.x?

Note that current behaviour is to throw an exception when the version does not match exactly. If versions aren't the same, we will need some other behaviour. Maybe checking only if major/minor versions match? @LinZhihao-723

It potentially possible to add an extra field at the end in open source metadata like "branch": "OSS" (default to "branch":"private"). To see if the version is for open source or private. However, the behaviour is not well defined in msgpack docs so would prefer to avoid. It also only seems to work for last serialized field.

@LinZhihao-723

LinZhihao-723 commented Jan 29, 2025

Copy link
Copy Markdown
Member

Can we pick a different minor version? Either 0.2.x or 0.0.x?

Note that current behaviour is to throw an exception when the version does not match exactly. If versions aren't the same, we will need some other behaviour. Maybe checking only if major/minor versions match? @LinZhihao-723

It potentially possible to add an extra field at the end in open source metadata like "branch": "OSS" (default to "branch":"private"). To see if the version is for open source or private. However, the behaviour is not well defined in msgpack docs so would prefer to avoid. It also only seems to work for last serialized field.

Yeah we might some special handling to check if a version number is supported. We have the same problem in IR version. I think the solution should be integrated a third-party SemVer checker to handle it in future PRs.
For the extra field, I prefer to not include it and rely on the given version number.

@davemarco

davemarco commented Jan 29, 2025

Copy link
Copy Markdown
Contributor Author

@LinZhihao-723 Okay so potentially does it work to use 0.2.x, or 0.0.x and only throw an exception if the major version differs? Or we should provide the code with specific ranges of valid versions in a future PR?

@kirkrodrigues

Copy link
Copy Markdown
Member

It's somewhat inevitable that as long as we deploy private, customized versions of CLP, then at some point, the private and public formats will diverge (although, we should try to avoid that for as long as possible). In these cases, using a single set of version numbers for both private and public formats will be too cumbersome and confusing, so I think we should try to support some way of using namespaces to separate private and public versions.

I think the conventional solution namespacing versions is to use the prerelease part of a semantic version, which in the case of the archive format, would be stored as a field in the archive metadata. Since we're making a breaking change to the public archive format, it's probably a good time to add this field to the archive metadata and then figure out some way to support compatibility for the private archive format. In the worst case, we would probably need to maintain a fork of the code for private and public formats, but since our current goal is to unify the formats, we can hopefully avoid that soon.

If we go with the above proposal, then perhaps we can start our version numbers at 0.2.0? Ideally, we'd pick 0.1.0, but 0.2.0 is probably a good compromise to avoid overlapping with the private format version and to indicate that the first public format using semver isn't really the first public format in existence.

@davemarco

davemarco commented Jan 30, 2025

Copy link
Copy Markdown
Contributor Author

I am okay with kirks idea. I am also proposing another for feedback.

We can keep two versions on each archive:

  • A branch version
  • An OSS compatible version

When the open source reads a private branch archive, it only checks the OSS compatible version, and ignores the branch version.

For example, the private branch would look like this.
Branch version: 0.1.1
OSS compatible version: 0.1.0

The OSS would look like this
Branch version: 0.1.0
OSS compatible version: 0.1.0

The private branch could make changes and their own version as they wish. If the version is no longer compatible maybe they can just set the OSS compatible version to 0.0.0.

@kirkrodrigues

Copy link
Copy Markdown
Member

Based on our offline discussion, we want to support the following cases:

Private archive format Public archive format
Private code can read Yes No
Public code can read Yes Yes

The way to support this is to have two separate version numbers like @davemarco suggested:

Public archive version Private archive version Meaning
0.0.0 x.y.z A private archive version that's incompatible with any public version.
a.b.c x.y.z A private archive version that's compatible with a specific public version.
a.b.c 0.0.0 A public archive version. The private archive version is ignored.

Note how we treat 0.0.0 as an invalid version number and instead use it to indicate incompatibility.

We can store the private archive format version in a reserved field (i.e., a field that's unused and
meaningless to the public code but can be used by the private code). That way the public code can
ignore it entirely, while the private code can use it to determine the private archive format
version, without interfering with the public archive format version.

For the archive format version, we can store this reserved field in the archive metadata. For the
single-file encoding version, we can store this reserved field in the header.

@davemarco

davemarco commented Jan 30, 2025

Copy link
Copy Markdown
Contributor Author

I like what @kirkrodrigues posted. @LinZhihao-723, any other suggestions or feedback

@LinZhihao-723

Copy link
Copy Markdown
Member

I agree with @kirkrodrigues' proposal. Can we add this into the single file archive design doc?

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c562142 and 7891f88.

📒 Files selected for processing (3)
  • components/core/src/clp/streaming_archive/Constants.hpp (1 hunks)
  • components/core/src/clp/streaming_archive/reader/Archive.cpp (1 hunks)
  • components/core/src/clp/streaming_archive/writer/Archive.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
components/core/src/clp/streaming_archive/writer/Archive.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_archive/reader/Archive.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_archive/Constants.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
components/core/src/clp/streaming_archive/Constants.hpp (1)
Learnt from: davemarco
PR: y-scope/clp#698
File: components/core/src/clp/streaming_archive/Constants.hpp:9-9
Timestamp: 2025-01-28T03:02:30.542Z
Learning: In the CLP project's archive version format (semver), the patch version uses 16 bits (uint16_t), while major version uses 8 bits and minor version uses 16 bits.
🪛 GitHub Actions: clp-lint
components/core/src/clp/streaming_archive/Constants.hpp

[error] 18-20: Code is not properly formatted according to clang-format standards. Multiple formatting violations in comment block.


[error] 44-44: Code is not properly formatted according to clang-format standards. Missing space before '|' operator.


[error] 48-48: Code is not properly formatted according to clang-format standards. Incorrect indentation of closing brace.

⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: ubuntu-focal-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-focal-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build (macos-latest)
🔇 Additional comments (4)
components/core/src/clp/streaming_archive/reader/Archive.cpp (1)

68-71: LGTM! Version check updated correctly.

The version check has been properly updated to use the new semver format through version::cPublicVersion. The error handling remains appropriate for incompatible versions.

components/core/src/clp/streaming_archive/writer/Archive.cpp (1)

72-72: LGTM! Version initialization updated correctly.

The archive metadata initialization has been properly updated to use the new semver format through version::cPublicVersion.

components/core/src/clp/streaming_archive/Constants.hpp (2)

17-39: Well-documented versioning scheme!

The documentation clearly explains the versioning scheme and compatibility between public and private branches. The table format makes it easy to understand the different scenarios.

🧰 Tools
🪛 GitHub Actions: clp-lint

[error] 18-20: Code is not properly formatted according to clang-format standards. Multiple formatting violations in comment block.


40-47: Version constants align with PR objectives.

The version constants are correctly defined:

  • Major (8 bits): uint8_t
  • Minor (8 bits): uint8_t
  • Patch (16 bits): uint16_t
  • Combined into a 32-bit version using appropriate bit shifts

The initial version is set to 0.1.0, which aligns with the PR objectives for the public branch.

🧰 Tools
🪛 GitHub Actions: clp-lint

[error] 44-44: Code is not properly formatted according to clang-format standards. Missing space before '|' operator.

Comment thread components/core/src/clp/streaming_archive/Constants.hpp Outdated
@davemarco

davemarco commented Jan 31, 2025

Copy link
Copy Markdown
Contributor Author

I agree with @kirkrodrigues' proposal. Can we add this into the single file archive design doc?

Sure I can add something. For now I put some better docs in the code.

I also added private version. Will add to metadata in metadataPR after this is merged.

@haiqi96, can review my new changes?

I tested compression\decompression. Clg. And unit tests.


namespace version {
// Version(s) for the disk format of the archive. The public branch has only one version; however,
// private branches have two. The private branch has its own independent disk format version, and a

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally feel the The private branch has its is kind of redundant. can we combine the two sentences into one?

@davemarco davemarco Jan 31, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't combine, I kind of like how it specifies that the public has one version, and the private has two. However, I did get rid of The private branch has its, and shortened the second sentence. lmk

// Version(s) for the disk format of the archive. The public branch has only one version; however,
// private branches have two. The private branch has its own independent disk format version, and a
// second version to specify which public release can read its private disk format. The following
// provides instructions to set the versions correctly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:
Compared to instruction I feel it's more like an example, but not a big deal anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changes to say example instead of intructions

@davemarco

Copy link
Copy Markdown
Contributor Author

Made @haiqi96 changes

@davemarco davemarco requested a review from haiqi96 January 31, 2025 17:10
haiqi96
haiqi96 previously approved these changes Jan 31, 2025

@haiqi96 haiqi96 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Just need to figure out the title.

@davemarco

davemarco commented Jan 31, 2025

Copy link
Copy Markdown
Contributor Author

Proposing
feat(core-clp)!: Upgrade archive format version from counter to semVer; Add new private format version.
@kirkrodrigues, let us know if there is anything else before I merge

@davemarco davemarco changed the title feat(clp)!: Upgrade archive format version from counter to semVer. feat(clp)!: Upgrade archive format version from counter to semVer; Add new private format version. Jan 31, 2025
@davemarco davemarco changed the title feat(clp)!: Upgrade archive format version from counter to semVer; Add new private format version. feat(core-clp)!: Upgrade archive format version from counter to semVer; Add new private format version. Jan 31, 2025
cPublicVersionMajor << 24 | cPublicVersionMinor << 16 | cPublicVersionPatch
};
constexpr archive_format_version_t cNullVersion{0};
constexpr archive_format_version_t cPrivateVersion{cNullVersion};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my original comment:

a field that's unused and meaningless to the public code but can be used by the private code

I was expecting we'd implement the private version as a reserved field in the archive metadata. But I guess we can't do that once we switch to MessagePack since each field in the archive metadata needs a key; is that why you're doing this here?

If so, how about this instead:

  1. Let's call the field something like "ArchiveFormatCustomVersion" instead of "PrivateVersion".
  2. Let's remove all of the discussion about public/private versions. As my original comment mentions, the public code should not need to care about private versions and so I don't even think it needs to document it. Strictly speaking, the private code or some other fork of the code can use the custom version field (or the combination of the public version and custom version) however it wants. It should be the job of the downstream code to document it.
  3. Let's call the "public" version simply "ArchiveFormatVersion".

@davemarco davemarco Jan 31, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we switch to MessagePack since each field in the archive metadata needs a key; is that why you're doing this here?

Yes, this is why I documented since it will show up as a key in MsgPack. Note while we can just not have a key in public, we can still read the MessagePack type with a missing key, I would prefer the private and public MessagePack type to be the same. However, we can remove documentation which also makes sense. See latest review

constexpr char cMetadataDBFileName[] = "metadata.db";
constexpr char cSchemaFileName[] = "schema.txt";

namespace version {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current convention for constants in this file is to name the namespace as cNamespace and then the constants inside as ConstantXxx. I know that disagrees with our clang-tidy rules, but in this case, I would stick with the existing conventions to avoid confusion until we refactor this file to address the clang-tidy warnings (especially the refactoring will propagate into a lot of files).

If you really want to namespace the version fields (which is fair), how about cArchiveFormatVersion with internal constants "Default" and "Custom"? "Default" isn't the greatest name, but I can't think of something better.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took your suggestion. Also in general it seems like people are recommending to me to move to new clang-tidy even in old files. But maybe this is not our recommendation?

@davemarco davemarco Jan 31, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I can just default initialize the custom version to 0 in the metadata class. Then we don't need to specify default or custom here. And can just use cVersion. Let me know

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in general it seems like people are recommending to me to move to new clang-tidy even in old files. But maybe this is not our recommendation?

It's our recommendation in other cases. In this case, I'm suggesting an exception until we refactor all other constants / make a more long-term decision about how to refactor them.

Also, I can just default initialize the custom version to 0 in the metadata class. Then we don't need to specify default or custom here. And can just use cVersion. Let me know

True. I like that idea.

Comment on lines +18 to +40
// Version(s) for the disk format of the archive. The public branch uses a single version.
// Private branches have two versions: an independent disk format version, and a second version to
// specify which public release can read its private disk format. Below are examples to help set
// the versions correctly.
//
// Public branch with version a.b.c
//
// | Public version | Private version |
// | --------------- | ---------------- |
// | a.b.c | 0.0.0 |
//
// Private branch with version x.y.z compatible with public version a.b.c
//
// | Public version | Private version |
// | --------------- | ---------------- |
// | a.b.c | x.y.z |
//
// Private branch with version x.y.z incompatible with any public version
//
// | Public version | Private version |
// | --------------- | ---------------- |
// | 0.0.0 | x.y.z |
//

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we consider to use /**/ style docstrings instead of // which is for inline comments?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this documentation per kirk, so ignoring

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
components/core/src/clp/streaming_archive/ArchiveMetadata.hpp (1)

93-93: LGTM! Consider adding private version field.

The initialization with cArchiveFormatVersion::cDefaultVersion aligns with the new semver format. As discussed in the PR, consider adding a reserved field in ArchiveMetadata to store the private archive format version, allowing OSS code to read private branch archives by checking only the OSS compatible version.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7891f88 and a2e6c27.

📒 Files selected for processing (4)
  • components/core/src/clp/streaming_archive/ArchiveMetadata.hpp (1 hunks)
  • components/core/src/clp/streaming_archive/Constants.hpp (1 hunks)
  • components/core/src/clp/streaming_archive/reader/Archive.cpp (1 hunks)
  • components/core/src/clp/streaming_archive/writer/Archive.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • components/core/src/clp/streaming_archive/writer/Archive.cpp
  • components/core/src/clp/streaming_archive/reader/Archive.cpp
  • components/core/src/clp/streaming_archive/Constants.hpp
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/streaming_archive/ArchiveMetadata.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
components/core/src/clp/streaming_archive/ArchiveMetadata.hpp (1)
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:122-132
Timestamp: 2025-01-29T21:09:06.878Z
Learning: The `creation_idx` field in `ArchiveMetadata` is used for tracking archive order in split archives. When archives aren't split, it can be safely initialized to zero.
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: ubuntu-focal-static-linked-bins
  • GitHub Check: ubuntu-focal-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build (macos-latest)
  • GitHub Check: build-macos (macos-13, true)

Comment on lines +18 to +21
constexpr uint8_t cDefaultVersionMajor{0};
constexpr uint8_t cDefaultVersionMinor{1};
constexpr uint16_t cDefaultVersionPatch{0};
constexpr archive_format_version_t cDefaultVersion{

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep these inside the namespace, we should remove the prefix c to follow the convention of the other constants in this file (again, this is an exception to our clang-tidy rules).

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the PR description to reflect the final decisions/code?

For the PR title, how about:

feat(core-clp)!: Change archive format version from a counter to a SemVer value.

@davemarco davemarco changed the title feat(core-clp)!: Upgrade archive format version from counter to semVer; Add new private format version. feat(core-clp)!: Change archive format version from a counter to a SemVer value. Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants