feat(core-clp)!: Change archive format version from a counter to a SemVer value.#698
Conversation
|
Warning Rate limit exceeded@davemarco has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 25 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (4)
WalkthroughThe pull request introduces modifications to archive format versioning in the CLP (Compressed Log Processing) library. The changes involve updating the data type for Changes
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
components/core/src/clp/Defs.h (1)
22-22: Add max constant for consistency with other type definitions.The type change from
uint16_ttouint32_tlooks good and aligns with the semver format requirements. However, for consistency with other type definitions in this file, consider adding a max constant.using archive_format_version_t = uint32_t; +constexpr archive_format_version_t cArchiveFormatVersionMax + = std::numeric_limits<archive_format_version_t>::max();
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp/Defs.h(1 hunks)components/core/src/clp/streaming_archive/Constants.hpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/streaming_archive/Constants.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
🪛 GitHub Actions: clp-lint
components/core/src/clp/streaming_archive/Constants.hpp
[error] 11-12: Code formatting violations detected. The code should be formatted using clang-format.
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-focal-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-focal-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: build-macos (macos-13, true)
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/core/src/clp/streaming_archive/Constants.hpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/streaming_archive/Constants.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
📓 Learnings (1)
components/core/src/clp/streaming_archive/Constants.hpp (1)
Learnt from: davemarco
PR: y-scope/clp#698
File: components/core/src/clp/streaming_archive/Constants.hpp:9-9
Timestamp: 2025-01-28T03:02:30.542Z
Learning: In the CLP project's archive version format (semver), the patch version uses 16 bits (uint16_t), while major version uses 8 bits and minor version uses 16 bits.
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-focal-static-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-focal-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: build-macos (macos-13, true)
- GitHub Check: build (macos-latest)
🔇 Additional comments (1)
components/core/src/clp/streaming_archive/Constants.hpp (1)
10-13: LGTM! Remember to address formatting.The bit allocation for combining version components is correct. However, please note that there was a previous pipeline failure regarding formatting.
|
Looks reasonable. Does @LinZhihao-723 agree with the version number 0.1.2? |
The private branch is using |
Note that current behaviour is to throw an exception when the version does not match exactly. If versions aren't the same, we will need some other behaviour. Maybe checking only if major/minor versions match? @LinZhihao-723 It potentially possible to add an extra field at the end in open source metadata like "branch": "OSS" (default to "branch":"private"). To see if the version is for open source or private. However, the behaviour is not well defined in msgpack docs so would prefer to avoid. It also only seems to work for last serialized field. |
Yeah we might some special handling to check if a version number is supported. We have the same problem in IR version. I think the solution should be integrated a third-party SemVer checker to handle it in future PRs. |
|
@LinZhihao-723 Okay so potentially does it work to use 0.2.x, or 0.0.x and only throw an exception if the major version differs? Or we should provide the code with specific ranges of valid versions in a future PR? |
|
It's somewhat inevitable that as long as we deploy private, customized versions of CLP, then at some point, the private and public formats will diverge (although, we should try to avoid that for as long as possible). In these cases, using a single set of version numbers for both private and public formats will be too cumbersome and confusing, so I think we should try to support some way of using namespaces to separate private and public versions. I think the conventional solution namespacing versions is to use the prerelease part of a semantic version, which in the case of the archive format, would be stored as a field in the archive metadata. Since we're making a breaking change to the public archive format, it's probably a good time to add this field to the archive metadata and then figure out some way to support compatibility for the private archive format. In the worst case, we would probably need to maintain a fork of the code for private and public formats, but since our current goal is to unify the formats, we can hopefully avoid that soon. If we go with the above proposal, then perhaps we can start our version numbers at 0.2.0? Ideally, we'd pick 0.1.0, but 0.2.0 is probably a good compromise to avoid overlapping with the private format version and to indicate that the first public format using semver isn't really the first public format in existence. |
|
I am okay with kirks idea. I am also proposing another for feedback. We can keep two versions on each archive:
When the open source reads a private branch archive, it only checks the OSS compatible version, and ignores the branch version. For example, the private branch would look like this. The OSS would look like this The private branch could make changes and their own version as they wish. If the version is no longer compatible maybe they can just set the OSS compatible version to 0.0.0. |
|
Based on our offline discussion, we want to support the following cases:
The way to support this is to have two separate version numbers like @davemarco suggested:
Note how we treat We can store the private archive format version in a reserved field (i.e., a field that's unused and For the archive format version, we can store this reserved field in the archive metadata. For the |
|
I like what @kirkrodrigues posted. @LinZhihao-723, any other suggestions or feedback |
|
I agree with @kirkrodrigues' proposal. Can we add this into the single file archive design doc? |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
components/core/src/clp/streaming_archive/Constants.hpp(1 hunks)components/core/src/clp/streaming_archive/reader/Archive.cpp(1 hunks)components/core/src/clp/streaming_archive/writer/Archive.cpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
components/core/src/clp/streaming_archive/writer/Archive.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
components/core/src/clp/streaming_archive/reader/Archive.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
components/core/src/clp/streaming_archive/Constants.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
📓 Learnings (1)
components/core/src/clp/streaming_archive/Constants.hpp (1)
Learnt from: davemarco
PR: y-scope/clp#698
File: components/core/src/clp/streaming_archive/Constants.hpp:9-9
Timestamp: 2025-01-28T03:02:30.542Z
Learning: In the CLP project's archive version format (semver), the patch version uses 16 bits (uint16_t), while major version uses 8 bits and minor version uses 16 bits.
🪛 GitHub Actions: clp-lint
components/core/src/clp/streaming_archive/Constants.hpp
[error] 18-20: Code is not properly formatted according to clang-format standards. Multiple formatting violations in comment block.
[error] 44-44: Code is not properly formatted according to clang-format standards. Missing space before '|' operator.
[error] 48-48: Code is not properly formatted according to clang-format standards. Incorrect indentation of closing brace.
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: ubuntu-focal-static-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-focal-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: build-macos (macos-13, true)
- GitHub Check: build (macos-latest)
🔇 Additional comments (4)
components/core/src/clp/streaming_archive/reader/Archive.cpp (1)
68-71: LGTM! Version check updated correctly.The version check has been properly updated to use the new semver format through
version::cPublicVersion. The error handling remains appropriate for incompatible versions.components/core/src/clp/streaming_archive/writer/Archive.cpp (1)
72-72: LGTM! Version initialization updated correctly.The archive metadata initialization has been properly updated to use the new semver format through
version::cPublicVersion.components/core/src/clp/streaming_archive/Constants.hpp (2)
17-39: Well-documented versioning scheme!The documentation clearly explains the versioning scheme and compatibility between public and private branches. The table format makes it easy to understand the different scenarios.
🧰 Tools
🪛 GitHub Actions: clp-lint
[error] 18-20: Code is not properly formatted according to clang-format standards. Multiple formatting violations in comment block.
40-47: Version constants align with PR objectives.The version constants are correctly defined:
- Major (8 bits): uint8_t
- Minor (8 bits): uint8_t
- Patch (16 bits): uint16_t
- Combined into a 32-bit version using appropriate bit shifts
The initial version is set to 0.1.0, which aligns with the PR objectives for the public branch.
🧰 Tools
🪛 GitHub Actions: clp-lint
[error] 44-44: Code is not properly formatted according to clang-format standards. Missing space before '|' operator.
Sure I can add something. For now I put some better docs in the code. I also added private version. Will add to metadata in metadataPR after this is merged. @haiqi96, can review my new changes? I tested compression\decompression. Clg. And unit tests. |
|
|
||
| namespace version { | ||
| // Version(s) for the disk format of the archive. The public branch has only one version; however, | ||
| // private branches have two. The private branch has its own independent disk format version, and a |
There was a problem hiding this comment.
I personally feel the The private branch has its is kind of redundant. can we combine the two sentences into one?
There was a problem hiding this comment.
I didn't combine, I kind of like how it specifies that the public has one version, and the private has two. However, I did get rid of The private branch has its, and shortened the second sentence. lmk
| // Version(s) for the disk format of the archive. The public branch has only one version; however, | ||
| // private branches have two. The private branch has its own independent disk format version, and a | ||
| // second version to specify which public release can read its private disk format. The following | ||
| // provides instructions to set the versions correctly. |
There was a problem hiding this comment.
Nit:
Compared to instruction I feel it's more like an example, but not a big deal anyway.
There was a problem hiding this comment.
I changes to say example instead of intructions
|
Made @haiqi96 changes |
haiqi96
left a comment
There was a problem hiding this comment.
Looks good to me.
Just need to figure out the title.
|
Proposing |
| cPublicVersionMajor << 24 | cPublicVersionMinor << 16 | cPublicVersionPatch | ||
| }; | ||
| constexpr archive_format_version_t cNullVersion{0}; | ||
| constexpr archive_format_version_t cPrivateVersion{cNullVersion}; |
There was a problem hiding this comment.
Based on my original comment:
a field that's unused and meaningless to the public code but can be used by the private code
I was expecting we'd implement the private version as a reserved field in the archive metadata. But I guess we can't do that once we switch to MessagePack since each field in the archive metadata needs a key; is that why you're doing this here?
If so, how about this instead:
- Let's call the field something like "ArchiveFormatCustomVersion" instead of "PrivateVersion".
- Let's remove all of the discussion about public/private versions. As my original comment mentions, the public code should not need to care about private versions and so I don't even think it needs to document it. Strictly speaking, the private code or some other fork of the code can use the custom version field (or the combination of the public version and custom version) however it wants. It should be the job of the downstream code to document it.
- Let's call the "public" version simply "ArchiveFormatVersion".
There was a problem hiding this comment.
once we switch to MessagePack since each field in the archive metadata needs a key; is that why you're doing this here?
Yes, this is why I documented since it will show up as a key in MsgPack. Note while we can just not have a key in public, we can still read the MessagePack type with a missing key, I would prefer the private and public MessagePack type to be the same. However, we can remove documentation which also makes sense. See latest review
| constexpr char cMetadataDBFileName[] = "metadata.db"; | ||
| constexpr char cSchemaFileName[] = "schema.txt"; | ||
|
|
||
| namespace version { |
There was a problem hiding this comment.
The current convention for constants in this file is to name the namespace as cNamespace and then the constants inside as ConstantXxx. I know that disagrees with our clang-tidy rules, but in this case, I would stick with the existing conventions to avoid confusion until we refactor this file to address the clang-tidy warnings (especially the refactoring will propagate into a lot of files).
If you really want to namespace the version fields (which is fair), how about cArchiveFormatVersion with internal constants "Default" and "Custom"? "Default" isn't the greatest name, but I can't think of something better.
There was a problem hiding this comment.
I took your suggestion. Also in general it seems like people are recommending to me to move to new clang-tidy even in old files. But maybe this is not our recommendation?
There was a problem hiding this comment.
Also, I can just default initialize the custom version to 0 in the metadata class. Then we don't need to specify default or custom here. And can just use cVersion. Let me know
There was a problem hiding this comment.
Also in general it seems like people are recommending to me to move to new clang-tidy even in old files. But maybe this is not our recommendation?
It's our recommendation in other cases. In this case, I'm suggesting an exception until we refactor all other constants / make a more long-term decision about how to refactor them.
Also, I can just default initialize the custom version to 0 in the metadata class. Then we don't need to specify default or custom here. And can just use cVersion. Let me know
True. I like that idea.
| // Version(s) for the disk format of the archive. The public branch uses a single version. | ||
| // Private branches have two versions: an independent disk format version, and a second version to | ||
| // specify which public release can read its private disk format. Below are examples to help set | ||
| // the versions correctly. | ||
| // | ||
| // Public branch with version a.b.c | ||
| // | ||
| // | Public version | Private version | | ||
| // | --------------- | ---------------- | | ||
| // | a.b.c | 0.0.0 | | ||
| // | ||
| // Private branch with version x.y.z compatible with public version a.b.c | ||
| // | ||
| // | Public version | Private version | | ||
| // | --------------- | ---------------- | | ||
| // | a.b.c | x.y.z | | ||
| // | ||
| // Private branch with version x.y.z incompatible with any public version | ||
| // | ||
| // | Public version | Private version | | ||
| // | --------------- | ---------------- | | ||
| // | 0.0.0 | x.y.z | | ||
| // |
There was a problem hiding this comment.
Shall we consider to use /**/ style docstrings instead of // which is for inline comments?
There was a problem hiding this comment.
I removed this documentation per kirk, so ignoring
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
components/core/src/clp/streaming_archive/ArchiveMetadata.hpp (1)
93-93: LGTM! Consider adding private version field.The initialization with
cArchiveFormatVersion::cDefaultVersionaligns with the new semver format. As discussed in the PR, consider adding a reserved field inArchiveMetadatato store the private archive format version, allowing OSS code to read private branch archives by checking only the OSS compatible version.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
components/core/src/clp/streaming_archive/ArchiveMetadata.hpp(1 hunks)components/core/src/clp/streaming_archive/Constants.hpp(1 hunks)components/core/src/clp/streaming_archive/reader/Archive.cpp(1 hunks)components/core/src/clp/streaming_archive/writer/Archive.cpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- components/core/src/clp/streaming_archive/writer/Archive.cpp
- components/core/src/clp/streaming_archive/reader/Archive.cpp
- components/core/src/clp/streaming_archive/Constants.hpp
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/streaming_archive/ArchiveMetadata.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
📓 Learnings (1)
components/core/src/clp/streaming_archive/ArchiveMetadata.hpp (1)
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:122-132
Timestamp: 2025-01-29T21:09:06.878Z
Learning: The `creation_idx` field in `ArchiveMetadata` is used for tracking archive order in split archives. When archives aren't split, it can be safely initialized to zero.
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: ubuntu-focal-static-linked-bins
- GitHub Check: ubuntu-focal-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: build (macos-latest)
- GitHub Check: build-macos (macos-13, true)
| constexpr uint8_t cDefaultVersionMajor{0}; | ||
| constexpr uint8_t cDefaultVersionMinor{1}; | ||
| constexpr uint16_t cDefaultVersionPatch{0}; | ||
| constexpr archive_format_version_t cDefaultVersion{ |
There was a problem hiding this comment.
If we keep these inside the namespace, we should remove the prefix c to follow the convention of the other constants in this file (again, this is an exception to our clang-tidy rules).
kirkrodrigues
left a comment
There was a problem hiding this comment.
Can you update the PR description to reflect the final decisions/code?
For the PR title, how about:
feat(core-clp)!: Change archive format version from a counter to a SemVer value.
Description
This PR changes the archive version format from a uint_16 counter to a uint_32 semver. It is breaking
The major version is allocated 8 bits, and the minor version is allocated 8 bits, the patch version is allocated 16 bits.
The main purpose of this PR is to bring the OSS main branch into alignment with a private branch, to make the reading each others archives possible.
I removed the development version flag since it isn't really compatible with semver. Moreover, it seemed unused as all released versions were "development".
Version
The new starting version will be 0.1.0, and will be independent from the existing private branch version 0.1.1. The following describes the new versioning system for public and private branches.
We can now support the following cases:
Achieved using two version numbers as follows.
Note how we treat
0.0.0as an invalid version number and instead use it to indicate incompatibility.The private version number will be stored as new field in the archive metadata(To be added in PR #700); however, the public code can ignore it entirely, while the private code can use it to determine the private archive format
version, without interfering with the public archive format version.
Validation performed
Compressed and decompressed an archive successfully. Verified, as expected, that a failure occurs when reading older archives.
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Refactor
Technical Improvement