docs(clp-s): Add skeleton for clp-s design documentation; Write first half of clp-s single-file archive design document.#961
Conversation
WalkthroughNew documentation has been introduced for the "compressed-log-processor-structured" (clp-s) archive format. This includes an overview, background, and a detailed specification of the single-file archive format. The developer guide's table of contents is updated to reference this new section. The documentation is marked as under construction. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DocsIndex
participant CLPSIndex
participant Background
participant ArchiveFormat
User->>DocsIndex: Access developer guide index
DocsIndex->>CLPSIndex: Navigate to clp-s design section
CLPSIndex->>Background: Optionally view background
CLPSIndex->>ArchiveFormat: Optionally view archive format specification
Possibly related PRs
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
docs/src/dev-guide/design-clp-structured/background.md(1 hunks)docs/src/dev-guide/design-clp-structured/index.md(1 hunks)docs/src/dev-guide/design-clp-structured/single-file-archive-format.md(1 hunks)docs/src/dev-guide/index.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/src/dev-guide/design-clp-structured/index.md
[duplication] ~15-~15: Possible typo: you repeated a word.
Context: ... :gutter: 2 :::{grid-item-card} 🔗 background Background ^^^ Any necessary information to unders...
(ENGLISH_WORD_REPEAT_RULE)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
[style] ~4-~4: Consider a more concise word here.
Context: ...format is optimized for streaming reads in order to offer high decompression and search per...
(IN_ORDER_TO_PREMIUM)
[typographical] ~10-~10: Consider adding a comma here.
Context: ... about the design decisions behind clp-s please refer to [our paper on clp-s][μSlope] o...
(PLEASE_COMMA)
[duplication] ~56-~56: Possible typo: you repeated a word.
Context: ...og.dict"] ad["/array.dict"] t["/0"] end end ::: +++ Figure 1: High-level single...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~66-~66: Possible missing comma found.
Context: ...important metadata information about an archive as shown in Figure 2. The ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~69-~69: Possible missing comma found.
Context: ... archives every minor version change is breaking and readers designed for a given minor ...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~70-~70: Consider a more concise word here.
Context: ...are only sometimes backwards compatible in order to reduce maintenance while the archive fo...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~83-~83: Possible missing preposition found.
Context: ...r size to determine the size and offset both the "metadata" and "files" sections. T...
(AI_HYDRA_LEO_MISSING_OF)
[uncategorized] ~154-~154: Possible missing comma found.
Context: .... Since the size of each packet is well-defined reader implementations can attempt to r...
(AI_HYDRA_LEO_MISSING_COMMA)
[typographical] ~155-~155: Consider adding a comma after ‘Generally’ for more clarity.
Context: ...ply skipping the corresponding content. Generally this metadata section design is intende...
(RB_LY_COMMA)
[uncategorized] ~166-~166: Possible missing comma found.
Context: ...ly records the number of segments in an archive as shown in Figure 4. Each...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~168-~168: Consider a more concise word here.
Context: ... we still record the number of segments in order to offer backwards compatibility if we do ...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~191-~191: It seems likely that a singular genitive (’s) apostrophe is missing.
Context: ...he order in which the components of the files section should be read to avoid backwar...
(AI_HYDRA_LEO_APOSTROPHE_S_XS)
[uncategorized] ~211-~211: Possible missing comma found.
Context: ...ains entries for each file in the files section ordered by their offset. Each file is d...
(AI_HYDRA_LEO_MISSING_COMMA)
[duplication] ~215-~215: Possible typo: you repeated a word.
Context: ...". :::: The ArchiveFileInfo packet has entries entries with the following names in order: * `"...
(ENGLISH_WORD_REPEAT_RULE)
[style] ~276-~276: Consider a more concise word here.
Context: ...integer epoch time or double epoch time in order to handle timestamp columns with polymorph...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~286-~286: Possible missing preposition found.
Context: ...t The archive range index allows users associate arbitrary properties with each file (or...
(AI_HYDRA_LEO_MISSING_TO)
[uncategorized] ~287-~287: Possible missing comma found.
Context: ...ther unit of data) ingested into clp-s. Collectively these units of data are referred to as ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~288-~288: Possible missing comma found.
Context: ...otentially aggregate data from multiple files the archive range index associates thes...
(AI_HYDRA_LEO_MISSING_COMMA)
[grammar] ~289-~289: The plural determiner ‘these’ does not agree with the singular noun ‘ingestion’.
Context: ...iles the archive range index associates these ingestion unit properties with a logical range of...
(THIS_NNS)
[style] ~290-~290: Consider a more concise word here.
Context: ...s can be split across multiple archives in order to maintain a configured archive size and ...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~291-~291: Possible missing comma found.
Context: ... order to maintain a configured archive size and that in these cases the properties ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~293-~293: Did you mean: “By default,”?
Context: ...of the ingestion unit in each archive. By default we automatically store the following pr...
(BY_DEFAULT_COMMA)
[uncategorized] ~298-~298: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...particular invocation of compression Currently clp-s will store these default properti...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[uncategorized] ~325-~325: Possible missing comma found.
Context: ...he range index have overlapping logical ranges and all entries are ordered by logical ...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.17.2)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
16-16: Link fragments should be valid
null
(MD051, link-fragments)
66-66: Link fragments should be valid
null
(MD051, link-fragments)
68-68: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
87-87: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
130-130: Link fragments should be valid
null
(MD051, link-fragments)
135-135: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
135-135: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
146-146: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
159-159: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
167-167: Link fragments should be valid
null
(MD051, link-fragments)
177-177: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
181-181: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
189-189: Link fragments should be valid
null
(MD051, link-fragments)
198-198: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
208-208: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
216-216: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
231-231: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
234-234: Link fragments should be valid
null
(MD051, link-fragments)
238-238: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
238-238: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
267-267: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
295-295: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
303-303: Link fragments should be valid
null
(MD051, link-fragments)
307-307: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
321-321: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
🔇 Additional comments (15)
docs/src/dev-guide/design-clp-structured/background.md (1)
1-8: Background stub is clear. The section correctly references the existing KV-IR background and includes a warning about ongoing work.docs/src/dev-guide/index.md (1)
84-84: clp-s design docs integrated. The new toctree entrydesign-clp-structured/indexcleanly adds clp-s to the Design section.docs/src/dev-guide/design-clp-structured/index.md (1)
1-9: clp-s overview added. The introduction contrasts clp-s with KV-IR, links to background and format specs, and includes a construction warning.docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (12)
1-12: Intro and context are solid. The opening section succinctly describes clp-s’s goals and links to the paper and blog.🧰 Tools
🪛 LanguageTool
[style] ~4-~4: Consider a more concise word here.
Context: ...format is optimized for streaming reads in order to offer high decompression and search per...(IN_ORDER_TO_PREMIUM)
[typographical] ~10-~10: Consider adding a comma here.
Context: ... about the design decisions behind clp-s please refer to [our paper on clp-s][μSlope] o...(PLEASE_COMMA)
13-23: Format overview section is well-structured. Splitting the archive into header, metadata, and files sections is clear and supported by Figure 1.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
16-16: Link fragments should be valid
null(MD051, link-fragments)
24-29: Endianness note is clear. The admonition on little-endian and padding rules is precise.
31-62: Figure 1 diagram looks correct. The Mermaid configuration and layout accurately illustrate the high-level archive structure.🧰 Tools
🪛 LanguageTool
[duplication] ~56-~56: Possible typo: you repeated a word.
Context: ...og.dict"] ad["/array.dict"] t["/0"] end end ::: +++ Figure 1: High-level single...(ENGLISH_WORD_REPEAT_RULE)
63-88: Header specification detail. The field descriptions are precise; please confirm the intended byte order for the version fields.
Can you verify that the 4-byte version is encoded as[patch (2 bytes), minor (1 byte), major (1 byte)]?🧰 Tools
🪛 LanguageTool
[uncategorized] ~66-~66: Possible missing comma found.
Context: ...important metadata information about an archive as shown in Figure 2. The ...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~69-~69: Possible missing comma found.
Context: ... archives every minor version change is breaking and readers designed for a given minor ...(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~70-~70: Consider a more concise word here.
Context: ...are only sometimes backwards compatible in order to reduce maintenance while the archive fo...(IN_ORDER_TO_PREMIUM)
[uncategorized] ~83-~83: Possible missing preposition found.
Context: ...r size to determine the size and offset both the "metadata" and "files" sections. T...(AI_HYDRA_LEO_MISSING_OF)
🪛 markdownlint-cli2 (0.17.2)
66-66: Link fragments should be valid
null(MD051, link-fragments)
68-68: Trailing spaces
Expected: 0 or 2; Actual: 1(MD009, no-trailing-spaces)
87-87: Lists should be surrounded by blank lines
null(MD032, blanks-around-lists)
127-157: Metadata section design is comprehensive. The packet/stream definitions and forward-compatibility rationale are well documented.🧰 Tools
🪛 LanguageTool
[uncategorized] ~154-~154: Possible missing comma found.
Context: .... Since the size of each packet is well-defined reader implementations can attempt to r...(AI_HYDRA_LEO_MISSING_COMMA)
[typographical] ~155-~155: Consider adding a comma after ‘Generally’ for more clarity.
Context: ...ply skipping the corresponding content. Generally this metadata section design is intende...(RB_LY_COMMA)
🪛 markdownlint-cli2 (0.17.2)
130-130: Link fragments should be valid
null(MD051, link-fragments)
135-135: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
135-135: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
146-146: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
158-163: Packet type list is complete. Ensure new mandatory packet types are added here in future revisions.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
159-159: Lists should be surrounded by blank lines
null(MD032, blanks-around-lists)
164-175: ArchiveInfo packet explained clearly. Figure 4 and the segment-count rationale support forward compatibility.🧰 Tools
🪛 LanguageTool
[uncategorized] ~166-~166: Possible missing comma found.
Context: ...ly records the number of segments in an archive as shown in Figure 4. Each...(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~168-~168: Consider a more concise word here.
Context: ... we still record the number of segments in order to offer backwards compatibility if we do ...(IN_ORDER_TO_PREMIUM)
🪛 markdownlint-cli2 (0.17.2)
167-167: Link fragments should be valid
null(MD051, link-fragments)
188-194: ArchiveFileInfo packet is well-presented. The JSON example and ordering guidance are precise.🧰 Tools
🪛 LanguageTool
[uncategorized] ~191-~191: It seems likely that a singular genitive (’s) apostrophe is missing.
Context: ...he order in which the components of the files section should be read to avoid backwar...(AI_HYDRA_LEO_APOSTROPHE_S_XS)
🪛 markdownlint-cli2 (0.17.2)
189-189: Link fragments should be valid
null(MD051, link-fragments)
228-235: TimestampDictionary packet overview. The description of ranges, patterns, and escaping rules is very clear.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
231-231: Lists should be surrounded by blank lines
null(MD032, blanks-around-lists)
234-234: Link fragments should be valid
null(MD051, link-fragments)
286-293: RangeIndex packet is precise. Figure 7’s JSON example and property-prefix guidance are on point.🧰 Tools
🪛 LanguageTool
[uncategorized] ~286-~286: Possible missing preposition found.
Context: ...t The archive range index allows users associate arbitrary properties with each file (or...(AI_HYDRA_LEO_MISSING_TO)
[uncategorized] ~287-~287: Possible missing comma found.
Context: ...ther unit of data) ingested into clp-s. Collectively these units of data are referred to as ...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~288-~288: Possible missing comma found.
Context: ...otentially aggregate data from multiple files the archive range index associates thes...(AI_HYDRA_LEO_MISSING_COMMA)
[grammar] ~289-~289: The plural determiner ‘these’ does not agree with the singular noun ‘ingestion’.
Context: ...iles the archive range index associates these ingestion unit properties with a logical range of...(THIS_NNS)
[style] ~290-~290: Consider a more concise word here.
Context: ...s can be split across multiple archives in order to maintain a configured archive size and ...(IN_ORDER_TO_PREMIUM)
[uncategorized] ~291-~291: Possible missing comma found.
Context: ... order to maintain a configured archive size and that in these cases the properties ...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~293-~293: Did you mean: “By default,”?
Context: ...of the ingestion unit in each archive. By default we automatically store the following pr...(BY_DEFAULT_COMMA)
333-351: Files section placeholder. The headings and construction warning match the style of other stubs.
| The ArchiveFileInfo packet has entries entries with the following names in order: | ||
| * `"/schema_tree"` - the Merged Parse Tree | ||
| * `"/schema_ids"` - the Schema Map | ||
| * `"/table_metadata"` - metadata describing the contents of the tables segments | ||
| * `"/var.dict"` - the Variable Dictionary | ||
| * `"/log.dict"` - the Log-type Dictionary | ||
| * `"/array.dict"` - the Array Log-type Dictionary (optional) | ||
| * `"/0"` - the first tables segment | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
File entry enumeration is exhaustive. Consider marking /array.dict explicitly as optional in the list.
Add a note or badge to indicate optional entries (e.g., /array.dict).
🧰 Tools
🪛 LanguageTool
[duplication] ~215-~215: Possible typo: you repeated a word.
Context: ...". :::: The ArchiveFileInfo packet has entries entries with the following names in order: * `"...
(ENGLISH_WORD_REPEAT_RULE)
🪛 markdownlint-cli2 (0.17.2)
216-216: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 215 to 223, the list of ArchiveFileInfo packet entries should explicitly
mark the `/array.dict` entry as optional. Update the list to include a note or
badge next to `/array.dict` indicating it is optional, clarifying its status for
readers.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (1)
237-267: Specify language for the fenced code block.The TimestampDictionary binary payload snippet currently uses plain backticks and lacks a language identifier, which prevents proper syntax highlighting. Please add an appropriate language tag (e.g.
cpp) to the opening ``` line.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
238-238: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
238-238: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
267-267: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
[style] ~4-~4: Consider a more concise word here.
Context: ...format is optimized for streaming reads in order to offer high decompression and search per...
(IN_ORDER_TO_PREMIUM)
[typographical] ~10-~10: Consider adding a comma here.
Context: ... about the design decisions behind clp-s please refer to [our paper on clp-s][μSlope] o...
(PLEASE_COMMA)
[duplication] ~56-~56: Possible typo: you repeated a word.
Context: ...og.dict"] ad["/array.dict"] t["/0"] end end ::: +++ Figure 1: High-level single...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~66-~66: Possible missing comma found.
Context: ...important metadata information about an archive as shown in Figure 2. The ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~69-~69: Possible missing comma found.
Context: ... archives every minor version change is breaking and readers designed for a given minor ...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~70-~70: Consider a more concise word here.
Context: ...are only sometimes backwards compatible in order to reduce maintenance while the archive fo...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~83-~83: Possible missing preposition found.
Context: ...r size to determine the size and offset both the "metadata" and "files" sections. T...
(AI_EN_LECTOR_MISSING_PREPOSITION)
[uncategorized] ~154-~154: Possible missing comma found.
Context: .... Since the size of each packet is well-defined reader implementations can attempt to r...
(AI_HYDRA_LEO_MISSING_COMMA)
[typographical] ~155-~155: Consider adding a comma after ‘Generally’ for more clarity.
Context: ...ply skipping the corresponding content. Generally this metadata section design is intende...
(RB_LY_COMMA)
[uncategorized] ~166-~166: Possible missing comma found.
Context: ...ly records the number of segments in an archive as shown in Figure 4. Each...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~168-~168: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ... single segment (i.e. there is only one tables segment file), but we still record the ...
(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
[style] ~168-~168: Consider a more concise word here.
Context: ... we still record the number of segments in order to offer backwards compatibility if we do ...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~191-~191: It seems likely that a singular genitive (’s) apostrophe is missing.
Context: ...he order in which the components of the files section should be read to avoid backwar...
(AI_HYDRA_LEO_APOSTROPHE_S_XS)
[uncategorized] ~211-~211: Possible missing comma found.
Context: ...ains entries for each file in the files section ordered by their offset. Each file is d...
(AI_HYDRA_LEO_MISSING_COMMA)
[duplication] ~215-~215: Possible typo: you repeated a word.
Context: ...". :::: The ArchiveFileInfo packet has entries entries with the following names in order: * `"...
(ENGLISH_WORD_REPEAT_RULE)
[style] ~276-~276: Consider a more concise word here.
Context: ...integer epoch time or double epoch time in order to handle timestamp columns with polymorph...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~286-~286: Possible missing preposition found.
Context: ...t The archive range index allows users associate arbitrary properties with each file (or...
(AI_HYDRA_LEO_MISSING_TO)
[uncategorized] ~287-~287: Possible missing comma found.
Context: ...ther unit of data) ingested into clp-s. Collectively these units of data are referred to as ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~288-~288: A comma might be missing here.
Context: ...otentially aggregate data from multiple files the archive range index associates thes...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[grammar] ~289-~289: The plural determiner ‘these’ does not agree with the singular noun ‘ingestion’.
Context: ...iles the archive range index associates these ingestion unit properties with a logical range of...
(THIS_NNS)
[style] ~290-~290: Consider a more concise word here.
Context: ...s can be split across multiple archives in order to maintain a configured archive size and ...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~291-~291: Possible missing comma found.
Context: ... order to maintain a configured archive size and that in these cases the properties ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~293-~293: Did you mean: “By default,”?
Context: ...of the ingestion unit in each archive. By default we automatically store the following pr...
(BY_DEFAULT_COMMA)
[uncategorized] ~298-~298: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...particular invocation of compression Currently clp-s will store these default properti...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[uncategorized] ~305-~305: A punctuation mark might be missing here.
Context: ...n in Figure 7. (figure-7)= ::::{card} ```json [ { "s": 0, ...
(AI_EN_LECTOR_MISSING_PUNCTUATION)
[uncategorized] ~325-~325: Possible missing comma found.
Context: ...he range index have overlapping logical ranges and all entries are ordered by logical ...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.17.2)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
16-16: Link fragments should be valid
null
(MD051, link-fragments)
66-66: Link fragments should be valid
null
(MD051, link-fragments)
68-68: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
87-87: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
130-130: Link fragments should be valid
null
(MD051, link-fragments)
135-135: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
135-135: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
146-146: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
159-159: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
167-167: Link fragments should be valid
null
(MD051, link-fragments)
177-177: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
181-181: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
189-189: Link fragments should be valid
null
(MD051, link-fragments)
198-198: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
208-208: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
216-216: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
231-231: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
234-234: Link fragments should be valid
null
(MD051, link-fragments)
238-238: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
238-238: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
267-267: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
295-295: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
303-303: Link fragments should be valid
null
(MD051, link-fragments)
307-307: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
321-321: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (2)
236-267: Specify language for the TimestampDictionary code block.
As noted in previous feedback, the fenced block for Figure 6 lacks a language identifier to enable proper syntax highlighting.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
238-238: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
238-238: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
267-267: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
215-223: 🧹 Nitpick (assertive)Fix duplicated word typo.
Remove the extra “entries” in the sentence:-The ArchiveFileInfo packet has entries entries with the following names in order: +The ArchiveFileInfo packet has entries with the following names in order:🧰 Tools
🪛 LanguageTool
[duplication] ~215-~215: Possible typo: you repeated a word.
Context: ...". :::: The ArchiveFileInfo packet has entries entries with the following names in order: * `"...(ENGLISH_WORD_REPEAT_RULE)
🪛 markdownlint-cli2 (0.17.2)
216-216: Lists should be surrounded by blank lines
null(MD032, blanks-around-lists)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
[style] ~4-~4: Consider a more concise word here.
Context: ...format is optimized for streaming reads in order to offer high decompression and search per...
(IN_ORDER_TO_PREMIUM)
[typographical] ~10-~10: Consider adding a comma here.
Context: ... about the design decisions behind clp-s please refer to [our paper on clp-s][μSlope] o...
(PLEASE_COMMA)
[duplication] ~56-~56: Possible typo: you repeated a word.
Context: ...og.dict"] ad["/array.dict"] t["/0"] end end ::: +++ Figure 1: High-level single...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~66-~66: Possible missing comma found.
Context: ...important metadata information about an archive as shown in Figure 2. The ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~69-~69: Possible missing comma found.
Context: ... archives every minor version change is breaking and readers designed for a given minor ...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~70-~70: Consider a more concise word here.
Context: ...are only sometimes backwards compatible in order to reduce maintenance while the archive fo...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~119-~119: Possible missing preposition found.
Context: ...r size to determine the size and offset both the "metadata" and "files" sections. T...
(AI_EN_LECTOR_MISSING_PREPOSITION)
[uncategorized] ~154-~154: Possible missing comma found.
Context: .... Since the size of each packet is well-defined reader implementations can attempt to r...
(AI_HYDRA_LEO_MISSING_COMMA)
[typographical] ~155-~155: Consider adding a comma after ‘Generally’ for more clarity.
Context: ...ply skipping the corresponding content. Generally this metadata section design is intende...
(RB_LY_COMMA)
[uncategorized] ~166-~166: Possible missing comma found.
Context: ...ly records the number of segments in an archive as shown in Figure 4. Each...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~168-~168: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ... single segment (i.e. there is only one tables segment file), but we still record the ...
(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
[style] ~168-~168: Consider a more concise word here.
Context: ... we still record the number of segments in order to offer backwards compatibility if we do ...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~191-~191: It seems likely that a singular genitive (’s) apostrophe is missing.
Context: ...he order in which the components of the files section should be read to avoid backwar...
(AI_HYDRA_LEO_APOSTROPHE_S_XS)
[uncategorized] ~211-~211: Possible missing comma found.
Context: ...ains entries for each file in the files section ordered by their offset. Each file is d...
(AI_HYDRA_LEO_MISSING_COMMA)
[duplication] ~215-~215: Possible typo: you repeated a word.
Context: ...". :::: The ArchiveFileInfo packet has entries entries with the following names in order: * `"...
(ENGLISH_WORD_REPEAT_RULE)
[style] ~276-~276: Consider a more concise word here.
Context: ...integer epoch time or double epoch time in order to handle timestamp columns with polymorph...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~286-~286: Possible missing preposition found.
Context: ...t The archive range index allows users associate arbitrary properties with each file (or...
(AI_HYDRA_LEO_MISSING_TO)
[uncategorized] ~287-~287: Possible missing comma found.
Context: ...ther unit of data) ingested into clp-s. Collectively these units of data are referred to as ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~288-~288: A comma might be missing here.
Context: ...otentially aggregate data from multiple files the archive range index associates thes...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[grammar] ~289-~289: The plural determiner ‘these’ does not agree with the singular noun ‘ingestion’.
Context: ...iles the archive range index associates these ingestion unit properties with a logical range of...
(THIS_NNS)
[style] ~290-~290: Consider a more concise word here.
Context: ...s can be split across multiple archives in order to maintain a configured archive size and ...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~291-~291: Possible missing comma found.
Context: ... order to maintain a configured archive size and that in these cases the properties ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~293-~293: Did you mean: “By default,”?
Context: ...of the ingestion unit in each archive. By default we automatically store the following pr...
(BY_DEFAULT_COMMA)
[uncategorized] ~298-~298: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...particular invocation of compression Currently clp-s will store these default properti...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[uncategorized] ~305-~305: A punctuation mark might be missing here.
Context: ...n in Figure 7. (figure-7)= ::::{card} ```json [ { "s": 0, ...
(AI_EN_LECTOR_MISSING_PUNCTUATION)
[uncategorized] ~325-~325: Possible missing comma found.
Context: ...he range index have overlapping logical ranges and all entries are ordered by logical ...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.17.2)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
16-16: Link fragments should be valid
null
(MD051, link-fragments)
66-66: Link fragments should be valid
null
(MD051, link-fragments)
68-68: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
123-123: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
130-130: Link fragments should be valid
null
(MD051, link-fragments)
135-135: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
135-135: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
146-146: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
159-159: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
167-167: Link fragments should be valid
null
(MD051, link-fragments)
177-177: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
181-181: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
189-189: Link fragments should be valid
null
(MD051, link-fragments)
198-198: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
208-208: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
216-216: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
231-231: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
234-234: Link fragments should be valid
null
(MD051, link-fragments)
238-238: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
238-238: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
267-267: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
295-295: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
303-303: Link fragments should be valid
null
(MD051, link-fragments)
307-307: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
321-321: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (6)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (6)
8-10: 🧹 Nitpick (assertive)Add missing comma after introductory clause
A comma is needed after “behind clp-s” to improve readability.-for more information about the design decisions behind clp-s please refer to [our paper on clp-s][μSlope], or our [blog][s3-blog] on optimizing +for more information about the design decisions behind clp-s, please refer to [our paper on clp-s][μSlope], or our [blog][s3-blog] on optimizing
133-145: 🧹 Nitpick (assertive)Specify language and surround fenced block for MetadataPacket
Include a language identifier (e.g.cpp) and blank lines around the fences.::: {card} -``` +```cpp MetadataPacket { packet_type: uint8_t packet_size: uint32_t packet_content: uint8_t[packet_size] } MetadataPacketStream { num_packets: uint8_t packets: MetadataPacket[num_packets] } -``` +``` +++🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
134-134: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
134-134: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
145-145: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
175-182: 🧹 Nitpick (assertive)Add language specifier for JSON snippet in Figure 4
Enclose the code block in blank lines and specifyjsonfor proper highlighting.(figure-4)= :::::{card} -``` +```json { "num_segments": 1 } +``` +++🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
176-176: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
180-180: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
236-266: 🧹 Nitpick (assertive)Specify language for TimestampDictionary snippet
Add a language identifier (e.g.cpp) and blank lines around the fenced block.(figure-6)= :::::{card} -``` +```cpp enum EncodingType : uint64_t { Epoch = 0x1, DoubleEpoch = 0x2 } TimestampRange { key_len: uint64_t key: char[key_len] num_column_ids: uint64_t column_ids: int32_t[num_column_ids] encoding_type: EncodingType epoch_start: union {epochtime_t, double} epoch_end: union {epochtime_t, double} } TimestampPattern { pattern_id: uint64_t pattern_len: uint64_t pattern: char[pattern_len] } TimestampDictionary { num_ranges: uint64_t ranges: TimestampRange[num_entries] num_patterns: uint64_t patterns: TimestampPattern[num_patterns] } -``` +``` +++🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
237-237: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
237-237: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
266-266: Fenced code blocks should be surrounded by blank lines
null(MD031, blanks-around-fences)
214-215: 🧹 Nitpick (assertive)Fix duplicated word in description
Remove the repeated “entries”.-The ArchiveFileInfo packet has entries entries with the following names in order: +The ArchiveFileInfo packet has the following entries in order:🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
215-215: Lists should be surrounded by blank lines
null(MD032, blanks-around-lists)
3-6: 🧹 Nitpick (assertive)Streamline introduction for brevity and clarity
The two sentences can be merged to avoid redundancy and remove “in order to”.Proposed diff:
-The clp-s single-file archive format is designed to offer high compression and fast search on -dynamically structured log data such as JSON logs. This format is optimized for streaming reads in -order to enable high performance for archives stored on object storage systems such as S3. +The clp-s single-file archive format is optimized for high compression, fast search on dynamically +structured logs (e.g., JSON), and efficient streaming reads on object storage systems such as S3.🧰 Tools
🪛 LanguageTool
[style] ~4-~4: Consider a more concise word here.
Context: ...format is optimized for streaming reads in order to enable high performance for archives st...(IN_ORDER_TO_PREMIUM)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
[style] ~4-~4: Consider a more concise word here.
Context: ...format is optimized for streaming reads in order to enable high performance for archives st...
(IN_ORDER_TO_PREMIUM)
[duplication] ~55-~55: Possible typo: you repeated a word.
Context: ...og.dict"] ad["/array.dict"] t["/0"] end end ::: +++ Figure 1: High-level single...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~65-~65: Possible missing comma found.
Context: ...important metadata information about an archive as shown in Figure 2. The ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~68-~68: Possible missing comma found.
Context: ... archives every minor version change is breaking and readers designed for a given minor ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~118-~118: Possible missing preposition found.
Context: ...r size to determine the size and offset both the "metadata" and "files" sections. T...
(AI_HYDRA_LEO_MISSING_OF)
[uncategorized] ~153-~153: Possible missing comma found.
Context: .... Since the size of each packet is well-defined reader implementations can attempt to r...
(AI_HYDRA_LEO_MISSING_COMMA)
[typographical] ~154-~154: Consider adding a comma after ‘Generally’ for more clarity.
Context: ...ply skipping the corresponding content. Generally this metadata section design is intende...
(RB_LY_COMMA)
[uncategorized] ~165-~165: Possible missing comma found.
Context: ...ly records the number of segments in an archive as shown in Figure 4. Each...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~167-~167: Consider a more concise word here.
Context: ... we still record the number of segments in order to offer backwards compatibility if we do ...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~190-~190: It seems likely that a singular genitive (’s) apostrophe is missing.
Context: ...he order in which the components of the files section should be read to avoid backwar...
(AI_HYDRA_LEO_APOSTROPHE_S_XS)
[uncategorized] ~210-~210: Possible missing comma found.
Context: ...ains entries for each file in the files section ordered by their offset. Each file is d...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~284-~284: Possible missing preposition found.
Context: ...t The archive range index allows users associate arbitrary properties with each file (or...
(AI_HYDRA_LEO_MISSING_TO)
[uncategorized] ~285-~285: Possible missing comma found.
Context: ...ther unit of data) ingested into clp-s. Collectively these units of data are referred to as ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~286-~286: Possible missing comma found.
Context: ...otentially aggregate data from multiple files the archive range index associates thes...
(AI_HYDRA_LEO_MISSING_COMMA)
[grammar] ~287-~287: The plural determiner ‘these’ does not agree with the singular noun ‘ingestion’.
Context: ...iles the archive range index associates these ingestion unit properties with a logical range of...
(THIS_NNS)
[style] ~288-~288: Consider a more concise word here.
Context: ...s can be split across multiple archives in order to maintain a configured archive size and ...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~289-~289: Possible missing comma found.
Context: ... order to maintain a configured archive size and that in these cases the properties ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~291-~291: Did you mean: “By default,”?
Context: ...of the ingestion unit in each archive. By default we automatically store the following pr...
(BY_DEFAULT_COMMA)
[uncategorized] ~296-~296: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...particular invocation of compression Currently clp-s will store these default properti...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[uncategorized] ~323-~323: Possible missing comma found.
Context: ...he range index have overlapping logical ranges and all entries are ordered by logical ...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.17.2)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
15-15: Link fragments should be valid
null
(MD051, link-fragments)
65-65: Link fragments should be valid
null
(MD051, link-fragments)
67-67: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
122-122: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
129-129: Link fragments should be valid
null
(MD051, link-fragments)
134-134: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
134-134: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
145-145: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
158-158: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
166-166: Link fragments should be valid
null
(MD051, link-fragments)
176-176: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
180-180: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
188-188: Link fragments should be valid
null
(MD051, link-fragments)
197-197: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
207-207: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
215-215: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
230-230: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
233-233: Link fragments should be valid
null
(MD051, link-fragments)
237-237: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
237-237: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
266-266: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
293-293: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
301-301: Link fragments should be valid
null
(MD051, link-fragments)
305-305: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
319-319: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
quinntaylormitchell
left a comment
There was a problem hiding this comment.
great job Devin, content-wise this is clear and well-laid out (and it helped me learn too, so win-win).
I left a fair amount of comments but don’t be intimidated, at least two-thirds of them are related either to the addition of monospace text for readability, or related to the addition of commas/other sentence-refactoring stuff. commas are so controversial lol so no worries if you disagree. I think the really important stuff is the instances of re-wording. let me know if you have any questions!
and also if you can't figure out how to change the font size on the mermaid diagrams, I'd be happy to whip up a couple images for you that refactor the information and present it in a more visible way. just let me know!
| * `"_file_split_number"` - incremented each time this ingestion-unit is split across antoher archive | ||
| * `"_archive_creator_id"` - UUID associated with a particular _invocation_ of compression | ||
|
|
||
| Currently clp-s will store these default properties as well as any properties present in the |
There was a problem hiding this comment.
| Currently clp-s will store these default properties as well as any properties present in the | |
| Currently `clp-s` will store these default properties (as well as any properties present) in the |
There was a problem hiding this comment.
I don't think this bracketing makes sense since I'm trying to convey that we store these default properties and also store any properties that were present in the KV-IR metadata whereas this bracketing implies that all of these fields get stored in the KV-IR metadata.
I'll try to find a different wording to make the intent more clear.
Co-authored-by: quinntaylormitchell <q.mitchell@mail.utoronto.ca>
There was a problem hiding this comment.
Actionable comments posted: 10
♻️ Duplicate comments (2)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (1)
3-5: Suggest refining the introduction for clarity and conciseness.The opening lines overuse hyphens and the phrase “in order to”. Consider removing the hyphen in “dynamically-structured” and rewriting to reduce verbosity:
-The `clp-s` single-file archive format is designed to offer high compression and fast search on -dynamically-structured log data such as JSON logs. This format is optimized for streaming reads in -order to enable high performance for archives stored on object storage systems such as [S3][amazon-s3]. +The `clp-s` single-file archive format is designed to offer high compression and fast search on dynamically structured log data such as JSON. +It is optimised for streaming reads and high performance on object storage systems like [S3][amazon-s3].docs/src/dev-guide/design-clp-structured/index.md (1)
5-9: Break up the long comparative sentence for readability.The single sentence is quite dense. Splitting improves clarity:
-Compared to the [KV-IR](../design-kv-ir-streams/index.md) streaming compression format, `clp-s` achieves higher compression ratios and significantly faster search speeds at the cost of needing to aggregate log data before compressing the data into archives and requiring memory proportional to archive size for both compression and search. +Compared to the [KV-IR](../design-kv-ir-streams/index.md) streaming compression format, `clp-s` achieves higher compression ratios and significantly faster search speeds. +However, it requires aggregating log data before compression and memory proportional to archive size for both compression and search.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
docs/src/dev-guide/design-clp-structured/background.md(1 hunks)docs/src/dev-guide/design-clp-structured/index.md(1 hunks)docs/src/dev-guide/design-clp-structured/single-file-archive-format.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.
🪛 LanguageTool
docs/src/dev-guide/design-clp-structured/index.md
[uncategorized] ~4-~4: The hyphen in dynamically-structured is redundant.
Context: ...is a storage format for large chunks of dynamically-structured (e.g., JSON) logs. Compared to the [KV-...
(ADVERB_LY_HYPHEN_FIX)
[duplication] ~15-~15: Possible typo: you repeated a word.
Context: ... :gutter: 2 :::{grid-item-card} 🔗 background Background ^^^ All necessary information needed to...
(ENGLISH_WORD_REPEAT_RULE)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
[uncategorized] ~3-~3: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...fer high compression and fast search on dynamically-structured log data such as JSON logs. This format...
(HYPHENATED_LY_ADVERB_ADJECTIVE)
[style] ~4-~4: Consider a more concise word here.
Context: ...format is optimized for streaming reads in order to enable high performance for archives st...
(IN_ORDER_TO_PREMIUM)
[duplication] ~55-~55: Possible typo: you repeated a word.
Context: ...og.dict"] ad["/array.dict"] t["/0"] end end ::: +++ Figure 1: High-level single...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~118-~118: Possible missing preposition found.
Context: ...r size to determine the size and offset both the "metadata" and "files" sections. T...
(AI_HYDRA_LEO_MISSING_OF)
[uncategorized] ~153-~153: Possible missing comma found.
Context: .... Since the size of each packet is well-defined reader implementations can attempt to r...
(AI_HYDRA_LEO_MISSING_COMMA)
[typographical] ~154-~154: Consider adding a comma after ‘Generally’ for more clarity.
Context: ...ply skipping the corresponding content. Generally this metadata section design is intende...
(RB_LY_COMMA)
[style] ~167-~167: Consider a more concise word here.
Context: ... we still record the number of segments in order to offer backwards compatibility if we do ...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~286-~286: Possible missing comma found.
Context: ...otentially aggregate data from multiple files the archive range index associates thes...
(AI_HYDRA_LEO_MISSING_COMMA)
[grammar] ~287-~287: The plural determiner ‘these’ does not agree with the singular noun ‘ingestion’.
Context: ...iles the archive range index associates these ingestion unit properties with a logical range of...
(THIS_NNS)
[style] ~288-~288: Consider a more concise word here.
Context: ...s can be split across multiple archives in order to maintain a configured archive size, and...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~291-~291: Did you mean: “By default,”?
Context: ...of the ingestion unit in each archive. By default we automatically store the following pr...
(BY_DEFAULT_COMMA)
[uncategorized] ~296-~296: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...particular invocation of compression Currently clp-s will store these default properti...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
🪛 markdownlint-cli2 (0.17.2)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
15-15: Link fragments should be valid
null
(MD051, link-fragments)
65-65: Link fragments should be valid
null
(MD051, link-fragments)
67-67: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
122-122: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
129-129: Link fragments should be valid
null
(MD051, link-fragments)
134-134: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
134-134: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
145-145: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
158-158: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
166-166: Link fragments should be valid
null
(MD051, link-fragments)
176-176: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
180-180: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
188-188: Link fragments should be valid
null
(MD051, link-fragments)
197-197: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
207-207: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
215-215: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
230-230: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
233-233: Link fragments should be valid
null
(MD051, link-fragments)
237-237: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
237-237: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
266-266: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
293-293: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
301-301: Link fragments should be valid
null
(MD051, link-fragments)
305-305: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
319-319: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: build (macos-latest)
| (figure-6)= | ||
| ::::{card} | ||
| ``` | ||
| enum EncodingType : uint64_t { | ||
| Epoch = 0x1, | ||
| DoubleEpoch = 0x2 | ||
| } | ||
|
|
||
| TimestampRange { | ||
| key_len: uint64_t | ||
| key: char[key_len] | ||
| num_column_ids: uint64_t | ||
| column_ids: int32_t[num_column_ids] | ||
| encoding_type: EncodingType | ||
| epoch_start: union {epochtime_t, double} | ||
| epoch_end: union {epochtime_t, double} | ||
| } | ||
|
|
||
| TimestampPattern { | ||
| pattern_id: uint64_t | ||
| pattern_len: uint64_t | ||
| pattern: char[pattern_len] | ||
| } | ||
|
|
||
| TimestampDictionary { | ||
| num_ranges: uint64_t | ||
| ranges: TimestampRange[num_entries] | ||
| num_patterns: uint64_t | ||
| patterns: TimestampPattern[num_patterns] | ||
| } | ||
|
|
||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Add language specifier and blank lines to the TimestampDictionary block.
The TimestampDictionary binary payload snippet should include a language identifier and blank lines around the fences:
(figure-6)=
:::::{card}
-```
+
+```cpp
enum EncodingType : uint64_t {
Epoch = 0x1,
DoubleEpoch = 0x2
}
...
TimestampDictionary {
num_ranges: uint64_t
ranges: TimestampRange[num_entries]
num_patterns: uint64_t
patterns: TimestampPattern[num_patterns]
}
-```
+```
+++
**Figure 6**: TimestampDictionary binary payload format...🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
237-237: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
237-237: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
266-266: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 235 to 266, add the language specifier "cpp" immediately after the opening
triple backticks of the TimestampDictionary code block and ensure there are
blank lines before and after the fenced code block to improve formatting and
readability.
| # clp-s | ||
|
|
||
| The compressed-log-processor-structured (abbreviated `clp-s`, also known as [μSlope][μSlope]) | ||
| archive format is a storage format for large chunks of dynamically-structured (e.g., JSON) logs. |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Remove unnecessary hyphen in “dynamically structured”.
Compound modifiers with an adverb ending in “-ly” don’t need a hyphen:
-... dynamically-structured (e.g., JSON) logs.
+... dynamically structured (e.g., JSON) logs.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| archive format is a storage format for large chunks of dynamically-structured (e.g., JSON) logs. | |
| archive format is a storage format for large chunks of dynamically structured (e.g., JSON) logs. |
🧰 Tools
🪛 LanguageTool
[uncategorized] ~4-~4: The hyphen in dynamically-structured is redundant.
Context: ...is a storage format for large chunks of dynamically-structured (e.g., JSON) logs. Compared to the [KV-...
(ADVERB_LY_HYPHEN_FIX)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/index.md at line 4, remove the
hyphen between "dynamically" and "structured" in the phrase
"dynamically-structured" to correctly reflect that adverbs ending in -ly do not
require hyphenation with the following adjective.
Co-authored-by: quinntaylormitchell <q.mitchell@mail.utoronto.ca>
There was a problem hiding this comment.
Actionable comments posted: 11
♻️ Duplicate comments (3)
docs/src/dev-guide/design-clp-structured/index.md (2)
4-4: Duplicate hyphen removal in “dynamically-structured”
This adverb-adjective pair doesn’t need a hyphen.
18-18: Simplify redundant phrasing and fix possessive
ReplaceAll necessary information needed to understand `clp-s`' format.with
All information needed to understand the `clp-s` format.docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (1)
67-67: Remove trailing whitespace
Line 67 ends with an extraneous space. Delete it to satisfy markdownlint MD009.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
docs/src/dev-guide/design-clp-structured/index.md(1 hunks)docs/src/dev-guide/design-clp-structured/single-file-archive-format.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.
🪛 LanguageTool
docs/src/dev-guide/design-clp-structured/index.md
[uncategorized] ~4-~4: The hyphen in dynamically-structured is redundant.
Context: ...is a storage format for large chunks of dynamically-structured (e.g., JSON) logs. Compared to the [KV-...
(ADVERB_LY_HYPHEN_FIX)
[duplication] ~15-~15: Possible typo: you repeated a word.
Context: ... :gutter: 2 :::{grid-item-card} 🔗 background Background ^^^ All necessary information needed to...
(ENGLISH_WORD_REPEAT_RULE)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
[uncategorized] ~3-~3: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...fer high compression and fast search on dynamically-structured log data such as JSON logs. This format...
(HYPHENATED_LY_ADVERB_ADJECTIVE)
[style] ~4-~4: Consider a more concise word here.
Context: ...format is optimized for streaming reads in order to enable high performance for archives st...
(IN_ORDER_TO_PREMIUM)
[duplication] ~55-~55: Possible typo: you repeated a word.
Context: ...og.dict"] ad["/array.dict"] t["/0"] end end ::: +++ Figure 1: High-level single...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~118-~118: Possible missing preposition found.
Context: ...r size to determine the size and offset both the "metadata" and "files" sections. T...
(AI_HYDRA_LEO_MISSING_OF)
[uncategorized] ~153-~153: Possible missing comma found.
Context: .... Since the size of each packet is well-defined reader implementations can attempt to r...
(AI_HYDRA_LEO_MISSING_COMMA)
[typographical] ~154-~154: Consider adding a comma after ‘Generally’ for more clarity.
Context: ...ply skipping the corresponding content. Generally this metadata section design is intende...
(RB_LY_COMMA)
[uncategorized] ~167-~167: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ... single segment (i.e. there is only one tables segment file), but we still record the ...
(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
[style] ~167-~167: Consider a more concise word here.
Context: ... we still record the number of segments in order to offer backwards compatibility if we do ...
(IN_ORDER_TO_PREMIUM)
[grammar] ~286-~286: The plural determiner ‘these’ does not agree with the singular noun ‘ingestion’.
Context: ...les, the archive range index associates these ingestion unit properties with a logical range of...
(THIS_NNS)
[style] ~287-~287: Consider a more concise word here.
Context: ...s can be split across multiple archives in order to maintain a configured archive size, and...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~290-~290: Did you mean: “By default,”?
Context: ...of the ingestion unit in each archive. By default we automatically store the following pr...
(BY_DEFAULT_COMMA)
[uncategorized] ~295-~295: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...particular invocation of compression Currently clp-s will store these default properti...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[uncategorized] ~302-~302: A punctuation mark might be missing here.
Context: ...n in Figure 7. (figure-7)= ::::{card} ```json [ { "s": 0, ...
(AI_EN_LECTOR_MISSING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
15-15: Link fragments should be valid
null
(MD051, link-fragments)
65-65: Link fragments should be valid
null
(MD051, link-fragments)
67-67: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
122-122: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
129-129: Link fragments should be valid
null
(MD051, link-fragments)
134-134: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
134-134: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
145-145: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
158-158: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
166-166: Link fragments should be valid
null
(MD051, link-fragments)
176-176: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
180-180: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
188-188: Link fragments should be valid
null
(MD051, link-fragments)
197-197: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
207-207: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
215-215: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
229-229: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
232-232: Link fragments should be valid
null
(MD051, link-fragments)
236-236: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
236-236: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
265-265: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
292-292: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
300-300: Link fragments should be valid
null
(MD051, link-fragments)
304-304: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
318-318: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
| The ArchiveFileInfo packet has entries with the following names in order: | ||
| * `"/schema_tree"` - the Merged Parse Tree | ||
| * `"/schema_ids"` - the Schema Map | ||
| * `"/table_metadata"` - metadata describing the contents of the tables segments | ||
| * `"/var.dict"` - the Variable Dictionary | ||
| * `"/log.dict"` - the Log-type Dictionary | ||
| * `"/array.dict"` - the Array Log-type Dictionary (optional) | ||
| * `"/0"` - the first tables segment | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Surround ArchiveFileInfo entries list with blank lines
Ensure there’s a blank line before line 215 and after line 222 to satisfy MD032.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
215-215: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 214 to 222, add a blank line before line 215 and another blank line after
line 222 to properly separate the ArchiveFileInfo entries list from surrounding
content, resolving the MD032 markdown linting issue.
| ```json | ||
| { | ||
| "num_segments": 1 | ||
| } | ||
| ``` | ||
| +++ | ||
| **Figure 4**: Layout of the ArchiveInfo msgpack payload. | ||
| :::: |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Surround JSON example with blank lines
Figure 4’s code block should have a blank line before the opening json and after the closing .
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
176-176: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
180-180: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 176 to 183, add a blank line before the opening ```json and another blank
line after the closing ``` of the JSON code block to improve readability and
formatting consistency.
| The `clp-s` single-file archive format is designed to offer high compression and fast search on | ||
| dynamically-structured log data such as JSON logs. This format is optimized for streaming reads in | ||
| order to enable high performance for archives stored on object storage systems such as [S3][amazon-s3]. |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Refine introduction wording
- Remove the hyphen in “dynamically-structured” → “dynamically structured”
- Simplify “in order to enable” to “for” or “to” for conciseness
🧰 Tools
🪛 LanguageTool
[uncategorized] ~3-~3: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...fer high compression and fast search on dynamically-structured log data such as JSON logs. This format...
(HYPHENATED_LY_ADVERB_ADJECTIVE)
[style] ~4-~4: Consider a more concise word here.
Context: ...format is optimized for streaming reads in order to enable high performance for archives st...
(IN_ORDER_TO_PREMIUM)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 3 to 5, remove the hyphen in "dynamically-structured" to make it
"dynamically structured" and simplify the phrase "in order to enable" to just
"for" or "to" to improve conciseness in the introduction.
Co-authored-by: quinntaylormitchell <q.mitchell@mail.utoronto.ca>
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (5)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (5)
157-162: Surround metadata packet types list with blank lines.
Insert a blank line before the list of metadata packet types:Archives currently support the following metadata packet types, some of which are optional: -* `0x00` - ArchiveInfo -* `0x01` - ArchiveFileInfo -* `0x02` - TimestampDictionary (optional) -* `0x03` - RangeIndex (optional) +* `0x00` - ArchiveInfo +* `0x01` - ArchiveFileInfo +* `0x02` - TimestampDictionary (optional) +* `0x03` - RangeIndex (optional)
214-222: Surround ArchiveFileInfo entries list with blank lines.
Ensure the file entries list is separated from surrounding text by blank lines:The ArchiveFileInfo packet has entries with the following names in order: -* `"/schema_tree"` - the Merged Parse Tree -* `"/schema_ids"` - the Schema Map -* `"/table_metadata"` - metadata describing the contents of the tables segments -* `"/var.dict"` - the Variable Dictionary -* `"/log.dict"` - the Log-type Dictionary -* `"/array.dict"` - the Array Log-type Dictionary (optional) -* `"/0"` - the first tables segment The Array Log-type Dictionary is nominally optional...
7-11: Add terminal punctuation to sentences.
Ensure the introductory sentences end with periods:-This documentation records the details of the single-file archive v0.3.1 format and what it enables +This documentation records the details of the single-file archive v0.3.1 format and what it enables. -With minimal discussion of design rationale. For more information about the design decisions +With minimal discussion of design rationale. For more information about the design decisions.
3-5: Refine introduction phrasing for conciseness.
Consider removing the hyphen in "dynamically-structured" and rephrasing "in order to enable" to simplify the sentence flow:-The `clp-s` single-file archive format is designed to offer high compression and fast search on -dynamically-structured log data such as JSON logs. This format is optimized for streaming reads in -order to enable high performance for archives stored on object storage systems such as [S3][amazon-s3]. +The `clp-s` single-file archive format is designed for high compression and fast search on +dynamically structured log data, such as JSON. It is optimised for streaming reads on +object stores like [S3][amazon-s3].
291-295: Add comma and blank lines around ingestion-unit properties list.
Insert a comma after "By default" and blank lines before/after the list:-By default we automatically store the following properties about each ingestion unit: +By default, we automatically store the following properties about each ingestion unit: -* `"_filename"` - the original filename of the ingestion unit as it was passed to `clp-s` during compression -* `"_file_split_number"` - incremented each time this ingestion-unit is split across another archive -* `"_archive_creator_id"` - UUID associated with a particular _invocation_ of compression + Currently clp-s will store...
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.
🪛 LanguageTool
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
[uncategorized] ~3-~3: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...fer high compression and fast search on dynamically-structured log data such as JSON logs. This format...
(HYPHENATED_LY_ADVERB_ADJECTIVE)
[style] ~4-~4: Consider a more concise word here.
Context: ...format is optimized for streaming reads in order to enable high performance for archives st...
(IN_ORDER_TO_PREMIUM)
[duplication] ~55-~55: Possible typo: you repeated a word.
Context: ...og.dict"] ad["/array.dict"] t["/0"] end end ::: +++ Figure 1: High-level single...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~153-~153: Possible missing comma found.
Context: .... Since the size of each packet is well-defined reader implementations can attempt to r...
(AI_HYDRA_LEO_MISSING_COMMA)
[typographical] ~154-~154: Consider adding a comma after ‘Generally’ for more clarity.
Context: ...ply skipping the corresponding content. Generally this metadata section design is intende...
(RB_LY_COMMA)
[style] ~167-~167: Consider a more concise word here.
Context: ... we still record the number of segments in order to offer backwards compatibility if we do ...
(IN_ORDER_TO_PREMIUM)
[grammar] ~286-~286: The plural determiner ‘these’ does not agree with the singular noun ‘ingestion’.
Context: ...les, the archive range index associates these ingestion unit properties with a logical range of...
(THIS_NNS)
[style] ~287-~287: Consider a more concise word here.
Context: ...s can be split across multiple archives in order to maintain a configured archive size, and...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~290-~290: Did you mean: “By default,”?
Context: ...of the ingestion unit in each archive. By default we automatically store the following pr...
(BY_DEFAULT_COMMA)
[uncategorized] ~295-~295: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...particular invocation of compression Currently clp-s will store these default properti...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[uncategorized] ~302-~302: A punctuation mark might be missing here.
Context: ...n in Figure 7. (figure-7)= ::::{card} ```json [ { "s": 0, ...
(AI_EN_LECTOR_MISSING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
15-15: Link fragments should be valid
null
(MD051, link-fragments)
65-65: Link fragments should be valid
null
(MD051, link-fragments)
67-67: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
122-122: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
129-129: Link fragments should be valid
null
(MD051, link-fragments)
134-134: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
134-134: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
145-145: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
158-158: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
166-166: Link fragments should be valid
null
(MD051, link-fragments)
176-176: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
180-180: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
188-188: Link fragments should be valid
null
(MD051, link-fragments)
197-197: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
207-207: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
215-215: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
229-229: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
232-232: Link fragments should be valid
null
(MD051, link-fragments)
236-236: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
236-236: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
265-265: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
292-292: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
300-300: Link fragments should be valid
null
(MD051, link-fragments)
304-304: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
318-318: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
🔇 Additional comments (6)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (6)
1-2: Title is clear and descriptive.
12-22: Format overview section is well-organised.
The breakdown ofheader,metadata, andfilessections is clear and easy to follow.
23-28: Note block is informative and correctly formatted.
The little-endian guidance is clear and appropriately emphasised.
30-61: Mermaid diagram appears correctly embedded.
Skipping detailed review of the diagram markup.
108-115: Compressed vs original size explanation is clear.
The examples of ndjson and KV-IR contexts provide helpful clarity.
350-353: Link references are defined correctly.
The[s3-blog],[μSlope], and[amazon-s3]links are valid and formatted consistently.
| (figure-4)= | ||
| ::::{card} | ||
| ```json | ||
| { | ||
| "num_segments": 1 | ||
| } | ||
| ``` | ||
| +++ | ||
| **Figure 4**: Layout of the ArchiveInfo msgpack payload. | ||
| :::: |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Surround JSON code block with blank lines.
Add blank lines before the opening and after the closing fences for Figure 4:
(figure-4)=
:::::{card}
-```json
+```json
{
"num_segments": 1
}
-```
+```
+++
**Figure 4**: Layout of the ArchiveInfo msgpack payload.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
176-176: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
180-180: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 174 to 183, the JSON code block for Figure 4 is not surrounded by blank
lines. Add a blank line before the opening ```json fence and another blank line
after the closing ``` fence to improve markdown formatting and readability.
| The compression type for an archive indicates the general-purpose compressor used to compress each | ||
| section of the archive and is currently one of: | ||
| * `0x0000` - ZStandard | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Surround the compression types list with blank lines.
Add blank lines before and after the bullet list for correct markdown rendering:
The compression type for an archive indicates the general-purpose compressor used to compress each
section of the archive and is currently one of:
-* `0x0000` - ZStandard
+* `0x0000` - ZStandard
All "reserved padding" is reserved for use in future versions of the single-file archive format.Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
122-122: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 120 to 123, the bullet list of compression types is not surrounded by
blank lines, which can cause markdown rendering issues. Add a blank line before
the list starts and another blank line after the list ends to ensure proper
markdown formatting.
There was a problem hiding this comment.
Actionable comments posted: 16
♻️ Duplicate comments (2)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (2)
107-110: Surround bullet list with blank lines.Insert a blank line before and after the compression types list to satisfy MD032:
The compression type for an archive indicates the general-purpose compressor used to compress each section of the archive and is currently one of: -* `0x0000` - ZStandard +* `0x0000` - ZStandard All `reserved_padding` fields are reserved for use in future versions of the single-file archive format.
3-6: Refine introduction for conciseness and style.Consider tightening the opening and removing unnecessary hyphens and phrases. For example:
-The `clp-s` single-file archive format is designed to offer high compression and fast search on -dynamically-structured log data such as JSON logs. This format is optimized for streaming reads in -order to enable high performance for archives stored on object storage systems such as -[S3][amazon-s3]. +The clp-s single-file archive format is optimised for high compression and fast search on dynamically structured log data (e.g., JSON), and for efficient streaming reads on object storage systems like S3.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.
🪛 LanguageTool
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
[uncategorized] ~3-~3: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...fer high compression and fast search on dynamically-structured log data such as JSON logs. This format...
(HYPHENATED_LY_ADVERB_ADJECTIVE)
[style] ~4-~4: Consider a more concise word here.
Context: ...format is optimized for streaming reads in order to enable high performance for archives st...
(IN_ORDER_TO_PREMIUM)
[duplication] ~58-~58: Possible typo: you repeated a word.
Context: ...] ad["/array.dict"] t["/0"] end end ::: +++ Figure 1: High-level single...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~141-~141: Possible missing comma found.
Context: .... Since the size of each packet is well-defined reader implementations can attempt to r...
(AI_HYDRA_LEO_MISSING_COMMA)
[typographical] ~142-~142: Consider adding a comma after ‘Generally’ for more clarity.
Context: ...ply skipping the corresponding content. Generally this metadata section design is intende...
(RB_LY_COMMA)
[style] ~155-~155: Consider a more concise word here.
Context: ... we still record the number of segments in order to offer backwards compatibility if we do ...
(IN_ORDER_TO_PREMIUM)
[grammar] ~275-~275: The plural determiner ‘these’ does not agree with the singular noun ‘ingestion’.
Context: ...les, the archive range index associates these ingestion unit properties with a logical range of...
(THIS_NNS)
[style] ~276-~276: Consider a more concise word here.
Context: ...s can be split across multiple archives in order to maintain a configured archive size, and...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~279-~279: Did you mean: “By default,”?
Context: ...of the ingestion unit in each archive. By default we automatically store the following pr...
(BY_DEFAULT_COMMA)
[uncategorized] ~285-~285: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...particular invocation of compression Currently clp-s will associate these default pr...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[uncategorized] ~292-~292: A punctuation mark might be missing here.
Context: ...n in Figure 7. (figure-7)= ::::{card} ```json [ { "s": 0, ...
(AI_EN_LECTOR_MISSING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
9-9: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
16-16: Link fragments should be valid
null
(MD051, link-fragments)
20-20: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
68-68: Link fragments should be valid
null
(MD051, link-fragments)
77-77: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
77-77: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
90-90: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
109-109: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
117-117: Link fragments should be valid
null
(MD051, link-fragments)
122-122: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
122-122: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
133-133: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
146-146: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
154-154: Link fragments should be valid
null
(MD051, link-fragments)
164-164: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
168-168: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
176-176: Link fragments should be valid
null
(MD051, link-fragments)
185-185: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
195-195: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
203-203: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
217-217: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
220-220: Link fragments should be valid
null
(MD051, link-fragments)
224-224: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
224-224: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
253-253: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
281-281: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
290-290: Link fragments should be valid
null
(MD051, link-fragments)
294-294: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
308-308: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
313-313: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
317-317: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
| By default we automatically store the following properties about each ingestion unit: | ||
| * `"_filename"` - the original filename of the ingestion unit as it was passed to `clp-s` during |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Surround ingestion-unit properties list with blank lines.
Insert blank lines before and after the bullet list (lines 281–284) to satisfy MD032 and improve readability.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
281-281: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 280 to 281, add a blank line before and after the bullet list of
ingestion-unit properties (lines 281–284) to comply with MD032 markdown linting
rules and improve readability.
| The TimestampDictionary packet is a binary format that records: | ||
| 1. The key name, corresponding MPT nodes, and range of values for zero or more timestamp columns | ||
| 2. The format and ID of zero or more timestamp patterns used to encode timestamps in the archive | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Format TimestampDictionary description.
- Add a blank line before the numbered list (lines 217–218) to satisfy MD032.
- Define or reference the meaning of “MPT” at first use for reader clarity.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
217-217: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 216 to 219, add a blank line before the numbered list starting at line 217
to comply with markdown rule MD032, and also define or provide a reference for
the abbreviation "MPT" at its first occurrence to improve reader clarity.
There was a problem hiding this comment.
might be good to define MPT, I agree!
There was a problem hiding this comment.
@quinntaylormitchell Thanks for agreeing! Yes, defining "MPT" (Merged Parse Tree) at first use would definitely improve reader clarity, especially since it's a core concept that appears throughout the document.
🐰
(◕‿◕)
/ づ
"Happy to help with docs!"
✏️ Learnings added
Learnt from: quinntaylormitchell
PR: y-scope/clp#961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.465Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| The ArchiveFileInfo packet has entries with the following names in order: | ||
| * `"/schema_tree"` - the Merged Parse Tree |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Surround ArchiveFileInfo entries list with blank lines.
Insert a blank line between the introductory sentence on line 202 and the list on line 203 to satisfy MD032.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
203-203: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 202 to 203, add a blank line between the sentence "The ArchiveFileInfo
packet has entries with the following names in order:" and the list starting
with "* `/schema_tree` - the Merged Parse Tree" to comply with markdown rule
MD032 for proper spacing.
| Archives currently support the following metadata packet types, some of which are optional: | ||
| * `0x00` - ArchiveInfo | ||
| * `0x01` - ArchiveFileInfo | ||
| * `0x02` - TimestampDictionary (optional) | ||
| * `0x03` - RangeIndex (optional) | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Surround metadata packet list with blank lines.
Add blank lines before and after the list of packet types to satisfy MD032:
Archives currently support the following metadata packet types, some of which are optional:
-* `0x00` - ArchiveInfo
-* `0x01` - ArchiveFileInfo
-* `0x02` - TimestampDictionary (optional)
-* `0x03` - RangeIndex (optional)
+* `0x00` - ArchiveInfo
+* `0x01` - ArchiveFileInfo
+* `0x02` - TimestampDictionary (optional)
+* `0x03` - RangeIndex (optional)
### ArchiveInfo packet📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Archives currently support the following metadata packet types, some of which are optional: | |
| * `0x00` - ArchiveInfo | |
| * `0x01` - ArchiveFileInfo | |
| * `0x02` - TimestampDictionary (optional) | |
| * `0x03` - RangeIndex (optional) | |
| Archives currently support the following metadata packet types, some of which are optional: | |
| * `0x00` - ArchiveInfo | |
| * `0x01` - ArchiveFileInfo | |
| * `0x02` - TimestampDictionary (optional) | |
| * `0x03` - RangeIndex (optional) | |
| ### ArchiveInfo packet |
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
146-146: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 145 to 150, add a blank line before and after the list of metadata packet
types to comply with markdown rule MD032. This means inserting an empty line
above the line starting with "Archives currently support" and another empty line
after the last list item "* `0x03` - RangeIndex (optional)".
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (7)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (5)
216-219: Add a blank line before the numbered list in TimestampDictionary.Separate the list from its preceding paragraph for proper markdown rendering:
-The TimestampDictionary packet is a binary format that records: +The TimestampDictionary packet is a binary format that records: + 1. The key name, corresponding MPT nodes, and range of values for zero or more timestamp columns 2. The format and ID of zero or more timestamp patterns used to encode timestamps in the archive
107-110: Surround the compression type list with blank lines.Markdown lists should have an empty line before and after for correct rendering:
The compression type for an archive indicates the general-purpose compressor used to compress each section of the archive, and is currently one of: + * `0x0000` - ZStandard + All `reserved_padding` fields are reserved for use in future versions of the single-file archive format.
202-210: Insert a blank line before the ArchiveFileInfo entries list.Markdown lists should be preceded by an empty line for correct formatting:
-The ArchiveFileInfo packet has entries with the following names in order: +The ArchiveFileInfo packet has entries with the following names in order: + * `"/schema_tree"` - the Merged Parse Tree * `"/schema_ids"` - the Schema Map * `"/table_metadata"` - metadata describing the contents... * `"/var.dict"` - the Variable Dictionary * `"/log.dict"` - the Log-type Dictionary * `"/array.dict"` - the Array Log-type Dictionary (optional) * `"/0"` - the first tables segment
3-6: Remove unnecessary hyphen and simplify phrasing in the introduction.The adverb “dynamically-structured” doesn’t need a hyphen, and “in order to enable” can be streamlined to “to” for clarity:
-The `clp-s` single-file archive format is designed to offer high compression and fast search on -dynamically-structured log data such as JSON logs. This format is optimized for streaming reads in -order to enable high performance for archives stored on object storage systems such as -[S3][amazon-s3]. +The `clp-s` single-file archive format is designed to offer high compression and fast search on +dynamically structured log data such as JSON logs. This format is optimized for streaming reads to +deliver high performance for archives stored on object storage systems such as [S3][amazon-s3].
280-284: Surround the ingestion-unit properties list with blank lines.Add empty lines before and after the bullet list to satisfy markdown list rules and improve readability:
-By default, we automatically store the following properties about each ingestion unit: +By default, we automatically store the following properties about each ingestion unit: + * `"_filename"` - the original filename... * `"_file_split_number"` - incremented each time... * `"_archive_creator_id"` - UUID associated... + Currently, `clp-s` will associate these default properties...docs/src/dev-guide/design-clp-structured/index.md (2)
3-4: Remove unnecessary hyphen in “dynamically-structured”.Adverbs ending in “-ly” do not require hyphens:
-The compressed-log-processor-structured (abbreviated `clp-s`, also known as [μSlope][μSlope]) -archive format is a storage format for large chunks of dynamically-structured (e.g., JSON) logs. +The compressed-log-processor-structured (abbreviated `clp-s`, also known as [μSlope][μSlope]) +archive format is a storage format for large chunks of dynamically structured (e.g., JSON) logs.
18-18: Correct the possessive construction for the clp-s format.Simplify “
clp-s' format” to “theclp-sformat”:-All information needed to understand `clp-s`' format. +All information needed to understand the `clp-s` format.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
docs/src/dev-guide/design-clp-structured/index.md(1 hunks)docs/src/dev-guide/design-clp-structured/single-file-archive-format.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.
🪛 LanguageTool
docs/src/dev-guide/design-clp-structured/index.md
[uncategorized] ~4-~4: The hyphen in dynamically-structured is redundant.
Context: ...is a storage format for large chunks of dynamically-structured (e.g., JSON) logs. Compared to the [KV-...
(ADVERB_LY_HYPHEN_FIX)
[duplication] ~15-~15: Possible typo: you repeated a word.
Context: ... :gutter: 2 :::{grid-item-card} 🔗 background Background ^^^ All information needed to understan...
(ENGLISH_WORD_REPEAT_RULE)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
[uncategorized] ~3-~3: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...fer high compression and fast search on dynamically-structured log data such as JSON logs. This format...
(HYPHENATED_LY_ADVERB_ADJECTIVE)
[style] ~4-~4: Consider a more concise word here.
Context: ...format is optimized for streaming reads in order to enable high performance for archives st...
(IN_ORDER_TO_PREMIUM)
[duplication] ~58-~58: Possible typo: you repeated a word.
Context: ...] ad["/array.dict"] t["/0"] end end ::: +++ Figure 1: High-level single...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~155-~155: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ... single segment (i.e. there is only one tables segment file), but we still record the ...
(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
[grammar] ~275-~275: The plural determiner ‘these’ does not agree with the singular noun ‘ingestion’.
Context: ...les, the archive range index associates these ingestion unit properties with a logical range of...
(THIS_NNS)
[style] ~276-~276: Consider a more concise word here.
Context: ...s can be split across multiple archives in order to maintain a configured archive size, and...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~285-~285: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...particular invocation of compression Currently clp-s will associate these default pr...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
🪛 markdownlint-cli2 (0.17.2)
docs/src/dev-guide/design-clp-structured/single-file-archive-format.md
16-16: Link fragments should be valid
null
(MD051, link-fragments)
20-20: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
68-68: Link fragments should be valid
null
(MD051, link-fragments)
77-77: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
77-77: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
90-90: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
109-109: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
117-117: Link fragments should be valid
null
(MD051, link-fragments)
122-122: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
122-122: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
133-133: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
146-146: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
154-154: Link fragments should be valid
null
(MD051, link-fragments)
164-164: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
168-168: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
176-176: Link fragments should be valid
null
(MD051, link-fragments)
185-185: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
195-195: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
203-203: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
217-217: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
220-220: Link fragments should be valid
null
(MD051, link-fragments)
224-224: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
224-224: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
253-253: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
281-281: Lists should be surrounded by blank lines
null
(MD032, blanks-around-lists)
290-290: Link fragments should be valid
null
(MD051, link-fragments)
294-294: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
308-308: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
| ```json | ||
| [ | ||
| { | ||
| "s": 0, | ||
| "e": 100, | ||
| "f": { | ||
| "_filename": "/my/file.jsonl", | ||
| "_file_split_number": 0, | ||
| "_archive_creator_id": " 03f2958a-7a2e-448c-a203-60f2cc990d74", | ||
| "arbitrary_user_property": "..." | ||
| } | ||
| }, | ||
| "..." | ||
| ] | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Add blank lines around the RangeIndex JSON code block.
Ensure the fenced snippet is separated by an empty line before and after:
::::{card}
-
```json
[
{
"s": 0,
"e": 100,
"f": {
"_filename": "/my/file.jsonl",
"_file_split_number": 0,
"_archive_creator_id": "03f2958a-7a2e-448c-a203-60f2cc990d74",
"arbitrary_user_property": "..."
}
},
"..."
]+++
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>
294-294: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
---
308-308: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 294 to 308, add a blank line before and after the fenced JSON code block
to ensure proper separation from surrounding text. This means inserting an empty
line immediately before the opening json and another empty line immediately after the closing .
</details>
<!-- This is an auto-generated comment by CodeRabbit -->
| ```json | ||
| { | ||
| "num_segments": 1 | ||
| } | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Add blank lines around the ArchiveInfo JSON snippet.
Ensure there is an empty line before and after the fenced code block:
(figure-4)=
:::::{card}
-
```json
{
"num_segments": 1
}+++
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>
164-164: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
---
168-168: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 164 to 168, add a blank line before and after the fenced JSON code block
containing the ArchiveInfo snippet to improve readability and formatting
consistency in the markdown file.
</details>
<!-- This is an auto-generated comment by CodeRabbit -->
| ```json | ||
| { | ||
| "files": [ | ||
| { | ||
| "n": "/schema_tree", | ||
| "o": 0 | ||
| }, | ||
| "..." | ||
| ] | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Surround the ArchiveFileInfo JSON example with blank lines.
Add spacing before and after the code fences for improved readability:
:::::{card}
-
```json
{
"files": [
{
"n": "/schema_tree",
"o": 0
},
"..."
]
}+++
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>
185-185: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 185 to 194, the JSON code example for ArchiveFileInfo lacks blank lines
before and after the code fences. Add a blank line before the opening json fence and another blank line after the closing fence to improve readability
and formatting consistency in the markdown file.
</details>
<!-- This is an auto-generated comment by CodeRabbit -->
| ``` | ||
| Header { | ||
| magic_number: uint8_t[4] = {0xFD, 0x2F, 0xC5, 0x30} | ||
| patch_version: uint16_t | ||
| minor_version: uint8_t | ||
| major_version: uint8_t | ||
| original_uncompressed_size: uint64_t | ||
| compressed_archive_size: uint64_t | ||
| reserved_padding1: uint64_t[4] | ||
| metadata_section_size: uint32_t | ||
| compression_type: uint16_t | ||
| reserved_padding2: uint16_t | ||
| } | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Specify language and add blank lines around the header code block.
Improving readability and syntax highlighting by declaring cpp and surrounding the fences with blank lines:
-```
+```cpp
Header {
magic_number: uint8_t[4] = {0xFD, 0x2F, 0xC5, 0x30}
patch_version: uint16_t
minor_version: uint8_t
major_version: uint8_t
original_uncompressed_size: uint64_t
compressed_archive_size: uint64_t
reserved_padding1: uint64_t[4]
metadata_section_size: uint32_t
compression_type: uint16_t
reserved_padding2: uint16_t
}
-```
+```
+++🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
77-77: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
77-77: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
90-90: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 77 to 90, the code block showing the Header structure lacks language
specification and surrounding blank lines. To fix this, add a blank line before
and after the triple backticks and specify the language as cpp immediately after
the opening backticks to enable syntax highlighting and improve readability.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| ``` | ||
| enum EncodingType : uint64_t { | ||
| Epoch = 0x1, | ||
| DoubleEpoch = 0x2 | ||
| } | ||
|
|
||
| TimestampRange { | ||
| key_len: uint64_t | ||
| key: char[key_len] | ||
| num_column_ids: uint64_t | ||
| column_ids: int32_t[num_column_ids] | ||
| encoding_type: EncodingType | ||
| epoch_start: union {epochtime_t, double} | ||
| epoch_end: union {epochtime_t, double} | ||
| } | ||
|
|
||
| TimestampPattern { | ||
| pattern_id: uint64_t | ||
| pattern_len: uint64_t | ||
| pattern: char[pattern_len] | ||
| } | ||
|
|
||
| TimestampDictionary { | ||
| num_ranges: uint64_t | ||
| ranges: TimestampRange[num_entries] | ||
| num_patterns: uint64_t | ||
| patterns: TimestampPattern[num_patterns] | ||
| } | ||
|
|
||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Specify language and add blank lines for the TimestampDictionary code block.
Use cpp for syntax highlighting and ensure blank lines around the fences:
-```
+```cpp
enum EncodingType : uint64_t {
Epoch = 0x1,
DoubleEpoch = 0x2
}
...
TimestampDictionary {
num_ranges: uint64_t
ranges: TimestampRange[num_entries]
num_patterns: uint64_t
patterns: TimestampPattern[num_patterns]
}
-```
+```
+++🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
224-224: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
224-224: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
253-253: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 224 to 253, the code block defining EncodingType, TimestampRange,
TimestampPattern, and TimestampDictionary lacks a language specifier and blank
lines around the triple backticks. Add "cpp" immediately after the opening
triple backticks for syntax highlighting and ensure there is a blank line before
the opening and after the closing triple backticks to improve readability and
formatting.
| ``` | ||
| MetadataPacket { | ||
| packet_type: uint8_t | ||
| packet_size: uint32_t | ||
| packet_content: uint8_t[packet_size] | ||
| } | ||
|
|
||
| MetadataPacketStream { | ||
| num_packets: uint8_t | ||
| packets: MetadataPacket[num_packets] | ||
| } | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Add language specifier and blank lines for the metadata packet code block.
Use cpp for syntax highlighting and satisfy lint rules by spacing around the fenced block:
-```
+```cpp
MetadataPacket {
packet_type: uint8_t
packet_size: uint32_t
packet_content: uint8_t[packet_size]
}
MetadataPacketStream {
num_packets: uint8_t
packets: MetadataPacket[num_packets]
}
-```
+```
+++🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
122-122: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
122-122: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
133-133: Fenced code blocks should be surrounded by blank lines
null
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
In docs/src/dev-guide/design-clp-structured/single-file-archive-format.md around
lines 122 to 133, the fenced code block for MetadataPacket lacks a language
specifier and proper spacing. Add "cpp" immediately after the opening triple
backticks for syntax highlighting and insert a blank line after the opening and
before the closing triple backticks to satisfy lint rules.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
quinntaylormitchell
left a comment
There was a problem hiding this comment.
looks good Devin! figure 2 on the "Single-file archive format" page looks a lot better. most of the rabbit comments are about leaving blank lines for formatting, so I will leave that to you. other than that I think it's ready to go!
| ``` | ||
| Header { | ||
| magic_number: uint8_t[4] = {0xFD, 0x2F, 0xC5, 0x30} | ||
| patch_version: uint16_t | ||
| minor_version: uint8_t | ||
| major_version: uint8_t | ||
| original_uncompressed_size: uint64_t | ||
| compressed_archive_size: uint64_t | ||
| reserved_padding1: uint64_t[4] | ||
| metadata_section_size: uint32_t | ||
| compression_type: uint16_t | ||
| reserved_padding2: uint16_t | ||
| } | ||
| ``` |
| ``` | ||
| MetadataPacket { | ||
| packet_type: uint8_t | ||
| packet_size: uint32_t | ||
| packet_content: uint8_t[packet_size] | ||
| } | ||
|
|
||
| MetadataPacketStream { | ||
| num_packets: uint8_t | ||
| packets: MetadataPacket[num_packets] | ||
| } | ||
| ``` |
| The TimestampDictionary packet is a binary format that records: | ||
| 1. The key name, corresponding MPT nodes, and range of values for zero or more timestamp columns | ||
| 2. The format and ID of zero or more timestamp patterns used to encode timestamps in the archive | ||
|
|
There was a problem hiding this comment.
might be good to define MPT, I agree!
|
Yeah I did sneak in an expansion of the MPT acronym on first use, but I think for this doc I'll leave explanation for the MPT-specific section later in the doc. For the extra whitespace I decided to ignore rabbit on that one since the |
|
|
||
| TimestampDictionary { | ||
| num_ranges: uint64_t | ||
| ranges: TimestampRange[num_entries] |
There was a problem hiding this comment.
| ranges: TimestampRange[num_entries] | |
| ranges: TimestampRange[num_ranges] |
Description
This PR begins adding developer design documentation for clp-s. It includes a very basic overview, a completely stubbed out background, and a halfway complete doc on the clp-s single-file archive format.
The documentation on the SFA format covers the archive format from the header through every metadata packet type. Detail on the "files" section will be added in a future PR.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit