docs(log-ingestor): Add user guide for using log-ingestor.#1789
Conversation
…23/clp into log-ingestor-api-doc
WalkthroughAdded a new user guide for the log-ingestor component and integrated it into the documentation: a new guide file describing configuration, ingestion behaviours, APIs and continuous ingestion modes, plus a new overview card and a Guides toctree entry linking to it. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
docs/src/user-docs/guides-overview.md(1 hunks)docs/src/user-docs/guides-using-log-ingestor.md(1 hunks)docs/src/user-docs/index.md(1 hunks)taskfiles/docs.yaml(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
docs/src/user-docs/guides-overview.md
🪛 LanguageTool
docs/src/user-docs/guides-using-log-ingestor.md
[uncategorized] ~9-~9: Possible missing comma found.
Context: ...owing capabilities are not yet supported but are planned for future releases: * One...
(AI_HYDRA_LEO_MISSING_COMMA)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: rust-checks
🔇 Additional comments (13)
taskfiles/docs.yaml (2)
57-57: LGTM!The task call is correctly positioned in the build sequence after
api-server-openapi-jsonand before the checksum computation.
130-141: LGTM!The task definition correctly mirrors the structure of
api-server-openapi-jsonand appropriately generates the OpenAPI JSON file for log-ingestor documentation.docs/src/user-docs/index.md (1)
69-69: LGTM!The new guide entry is correctly added to the Guides toctree and properly positioned within the navigation structure.
docs/src/user-docs/guides-overview.md (1)
43-48: LGTM!The new grid card is properly structured and provides a clear, concise description of the log-ingestor guide's purpose.
docs/src/user-docs/guides-using-log-ingestor.md (9)
1-5: LGTM!The introduction clearly establishes the purpose of the log-ingestor component and sets appropriate expectations for the guide.
7-14: LGTM!The note appropriately sets expectations about current limitations and planned features, helping users understand the component's scope.
16-20: LGTM!The configuration instructions are clear and provide the essential information for users to start the log-ingestor component.
22-27: LGTM!The explanation of ingestion jobs effectively communicates the concept and its benefits for compression efficiency.
29-37: LGTM!The API orchestration section clearly explains how to interact with log-ingestor and appropriately notes the current direct-access requirement.
39-49: LGTM!The fault tolerance warning is appropriately emphasised and clearly communicates the current limitations and their implications for users.
51-59: LGTM!The introduction to continuous ingestion effectively previews the two available ingestion methods and establishes a clear structure for the following sections.
61-78: LGTM!The S3 scanner section provides a thorough explanation with clear examples, and the important block appropriately highlights critical assumptions for correct operation.
112-117: LGTM!The reference links are properly formatted and provide appropriate connections to external AWS documentation and internal CLP documentation.
…723/clp into log-ingestor-user-doc
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-docs/guides-using-log-ingestor.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
🪛 LanguageTool
docs/src/user-docs/guides-using-log-ingestor.md
[uncategorized] ~9-~9: Possible missing comma found.
Context: ...owing capabilities are not yet supported but are planned for future releases: * One...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-log-ingestor.md
3-3: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
73-73: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
🔇 Additional comments (1)
docs/src/user-docs/guides-using-log-ingestor.md (1)
107-110: Verify resolution of past review comment.A previous review flagged a redundant "currently" in this section. The current version correctly reads "Currently, the SQS listener does not support custom S3 endpoint configurations" without duplication. Please confirm this reflects your intended fix.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
docs/src/user-docs/guides-using-log-ingestor.md (2)
3-3: Remove trailing whitespace.Lines contain trailing spaces that should be removed per markdownlint rules.
Apply this diff to remove trailing spaces:
-`clp-json` includes a component called `log-ingestor` that enables users to ingest logs into CLP +`clp-json` includes a component called `log-ingestor` that enables users to ingest logs into CLPand
- example, objects with keys `log1` and `log2` will be ingested sequentially. If a new object with + example, objects with keys `log1` and `log2` will be ingested sequentially. If a new object withAlso applies to: 73-73
31-31: Fix typo in API terminology.Line 31 contains a typo: "RESTfuls APIs" should be "RESTful APIs" (the adjective form is not pluralized).
Apply this diff:
-`log-ingestor` exposes **RESTfuls APIs** for ingestion job orchestrations. You can explore all +`log-ingestor` exposes **RESTful APIs** for ingestion job orchestrations. You can explore all
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-docs/guides-using-log-ingestor.md(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-08-08T21:15:10.905Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/test_identity_transformation.py:87-95
Timestamp: 2025-08-08T21:15:10.905Z
Learning: In the CLP project's integration tests (Python code), variable names should use "logs" (plural) rather than "log" (singular) when referring to test logs or log-related entities, as this aligns with the naming conventions used throughout the codebase.
Applied to files:
docs/src/user-docs/guides-using-log-ingestor.md
🪛 LanguageTool
docs/src/user-docs/guides-using-log-ingestor.md
[uncategorized] ~9-~9: Possible missing comma found.
Context: ...owing capabilities are not yet supported but are planned for future releases: * One...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-log-ingestor.md
3-3: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
73-73: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build (macos-15)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: check-generated
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
docs/src/user-docs/guides-overview.md(1 hunks)docs/src/user-docs/guides-using-log-ingestor.md(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-08-08T21:15:10.905Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/test_identity_transformation.py:87-95
Timestamp: 2025-08-08T21:15:10.905Z
Learning: In the CLP project's integration tests (Python code), variable names should use "logs" (plural) rather than "log" (singular) when referring to test logs or log-related entities, as this aligns with the naming conventions used throughout the codebase.
Applied to files:
docs/src/user-docs/guides-using-log-ingestor.md
🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-log-ingestor.md
88-88: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: check-generated
🔇 Additional comments (2)
docs/src/user-docs/guides-using-log-ingestor.md (1)
1-126: Documentation content and structure are well-executed.The guide effectively covers log-ingestor usage with clear sections on starting the component, managing ingestion jobs, and configuring continuous S3 ingestion. The technical content is accurate, and the use of admonitions (note, warning, important) appropriately guides users through assumptions, limitations, and planned features. Reference links are properly formatted and point to relevant API documentation and external resources.
docs/src/user-docs/guides-overview.md (1)
43-49: New guide card is properly integrated.The new card for log-ingestor guide follows the established structure and formatting of existing cards, with an accurate link target, clear title, and descriptive text. The placement after "Using the API server" maintains a logical flow within the guides section.
| guides-external-database | ||
| guides-multi-host | ||
| guides-retention | ||
| guides-using-log-ingestor |
There was a problem hiding this comment.
How about putting this below the API server guide?
There was a problem hiding this comment.
agreed
also just noticed that the order of the card links in guides-overview.md doesn't match the sidebar, and some of the sidebar items don't have card links. making a note to myself to put up a PR to fix that once this is merged.
There was a problem hiding this comment.
Perhaps ask coderabbit to open an issue?
log-ingestor.
…#1789) Co-authored-by: Quinn Taylor Mitchell <q.mitchell@mail.utoronto.ca>
…#1789) Co-authored-by: Quinn Taylor Mitchell <q.mitchell@mail.utoronto.ca>
…#1789) Co-authored-by: Quinn Taylor Mitchell <q.mitchell@mail.utoronto.ca>
Description
This PR adds a user guide for using log-ingestor.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.