Skip to content

docs(log-ingestor): Add user guide for using log-ingestor.#1789

Merged
quinntaylormitchell merged 28 commits into
y-scope:mainfrom
LinZhihao-723:log-ingestor-user-doc
Dec 19, 2025
Merged

docs(log-ingestor): Add user guide for using log-ingestor.#1789
quinntaylormitchell merged 28 commits into
y-scope:mainfrom
LinZhihao-723:log-ingestor-user-doc

Conversation

@LinZhihao-723

@LinZhihao-723 LinZhihao-723 commented Dec 16, 2025

Copy link
Copy Markdown
Member

Description

This PR adds a user guide for using log-ingestor.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Ensure all workflows pass.
  • Ensure the doc can be properly rendered.

Summary by CodeRabbit

  • Documentation
    • Added a new user guide for the log-ingestor component covering setup, configuration, continuous ingestion from S3, job buffering for efficient compression, REST API endpoints with Swagger UI, and operational considerations and limitations.
    • Added a new tile in the Guides overview and updated the Guides index to include the new log-ingestor guide.

✏️ Tip: You can customize this high-level summary in your review settings.

@LinZhihao-723 LinZhihao-723 requested a review from a team as a code owner December 16, 2025 22:38
@coderabbitai

coderabbitai Bot commented Dec 16, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Added a new user guide for the log-ingestor component and integrated it into the documentation: a new guide file describing configuration, ingestion behaviours, APIs and continuous ingestion modes, plus a new overview card and a Guides toctree entry linking to it.

Changes

Cohort / File(s) Summary
Overview & site index
docs/src/user-docs/guides-overview.md, docs/src/user-docs/index.md
Inserted a new grid-item-card titled "Using `log-ingestor`" into the guides overview and added guides-using-log-ingestor to the Guides toctree in the index.
New guide content
docs/src/user-docs/guides-using-log-ingestor.md
Added a comprehensive guide for log-ingestor describing startup behaviour, log_ingestor settings, ingestion jobs with buffering/compression, REST APIs and Swagger UI, continuous ingestion via S3 scanner or SQS listener, assumptions, limitations, and external references.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Check Markdown formatting and style consistency with existing docs.
  • Verify that the internal link guides-using-log-ingestor resolves in the site build.
  • Validate external links (AWS S3 events, Swagger UI) and API reference snippets for accuracy.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding documentation for the log-ingestor component user guide.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a8beab and 0c32565.

📒 Files selected for processing (4)
  • docs/src/user-docs/guides-overview.md (1 hunks)
  • docs/src/user-docs/guides-using-log-ingestor.md (1 hunks)
  • docs/src/user-docs/index.md (1 hunks)
  • taskfiles/docs.yaml (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • docs/src/user-docs/guides-overview.md
🪛 LanguageTool
docs/src/user-docs/guides-using-log-ingestor.md

[uncategorized] ~9-~9: Possible missing comma found.
Context: ...owing capabilities are not yet supported but are planned for future releases: * One...

(AI_HYDRA_LEO_MISSING_COMMA)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: rust-checks
🔇 Additional comments (13)
taskfiles/docs.yaml (2)

57-57: LGTM!

The task call is correctly positioned in the build sequence after api-server-openapi-json and before the checksum computation.


130-141: LGTM!

The task definition correctly mirrors the structure of api-server-openapi-json and appropriately generates the OpenAPI JSON file for log-ingestor documentation.

docs/src/user-docs/index.md (1)

69-69: LGTM!

The new guide entry is correctly added to the Guides toctree and properly positioned within the navigation structure.

docs/src/user-docs/guides-overview.md (1)

43-48: LGTM!

The new grid card is properly structured and provides a clear, concise description of the log-ingestor guide's purpose.

docs/src/user-docs/guides-using-log-ingestor.md (9)

1-5: LGTM!

The introduction clearly establishes the purpose of the log-ingestor component and sets appropriate expectations for the guide.


7-14: LGTM!

The note appropriately sets expectations about current limitations and planned features, helping users understand the component's scope.


16-20: LGTM!

The configuration instructions are clear and provide the essential information for users to start the log-ingestor component.


22-27: LGTM!

The explanation of ingestion jobs effectively communicates the concept and its benefits for compression efficiency.


29-37: LGTM!

The API orchestration section clearly explains how to interact with log-ingestor and appropriately notes the current direct-access requirement.


39-49: LGTM!

The fault tolerance warning is appropriately emphasised and clearly communicates the current limitations and their implications for users.


51-59: LGTM!

The introduction to continuous ingestion effectively previews the two available ingestion methods and establishes a clear structure for the following sections.


61-78: LGTM!

The S3 scanner section provides a thorough explanation with clear examples, and the important block appropriately highlights critical assumptions for correct operation.


112-117: LGTM!

The reference links are properly formatted and provide appropriate connections to external AWS documentation and internal CLP documentation.

Comment thread docs/src/user-docs/guides-using-log-ingestor.md Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c32565 and 6c2e2e1.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-log-ingestor.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
🪛 LanguageTool
docs/src/user-docs/guides-using-log-ingestor.md

[uncategorized] ~9-~9: Possible missing comma found.
Context: ...owing capabilities are not yet supported but are planned for future releases: * One...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-log-ingestor.md

3-3: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


73-73: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)

🔇 Additional comments (1)
docs/src/user-docs/guides-using-log-ingestor.md (1)

107-110: Verify resolution of past review comment.

A previous review flagged a redundant "currently" in this section. The current version correctly reads "Currently, the SQS listener does not support custom S3 endpoint configurations" without duplication. Please confirm this reflects your intended fix.

Comment thread docs/src/user-docs/guides-using-log-ingestor.md Outdated
Comment thread docs/src/user-docs/guides-using-log-ingestor.md Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
docs/src/user-docs/guides-using-log-ingestor.md (2)

3-3: Remove trailing whitespace.

Lines contain trailing spaces that should be removed per markdownlint rules.

Apply this diff to remove trailing spaces:

-`clp-json` includes a component called `log-ingestor` that enables users to ingest logs into CLP 
+`clp-json` includes a component called `log-ingestor` that enables users to ingest logs into CLP

and

-  example, objects with keys `log1` and `log2` will be ingested sequentially. If a new object with 
+  example, objects with keys `log1` and `log2` will be ingested sequentially. If a new object with

Also applies to: 73-73


31-31: Fix typo in API terminology.

Line 31 contains a typo: "RESTfuls APIs" should be "RESTful APIs" (the adjective form is not pluralized).

Apply this diff:

-`log-ingestor` exposes **RESTfuls APIs** for ingestion job orchestrations. You can explore all
+`log-ingestor` exposes **RESTful APIs** for ingestion job orchestrations. You can explore all
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c2e2e1 and bc109fe.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-log-ingestor.md (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-08-08T21:15:10.905Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/test_identity_transformation.py:87-95
Timestamp: 2025-08-08T21:15:10.905Z
Learning: In the CLP project's integration tests (Python code), variable names should use "logs" (plural) rather than "log" (singular) when referring to test logs or log-related entities, as this aligns with the naming conventions used throughout the codebase.

Applied to files:

  • docs/src/user-docs/guides-using-log-ingestor.md
🪛 LanguageTool
docs/src/user-docs/guides-using-log-ingestor.md

[uncategorized] ~9-~9: Possible missing comma found.
Context: ...owing capabilities are not yet supported but are planned for future releases: * One...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-log-ingestor.md

3-3: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


73-73: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build (macos-15)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: check-generated

Comment thread docs/src/user-docs/guides-using-log-ingestor.md Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc109fe and 89b3d5b.

📒 Files selected for processing (2)
  • docs/src/user-docs/guides-overview.md (1 hunks)
  • docs/src/user-docs/guides-using-log-ingestor.md (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-08-08T21:15:10.905Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/test_identity_transformation.py:87-95
Timestamp: 2025-08-08T21:15:10.905Z
Learning: In the CLP project's integration tests (Python code), variable names should use "logs" (plural) rather than "log" (singular) when referring to test logs or log-related entities, as this aligns with the naming conventions used throughout the codebase.

Applied to files:

  • docs/src/user-docs/guides-using-log-ingestor.md
🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-log-ingestor.md

88-88: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: check-generated
🔇 Additional comments (2)
docs/src/user-docs/guides-using-log-ingestor.md (1)

1-126: Documentation content and structure are well-executed.

The guide effectively covers log-ingestor usage with clear sections on starting the component, managing ingestion jobs, and configuring continuous S3 ingestion. The technical content is accurate, and the use of admonitions (note, warning, important) appropriately guides users through assumptions, limitations, and planned features. Reference links are properly formatted and point to relevant API documentation and external resources.

docs/src/user-docs/guides-overview.md (1)

43-49: New guide card is properly integrated.

The new card for log-ingestor guide follows the established structure and formatting of existing cards, with an accurate link target, clear title, and descriptive text. The placement after "Using the API server" maintains a logical flow within the guides section.

Comment thread docs/src/user-docs/guides-using-log-ingestor.md Outdated
Comment thread docs/src/user-docs/guides-overview.md Outdated
Comment thread docs/src/user-docs/index.md Outdated
guides-external-database
guides-multi-host
guides-retention
guides-using-log-ingestor

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about putting this below the API server guide?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

also just noticed that the order of the card links in guides-overview.md doesn't match the sidebar, and some of the sidebar items don't have card links. making a note to myself to put up a PR to fix that once this is merged.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps ask coderabbit to open an issue?

Comment thread docs/src/user-docs/guides-using-log-ingestor.md Outdated
Comment thread docs/src/user-docs/guides-using-log-ingestor.md Outdated
Comment thread docs/src/user-docs/guides-using-log-ingestor.md Outdated
Comment thread docs/src/user-docs/guides-using-log-ingestor.md Outdated
Comment thread docs/src/user-docs/guides-using-log-ingestor.md Outdated
Comment thread docs/src/user-docs/guides-using-log-ingestor.md Outdated
Comment thread docs/src/user-docs/guides-using-log-ingestor.md
Comment thread docs/src/user-docs/guides-using-log-ingestor.md
@kirkrodrigues kirkrodrigues changed the title docs(log-ingestor): Add user guide for using log-ingestor. docs(log-ingestor): Add user guide for using log-ingestor. Dec 19, 2025
@quinntaylormitchell quinntaylormitchell merged commit 80757cd into y-scope:main Dec 19, 2025
8 checks passed
davidlion pushed a commit to davidlion/clp that referenced this pull request Jan 17, 2026
…#1789)

Co-authored-by: Quinn Taylor Mitchell <q.mitchell@mail.utoronto.ca>
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…#1789)

Co-authored-by: Quinn Taylor Mitchell <q.mitchell@mail.utoronto.ca>
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…#1789)

Co-authored-by: Quinn Taylor Mitchell <q.mitchell@mail.utoronto.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants