Skip to content

docs(clp-package): Add instructions for ingesting logs from non-AWS S3-compatible object storage.#1796

Merged
hoophalab merged 7 commits into
y-scope:mainfrom
hoophalab:s3doc
Jan 7, 2026
Merged

docs(clp-package): Add instructions for ingesting logs from non-AWS S3-compatible object storage.#1796
hoophalab merged 7 commits into
y-scope:mainfrom
hoophalab:s3doc

Conversation

@hoophalab

@hoophalab hoophalab commented Dec 18, 2025

Copy link
Copy Markdown
Contributor

Description

Add instructions for ingesting logs from S3 compatible object storage.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

task docs:serve serves the doc correctly

Summary by CodeRabbit

  • Documentation
    • Added guidance accepting HTTP-style endpoints for S3-compatible object storage (object and key-prefix modes), with examples showing host:port, bucket and key formats.
    • Clarified CLP will auto-detect appropriate endpoint URLs during compression and reiterated required bucket policy considerations.
    • Noted Web UI currently cannot display extracted streams from custom S3 endpoints; fix planned for a future release.

✏️ Tip: You can customize this high-level summary in your review settings.

@hoophalab hoophalab requested a review from a team as a code owner December 18, 2025 05:13
@coderabbitai

coderabbitai Bot commented Dec 18, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Adds documentation clarifications for S3-compatible endpoints: accepts HTTP-style object and key-prefix URLs for s3-object and s3-key-prefix modes; states CLP auto-determines endpoint URL during compression for non-AWS endpoints; notes the Web UI cannot yet view extracted streams from custom S3 endpoints.

Changes

Cohort / File(s) Change Summary
Object storage usage guide
docs/src/user-docs/guides-using-object-storage/clp-usage.md
Adds acceptance of HTTP S3-compatible object URLs (e.g., http://<host>:<port>/<bucket-name>/<object-key>) and HTTP key-prefix URL examples for s3-key-prefix mode; clarifies accepted URL formats only.
Object storage configuration guide
docs/src/user-docs/guides-using-object-storage/object-storage-config.md
Adds guidance to ensure equivalent bucket access policies for S3-compatible endpoints; documents that CLP will auto-determine the endpoint URL during compression; adds note that the Web UI currently cannot view extracted streams stored on custom S3 endpoints.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5–10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: adding documentation for ingesting logs from S3-compatible object storage, which aligns with the modifications to the two documentation files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom Pre-merge Checks in the settings.


📜 Recent review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 194c34d and 963c0b1.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-object-storage/object-storage-config.md
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
📚 Learning: 2025-12-10T23:32:24.363Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.

Applied to files:

  • docs/src/user-docs/guides-using-object-storage/object-storage-config.md
📚 Learning: 2024-12-12T19:20:59.778Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 634
File: components/clp-py-utils/clp_py_utils/s3_utils.py:11-16
Timestamp: 2024-12-12T19:20:59.778Z
Learning: S3 roles provided may not have permission to perform `head_bucket` and `delete_object` operations; verification logic should avoid using these methods.

Applied to files:

  • docs/src/user-docs/guides-using-object-storage/object-storage-config.md
📚 Learning: 2025-10-13T03:24:35.074Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.

Applied to files:

  • docs/src/user-docs/guides-using-object-storage/object-storage-config.md
📚 Learning: 2025-11-17T22:58:50.056Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.

Applied to files:

  • docs/src/user-docs/guides-using-object-storage/object-storage-config.md
📚 Learning: 2025-01-13T21:18:54.629Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 662
File: components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py:167-186
Timestamp: 2025-01-13T21:18:54.629Z
Learning: In the CLP system, streams can be deterministically recreated if needed for debugging purposes, so it's acceptable to delete them even if their S3 upload fails.

Applied to files:

  • docs/src/user-docs/guides-using-object-storage/object-storage-config.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: check-generated
🔇 Additional comments (2)
docs/src/user-docs/guides-using-object-storage/object-storage-config.md (2)

4-6: Guidance aligns well with S3-compatible endpoint support.

The addition appropriately addresses non-AWS S3-compatible endpoints, following the suggestion from the previous review to mention equivalent bucket access policies. The statement about automatic endpoint URL determination during compression is consistent with the feature scope (as noted in PR #1767 learnings where region_code became optional for non-AWS endpoints).

Based on learnings from PR #1767, this documentation correctly reflects that custom S3-compatible endpoints (MinIO, LocalStack, etc.) support path-style URLs and don't require region codes, allowing CLP to determine the endpoint automatically.


100-103: Web UI limitation note is well-placed and addresses prior feedback.

The note block appropriately warns users that the Web UI doesn't yet support viewing extracted streams from custom S3 endpoints, as requested in the previous review. This communicates the limitation transparently while indicating it will be addressed in a future release, allowing users to proceed with custom S3 endpoints for non-Web-UI workflows.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hoophalab hoophalab requested a review from junhaoliao December 18, 2025 05:13

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9650b8c and c6cf9a3.

📒 Files selected for processing (2)
  • docs/src/user-docs/guides-using-object-storage/clp-usage.md (2 hunks)
  • docs/src/user-docs/guides-using-object-storage/object-storage-config.md (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
📚 Learning: 2025-12-10T23:32:24.363Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.

Applied to files:

  • docs/src/user-docs/guides-using-object-storage/clp-usage.md
  • docs/src/user-docs/guides-using-object-storage/object-storage-config.md
🪛 LanguageTool
docs/src/user-docs/guides-using-object-storage/clp-usage.md

[style] ~34-~34: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...et-name>/*` can also be an object stored in an S3-compa...

(MISSING_IT_THERE)


[style] ~89-~89: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...et-name>/*` can also be an object stored in an S3-compa...

(MISSING_IT_THERE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: check-generated
🔇 Additional comments (1)
docs/src/user-docs/guides-using-object-storage/object-storage-config.md (1)

3-6: Documentation additions properly positioned and consistent with S3-compatible endpoint support.

The clarification about S3-compatible storages using equivalent bucket policies and automatic endpoint URL determination during compression is well-placed and aligns with the custom S3 endpoint support added in PR #1767 (per learnings, where region_code is now optional for non-AWS endpoints).

Comment on lines +34 to +35
* `<object-url>` can also be an object stored in an S3-compatible storage:
* `http://<host>:<port>/<bucket-name>/<object-key>`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix incomplete sentence structure on line 34.

The bullet point should form a complete sentence. The current phrasing "* <object-url> can also be an object stored..." lacks proper sentence structure.

🔎 Suggested fix:
- * `<object-url>` can also be an object stored in an S3-compatible storage:
+ * `<object-url>` can also point to an object stored in an S3-compatible storage:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* `<object-url>` can also be an object stored in an S3-compatible storage:
* `http://<host>:<port>/<bucket-name>/<object-key>`
* `<object-url>` can also point to an object stored in an S3-compatible storage:
* `http://<host>:<port>/<bucket-name>/<object-key>`
🧰 Tools
🪛 LanguageTool

[style] ~34-~34: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...et-name>/*` can also be an object stored in an S3-compa...

(MISSING_IT_THERE)

🤖 Prompt for AI Agents
In docs/src/user-docs/guides-using-object-storage/clp-usage.md around lines
34–35, the bullet is a fragment; rewrite it as a complete sentence that states
that the object URL can reference an object stored in S3-compatible storage and
include the example URL format (host:port/bucket-name/object-key) inline so the
bullet reads as a full, self-contained sentence.

Comment thread docs/src/user-docs/guides-using-object-storage/clp-usage.md Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to the current PR, but i found we haven't update the webui code to support custom s3 endpoints?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though technically we haven't been mentioning the custom endpoint feature can be used for streams... lol. so long we're explicit that the custom endpoint works only for logs inputs, i think it's fine to skip the webui support for now

anyways, i created an issue to track: #1797

@hoophalab hoophalab Dec 18, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, we don't officially support storing archives and streams on S3 compatible storages. The PR title and feature request were Add support for ingesting logs from S3 compatible endpoints.

With that being said, the users can actually add an "endpoint_url: ..." field in clp-config.yaml to enable this feature. I did a very quick test, and the only issue I discovered was extracting stream didn't work in webui.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. let's mention the temporary webui limitation in docs/src/user-docs/guides-using-object-storage/object-storage-config.md if we don't plan to add the webui config in the coming release

@hoophalab hoophalab changed the title docs(clp-package): Add instructions for using S3-compatible endpoints. docs(clp-package): Add instructions for ingesting logs from S3-compatible object storage. Dec 18, 2025
@hoophalab hoophalab requested a review from junhaoliao December 18, 2025 18:53
@hoophalab

Copy link
Copy Markdown
Contributor Author

@junhaoliao updated based on our offline discussion

stream files but not CLP archives; thus, to view the compressed logs from a CLP archive, CLP first
converts the compressed logs into stream files. These streams can be cached on the filesystem, or on
object storage.
object storage. Currently, the Web UI does not support viewing extracted streams stored in custom S3

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, i would add a :::{note} or :::{warning} for things that require user attentions.

Anyways, since the release timeline is slightly pushed back, we may submit another PR to add such support in the webui directly. Then we don't have to add this warning

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still keep the warning. We can remove it in the PR that implements the feature.

Comment thread docs/src/user-docs/guides-using-object-storage/object-storage-config.md Outdated
@hoophalab hoophalab requested a review from junhaoliao January 6, 2026 18:54

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI Agents
In @docs/src/user-docs/guides-using-object-storage/object-storage-config.md:
- Line 4: Fix the spelling error in the documentation sentence that currently
reads "non-AWS S3-compatitble" by changing "compatitble" to "compatible" so the
phrase becomes "non-AWS S3-compatible"; update the text where that exact phrase
appears to correct the typo.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82a1b97 and 194c34d.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-object-storage/object-storage-config.md
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
📚 Learning: 2025-12-10T23:32:24.363Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.

Applied to files:

  • docs/src/user-docs/guides-using-object-storage/object-storage-config.md
📚 Learning: 2024-12-12T19:20:59.778Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 634
File: components/clp-py-utils/clp_py_utils/s3_utils.py:11-16
Timestamp: 2024-12-12T19:20:59.778Z
Learning: S3 roles provided may not have permission to perform `head_bucket` and `delete_object` operations; verification logic should avoid using these methods.

Applied to files:

  • docs/src/user-docs/guides-using-object-storage/object-storage-config.md
📚 Learning: 2025-01-13T21:18:54.629Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 662
File: components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py:167-186
Timestamp: 2025-01-13T21:18:54.629Z
Learning: In the CLP system, streams can be deterministically recreated if needed for debugging purposes, so it's acceptable to delete them even if their S3 upload fails.

Applied to files:

  • docs/src/user-docs/guides-using-object-storage/object-storage-config.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: check-generated
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (2)
docs/src/user-docs/guides-using-object-storage/object-storage-config.md (2)

5-6: Guidance about endpoint configuration is clear and appropriate.

The documentation correctly clarifies that non-AWS S3-compatible endpoints require equivalent bucket access policies and that CLP automatically determines the endpoint URL during compression. This aligns well with the custom S3 endpoint support introduced in PR #1767. Based on learnings from prior work, this guidance is accurate.


100-103: Clear communication of Web UI limitation.

The new note appropriately documents the current limitation with custom S3 endpoints in the Web UI and sets expectations for future improvement. The placement within the stream storage section is logical, and the directive level is reasonable for this type of limitation.

Comment thread docs/src/user-docs/guides-using-object-storage/object-storage-config.md Outdated

@junhaoliao junhaoliao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the title how about

docs(clp-package): Add instructions for ingesting logs from non-AWS S3-compatible object storage.

Comment thread docs/src/user-docs/guides-using-object-storage/object-storage-config.md Outdated
@hoophalab hoophalab changed the title docs(clp-package): Add instructions for ingesting logs from S3-compatible object storage. docs(clp-package): Add instructions for ingesting logs from non-AWS S3-compatible object storage. Jan 7, 2026
@hoophalab hoophalab requested a review from junhaoliao January 7, 2026 05:03
@hoophalab hoophalab merged commit 356d08d into y-scope:main Jan 7, 2026
8 checks passed
davidlion pushed a commit to davidlion/clp that referenced this pull request Jan 17, 2026
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants