docs(clp-package): Add instructions for ingesting logs from non-AWS S3-compatible object storage.#1796
Conversation
WalkthroughAdds documentation clarifications for S3-compatible endpoints: accepts HTTP-style object and key-prefix URLs for s3-object and s3-key-prefix modes; states CLP auto-determines endpoint URL during compression for non-AWS endpoints; notes the Web UI cannot yet view extracted streams from custom S3 endpoints. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5–10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom Pre-merge Checks in the settings. 📜 Recent review detailsConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (6)📓 Common learnings📚 Learning: 2025-12-10T23:32:24.363ZApplied to files:
📚 Learning: 2024-12-12T19:20:59.778ZApplied to files:
📚 Learning: 2025-10-13T03:24:35.074ZApplied to files:
📚 Learning: 2025-11-17T22:58:50.056ZApplied to files:
📚 Learning: 2025-01-13T21:18:54.629ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
docs/src/user-docs/guides-using-object-storage/clp-usage.md(2 hunks)docs/src/user-docs/guides-using-object-storage/object-storage-config.md(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
📚 Learning: 2025-12-10T23:32:24.363Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Applied to files:
docs/src/user-docs/guides-using-object-storage/clp-usage.mddocs/src/user-docs/guides-using-object-storage/object-storage-config.md
🪛 LanguageTool
docs/src/user-docs/guides-using-object-storage/clp-usage.md
[style] ~34-~34: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...et-name>/*` can also be an object stored in an S3-compa...
(MISSING_IT_THERE)
[style] ~89-~89: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...et-name>/*` can also be an object stored in an S3-compa...
(MISSING_IT_THERE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: check-generated
🔇 Additional comments (1)
docs/src/user-docs/guides-using-object-storage/object-storage-config.md (1)
3-6: Documentation additions properly positioned and consistent with S3-compatible endpoint support.The clarification about S3-compatible storages using equivalent bucket policies and automatic endpoint URL determination during compression is well-placed and aligns with the custom S3 endpoint support added in PR #1767 (per learnings, where region_code is now optional for non-AWS endpoints).
| * `<object-url>` can also be an object stored in an S3-compatible storage: | ||
| * `http://<host>:<port>/<bucket-name>/<object-key>` |
There was a problem hiding this comment.
Fix incomplete sentence structure on line 34.
The bullet point should form a complete sentence. The current phrasing "* <object-url> can also be an object stored..." lacks proper sentence structure.
🔎 Suggested fix:
- * `<object-url>` can also be an object stored in an S3-compatible storage:
+ * `<object-url>` can also point to an object stored in an S3-compatible storage:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| * `<object-url>` can also be an object stored in an S3-compatible storage: | |
| * `http://<host>:<port>/<bucket-name>/<object-key>` | |
| * `<object-url>` can also point to an object stored in an S3-compatible storage: | |
| * `http://<host>:<port>/<bucket-name>/<object-key>` |
🧰 Tools
🪛 LanguageTool
[style] ~34-~34: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...et-name>/*` can also be an object stored in an S3-compa...
(MISSING_IT_THERE)
🤖 Prompt for AI Agents
In docs/src/user-docs/guides-using-object-storage/clp-usage.md around lines
34–35, the bullet is a fragment; rewrite it as a complete sentence that states
that the object URL can reference an object stored in S3-compatible storage and
include the example URL format (host:port/bucket-name/object-key) inline so the
bullet reads as a full, self-contained sentence.
There was a problem hiding this comment.
unrelated to the current PR, but i found we haven't update the webui code to support custom s3 endpoints?
There was a problem hiding this comment.
though technically we haven't been mentioning the custom endpoint feature can be used for streams... lol. so long we're explicit that the custom endpoint works only for logs inputs, i think it's fine to skip the webui support for now
anyways, i created an issue to track: #1797
There was a problem hiding this comment.
Technically, we don't officially support storing archives and streams on S3 compatible storages. The PR title and feature request were Add support for ingesting logs from S3 compatible endpoints.
With that being said, the users can actually add an "endpoint_url: ..." field in clp-config.yaml to enable this feature. I did a very quick test, and the only issue I discovered was extracting stream didn't work in webui.
There was a problem hiding this comment.
sure. let's mention the temporary webui limitation in docs/src/user-docs/guides-using-object-storage/object-storage-config.md if we don't plan to add the webui config in the coming release
|
@junhaoliao updated based on our offline discussion |
| stream files but not CLP archives; thus, to view the compressed logs from a CLP archive, CLP first | ||
| converts the compressed logs into stream files. These streams can be cached on the filesystem, or on | ||
| object storage. | ||
| object storage. Currently, the Web UI does not support viewing extracted streams stored in custom S3 |
There was a problem hiding this comment.
Personally, i would add a :::{note} or :::{warning} for things that require user attentions.
Anyways, since the release timeline is slightly pushed back, we may submit another PR to add such support in the webui directly. Then we don't have to add this warning
There was a problem hiding this comment.
I still keep the warning. We can remove it in the PR that implements the feature.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI Agents
In @docs/src/user-docs/guides-using-object-storage/object-storage-config.md:
- Line 4: Fix the spelling error in the documentation sentence that currently
reads "non-AWS S3-compatitble" by changing "compatitble" to "compatible" so the
phrase becomes "non-AWS S3-compatible"; update the text where that exact phrase
appears to correct the typo.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-docs/guides-using-object-storage/object-storage-config.md
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
📚 Learning: 2025-12-10T23:32:24.363Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Applied to files:
docs/src/user-docs/guides-using-object-storage/object-storage-config.md
📚 Learning: 2024-12-12T19:20:59.778Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 634
File: components/clp-py-utils/clp_py_utils/s3_utils.py:11-16
Timestamp: 2024-12-12T19:20:59.778Z
Learning: S3 roles provided may not have permission to perform `head_bucket` and `delete_object` operations; verification logic should avoid using these methods.
Applied to files:
docs/src/user-docs/guides-using-object-storage/object-storage-config.md
📚 Learning: 2025-01-13T21:18:54.629Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 662
File: components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py:167-186
Timestamp: 2025-01-13T21:18:54.629Z
Learning: In the CLP system, streams can be deterministically recreated if needed for debugging purposes, so it's acceptable to delete them even if their S3 upload fails.
Applied to files:
docs/src/user-docs/guides-using-object-storage/object-storage-config.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint-check (macos-15)
- GitHub Check: check-generated
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (2)
docs/src/user-docs/guides-using-object-storage/object-storage-config.md (2)
5-6: Guidance about endpoint configuration is clear and appropriate.The documentation correctly clarifies that non-AWS S3-compatible endpoints require equivalent bucket access policies and that CLP automatically determines the endpoint URL during compression. This aligns well with the custom S3 endpoint support introduced in PR #1767. Based on learnings from prior work, this guidance is accurate.
100-103: Clear communication of Web UI limitation.The new note appropriately documents the current limitation with custom S3 endpoints in the Web UI and sets expectations for future improvement. The placement within the stream storage section is logical, and the directive level is reasonable for this type of limitation.
junhaoliao
left a comment
There was a problem hiding this comment.
for the title how about
docs(clp-package): Add instructions for ingesting logs from non-AWS S3-compatible object storage.
…3-compatible object storage. (y-scope#1796)
…3-compatible object storage. (y-scope#1796)
…3-compatible object storage. (y-scope#1796)
Description
Add instructions for ingesting logs from S3 compatible object storage.
Checklist
breaking change.
Validation performed
task docs:serveserves the doc correctlySummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.