feat(clp-package): Add log-ingestor config interface and Docker Compose orchestration.#1741
Conversation
| "CRITICAL", | ||
| ] | ||
|
|
||
| LoggingLevelRust = Literal[ |
There was a problem hiding this comment.
Since this is not related to Python logging, can we move it to clp_config.py? or some constants / definitions file if clp_config.py is already too long
There was a problem hiding this comment.
Will put it in clp_config.py. Figuring out where to put this constant seems out of scope for this PR. Prefer to do it in a refactoring PR.
| class LogIngestor(BaseModel): | ||
| host: DomainStr = "localhost" | ||
| port: Port = 3270 | ||
| buffer_timeout: PositiveInt = 300 |
There was a problem hiding this comment.
buffer_flush_timeout sounds more intuitive
|
|
||
| class LogIngestor(BaseModel): | ||
| host: DomainStr = "localhost" | ||
| port: Port = 3270 |
There was a problem hiding this comment.
nah, we don't really enforce any rules in picking ports. though maybe we can just do the (API server port + 1) = 3002 so the number appears less random?
| # initial_backoff_ms: 100 | ||
| # max_backoff_ms: 5000 | ||
|
|
||
| ## log-ingestor config |
There was a problem hiding this comment.
actually, maybe we can just remove this comment? since it doesn't add much info (like it's kinda apparent that the log_ingestor config object is for log-ingestor configuration
| #log_ingestor: | ||
| # host: "localhost" | ||
| # port: 3270 | ||
| # # The timeout (in seconds) after which the log buffer is flushed for compression if no new input |
There was a problem hiding this comment.
Users might not care about Pydantic fields, they will only modify this config.
ah i don't mean users need to check the pydantic code. like if they have a YAML LSP capable plugin installed in their IDE (e.g. PyCharm and VS Code), the IDE would show the helper text because we do generate schemas for the config file now.
I agree we should keep the comments in the YAML file, until a reference doc is added for the whole config file. see #1233
| | webui | Web server for the UI | | ||
| | mcp_server | MCP server for AI agent to access CLP functionalities | | ||
| | garbage_collector | Process to manage data retention | | ||
| | log_ingestor | A background service for continuous log ingestion | |
There was a problem hiding this comment.
How about
Server for log ingestion from S3/SQS sources.
The "background service" part isn't that descriptive cuz other services are also services and also run in the background
| host: DomainStr = "localhost" | ||
| port: Port = 3270 | ||
| buffer_timeout: PositiveInt = 300 | ||
| buffer_size_threshold: PositiveInt = 52428800 # 50 MB |
There was a problem hiding this comment.
@coderabbitai submit an issue for fixing the units in other doc-strings
| # # arrives. | ||
| # buffer_timeout: 300 | ||
| # # The log buffer size (in bytes) that triggers a flush for compression. | ||
| # buffer_size_threshold: 52428800 |
There was a problem hiding this comment.
sure (note: we want to use MiB in the docs
| | webui | Web server for the UI | | ||
| | mcp_server | MCP server for AI agent to access CLP functionalities | | ||
| | garbage_collector | Process to manage data retention | | ||
| | log_ingestor | A background service for continuous log ingestion | |
There was a problem hiding this comment.
Server for orchestrating and running continuous log ingestion jobs.
sure
junhaoliao
left a comment
There was a problem hiding this comment.
the title is fine, though i think it can be as simple as:
feat(clp-package): Add log-ingestor config interface and Docker Compose orchestration.
(the purpose of the service will be included in the release notes so we don't have to be explicit about that in this commit message
| """ | ||
| component_name = LOG_INGESTOR_COMPONENT_NAME | ||
| if self._clp_config.log_ingestor is None: | ||
| logger.info(f"log-ingestor is not configured, skipping {component_name} creation...") |
…/LinZhihao-723/clp into log-ingestor-package-integration
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/clp-package-utils/clp_package_utils/controller.py(3 hunks)components/package-template/src/etc/clp-config.template.json.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-06-24T20:13:46.758Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 939
File: components/package-template/src/etc/clp-config.yml:64-64
Timestamp: 2025-06-24T20:13:46.758Z
Learning: When users ask CodeRabbit to create an issue after providing suggestions, they want a GitHub issue created with the high-level requirements and context, not specific code implementations.
Applied to files:
components/package-template/src/etc/clp-config.template.json.yaml
📚 Learning: 2025-08-08T21:15:10.905Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/test_identity_transformation.py:87-95
Timestamp: 2025-08-08T21:15:10.905Z
Learning: In the CLP project's integration tests (Python code), variable names should use "logs" (plural) rather than "log" (singular) when referring to test logs or log-related entities, as this aligns with the naming conventions used throughout the codebase.
Applied to files:
components/package-template/src/etc/clp-config.template.json.yaml
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Applied to files:
components/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
components/clp-package-utils/clp_package_utils/controller.py
🧬 Code graph analysis (1)
components/clp-package-utils/clp_package_utils/controller.py (2)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
StorageType(161-163)components/clp-py-utils/clp_py_utils/core.py (1)
resolve_host_path_in_container(82-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: build (macos-15)
- GitHub Check: rust-checks
🔇 Additional comments (1)
components/clp-package-utils/clp_package_utils/controller.py (1)
35-52: VerifyCLP_LOG_INGESTOR_ENABLEDagainst docker-compose templates before mergingThe new
_set_up_env_for_log_ingestorfollows existing patterns: it explicitly setsCLP_LOG_INGESTOR_ENABLED="0"when disabled, and relies on docker-compose defaults to enable the component otherwise. Confirm that docker-compose service definitions use${CLP_LOG_INGESTOR_ENABLED:-1}or equivalent to ensure the implicit enabling behavior matches other*_ENABLEDflags (e.g.,CLP_QUERY_SCHEDULER_ENABLED).
| ## log-ingestor config. Currently, the config is applicable only if `logs_input.type` is "s3". | ||
| #log_ingestor: | ||
| # host: "localhost" | ||
| # port: 3002 | ||
| # # The timeout (in seconds) after which the log buffer is flushed for compression if no new input | ||
| # # arrives. | ||
| # buffer_flush_timeout: 300 | ||
| # # The log buffer size (in bytes) that triggers a flush for compression. | ||
| # buffer_flush_threshold: 268435456 # 256 MiB | ||
| # # The capacity of the internal channel used for communication between an ingestion job and the | ||
| # # log buffer. | ||
| # channel_capacity: 10 | ||
| # logging_level: "INFO" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Log ingestor template block looks consistent; consider clarifying all applicability constraints
The new log_ingestor block and its defaults look sensible and consistent with other sections in this template. To reduce user confusion, consider also mentioning in the comment that log‑ingestor is only supported when the package storage engine is clp-s (since the Python config enforces that), in addition to logs_input.type being "s3".
🤖 Prompt for AI Agents
components/package-template/src/etc/clp-config.template.json.yaml lines 16-28:
the comment for the log_ingestor block only says it applies when logs_input.type
is "s3" but the Python config also requires the package storage engine to be
"clp-s"; update the commented header to state both applicability constraints
(logs_input.type == "s3" AND package storage engine == "clp-s"), keeping the
existing defaults and formatting, and make the note concise so users know both
conditions must be met for log_ingestor to be used.
…se orchestration. (y-scope#1741)
…se orchestration. (y-scope#1741)
…se orchestration. (y-scope#1741)
Description
NOTE: This PR depends on #1740.
This PR integrates log-ingestor into CLP package. This means log-ingestor will be started as a CLP service when starting a CLP package (clp-json) by default inside the Docker Compose.
The changes in this PR are mainly mirrored from #1575 and #1709.
Notice that users need to directly submit requests to log-ingestor's host at this moment. In a future PR, we should integrate log-ingestor into the API server so that users do not need to know where the log ingestor is hosted explicitly.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.