feat(helm): Add compression & query job orchestration and the webui to the chart.#1749
Conversation
…ebui to the Package Helm chart.
WalkthroughAdded Helm helpers for dynamic volume/PVC naming and creation, Celery env var builders, logs hostPath helpers, and an enhanced wait-for initContainer; introduced multiple Deployments, Services, PV/PVC templates, expanded ConfigMap and values, replaced hard-coded StatefulSet volume names, bumped chart version, and updated test script. Changes
Sequence Diagram(s)sequenceDiagram
participant Init as initContainer (clp.waitFor)
participant Job as Job (db-table-creator / indices-creator)
participant Queue as Service (queue)
participant Redis as Service (redis)
participant App as Main container (pod)
Note over Init: clp.waitFor polls resources with timeout (300s)
Init->>Job: poll job status until complete (type=job)
Init->>Queue: check service endpoints/port (type=service)
Init->>Redis: check service endpoints/port (type=service)
Init-->>App: exit when dependencies ready
App->>Queue: connect using BROKER_URL (from clp.celeryBrokerUrlEnvVar)
App->>Redis: connect using RESULT_BACKEND (from clp.celeryResultBackendEnvVar)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| storage: | ||
| directory: "/var/data/archives" | ||
| type: "fs" | ||
| target_archive_size: {{ .Values.clpConfig.archive_output.target_archive_size | int }} |
There was a problem hiding this comment.
Helm like to convert larger numbers into float type when building charts, (confirmed by helm template .), so for the larger numbers we have to add int to explicitly ask for unsigned numbers.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (3)
tools/deployment/package-helm/templates/reducer-deployment.yaml (1)
27-47: Init containers follow the same pattern.The init containers use the same approach as other deployments (busybox:latest with basic TCP checks).
Refer to recommendations in compression-worker-deployment.yaml for improvements.
tools/deployment/package-helm/templates/query-scheduler-deployment.yaml (1)
27-57: Init containers follow the established pattern.The three init containers (database, queue, redis) use the same approach as other deployments.
Refer to recommendations in compression-worker-deployment.yaml for improvements (pin busybox version, add timeouts).
tools/deployment/package-helm/templates/webui-deployment.yaml (1)
25-45: Consider improving init container reliability.The init containers have the same potential issues as in other deployment templates: using
busybox:latest, basic TCP checks, and no timeout mechanisms.Refer to the recommendations in compression-worker-deployment.yaml for improvements (pin busybox version, add timeouts, consider service-specific health checks).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (32)
tools/deployment/package-helm/Chart.yaml(1 hunks)tools/deployment/package-helm/templates/_helpers.tpl(1 hunks)tools/deployment/package-helm/templates/archives-pv.yaml(1 hunks)tools/deployment/package-helm/templates/archives-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/compression-scheduler-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/compression-scheduler-logs-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/compression-worker-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/compression-worker-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/compression-worker-logs-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/configmap.yaml(4 hunks)tools/deployment/package-helm/templates/query-scheduler-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/query-scheduler-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/query-scheduler-logs-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/query-scheduler-service.yaml(1 hunks)tools/deployment/package-helm/templates/query-worker-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/query-worker-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/query-worker-logs-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/reducer-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/reducer-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/reducer-logs-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/reducer-service.yaml(1 hunks)tools/deployment/package-helm/templates/streams-pv.yaml(1 hunks)tools/deployment/package-helm/templates/streams-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/tmp-pv.yaml(1 hunks)tools/deployment/package-helm/templates/tmp-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/user-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/user-logs-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/webui-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/webui-service.yaml(1 hunks)tools/deployment/package-helm/test.sh(2 hunks)tools/deployment/package-helm/values.yaml(2 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Applied to files:
tools/deployment/package-helm/templates/compression-scheduler-deployment.yamltools/deployment/package-helm/values.yaml
📚 Learning: 2025-12-04T03:31:55.239Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.
Applied to files:
tools/deployment/package-helm/templates/streams-pv.yamltools/deployment/package-helm/templates/query-worker-logs-pv.yamltools/deployment/package-helm/templates/query-worker-logs-pvc.yamltools/deployment/package-helm/values.yamltools/deployment/package-helm/templates/configmap.yamltools/deployment/package-helm/templates/query-worker-deployment.yaml
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
tools/deployment/package-helm/values.yamltools/deployment/package-helm/templates/configmap.yaml
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Applied to files:
tools/deployment/package-helm/values.yaml
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
tools/deployment/package-helm/values.yaml
📚 Learning: 2025-11-28T15:12:53.530Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1655
File: components/clp-package-utils/clp_package_utils/controller.py:183-189
Timestamp: 2025-11-28T15:12:53.530Z
Learning: In the y-scope/clp project, root database credentials (root username and password) are required configuration items for all deployments. The `credentials[ClpDbUserType.ROOT]` entry is guaranteed to exist and can be accessed directly without optional handling.
Applied to files:
tools/deployment/package-helm/values.yaml
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
tools/deployment/package-helm/templates/configmap.yaml
📚 Learning: 2025-06-22T04:01:43.409Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 939
File: components/package-template/src/etc/clp-config.yml:64-64
Timestamp: 2025-06-22T04:01:43.409Z
Learning: The webui server logging can be configured via LOG_LEVEL and WEBUI_LOGS_DIR environment variables, with file logging enabled automatically in non-TTY environments when a logs directory is specified.
Applied to files:
tools/deployment/package-helm/templates/configmap.yaml
🪛 YAMLlint (1.37.1)
tools/deployment/package-helm/templates/reducer-logs-pvc.yaml
[error] 6-6: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/streams-pvc.yaml
[error] 6-6: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/reducer-service.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 11-11: too many spaces inside braces
(braces)
[error] 14-14: too many spaces inside braces
(braces)
[error] 16-16: too many spaces inside braces
(braces)
[error] 16-16: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 66-66: too many spaces inside braces
(braces)
[error] 66-66: too many spaces inside braces
(braces)
[error] 71-71: too many spaces inside braces
(braces)
[error] 71-71: too many spaces inside braces
(braces)
[error] 74-74: too many spaces inside braces
(braces)
[error] 74-74: too many spaces inside braces
(braces)
[error] 101-101: too many spaces inside braces
(braces)
[error] 101-101: too many spaces inside braces
(braces)
[error] 104-104: too many spaces inside braces
(braces)
[error] 104-104: too many spaces inside braces
(braces)
[error] 107-107: too many spaces inside braces
(braces)
[error] 107-107: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/reducer-logs-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/streams-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/user-logs-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/query-worker-logs-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/compression-worker-logs-pvc.yaml
[error] 6-6: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/reducer-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 56-56: too many spaces inside braces
(braces)
[error] 56-56: too many spaces inside braces
(braces)
[error] 78-78: too many spaces inside braces
(braces)
[error] 78-78: too many spaces inside braces
(braces)
[error] 81-81: too many spaces inside braces
(braces)
[error] 81-81: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/webui-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 54-54: too many spaces inside braces
(braces)
[error] 54-54: too many spaces inside braces
(braces)
[error] 59-59: too many spaces inside braces
(braces)
[error] 59-59: too many spaces inside braces
(braces)
[error] 90-90: too many spaces inside braces
(braces)
[error] 94-94: too many spaces inside braces
(braces)
[error] 99-99: too many spaces inside braces
(braces)
[error] 99-99: too many spaces inside braces
(braces)
[error] 102-102: too many spaces inside braces
(braces)
[error] 102-102: too many spaces inside braces
(braces)
[error] 105-105: too many spaces inside braces
(braces)
[error] 105-105: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/compression-scheduler-logs-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/query-worker-logs-pvc.yaml
[error] 6-6: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/compression-worker-logs-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/user-logs-pvc.yaml
[error] 6-6: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/query-scheduler-logs-pvc.yaml
[error] 6-6: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/webui-service.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 11-11: too many spaces inside braces
(braces)
[error] 16-16: too many spaces inside braces
(braces)
[error] 16-16: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/archives-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/query-worker-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 58-58: too many spaces inside braces
(braces)
[error] 58-58: too many spaces inside braces
(braces)
[error] 92-92: too many spaces inside braces
(braces)
[error] 92-92: too many spaces inside braces
(braces)
[error] 95-95: too many spaces inside braces
(braces)
[error] 95-95: too many spaces inside braces
(braces)
[error] 98-98: too many spaces inside braces
(braces)
[error] 98-98: too many spaces inside braces
(braces)
[error] 101-101: too many spaces inside braces
(braces)
[error] 101-101: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/archives-pvc.yaml
[error] 6-6: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 58-58: too many spaces inside braces
(braces)
[error] 58-58: too many spaces inside braces
(braces)
[error] 95-95: too many spaces inside braces
(braces)
[error] 95-95: too many spaces inside braces
(braces)
[error] 98-98: too many spaces inside braces
(braces)
[error] 98-98: too many spaces inside braces
(braces)
[error] 101-101: too many spaces inside braces
(braces)
[error] 101-101: too many spaces inside braces
(braces)
[error] 108-108: too many spaces inside braces
(braces)
[error] 108-108: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/query-scheduler-logs-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/compression-scheduler-logs-pvc.yaml
[error] 6-6: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/tmp-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/query-scheduler-service.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 11-11: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/tmp-pvc.yaml
[error] 6-6: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/query-scheduler-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 68-68: too many spaces inside braces
(braces)
[error] 68-68: too many spaces inside braces
(braces)
[error] 73-73: too many spaces inside braces
(braces)
[error] 73-73: too many spaces inside braces
(braces)
[error] 76-76: too many spaces inside braces
(braces)
[error] 76-76: too many spaces inside braces
(braces)
[error] 101-101: too many spaces inside braces
(braces)
[error] 101-101: too many spaces inside braces
(braces)
[error] 104-104: too many spaces inside braces
(braces)
[error] 104-104: too many spaces inside braces
(braces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: build (macos-15)
🔇 Additional comments (46)
tools/deployment/package-helm/Chart.yaml (1)
3-3: Version bump is appropriate for feature scope.The patch-level increment (0.1.2-dev.2 → 0.1.2-dev.3) is consistent with the extensive chart enhancements (new PVCs, PVs, deployments, services, and configuration) added in this PR.
tools/deployment/package-helm/templates/archives-pvc.yaml (1)
1-6: YAMLlint errors are false positives; PVC pattern is sound.The static analysis warnings about syntax errors and excess braces are false positives: YAMLlint is a pure YAML linter that does not understand Helm template syntax ({{-, dict, list). The template structure is consistent with the other PVC files in this changeset.
Verify that this PVC is correctly mounted in the corresponding archives-related deployment manifests (not included in this review set).
tools/deployment/package-helm/templates/compression-worker-logs-pvc.yaml (1)
1-6: Correct application of clp.createPvc helper pattern.Consistent with other component log PVCs; ReadWriteOnce mode is appropriate for component-specific logs.
tools/deployment/package-helm/templates/reducer-logs-pvc.yaml (1)
1-6: Pattern is consistent and appropriate.Follows the same clp.createPvc pattern as other component log PVCs with proper ReadWriteOnce access for reducer-specific logs.
tools/deployment/package-helm/templates/tmp-pvc.yaml (1)
1-6: ReadWriteMany access mode is appropriate for temporary storage.The 10Gi capacity with ReadWriteMany mode is well-suited for shared temporary storage across compression and query orchestration components.
tools/deployment/package-helm/templates/query-scheduler-logs-pvc.yaml (1)
1-6: Pattern is consistent with other component log PVCs.Correct parameters for query-scheduler-specific logs (5Gi, ReadWriteOnce).
tools/deployment/package-helm/templates/query-worker-logs-pvc.yaml (1)
1-6: Pattern is correct for query-worker logs storage.The ReadWriteOnce access mode is appropriate for component-specific logs. Note: the retrieved learning about query-worker stream output (PR 1690) pertains to the separate streams-pvc (ReadWriteMany), not this logs PVC.
tools/deployment/package-helm/templates/streams-pvc.yaml (1)
1-6: ReadWriteMany access mode is correctly applied for shared stream data.The 20Gi capacity with ReadWriteMany mode is well-suited for stream output storage shared across compression and query orchestration components.
tools/deployment/package-helm/templates/query-scheduler-service.yaml (2)
1-15: YAMLlint errors are false positives; Helm template syntax is valid.The YAMLlint errors reported ("too many spaces inside braces" and "syntax error") are false positives. YAMLlint v1.37.1 does not understand Helm Go template syntax. The template syntax
{{ include "clp.fullname" . }}is standard and correct Helm templating.
14-15: Verify service port alignment with query-scheduler deployment.The service defines targetPort as a named port "query-scheduler" on port 7000. Confirm that the query-scheduler deployment container exposes a port with this name (or uses port 7000).
tools/deployment/package-helm/templates/reducer-service.yaml (1)
13-18: Verify reducer deployment port configuration matches service dynamic ports.The service dynamically generates ports from 14009 to 14009 + (workerConcurrency - 1). The reducer deployment must configure containers to listen on these same port numbers. Confirm the port mapping is consistent between this service and the reducer deployment template.
tools/deployment/package-helm/templates/archives-pv.yaml (1)
1-9: Verify clp.createLocalPv helper implementation and hostPath configuration.This file delegates entirely to the
clp.createLocalPvhelper. Verify that:
- The helper correctly generates a local PersistentVolume with the provided parameters
- The hostPath
.Values.clpConfig.archive_output.storage.directoryis properly populated and exists on control-plane nodes- The hardcoded
nodeRole: "control-plane"affinity is intentional and adequately constrains the deploymenttools/deployment/package-helm/templates/compression-scheduler-logs-pvc.yaml (1)
1-6: Verify clp.createPvc helper implementation and PVC-PV binding.Confirm that the
clp.createPvchelper:
- Correctly generates a PersistentVolumeClaim with the specified parameters
- Uses a storageClassName (e.g., "local-storage") that matches available storage classes
- Is bound to the corresponding
compression-scheduler-logs-pv.yamlPersistentVolumetools/deployment/package-helm/templates/query-worker-logs-pv.yaml (1)
1-9: Verify query-worker deployment mounts this PV at the correct path.Confirm that the query-worker deployment:
- Mounts this PV at a path consistent with the logs directory structure
- Does not conflict with stream output storage mentioned in related deployments (PR #1690 indicates query-worker writes to /var/data/streams as well)
tools/deployment/package-helm/templates/user-logs-pvc.yaml (1)
1-6: Verify clp.createPvc helper and corresponding PersistentVolume.Confirm that:
- The
clp.createPvchelper generates a correct PersistentVolumeClaim withReadWriteManyaccess- A matching
user-logs-pv.yamlfile exists and is configured withReadWriteManyaccess and at least 10Gi capacitytools/deployment/package-helm/templates/webui-service.yaml (1)
13-16: Verify WebUI deployment port configuration and NodePort value range.Confirm that:
- The webui deployment defines a container port named "webui" or on port 4000 to match this service
.Values.clpConfig.webui.portis defined in values.yaml within the valid NodePort range (30000-32767)tools/deployment/package-helm/templates/compression-scheduler-logs-pv.yaml (1)
1-9: Verify clp.createLocalPv helper and hostPath for compression-scheduler logs.Confirm that:
- The
clp.createLocalPvhelper correctly generates a local PersistentVolume- The constructed hostPath
.Values.clpConfig.logs_directory/compression-schedulerexists on control-plane nodes- The PVC
compression-scheduler-logs-pvc.yamlis properly bound to this PVtools/deployment/package-helm/templates/streams-pv.yaml (1)
1-9: LGTM!The PersistentVolume template correctly invokes the
clp.createLocalPvhelper with appropriate parameters for local hostPath storage. The ReadWriteMany access mode is suitable for this development environment using a local Kubernetes cluster.tools/deployment/package-helm/templates/query-scheduler-logs-pv.yaml (1)
1-9: LGTM!The template correctly creates a local PersistentVolume for query-scheduler logs with ReadWriteOnce access mode, which is appropriate for component-specific logging.
tools/deployment/package-helm/templates/tmp-pv.yaml (1)
1-9: LGTM!The PersistentVolume template is correctly configured for temporary storage with ReadWriteMany access mode, suitable for shared temporary file usage across components.
tools/deployment/package-helm/templates/compression-worker-logs-pv.yaml (1)
1-9: LGTM!The template correctly provisions a local PersistentVolume for compression worker logs with ReadWriteOnce access mode.
tools/deployment/package-helm/templates/reducer-logs-pv.yaml (1)
1-9: LGTM!The PersistentVolume template correctly configures local storage for reducer component logs.
tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml (1)
1-111: LGTM!The deployment template is well-structured with proper init containers for dependency readiness checks and appropriate volume configuration. The security context correctly uses the first-party UID/GID as per the CLP design (compression-scheduler runs as current user, not service container account).
tools/deployment/package-helm/templates/user-logs-pv.yaml (1)
1-9: LGTM!The PersistentVolume template correctly defines shared user logs storage with ReadWriteMany access mode.
tools/deployment/package-helm/templates/query-worker-deployment.yaml (1)
1-101: LGTM!The query-worker deployment is properly configured with correct init containers (queue and Redis only, appropriately excluding database), appropriate volume mounts with read-write access for streams and archives (aligning with worker output requirements), and configurable concurrency support.
tools/deployment/package-helm/templates/compression-worker-deployment.yaml (3)
50-66: LGTM: Environment variable configuration is appropriate.The environment variables are properly configured with values from the chart. Credentials are embedded in connection URLs (BROKER_URL, RESULT_BACKEND), which is a common pattern for internal cluster communication. The use of
.Values.credentialsprovides centralized credential management.
67-80: LGTM: Volume mounts are correctly configured.The volume mounts are properly structured:
- Configuration is mounted read-only from a ConfigMap.
- Input logs are correctly mounted read-only.
- Output archives and logs have read-write access as needed.
- Temporary storage is available via a dedicated PVC.
81-91: LGTM: Celery worker command is properly configured.The command is well-structured with:
- Unbuffered Python output (
-uflag) for real-time logging.- Configurable concurrency via
{{ .Values.workerConcurrency }}.- Appropriate queue targeting and worker naming.
tools/deployment/package-helm/test.sh (4)
11-16: LGTM: Directory structure setup is comprehensive.The directory creation properly sets up all required paths for the expanded deployment components (compression-scheduler, compression-worker, query-scheduler, query-worker, reducer), including separate data and log directories, plus archives, streams, and temporary storage.
28-29: Verify the /home mount requirement.The extraMount was changed from a specific path to
/home. This provides broad access to user home directories in the cluster.Please confirm this broad access is intentional and required for the test environment. If only specific directories are needed, consider mounting them individually for better isolation.
39-41: LGTM: Port mapping added for WebUI.The port mapping for 30400 correctly exposes the WebUI service, aligning with the
clpConfig.webui.portvalue.
47-49: LGTM: Sample download synchronization is correct.The script properly waits for the background download to complete before printing the completion message, ensuring the test environment is fully prepared.
tools/deployment/package-helm/templates/webui-deployment.yaml (2)
74-84: LGTM: Volume mounts are correctly configured.The volume mounts are properly set up:
- Client and server settings are mounted read-only from the ConfigMap using subPath for file-level access.
- The streams volume is correctly mounted read-write (query-worker writes to this directory per learnings).
Based on learnings, the streams directory must remain read-write as query-worker writes to it.
89-95: LGTM: Health probes are appropriately configured.The readiness and liveness probes use TCP socket checks on port 4000 with configurable timings from helper templates. The use of YAML anchors reduces duplication.
tools/deployment/package-helm/templates/reducer-deployment.yaml (3)
22-22: Verify termination grace period for reducer.The
terminationGracePeriodSeconds: 10is relatively short for a reducer component that may be processing results. Depending on the reducer's workload and cleanup requirements, this might not provide sufficient time for graceful shutdown.Consider whether 10 seconds is adequate for the reducer to:
- Complete in-flight result processing.
- Flush any pending upserts to the results cache.
- Clean up connections gracefully.
If needed, increase to 30-60 seconds or make it configurable via values.
52-74: LGTM: Reducer command and environment are properly configured.The reducer deployment is well-configured:
- Environment variables provide necessary paths and logging configuration.
- The command uses unbuffered Python output for real-time logging.
- Concurrency and upsert interval are configurable via chart values, allowing tuning for different workloads.
75-81: LGTM: Volume configuration is standard.The volumes are properly configured with a ConfigMap for configuration and a PVC for logs, following the established pattern across components.
tools/deployment/package-helm/templates/query-scheduler-deployment.yaml (2)
22-22: Verify termination grace period for query-scheduler.The
terminationGracePeriodSeconds: 10may be insufficient for the query-scheduler to gracefully handle in-flight query job scheduling operations and cleanup.Consider whether 10 seconds allows adequate time for:
- Completing active job scheduling cycles.
- Releasing queue connections gracefully.
- Cleaning up Redis connections.
Schedulers typically benefit from longer grace periods (30-60 seconds) to ensure clean shutdown.
62-104: LGTM: Query-scheduler configuration is comprehensive.The configuration is well-structured:
- Database credentials are securely referenced from Kubernetes secrets.
- Connection URLs for queue and Redis are properly templated with credentials and service names.
- The exposed port (7000) matches the service and configuration.
- The command is properly formatted for running the query scheduler module.
tools/deployment/package-helm/templates/_helpers.tpl (1)
138-165: LGTM: PVC helper is well-designed.The
clp.createPvchelper template is properly implemented:
- Clear documentation with parameter descriptions.
- Consistent naming pattern using fullname and component.
- Proper label propagation for resource tracking and selection.
- The selector matches on component label, ensuring PVCs bind to the correct PVs.
- Follows the established pattern of the existing
clp.createLocalPvhelper.tools/deployment/package-helm/templates/configmap.yaml (3)
9-27: LGTM: Archive and compression configuration is correct.The archive output and compression scheduler/worker configurations are properly structured:
- The
| intfilter is correctly applied to large numeric values to prevent Helm from converting them to floats, as noted in past review comments.- Configuration paths match the volume mounts in deployment templates.
- All values are properly sourced from the values file.
38-82: LGTM: Query, reducer, stream, and WebUI configurations are comprehensive.The configuration sections are well-structured:
- Service hostnames use templated fullnames for proper cluster DNS resolution.
- Large numeric values (e.g., target_uncompressed_size) correctly use the
| intfilter.- All paths and ports align with corresponding deployment templates.
- Configuration values are properly sourced from the values file.
112-160: LGTM: WebUI client and server settings are properly configured.The WebUI JSON configuration blocks are well-structured:
- All settings are properly sourced from the chart values.
- Database and MongoDB connection details use templated service names for cluster DNS.
- Large numeric values use the
| intfilter to prevent float conversion.- The configuration aligns with the WebUI deployment's volume mounts.
Based on learnings, the WebUI configuration properly separates package settings (in ConfigMap) from environment-specific settings (via environment variables in the deployment).
tools/deployment/package-helm/values.yaml (3)
28-35: Review the logs_input directory configuration.The
logs_input.directoryis set to"/", which exposes the entire filesystem to the compression workers. While the comment warns about symbolic links, this configuration allows workers to compress any file on the host.Please confirm:
- Is root directory access intentional for test/development environments?
- For production deployments, should this be restricted to a specific directory (e.g.,
/var/logor/data/logs)?- Consider documenting recommended production values in the comments.
This broad access may be appropriate for testing but could pose security concerns in production.
42-75: LGTM: Scheduler, worker, and reducer configurations are well-balanced.The configuration values are sensible defaults:
- Poll delays (0.1 seconds) provide responsive job scheduling without excessive overhead.
- Logging levels (INFO) are appropriate for operational visibility.
max_concurrent_tasks_per_job: 0provides unlimited parallelism by default, allowing users to tune based on their workload.- WebUI port (30400) aligns with the test environment and service definitions.
77-131: LGTM: Archive and stream output configurations have sensible defaults.The output configurations are well-thought-out:
- Target sizes (256 MB for archives, 32 MB for dictionaries, 128 MB for streams) are reasonable defaults balancing granularity and overhead.
- Compression level 3 provides a good balance between speed and compression ratio.
retention_period: nullsafely disables automatic deletion by default.- Comments clearly document the byte values in human-readable units (MB).
- All directories are properly configured under /tmp/clp for the test environment.
| # Download sample datasets in the background | ||
| wget -O - https://zenodo.org/records/10516402/files/postgresql.tar.gz?download=1 \ | ||
| | tar xz -C /tmp/clp/samples & | ||
| SAMPLE_DOWNLOAD_PID=$! |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding error handling for sample download.
The background download has no error handling. If the download fails, the script will still proceed with helm install and only fail at the wait command (line 48) without a clear error message.
Consider adding error handling:
# Download sample datasets in the background
-wget -O - https://zenodo.org/records/10516402/files/postgresql.tar.gz?download=1 \
- | tar xz -C /tmp/clp/samples &
+(wget -O - https://zenodo.org/records/10516402/files/postgresql.tar.gz?download=1 \
+ | tar xz -C /tmp/clp/samples) &
SAMPLE_DOWNLOAD_PID=$!Then add error checking:
wait $SAMPLE_DOWNLOAD_PID
+if [ $? -ne 0 ]; then
+ echo "ERROR: Sample download failed"
+ exit 1
+fi
echo "Sample download and extraction complete"Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In tools/deployment/package-helm/test.sh around lines 17 to 20, the background
wget|tar pipeline runs without error handling so failures are only detected
later; change it to run the download in a background subshell that captures its
exit status (or write a small wrapper that redirects stdout/stderr to a temp log
and echoes the pipeline exit code to a status file), start that background job
and store its PID as SAMPLE_DOWNLOAD_PID, then before proceeding check the job’s
exit code (e.g., wait $SAMPLE_DOWNLOAD_PID and if non‑zero print a clear error
message with the download log and exit the script with a non‑zero status) so
helm install won’t run when the sample download fails.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
tools/deployment/package-helm/templates/webui-deployment.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
🪛 YAMLlint (1.37.1)
tools/deployment/package-helm/templates/webui-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 54-54: too many spaces inside braces
(braces)
[error] 54-54: too many spaces inside braces
(braces)
[error] 59-59: too many spaces inside braces
(braces)
[error] 59-59: too many spaces inside braces
(braces)
[error] 70-70: too many spaces inside braces
(braces)
[error] 70-70: too many spaces inside braces
(braces)
[error] 90-90: too many spaces inside braces
(braces)
[error] 94-94: too many spaces inside braces
(braces)
[error] 99-99: too many spaces inside braces
(braces)
[error] 99-99: too many spaces inside braces
(braces)
[error] 102-102: too many spaces inside braces
(braces)
[error] 102-102: too many spaces inside braces
(braces)
[error] 105-105: too many spaces inside braces
(braces)
[error] 105-105: too many spaces inside braces
(braces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: build (macos-15)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (8)
tools/deployment/package-helm/templates/webui-deployment.yaml (8)
69-70: ✓ RATE_LIMIT now properly configurable.The hardcoded
RATE_LIMITvalue from the previous review has been correctly updated to use{{ .Values.clpConfig.webui.rate_limit | quote }}, making it configurable viavalues.yaml.
20-24: Security context properly configured.The Pod security context correctly uses first-party UID/GID values from the chart configuration and includes the necessary
fsGroupfor volume permission handling.
1-105: YAMLlint warnings about braces are false positives for Helm templates.The YAMLlint tool is reporting "too many spaces inside braces" errors for standard Helm template syntax (e.g.,
{{ include "clp.fullname" . }}). This formatting is conventional and expected in Helm templates for readability. These warnings can be safely ignored or the YAMLlint configuration should be adjusted to exclude Helm template patterns to reduce noise.
74-82: Verify ConfigMap contains required webUI settings keys.The deployment mounts
webui-client-settings.jsonandwebui-server-settings.jsonfrom the config ConfigMap using subPath references. Confirm thattools/deployment/package-helm/templates/configmap.yamldefines these exact keys in the ConfigMap data.
48-49: Verify image reference helper function exists and is properly defined.The deployment uses the
clp.image.refhelper function for the container image at line 48. Confirm that this helper is defined in_helpers.tpland returns the correct webUI image reference.
61-73: Confirm hardcoded PORT aligns with chart configuration.The webUI port is hardcoded to
4000across multiple locations (env var PORT, containerPort, and tcpSocket). Verify that this port value is intentionally fixed in the chart design and that the correspondingwebui-service.yamltemplate also uses this same port (or references it from a shared configuration).
103-105: Verify streams PVC is created before webUI deployment.The deployment references a
streamsPersistentVolumeClaim (lines 103-105). Confirm that the corresponding PVC template creates the PVC with the correct name pattern matching{{ include "clp.fullname" . }}-streamsand that the PVC is deployed before this webUI deployment.
89-95: Verify probe timing helper functions are defined and configured appropriately.The readiness and liveness probes use
clp.readinessProbeTimingsandclp.livenessProbeTimingshelper functions. Confirm these helpers are properly defined in_helpers.tplwith appropriate timeout and retry values for a Node.js web service (consider initialDelaySeconds, periodSeconds, timeoutSeconds, and failureThreshold).
kirkrodrigues
left a comment
There was a problem hiding this comment.
Partial review. Some questions are kind of important to answer before we move on.
| {{- include "clp.createLocalPv" (dict | ||
| "root" . | ||
| "name" (printf "%s-archives" (include "clp.fullname" .)) | ||
| "component" "archives" |
There was a problem hiding this comment.
Maybe we should have some kind of naming convention to differentiate runtime components from "data" components?
There was a problem hiding this comment.
sure. any ideas? i'm thinking about "component" "clp-runtime" now, but that's probably a bad idea?
There was a problem hiding this comment.
Sorry.. i definitely see their distinctions, but for some reason i was mistakenly thinking that you were referring to "archive" as a runtime component and "database" as a data component...
instead of thinking about "runtime", let's analyze their characteristics first:
- database-logs-pv, database-data-pv -> component="database" makes sense because the storage belongs to the "database" service
- archives-pv -> component="archives" can be misleading because "archives" isn't a service, it's shared storage used by compression-worker, query-worker, garbage-collector, etc.
we should still keep the "app.kubernetes.io/component:" label for both types of PV because it's a "well-known" label.
though, i think this is what we can do:
- Service-owned: Use service name: database, compression-scheduler
- Shared storage: Prefix with
shared-: shared-archives, shared-streams, shared-staged-streams
There was a problem hiding this comment.
actually, let's try this:
0. update createLocalPv to handle name differently:
{{/*
Creates a local PersistentVolume.
@param {object} root Root template context
@param {string} component Component category (e.g., "shared-data", "database")
@param {string} name Storage name (e.g., "archives", "logs", "data")
@param {string} nodeRole Node role for affinity. Targets nodes with label
"node-role.kubernetes.io/<nodeRole>". Always falls back to
"node-role.kubernetes.io/control-plane"
@param {string} capacity Storage capacity
@param {string[]} accessModes Access modes
@param {string} hostPath Absolute path on host
@return {string} YAML-formatted PersistentVolume resource
*/}}
{{- define "clp.createLocalPv" -}}
...
metadata:
name: {{ include "clp.fullname" .root }}-{{ .component }}-{{ .name }}
labels:
{{- include "clp.labels" .root | nindent 4 }}
app.kubernetes.io/component: {{ .component | quote }}
...
{{- end }}
for
- pass "shared-data" as the
component(orshared-storage) - pass "archive" as the
name
There was a problem hiding this comment.
Sgtm. Can we rename component to category or component_category?
There was a problem hiding this comment.
Could this have been a statefulset?
There was a problem hiding this comment.
mind explaining why?
for example, the reasons why we are using statefulset for the database are:
- it can have multiple replicas with a stable network identity (though i know we have only one for DB for now)
- the compression scheduler shouldn't need multiple replicas (at least for now?)
- the (celery) compression scheduler shouldn't need network identity because communications happen over DB / queue / redis.
- when multiple replicas are created, the pod creation / deletion are ordered
- again, the compression scheduler is single replica
- the database data (states) stays on a single host so we want to ensure each pod gets a persistent and predictable identify, so its own PVC stays bound to that identity across restarts/rescheduling
- we are creating PVCs for the compression scheduler only for logs. the compression scheduler's state is maintained in the databse
| runAsGroup: {{ .Values.securityContext.firstParty.gid }} | ||
| fsGroup: {{ .Values.securityContext.firstParty.gid }} | ||
| initContainers: | ||
| - name: "wait-for-database" |
There was a problem hiding this comment.
Don't we need to wait for the db-table-creator job? Is that not possible in K8s?
There was a problem hiding this comment.
getting the status of other pods require certain k8s privileges, and there were some difficulties in getting that privilege (i.e. setting up service account) when i tried that previously. there were no correctness issues with the current setup (as in, if the table is not created, those services would exit with an exception and be restarted by k8s) so i just left it.
now i have figured out how to configure the service accounts. i think for all services we should use kubectl wait for waiting containers from now on.
| imagePullPolicy: "{{ .Values.image.clpPackage.pullPolicy }}" | ||
| env: | ||
| - name: "BROKER_URL" | ||
| value: "amqp://{{ .Values.credentials.queue.username }}:{{ .Values.credentials.queue.password }}@{{ include "clp.fullname" . }}-queue:5672" |
There was a problem hiding this comment.
Can we wrap these long lines?
There was a problem hiding this comment.
instead, i created a helper to generate this env var so we don't have to repeat this 4 times
| {{- include "clp.createLocalPv" (dict | ||
| "root" . | ||
| "name" (printf "%s-compression-scheduler-logs" (include "clp.fullname" .)) | ||
| "component" "compression-scheduler-logs" |
There was a problem hiding this comment.
Should this just be compression-scheduler (similar to the database logs volume)?
There was a problem hiding this comment.
right. sorry for the typo
There was a problem hiding this comment.
i also updated the prototype of createLocalPv to match createPvc
| "nodeRole" "control-plane" | ||
| "capacity" "5Gi" | ||
| "accessModes" (list "ReadWriteOnce") | ||
| "hostPath" (printf "%s/compression-scheduler" .Values.clpConfig.logs_directory) |
There was a problem hiding this comment.
Hm, we're using hyphen here but in the volume mount, we use underscore. Any reason we can't pick one for consistency?
There was a problem hiding this comment.
sorry it's a typo. we should use compression_scheduler consistently, until the day we change the service names to use - for both docker compose and k8s (when things are more stabled
There was a problem hiding this comment.
do note that staged-archives will use hyphen, though it's not shown in the current PR (no s3 support here yet)
| @@ -0,0 +1,6 @@ | |||
| {{- include "clp.createPvc" (dict | |||
| "root" . | |||
| "component" "compression-scheduler-logs" | |||
There was a problem hiding this comment.
Like in the PV, it's seem a bit odd to me to pass in compression-scheduler-logs rather than compression-scheduler for the component.
There was a problem hiding this comment.
right, let's add a name field for this createPvc helper
| - name: "archives" | ||
| persistentVolumeClaim: | ||
| claimName: {{ include "clp.fullname" . }}-archives |
There was a problem hiding this comment.
Do you think it's useful to deduplicate things like these volumes since we're repeating them in different deployments? That would avoid cases where we have a typo in "archives" or something.
There was a problem hiding this comment.
sure. let me create some helpers for those
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
tools/deployment/package-helm/values.yaml(2 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Applied to files:
tools/deployment/package-helm/values.yaml
📚 Learning: 2025-12-04T03:31:55.239Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.
Applied to files:
tools/deployment/package-helm/values.yaml
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
tools/deployment/package-helm/values.yaml
📚 Learning: 2025-11-28T15:12:53.530Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1655
File: components/clp-package-utils/clp_package_utils/controller.py:183-189
Timestamp: 2025-11-28T15:12:53.530Z
Learning: In the y-scope/clp project, root database credentials (root username and password) are required configuration items for all deployments. The `credentials[ClpDbUserType.ROOT]` entry is guaranteed to exist and can be accessed directly without optional handling.
Applied to files:
tools/deployment/package-helm/values.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build (ubuntu-24.04)
🔇 Additional comments (10)
tools/deployment/package-helm/values.yaml (10)
21-21: Confirm workerConcurrency aligns with deployment resource planning.The
workerConcurrency: 8default should be verified to match resource requests (CPU/memory) in the compression-worker and query-worker deployment templates to prevent resource contention or over-subscription.
42-45: Clarify intent of max_concurrent_tasks_per_job: 0 default.The
max_concurrent_tasks_per_job: 0default disables concurrent task limiting, which may lead to resource exhaustion or unexpected queueing behavior. Confirm whether unlimited concurrency is the intended default for production deployments, or if a reasonable limit (e.g., based on worker count orworkerConcurrency) should be the default instead.
47-51: Configuration looks sound.The query_scheduler settings are reasonable: port 7000 is non-privileged, polling delay of 0.1s is consistent with compression_scheduler, and archive search batching of 16 is a sensible default for query distribution.
57-59: Configuration is appropriate.The reducer settings are reasonable: INFO logging level and 100ms upsert interval (for batching database writes) are sensible defaults.
66-67: Logging configuration is appropriate.The compression_worker logging level default is sensible.
69-70: Logging configuration is appropriate.The query_worker logging level default is sensible. Ensure downstream deployment templates mount
stream_output.storage.directoryas read-write (not read-only) since query-workers write to this directory.
72-75: WebUI configuration is sound.Port 30400 is non-privileged, collection name and rate limit of 1000 are sensible defaults.
77-103: Archive output configuration is well-tuned.The archive configuration is sensible: 256 MB archive size, 32 MB dictionaries, compression level 3 (moderate speed/ratio balance), and null retention (auto-deletion disabled) are reasonable defaults. The path
/tmp/clp/var/data/archivesis isolated from streams and logs. Documentation is clear.
105-114: Stream output configuration is appropriate.The stream output path
/tmp/clp/var/data/streamswith 128 MB target file size is sensible. Path is isolated from archives and logs, and is dedicated to query-worker stream writes (which require read-write access).
127-131: Temporary directory configuration is sound.The
tmp_directory: "/tmp/clp/var/tmp"path is isolated and appropriate for runtime temporary data.
| # Location (e.g., directory) containing any logs you wish to compress. Must be reachable by all | ||
| # workers. | ||
| logs_input: | ||
| type: "fs" | ||
|
|
||
| # NOTE: This directory will be exposed inside the container, so symbolic links to files outside | ||
| # this directory will be ignored. | ||
| directory: "/" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find the values.yaml file and read extended context
fd -e yaml -e yml | grep -i values.yaml | head -5Repository: y-scope/clp
Length of output: 96
🏁 Script executed:
# Read the values.yaml file to get full context around lines 28-35
cat -n tools/deployment/package-helm/values.yaml | head -60Repository: y-scope/clp
Length of output: 1778
🏁 Script executed:
# Search for how logs_input.directory is used in the Helm templates
find tools/deployment/package-helm -type f \( -name "*.yaml" -o -name "*.yml" -o -name "*.tpl" \) | xargs grep -l "logs_input\|directory" | head -10Repository: y-scope/clp
Length of output: 699
🏁 Script executed:
# Search more specifically for logs_input usage in templates
rg "logs_input" tools/deployment/package-helm/templates/ -A 3 -B 1Repository: y-scope/clp
Length of output: 1944
🏁 Script executed:
# Read the configmap.yaml file to understand the logs_input configuration
cat -n tools/deployment/package-helm/templates/configmap.yaml | grep -A 10 -B 10 "logs_input"Repository: y-scope/clp
Length of output: 2012
🏁 Script executed:
# Search for all uses of .Values.clpConfig.logs_input in the templates
rg "\.Values\.clpConfig\.logs_input" tools/deployment/package-helm/templates/Repository: y-scope/clp
Length of output: 464
🏁 Script executed:
# Read the compression-worker-deployment.yaml to see the full context of the hostPath mount
cat -n tools/deployment/package-helm/templates/compression-worker-deployment.yaml | grep -A 15 -B 5 "logs_input.directory"Repository: y-scope/clp
Length of output: 504
🏁 Script executed:
# Read the compression-worker-deployment.yaml to find the volumeMounts for logs-input
cat -n tools/deployment/package-helm/templates/compression-worker-deployment.yaml | grep -A 3 -B 3 "logs-input"Repository: y-scope/clp
Length of output: 679
🏁 Script executed:
# Check compression-scheduler-deployment.yaml for the same volume configuration
cat -n tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml | grep -A 3 -B 3 "logs-input"Repository: y-scope/clp
Length of output: 675
🏁 Script executed:
# Look for README or documentation in the Helm chart directory
find tools/deployment/package-helm -type f \( -name "README*" -o -name "*.md" \) | head -5Repository: y-scope/clp
Length of output: 37
🏁 Script executed:
# Search for any documentation about logs_input in the repository
rg "logs_input|logs-input" --type md -i | head -20Repository: y-scope/clp
Length of output: 854
🏁 Script executed:
# Read the relevant design documentation
cat docs/src/dev-docs/design-deployment-orchestration.md | grep -A 20 -B 5 "logs_input"Repository: y-scope/clp
Length of output: 1239
🏁 Script executed:
# Read the multi-host guide to see how logs_input is documented for users
cat docs/src/user-docs/guides-multi-host.md | grep -A 10 -B 5 "logs_input"Repository: y-scope/clp
Length of output: 592
🏁 Script executed:
# Read the exact context around lines 28-35 in values.yaml with more detail
sed -n '25,40p' tools/deployment/package-helm/values.yamlRepository: y-scope/clp
Length of output: 499
Clarify that logs_input.directory should be user-configured; document the readOnly mount protection.
The default directory: "/" in values.yaml is too permissive for the logs_input mount. Although the hostPath volume is mounted read-only, the comment should explicitly note that users should override this to point to their specific logs directory (e.g., /var/log or a shared filesystem path containing logs). Add a note clarifying that the read-only mount restricts workers to read access only.
🤖 Prompt for AI Agents
In tools/deployment/package-helm/values.yaml around lines 28 to 35, the default
logs_input.directory is set to "/" which is too permissive and lacks guidance;
update the comment to instruct users to override this value to point to their
specific logs directory (for example /var/log or a shared filesystem path) and
explicitly state that the hostPath volume is mounted read-only so workers have
read-only access to that directory; keep the default value but document that it
must be user-configured for safety and that symbolic links outside the mounted
directory will be ignored.
…gory naming Update the PersistentVolume helper to use component_category and name parameters for consistent naming across all resources.
… naming Update the PersistentVolumeClaim helper to use component_category and name parameters for consistent naming across PV and PVC resources.
- Add clp.pvcVolume helper template that creates a volume definition referencing a PersistentVolumeClaim, using component_category and name parameters to generate consistent naming. - Update deployments to use the new pvcVolume helper for volumes: - compression-scheduler-deployment.yaml - compression-worker-deployment.yaml - query-scheduler-deployment.yaml - query-worker-deployment.yaml - reducer-deployment.yaml - webui-deployment.yaml - Update volume names to use component_category prefix: - "archives" -> "shared-data-archives" - "streams" -> "shared-data-streams" - "tmp" -> "compression-worker-tmp" - Reorder volumeMounts in deployments and statefulsets alphabetically.
- Add clp.celeryBrokerUrlEnvVar helper template that creates the BROKER_URL environment variable for Celery workers/schedulers. - Update deployments to use the new helper: - compression-scheduler-deployment.yaml - compression-worker-deployment.yaml - query-scheduler-deployment.yaml - query-worker-deployment.yaml - Reorder env vars: template helpers first, then alphabetically sorted.
- Add clp.celeryResultBackendEnvVar helper template that creates the RESULT_BACKEND environment variable for Celery workers/schedulers. Takes root context and database number as parameters. - Update deployments to use the new helper: - compression-scheduler-deployment.yaml - compression-worker-deployment.yaml - query-scheduler-deployment.yaml - query-worker-deployment.yaml
7ee56dc to
37ea46b
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
tools/deployment/package-helm/test.sh (1)
19-22: Add error handling for the background download.The wget pipeline runs in the background without error capture, so failures are silently ignored until the
waitcommand attempts to synchronize. If the download fails, the script proceeds tohelm installwith incomplete sample data.Wrap the pipeline in a subshell to preserve its exit status:
# Download sample datasets in the background -(wget -O - https://zenodo.org/records/10516402/files/postgresql.tar.gz?download=1 \ +((wget -O - https://zenodo.org/records/10516402/files/postgresql.tar.gz?download=1 \ | tar xz -C /tmp/clp/samples) || echo "1" > /tmp/clp/download_failed) & SAMPLE_DOWNLOAD_PID=$!Then check the status after waiting:
wait $SAMPLE_DOWNLOAD_PID +if [ -f /tmp/clp/download_failed ]; then + echo "ERROR: Sample download or extraction failed" + exit 1 +fi echo "Sample download and extraction complete"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 1758404 and 37ea46bfc29e5cfb85191d1764feaf34fb9f3ef4.
📒 Files selected for processing (41)
tools/deployment/package-helm/templates/_helpers.tpl(3 hunks)tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/compression-scheduler-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/compression-scheduler-logs-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/compression-scheduler-user-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/compression-scheduler-user-logs-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/compression-worker-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/compression-worker-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/compression-worker-logs-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/compression-worker-tmp-pv.yaml(1 hunks)tools/deployment/package-helm/templates/compression-worker-tmp-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/database-data-pv.yaml(1 hunks)tools/deployment/package-helm/templates/database-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/database-statefulset.yaml(1 hunks)tools/deployment/package-helm/templates/db-table-creator-job.yaml(1 hunks)tools/deployment/package-helm/templates/job-watcher-role.yaml(1 hunks)tools/deployment/package-helm/templates/job-watcher-rolebinding.yaml(1 hunks)tools/deployment/package-helm/templates/job-watcher-serviceaccount.yaml(1 hunks)tools/deployment/package-helm/templates/query-scheduler-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/query-scheduler-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/query-scheduler-logs-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/query-worker-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/query-worker-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/query-worker-logs-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/queue-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/redis-data-pv.yaml(1 hunks)tools/deployment/package-helm/templates/redis-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/redis-statefulset.yaml(1 hunks)tools/deployment/package-helm/templates/reducer-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/reducer-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/reducer-logs-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/results-cache-data-pv.yaml(1 hunks)tools/deployment/package-helm/templates/results-cache-indices-creator-job.yaml(2 hunks)tools/deployment/package-helm/templates/results-cache-logs-pv.yaml(1 hunks)tools/deployment/package-helm/templates/results-cache-statefulset.yaml(1 hunks)tools/deployment/package-helm/templates/shared-data-archives-pv.yaml(1 hunks)tools/deployment/package-helm/templates/shared-data-archives-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/shared-data-streams-pv.yaml(1 hunks)tools/deployment/package-helm/templates/shared-data-streams-pvc.yaml(1 hunks)tools/deployment/package-helm/templates/webui-deployment.yaml(1 hunks)tools/deployment/package-helm/test.sh(2 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
📚 Learning: 2025-12-04T03:31:55.239Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.
Applied to files:
tools/deployment/package-helm/templates/shared-data-streams-pv.yamltools/deployment/package-helm/templates/query-worker-deployment.yamltools/deployment/package-helm/templates/query-worker-logs-pv.yamltools/deployment/package-helm/templates/query-worker-logs-pvc.yaml
📚 Learning: 2024-11-18T16:49:20.248Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-05-06T10:07:04.654Z
Learnt from: kirkrodrigues
Repo: y-scope/clp PR: 881
File: components/core/tools/scripts/lib_install/macos/install-all.sh:11-12
Timestamp: 2025-05-06T10:07:04.654Z
Learning: In CLP installation scripts, temporary directories with downloaded files should not be automatically cleaned up on failure (e.g., with EXIT traps) to preserve artifacts for debugging purposes.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-04-17T16:55:06.658Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 831
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:0-0
Timestamp: 2025-04-17T16:55:06.658Z
Learning: In the compression scheduler, the team prefers initializing in-memory caches from the database at startup rather than performing repeated database queries for efficiency reasons. This approach maintains both performance and reliability across process restarts.
Applied to files:
tools/deployment/package-helm/templates/compression-scheduler-logs-pv.yamltools/deployment/package-helm/templates/compression-scheduler-deployment.yamltools/deployment/package-helm/templates/compression-scheduler-logs-pvc.yaml
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Applied to files:
tools/deployment/package-helm/templates/compression-scheduler-logs-pv.yamltools/deployment/package-helm/templates/compression-scheduler-logs-pvc.yaml
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
tools/deployment/package-helm/templates/compression-scheduler-logs-pv.yamltools/deployment/package-helm/templates/compression-scheduler-logs-pvc.yaml
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Applied to files:
tools/deployment/package-helm/templates/compression-scheduler-logs-pv.yamltools/deployment/package-helm/templates/compression-scheduler-logs-pvc.yaml
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
tools/deployment/package-helm/templates/compression-scheduler-logs-pvc.yaml
🪛 Shellcheck (0.11.0)
tools/deployment/package-helm/test.sh
[warning] 12-12: This { is literal. Check expression (missing ;/\n?) or quote it.
(SC1083)
[warning] 13-13: This } is literal. Check expression (missing ;/\n?) or quote it.
(SC1083)
🪛 YAMLlint (1.37.1)
tools/deployment/package-helm/templates/compression-scheduler-user-logs-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/job-watcher-serviceaccount.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
tools/deployment/package-helm/templates/shared-data-streams-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/shared-data-streams-pvc.yaml
[error] 7-7: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/job-watcher-rolebinding.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 9-9: too many spaces inside braces
(braces)
[error] 9-9: too many spaces inside braces
(braces)
[error] 10-10: too many spaces inside braces
(braces)
[error] 10-10: too many spaces inside braces
(braces)
[error] 13-13: too many spaces inside braces
(braces)
[error] 13-13: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/results-cache-indices-creator-job.yaml
[error] 16-16: too many spaces inside braces
(braces)
[error] 16-16: too many spaces inside braces
(braces)
[error] 19-19: too many spaces inside braces
(braces)
[error] 19-19: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 38-38: too many spaces inside braces
(braces)
[error] 39-39: too many spaces inside braces
(braces)
[error] 40-40: too many spaces inside braces
(braces)
[error] 41-41: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/compression-worker-tmp-pvc.yaml
[error] 7-7: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/reducer-logs-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/query-worker-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 28-28: too many spaces inside braces
(braces)
[error] 34-34: too many spaces inside braces
(braces)
[error] 38-38: too many spaces inside braces
(braces)
[error] 44-44: too many spaces inside braces
(braces)
[error] 44-44: too many spaces inside braces
(braces)
[error] 78-78: too many spaces inside braces
(braces)
[error] 83-83: too many spaces inside braces
(braces)
[error] 88-88: too many spaces inside braces
(braces)
[error] 91-91: too many spaces inside braces
(braces)
[error] 91-91: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/compression-scheduler-logs-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/compression-worker-logs-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/shared-data-archives-pvc.yaml
[error] 7-7: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/query-worker-logs-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/job-watcher-role.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
tools/deployment/package-helm/templates/reducer-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 33-33: too many spaces inside braces
(braces)
[error] 38-38: too many spaces inside braces
(braces)
[error] 47-47: too many spaces inside braces
(braces)
[error] 47-47: too many spaces inside braces
(braces)
[error] 71-71: too many spaces inside braces
(braces)
[error] 74-74: too many spaces inside braces
(braces)
[error] 74-74: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/query-scheduler-logs-pvc.yaml
[error] 7-7: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/webui-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 31-31: too many spaces inside braces
(braces)
[error] 36-36: too many spaces inside braces
(braces)
[error] 45-45: too many spaces inside braces
(braces)
[error] 45-45: too many spaces inside braces
(braces)
[error] 50-50: too many spaces inside braces
(braces)
[error] 50-50: too many spaces inside braces
(braces)
[error] 61-61: too many spaces inside braces
(braces)
[error] 61-61: too many spaces inside braces
(braces)
[error] 81-81: too many spaces inside braces
(braces)
[error] 85-85: too many spaces inside braces
(braces)
[error] 92-92: too many spaces inside braces
(braces)
[error] 95-95: too many spaces inside braces
(braces)
[error] 95-95: too many spaces inside braces
(braces)
[error] 98-98: too many spaces inside braces
(braces)
[error] 98-98: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/query-scheduler-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 33-33: too many spaces inside braces
(braces)
[error] 38-38: too many spaces inside braces
(braces)
[error] 43-43: too many spaces inside braces
(braces)
[error] 49-49: too many spaces inside braces
(braces)
[error] 53-53: too many spaces inside braces
(braces)
[error] 57-57: too many spaces inside braces
(braces)
[error] 57-57: too many spaces inside braces
(braces)
[error] 62-62: too many spaces inside braces
(braces)
[error] 62-62: too many spaces inside braces
(braces)
[error] 65-65: too many spaces inside braces
(braces)
[error] 65-65: too many spaces inside braces
(braces)
[error] 90-90: too many spaces inside braces
(braces)
[error] 93-93: too many spaces inside braces
(braces)
[error] 93-93: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 31-31: too many spaces inside braces
(braces)
[error] 36-36: too many spaces inside braces
(braces)
[error] 41-41: too many spaces inside braces
(braces)
[error] 47-47: too many spaces inside braces
(braces)
[error] 51-51: too many spaces inside braces
(braces)
[error] 55-55: too many spaces inside braces
(braces)
[error] 55-55: too many spaces inside braces
(braces)
[error] 60-60: too many spaces inside braces
(braces)
[error] 60-60: too many spaces inside braces
(braces)
[error] 63-63: too many spaces inside braces
(braces)
[error] 63-63: too many spaces inside braces
(braces)
[error] 90-90: too many spaces inside braces
(braces)
[error] 95-95: too many spaces inside braces
(braces)
[error] 98-98: too many spaces inside braces
(braces)
[error] 98-98: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/query-scheduler-logs-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/db-table-creator-job.yaml
[error] 16-16: too many spaces inside braces
(braces)
[error] 16-16: too many spaces inside braces
(braces)
[error] 19-19: too many spaces inside braces
(braces)
[error] 19-19: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/compression-worker-tmp-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/compression-scheduler-logs-pvc.yaml
[error] 7-7: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/reducer-logs-pvc.yaml
[error] 7-7: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/compression-scheduler-user-logs-pvc.yaml
[error] 7-7: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/query-worker-logs-pvc.yaml
[error] 7-7: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/compression-worker-logs-pvc.yaml
[error] 7-7: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/shared-data-archives-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 28-28: too many spaces inside braces
(braces)
[error] 34-34: too many spaces inside braces
(braces)
[error] 38-38: too many spaces inside braces
(braces)
[error] 44-44: too many spaces inside braces
(braces)
[error] 44-44: too many spaces inside braces
(braces)
[error] 81-81: too many spaces inside braces
(braces)
[error] 86-86: too many spaces inside braces
(braces)
[error] 91-91: too many spaces inside braces
(braces)
[error] 94-94: too many spaces inside braces
(braces)
[error] 94-94: too many spaces inside braces
(braces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: build (macos-15)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (45)
tools/deployment/package-helm/templates/results-cache-statefulset.yaml (2)
41-44: Verify volumeMount ordering consistency across volumeClaimTemplates.The volumeMounts have been reordered (results-cache-data and results-cache-logs now follow the config mount). Confirm that this ordering aligns with the order of volumeClaimTemplates defined at lines 63-87. While this reordering is non-functional, maintaining consistency between volumeMount and volumeClaimTemplates order improves maintainability.
21-24: > Likely an incorrect or invalid review comment.tools/deployment/package-helm/templates/database-statefulset.yaml (1)
60-63: Cosmetic volumeMount reordering.The reordering of database-data and database-logs mounts after the config mount preserves the existing PVC contract and has no functional impact on storage binding or pod lifecycle.
tools/deployment/package-helm/templates/db-table-creator-job.yaml (2)
16-16: Correct addition of RBAC serviceAccountName.The serviceAccountName aligns with the job-watcher RBAC pattern introduced in this PR for job monitoring.
23-27: Verify clp.waitFor helper template produces correct init container.The clp.waitFor helper is being used with type "service" and name "database" to wait for the database service. Ensure the helper is defined correctly in _helpers.tpl and produces a valid init container that correctly resolves the service name and waits for readiness. If using kubectl-based waiting, verify that appropriate RBAC permissions and ServiceAccount configuration are in place for the init container to access cluster resources.
tools/deployment/package-helm/templates/redis-statefulset.yaml (2)
48-51: Cosmetic volumeMount reordering.The reordering preserves the existing storage contract and has no functional impact.
55-61: Readiness probe correctly updated for authenticated Redis.The readiness probe now includes the
-aflag with REDIS_PASSWORD, matching the Redis server's--requirepassconfiguration (line 32) and ensuring authentication is validated during health checks.tools/deployment/package-helm/templates/job-watcher-serviceaccount.yaml (1)
1-6: ServiceAccount correctly structured for job-watcher RBAC pattern.The serviceaccount name uses the release-based naming convention and includes proper labels for cluster discovery. This serves as the foundation for the new RBAC job-watching workflow.
tools/deployment/package-helm/templates/reducer-logs-pvc.yaml (1)
1-7: Helper invocation consistent with other PVC patterns in PR.The PVC follows the established clp.createPvc pattern with appropriate component category, name, and capacity for reducer logs storage.
tools/deployment/package-helm/templates/compression-worker-logs-pvc.yaml (1)
1-7: Helper invocation consistent with established PVC pattern.The compression-worker logs PVC follows the same clp.createPvc pattern as other components with appropriate capacity and access mode.
tools/deployment/package-helm/templates/query-scheduler-logs-pvc.yaml (1)
1-7: Helper invocation consistent with established PVC pattern.The query-scheduler logs PVC follows the established clp.createPvc pattern with appropriate component category and capacity.
tools/deployment/package-helm/templates/shared-data-streams-pvc.yaml (1)
1-7: clp.createPvc helper correctly generates PVC with matching selector labels and storage class.The helper definition confirms the PVC will include selector labels with
app.kubernetes.io/component: "shared-data"matching the corresponding PV (shared-data-streams-pv.yaml), and both templates usestorageClassName: "local-storage"for proper local storage binding. The PVC configuration is correct.tools/deployment/package-helm/templates/database-data-pv.yaml (1)
1-9: Standardized PV naming follows established pattern.The refactoring to use component_category and fixed name aligns with the chart-wide update shown in other PV templates. The parameter structure is consistent.
tools/deployment/package-helm/templates/compression-scheduler-user-logs-pvc.yaml (1)
1-7: New PVC template correctly structured; YAMLlint warnings are false positives.The new PVC uses the clp.createPvc helper with proper component_category/name separation and reasonable capacity/access modes. The YAMLlint syntax errors are misreporting valid Helm template syntax (line 1:
includedirective; line 7: Go template dict syntax).tools/deployment/package-helm/templates/query-worker-logs-pvc.yaml (1)
1-7: Query-worker logs PVC correctly configured.PVC parameters align with the compression-scheduler PVC pattern. ReadWriteOnce is appropriate for single-instance log volumes, and the 5Gi capacity matches the corresponding PV.
tools/deployment/package-helm/templates/results-cache-data-pv.yaml (2)
3-4: Component naming refactor applied consistently.The component_category/name updates follow the established pattern across the chart.
8-8: Theresults_cacheunderscore naming is consistent across the codebase.Line 8 correctly uses
results_cache(underscore), which aligns with:
- How it's defined in
values.yamlat line 61 (results_cache:)- How it's referenced in the corresponding logs PV template (
results-cache-logs-pv.yaml)- The pattern used by all other multi-word directory names in hostPath declarations:
compression_scheduler,compression_worker,query_scheduler,query_worker- The directory structure in
test.shwhere the directory is created asresults_cacheKubernetes identifiers (component names, services) use hyphens (
results-cache), but the actual directory paths consistently use underscores. No changes required.tools/deployment/package-helm/templates/shared-data-archives-pvc.yaml (1)
1-7: Shared archives PVC properly configured for multi-consumer access.The ReadWriteMany access mode and 50Gi capacity are appropriate for shared archive storage serving multiple compression/query workers.
tools/deployment/package-helm/templates/compression-scheduler-logs-pvc.yaml (1)
1-7: Compression scheduler logs PVC correctly implements resolved naming pattern.The component_category/name split (compression-scheduler + logs) matches the naming convention resolved in earlier review discussion and applied consistently across similar components.
tools/deployment/package-helm/templates/query-worker-logs-pv.yaml (1)
1-9: Query-worker logs PV properly paired with PVC and uses consistent hostPath naming.The PV configuration matches the query-worker-logs-pvc.yaml (same component_category, name, capacity, accessModes). The hostPath uses underscore convention (
query_worker) which appears to be a deliberate standardization across the chart refactor.tools/deployment/package-helm/templates/redis-logs-pv.yaml (1)
1-9: Component naming refactor applied; verify hostPath naming consistency.The component_category/name refactoring follows the established pattern. However, the hostPath
redis(line 8) uses different naming than some peer components (e.g.,query_worker,results_cacheuse underscores). Confirm whether this inconsistency is intentional or should be standardized.tools/deployment/package-helm/templates/compression-scheduler-user-logs-pv.yaml (1)
1-9: LGTM — separate user-logs PV follows the pattern.This new PV cleanly separates user-facing logs from system logs, using the established
clp.createLocalPvpattern with appropriate parameters. The 10Gi capacity and ReadWriteMany access are suitable for user log storage.Note: The static analysis reports (YAMLlint errors) are false positives due to Helm template syntax; the
{{-and brace spacing in Helm dicts are standard and correct.tools/deployment/package-helm/templates/database-logs-pv.yaml (1)
1-8: LGTM — PV naming refactoring is consistent.The update from dynamic name generation to fixed name "logs" and from "component" to "component_category" aligns with the broader standardization across the chart. The refactoring is straightforward and maintains all other configuration intact.
tools/deployment/package-helm/templates/compression-scheduler-logs-pv.yaml (1)
1-9: LGTM — compression-scheduler logs PV follows established naming conventions.The underscore notation in the hostPath (
compression_scheduler) is correct and consistent with the PR's decision to use underscores until service names are unified. The 5Gi capacity and ReadWriteOnce mode are appropriate for scheduler logs.tools/deployment/package-helm/templates/shared-data-streams-pv.yaml (1)
1-9: LGTM — ReadWriteMany access is essential for query-worker writes.The
shared-datastreams PV correctly usesReadWriteMany, which is necessary for query-worker to write results to the streams directory (per PR #1690 learnings). The 20Gi capacity and component_category/name structure align with the refactoring pattern.tools/deployment/package-helm/templates/job-watcher-rolebinding.yaml (1)
1-14: LGTM — RBAC RoleBinding is properly structured.The RoleBinding correctly binds the
job-watcherServiceAccount to thejob-readerRole with proper Kubernetes RBAC v1 API versioning and standard metadata. The use of fullname templates ensures consistent naming with other chart resources.Note: Static analysis reports are false positives due to Helm template syntax being misinterpreted by YAMLlint.
tools/deployment/package-helm/templates/query-scheduler-logs-pv.yaml (1)
1-9: LGTM — query-scheduler logs PV mirrors compression-scheduler pattern correctly.The underscore notation in hostPath (
query_scheduler) maintains consistency with prior decisions. The 5Gi capacity and ReadWriteOnce access are appropriate for scheduler logs.tools/deployment/package-helm/templates/redis-data-pv.yaml (1)
1-9: LGTM — redis-data PV refactoring aligns with chart standardization.The update to fixed name "data" and component_category "redis" is consistent with the broader PV naming conventions. The 20Gi capacity and ReadWriteOnce mode are suitable for Redis data storage.
tools/deployment/package-helm/templates/compression-worker-tmp-pv.yaml (1)
1-9: LGTM — temporary storage PV is appropriately sized and accessible.The
ReadWriteManyaccess mode allows compression workers to share temporary space, and 10Gi is a reasonable capacity for intermediate compression data. The direct reference to.Values.clpConfig.tmp_directory(without component-specific suffixing) correctly reflects that temp storage is shared.Verify that
.Values.clpConfig.tmp_directoryis properly defined invalues.yamlwith an appropriate host path value.tools/deployment/package-helm/templates/compression-worker-tmp-pvc.yaml (1)
1-7: Template follows established PVC pattern.The helper invocation is correct and consistent with other PVC templates in the PR. The YAMLlint errors reported are false positives related to Helm's
{{-syntax.tools/deployment/package-helm/templates/results-cache-logs-pv.yaml (1)
3-4: Component naming migration is consistent.The update correctly applies the component_category + static name pattern, with hostPath using the underscore convention ("results_cache"). This aligns with the broader naming refactoring across the chart.
Also applies to: 8-8
tools/deployment/package-helm/templates/reducer-logs-pv.yaml (1)
1-9: New PV template follows established pattern.The reducer logs PV is correctly configured with appropriate capacity (5Gi), access mode, and node affinity. Naming and path structure align with other component PVs in the chart.
tools/deployment/package-helm/templates/compression-worker-logs-pv.yaml (1)
1-9: New PV template is correctly configured.The compression-worker logs PV follows the established pattern with appropriate sizing and path naming using underscores in the hostPath.
tools/deployment/package-helm/templates/shared-data-archives-pv.yaml (1)
1-9: Shared data PV correctly configured for multi-component access.The use of ReadWriteMany access mode and larger capacity (50Gi) is appropriate for shared archive storage accessed by multiple worker components. Configuration aligns with storage architecture.
tools/deployment/package-helm/templates/job-watcher-role.yaml (1)
1-13: RBAC Role correctly scoped for job and pod observation.Permissions are minimally scoped to read-only operations (get, list, watch) across batch jobs and core pods. This follows the principle of least privilege appropriate for monitoring job orchestration.
tools/deployment/package-helm/templates/queue-logs-pv.yaml (1)
3-4: Component naming update is consistent with migration pattern.The transition to component_category "queue" with static name "logs" aligns with the broader refactoring of naming conventions across local PV/PVC resources.
tools/deployment/package-helm/templates/query-worker-deployment.yaml (4)
20-25: Verify security context UID/GID selection.The deployment uses firstParty UID/GID for the security context. Per project learnings, compression-scheduler uses firstParty values, while infrastructure services (database, queue, redis) use service container UID/GID (999:999). Please confirm that query-worker should use firstParty rather than the service container UID/GID.
33-50: Environment configuration is comprehensive.Celery broker/result backend, CLP home/config, logging paths, and Python path are all properly configured. The logging level is configurable per values. Environment setup appears complete for query-worker operation.
51-72: Volume configuration and Celery command are properly set up.Volume mounts include read-only config, writable logs and data directories (archives and streams). The shared-data-streams mount is correctly not marked read-only, as the learning indicates query-worker writes to this directory. Celery worker command properly specifies queue name "query", concurrency from values, and appropriate logging configuration.
73-91: Code snippet appears truncated at configMap volume definition.The volumes section correctly defines four volumes using PVC references and a ConfigMap, but the code snippet ends at line 91 showing only the configMap name without the complete volume definition structure. Please verify the full configMap volume block is present in the actual file (should include the
configMap:key with name specification).tools/deployment/package-helm/templates/results-cache-indices-creator-job.yaml (1)
16-27: Job wiring for results-cache looks sound and consistent.Using the
{{ include "clp.fullname" . }}-job-watcherservice account together with the sharedclp.waitForhelper cleanly centralizes readiness logic, and the updated--uri/--stream-collectionarguments correctly bind to theresults_cacheconfig values. No issues from my side here.Also applies to: 38-41
tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml (1)
20-41: Compression scheduler deployment looks consistent and aligned with expectations.The use of
securityContext.firstPartyforrunAsUser/runAsGroupand the shared Celery/env helpers matches the intended model for the compression scheduler running under CLP_UID:GID rather than the infra service UID. The dependency waits (db-table-creator, queue, redis), PVC volumes, and hostPath logs-input wiring all look coherent.Based on learnings, this matches the expected scheduler runtime identity.
Also applies to: 46-80, 85-102
tools/deployment/package-helm/templates/query-scheduler-deployment.yaml (1)
20-43: Query scheduler deployment is consistent and well-factored.The init dependency chain, Celery broker/backend envs, securityContext, stable hostname/subdomain, and PVC/config wiring all follow the same pattern as the compression scheduler and look correct for the query path.
Also applies to: 48-84, 86-93
tools/deployment/package-helm/templates/webui-deployment.yaml (1)
20-37: WebUI deployment wiring and configurability look good.The switch to using
.Values.clpConfig.webui.rate_limitforRATE_LIMIT, combined with the shared-data streams PVC, settings ConfigMaps, and TCP health probes on port 4000, gives a coherent and configurable deployment. The init waits for db-table-creator and results-cache-indices-creator also align well with the rest of the chart.Also applies to: 41-61, 80-86, 88-98
tools/deployment/package-helm/templates/_helpers.tpl (1)
97-136: PV/PVC helpers look consistent and should reduce copy-paste errors.The
clp.createLocalPv,clp.createPvc, andclp.pvcVolumehelpers give a clear, uniform naming scheme (<fullname>-<component_category>-<name>for PV/PVC,<component_category>-<name>for volume names) and ensure the PVC selectors match the PV labels viaclp.selectorLabelsand theapp.kubernetes.io/componentlabel. This should make future storage additions less error-prone.Also applies to: 138-180
| {{- define "clp.celeryBrokerUrlEnvVar" -}} | ||
| {{- $user := .Values.credentials.queue.username -}} | ||
| {{- $pass := .Values.credentials.queue.password -}} | ||
| {{- $host := printf "%s-queue" (include "clp.fullname" .) -}} | ||
| name: "BROKER_URL" | ||
| value: {{ printf "amqp://%s:%s@%s:5672" $user $pass $host | quote }} | ||
| {{- end }} | ||
|
|
||
| {{/* | ||
| Creates the RESULT_BACKEND env var for Celery workers. | ||
|
|
||
| @param {object} root Root template context | ||
| @param {string} database Redis database number from config | ||
| @return {string} YAML-formatted env var definition | ||
| */}} | ||
| {{- define "clp.celeryResultBackendEnvVar" -}} | ||
| {{- $pass := .root.Values.credentials.redis.password -}} | ||
| {{- $host := printf "%s-redis" (include "clp.fullname" .root) -}} | ||
| name: "RESULT_BACKEND" | ||
| value: {{ printf "redis://default:%s@%s:6379/%d" $pass $host (int .database) | quote }} | ||
| {{- end }} |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Tighten clp.waitFor robustness: pin kubectl image and scope service selector by release.
The shared clp.waitFor helper is a nice consolidation, but two tweaks would make it safer:
- Avoid
bitnami/kubectl:latest– a mutable tag can change behaviour between deploys. - Scope service waits to the Helm release – currently the selector only filters on
app.kubernetes.io/component={{ .name }}, so another release of the chart in the same namespace could satisfy the wait.
A possible adjustment:
{{- define "clp.waitFor" -}}
-name: "wait-for-{{ .name }}"
-image: "bitnami/kubectl:latest"
+name: "wait-for-{{ .name }}"
+image: {{ .root.Values.image.kubectl | default "bitnami/kubectl:1.32.0" | quote }}
command: [
"kubectl", "wait",
{{- if eq .type "service" }}
"--for=condition=ready",
- "pod", "--selector", "app.kubernetes.io/component={{ .name }}",
+ "pod", "--selector",
+ {{ printf "app.kubernetes.io/component=%s,app.kubernetes.io/instance=%s" .name .root.Release.Name | quote }},
{{- else if eq .type "job" }}
"--for=condition=complete",
"job/{{ include "clp.fullname" .root }}-{{ .name }}",
{{- end }}
"--timeout=300s"
]
{{- end }}You’d also add a corresponding image.kubectl default in values.yaml. This keeps waits release-local and avoids surprises from kubectl image drift.
Also applies to: 219-232
🤖 Prompt for AI Agents
In tools/deployment/package-helm/templates/_helpers.tpl around lines 188-208
(and also apply same change at 219-232), tighten the clp.waitFor helper by (1)
replacing the mutable kubectl image reference with a configurable, pinned image
value (read from .Values.image.kubectl and add a corresponding default in
values.yaml) instead of hardcoding bitnami/kubectl:latest, and (2) scope service
selector to the specific Helm release by adding the release/instance label (for
example include app.kubernetes.io/instance={{ include "clp.fullname" . }} or the
appropriate instance label in the selector) so the wait only matches services
from this release; apply the same two changes to the other occurrence mentioned.
| - name: "CLP_CONFIG_PATH" | ||
| value: "/etc/clp-config.yaml" | ||
| - name: "CLP_HOME" | ||
| value: "/opt/clp" | ||
| - name: "CLP_LOGGING_LEVEL" | ||
| value: {{ .Values.clpConfig.compression_worker.logging_level | quote }} | ||
| - name: "CLP_LOGS_DIR" | ||
| value: "/var/log/compression_worker" | ||
| - name: "CLP_WORKER_LOG_PATH" | ||
| value: "/var/log/compression_worker/worker.log" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Avoid duplicating the worker log path in env and CLI.
Both CLP_WORKER_LOG_PATH and the -f /var/log/compression_worker/worker.log flag encode the same path; if it ever changes, it has to be updated in two places. Consider either dropping the env var or deriving the CLI flag from a single source of truth to reduce drift risk.
Also applies to: 66-74
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 44-44: too many spaces inside braces
(braces)
[error] 44-44: too many spaces inside braces
(braces)
🤖 Prompt for AI Agents
tools/deployment/package-helm/templates/compression-worker-deployment.yaml lines
39-48 (also apply same change to lines 66-74): the CLP_WORKER_LOG_PATH
environment variable duplicates the literal log path used in the container CLI
flag, which creates drift risk; change to a single source of truth by either (A)
remove the hardcoded CLI literal and derive the flag from the env var (use
shell/entrypoint expansion so args include -f $CLP_WORKER_LOG_PATH), or (B)
remove the CLP_WORKER_LOG_PATH env var and render the CLI flag from the Helm
value (keep one place to update); implement the chosen approach consistently for
both occurrences (lines 39-48 and 66-74).
| serviceAccountName: {{ include "clp.fullname" . }}-job-watcher | ||
| hostname: {{ include "clp.fullname" . }}-reducer | ||
| subdomain: {{ include "clp.fullname" . }}-reducer | ||
| terminationGracePeriodSeconds: 10 | ||
| securityContext: | ||
| runAsUser: {{ .Values.securityContext.firstParty.uid }} | ||
| runAsGroup: {{ .Values.securityContext.firstParty.gid }} | ||
| fsGroup: {{ .Values.securityContext.firstParty.gid }} | ||
| initContainers: | ||
| - {{- include "clp.waitFor" (dict | ||
| "root" . | ||
| "type" "service" | ||
| "name" "query-scheduler" | ||
| ) | nindent 10 }} | ||
| - {{- include "clp.waitFor" (dict | ||
| "root" . | ||
| "type" "job" | ||
| "name" "results-cache-indices-creator" | ||
| ) | nindent 10 }} |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding readiness/liveness probes for the reducer.
The dependency wiring and command look good, but the reducer pod currently has no health probes. Adding at least a basic liveness (and ideally readiness) probe will make failures or hangs surface more quickly and improve self-healing behaviour in production.
Also applies to: 40-65
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 33-33: too many spaces inside braces
(braces)
[error] 38-38: too many spaces inside braces
(braces)
🤖 Prompt for AI Agents
In tools/deployment/package-helm/templates/reducer-deployment.yaml around lines
20 to 38 (and similarly for lines 40 to 65), the reducer Pod has no liveness or
readiness probes; add both probes to the reducer container spec to detect hangs
and enable self-healing. Add a livenessProbe (e.g., exec or httpGet/tcpSocket
depending on the reducer process) with a short initialDelaySeconds (10–30),
periodSeconds (10–15), and failureThreshold (3) to restart stuck containers, and
add a readinessProbe (same probe type) with a slightly shorter
initialDelaySeconds or conservative timeout to mark the pod unready until it can
serve traffic; place these probe blocks under the reducer container definition
in the Helm template and make any probe path/port configurable via Values so you
can override for env-specific behavior.
- Add clp.waitFor helper template that creates an initContainer to wait for Kubernetes resources (services or jobs) to be ready. Uses kubectl wait instead of nc-based polling. - Add RBAC resources for job-watcher: - job-watcher-serviceaccount.yaml: ServiceAccount for pods - job-watcher-role.yaml: Role with get/list/watch on jobs and pods - job-watcher-rolebinding.yaml: RoleBinding to connect the two - Update all deployments and jobs to use: - serviceAccountName for RBAC permissions - waitFor helper for init containers Affected files: - compression-scheduler-deployment.yaml - compression-worker-deployment.yaml - query-scheduler-deployment.yaml - query-worker-deployment.yaml - reducer-deployment.yaml - webui-deployment.yaml - db-table-creator-job.yaml - results-cache-indices-creator-job.yaml
- Use underscores in directory names to match component naming - Separate data/log directories into distinct mkdir commands for clarity
37ea46b to
5c7f13e
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (4)
tools/deployment/package-helm/templates/reducer-deployment.yaml (1)
40-68: Consider adding resource limits for production readiness.The reducer container lacks resource requests/limits. Without these, the pod may be scheduled on an overcommitted node or consume unbounded resources, potentially affecting cluster stability.
🔎 Suggested addition:
command: [ "python3", "-u", "-m", "job_orchestration.reducer.reducer", "--config", "/etc/clp-config.yaml", "--concurrency", "{{ .Values.workerConcurrency }}", "--upsert-interval", "{{ .Values.clpConfig.reducer.upsert_interval }}" ] + resources: + requests: + memory: "256Mi" + cpu: "100m" + limits: + memory: "512Mi" + cpu: "500m"Regarding health probes: this has already been flagged in a previous review comment. Since the reducer doesn't expose a port, an
execprobe checking the process or a file-based liveness indicator would be needed.tools/deployment/package-helm/templates/query-worker-deployment.yaml (1)
26-28: Verify if these initContainers should be removed.According to a past review comment, you mentioned that these wait conditions "should not exist for the query worker" and that you would "double check and remove them." These initContainers are still present.
Please confirm whether the query-worker actually requires waiting for queue and Redis services before starting, or if these should be removed as originally intended.
tools/deployment/package-helm/templates/_helpers.tpl (1)
226-250: Improve waitFor helper robustness: pin kubectl image and scope selectors.Two improvements would make this helper more reliable in production:
Pin kubectl image version: The
bitnami/kubectl:latesttag is mutable and can change behavior between deployments. Consider pinning to a specific version (e.g.,bitnami/kubectl:1.32.0) or making it configurable via values.Scope service selector to release: Currently, the service wait only filters by component name (Line 243). If multiple Helm releases of this chart exist in the same namespace, pods from a different release could satisfy the wait condition. Add the instance label to scope the wait to this specific release.
🔎 Apply these improvements
{{- define "clp.waitFor" -}} name: "wait-for-{{ .name }}" -image: "bitnami/kubectl:latest" +image: {{ .root.Values.image.kubectl | default "bitnami/kubectl:1.32.0" | quote }} command: [ "kubectl", "wait", {{- if eq .type "service" }} "--for=condition=ready", - "pod", "--selector", "app.kubernetes.io/component={{ .name }}", + "pod", "--selector", + {{ printf "app.kubernetes.io/component=%s,app.kubernetes.io/instance=%s" .name .root.Release.Name | quote }}, {{- else if eq .type "job" }} "--for=condition=complete", "job/{{ include "clp.fullname" .root }}-{{ .name }}", {{- end }} "--timeout=300s" ] {{- end }}Also add to values.yaml:
image: kubectl: "bitnami/kubectl:1.32.0"tools/deployment/package-helm/templates/compression-worker-deployment.yaml (1)
47-48: Consider consolidating the worker log path to a single source.The log path
/var/log/compression_worker/worker.logappears in bothCLP_WORKER_LOG_PATH(Line 48) and the Celery CLI flag (Line 81). If this path needs to change, both locations require updates, creating a drift risk.🔎 Potential approaches
Option 1: Remove the environment variable if it's only used for the Celery flag:
- - name: "CLP_WORKER_LOG_PATH" - value: "/var/log/compression_worker/worker.log"Option 2: If the env var is needed by application code, derive the CLI flag from a Helm value to maintain a single source of truth in values.yaml.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (11)
tools/deployment/package-helm/templates/_helpers.tpl(3 hunks)tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/compression-worker-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/database-statefulset.yaml(3 hunks)tools/deployment/package-helm/templates/query-scheduler-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/query-worker-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/queue-statefulset.yaml(2 hunks)tools/deployment/package-helm/templates/redis-statefulset.yaml(3 hunks)tools/deployment/package-helm/templates/reducer-deployment.yaml(1 hunks)tools/deployment/package-helm/templates/results-cache-statefulset.yaml(3 hunks)tools/deployment/package-helm/templates/webui-deployment.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (13)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
📚 Learning: 2025-04-17T16:55:06.658Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 831
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:0-0
Timestamp: 2025-04-17T16:55:06.658Z
Learning: In the compression scheduler, the team prefers initializing in-memory caches from the database at startup rather than performing repeated database queries for efficiency reasons. This approach maintains both performance and reliability across process restarts.
Applied to files:
tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml
📚 Learning: 2025-12-04T03:31:55.239Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.
Applied to files:
tools/deployment/package-helm/templates/query-worker-deployment.yamltools/deployment/package-helm/templates/results-cache-statefulset.yaml
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Applied to files:
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
📚 Learning: 2025-10-20T21:05:30.417Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1413
File: tools/docker-images/clp-package/Dockerfile:22-24
Timestamp: 2025-10-20T21:05:30.417Z
Learning: In the clp repository's Dockerfiles, ENV directives should be consolidated into multi-line ENV statements when possible to reduce image layers. ENV statements should only be split into separate commands when consolidation is not possible due to dependencies (e.g., when later variables must reference earlier ones that need to be set first, or when PATH must be modified sequentially).
Applied to files:
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
📚 Learning: 2025-10-22T21:14:12.225Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1466
File: .github/workflows/clp-rust-checks.yaml:14-15
Timestamp: 2025-10-22T21:14:12.225Z
Learning: Repository y-scope/clp: In GitHub Actions workflows (e.g., .github/workflows/clp-rust-checks.yaml), YAML anchors/aliases are acceptable and preferred to avoid duplication; if actionlint flags an alias node (e.g., on push.paths) as an error, treat it as a tool limitation and do not require inlining unless the team asks to silence the warning.
Applied to files:
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
Applied to files:
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
📚 Learning: 2025-10-17T19:59:35.299Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:757-757
Timestamp: 2025-10-17T19:59:35.299Z
Learning: In the y-scope/clp repository, prefer long-form command-line flags (e.g., `--recursive`) over short-form flags (e.g., `-R`) for readability, even if the long form is GNU-specific.
Applied to files:
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Applied to files:
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
📚 Learning: 2025-07-01T14:51:19.172Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
Applied to files:
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
🪛 YAMLlint (1.37.1)
tools/deployment/package-helm/templates/query-scheduler-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 33-33: too many spaces inside braces
(braces)
[error] 38-38: too many spaces inside braces
(braces)
[error] 43-43: too many spaces inside braces
(braces)
[error] 49-49: too many spaces inside braces
(braces)
[error] 53-53: too many spaces inside braces
(braces)
[error] 57-57: too many spaces inside braces
(braces)
[error] 57-57: too many spaces inside braces
(braces)
[error] 62-62: too many spaces inside braces
(braces)
[error] 62-62: too many spaces inside braces
(braces)
[error] 65-65: too many spaces inside braces
(braces)
[error] 65-65: too many spaces inside braces
(braces)
[error] 78-78: too many spaces inside braces
(braces)
[error] 81-81: too many spaces inside braces
(braces)
[error] 93-93: too many spaces inside braces
(braces)
[error] 96-96: too many spaces inside braces
(braces)
[error] 96-96: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 31-31: too many spaces inside braces
(braces)
[error] 36-36: too many spaces inside braces
(braces)
[error] 41-41: too many spaces inside braces
(braces)
[error] 47-47: too many spaces inside braces
(braces)
[error] 51-51: too many spaces inside braces
(braces)
[error] 55-55: too many spaces inside braces
(braces)
[error] 55-55: too many spaces inside braces
(braces)
[error] 60-60: too many spaces inside braces
(braces)
[error] 60-60: too many spaces inside braces
(braces)
[error] 63-63: too many spaces inside braces
(braces)
[error] 63-63: too many spaces inside braces
(braces)
[error] 76-76: too many spaces inside braces
(braces)
[error] 79-79: too many spaces inside braces
(braces)
[error] 81-81: too many spaces inside braces
(braces)
[error] 84-84: too many spaces inside braces
(braces)
[error] 96-96: too many spaces inside braces
(braces)
[error] 101-101: too many spaces inside braces
(braces)
[error] 104-104: too many spaces inside braces
(braces)
[error] 104-104: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/query-worker-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 28-28: too many spaces inside braces
(braces)
[error] 34-34: too many spaces inside braces
(braces)
[error] 38-38: too many spaces inside braces
(braces)
[error] 44-44: too many spaces inside braces
(braces)
[error] 44-44: too many spaces inside braces
(braces)
[error] 56-56: too many spaces inside braces
(braces)
[error] 59-59: too many spaces inside braces
(braces)
[error] 61-61: too many spaces inside braces
(braces)
[error] 64-64: too many spaces inside braces
(braces)
[error] 66-66: too many spaces inside braces
(braces)
[error] 69-69: too many spaces inside braces
(braces)
[error] 87-87: too many spaces inside braces
(braces)
[error] 92-92: too many spaces inside braces
(braces)
[error] 97-97: too many spaces inside braces
(braces)
[error] 100-100: too many spaces inside braces
(braces)
[error] 100-100: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/queue-statefulset.yaml
[error] 46-46: too many spaces inside braces
(braces)
[error] 49-49: too many spaces inside braces
(braces)
[error] 62-62: too many spaces inside braces
(braces)
[error] 65-65: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 28-28: too many spaces inside braces
(braces)
[error] 34-34: too many spaces inside braces
(braces)
[error] 38-38: too many spaces inside braces
(braces)
[error] 44-44: too many spaces inside braces
(braces)
[error] 44-44: too many spaces inside braces
(braces)
[error] 59-59: too many spaces inside braces
(braces)
[error] 62-62: too many spaces inside braces
(braces)
[error] 64-64: too many spaces inside braces
(braces)
[error] 67-67: too many spaces inside braces
(braces)
[error] 69-69: too many spaces inside braces
(braces)
[error] 72-72: too many spaces inside braces
(braces)
[error] 90-90: too many spaces inside braces
(braces)
[error] 95-95: too many spaces inside braces
(braces)
[error] 100-100: too many spaces inside braces
(braces)
[error] 103-103: too many spaces inside braces
(braces)
[error] 103-103: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/results-cache-statefulset.yaml
[error] 41-41: too many spaces inside braces
(braces)
[error] 44-44: too many spaces inside braces
(braces)
[error] 46-46: too many spaces inside braces
(braces)
[error] 49-49: too many spaces inside braces
(braces)
[error] 71-71: too many spaces inside braces
(braces)
[error] 74-74: too many spaces inside braces
(braces)
[error] 86-86: too many spaces inside braces
(braces)
[error] 89-89: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/webui-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 23-23: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 31-31: too many spaces inside braces
(braces)
[error] 36-36: too many spaces inside braces
(braces)
[error] 45-45: too many spaces inside braces
(braces)
[error] 45-45: too many spaces inside braces
(braces)
[error] 50-50: too many spaces inside braces
(braces)
[error] 50-50: too many spaces inside braces
(braces)
[error] 61-61: too many spaces inside braces
(braces)
[error] 61-61: too many spaces inside braces
(braces)
[error] 74-74: too many spaces inside braces
(braces)
[error] 77-77: too many spaces inside braces
(braces)
[error] 84-84: too many spaces inside braces
(braces)
[error] 88-88: too many spaces inside braces
(braces)
[error] 95-95: too many spaces inside braces
(braces)
[error] 98-98: too many spaces inside braces
(braces)
[error] 98-98: too many spaces inside braces
(braces)
[error] 101-101: too many spaces inside braces
(braces)
[error] 101-101: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/reducer-deployment.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 20-20: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 22-22: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 27-27: too many spaces inside braces
(braces)
[error] 33-33: too many spaces inside braces
(braces)
[error] 38-38: too many spaces inside braces
(braces)
[error] 47-47: too many spaces inside braces
(braces)
[error] 47-47: too many spaces inside braces
(braces)
[error] 57-57: too many spaces inside braces
(braces)
[error] 60-60: too many spaces inside braces
(braces)
[error] 74-74: too many spaces inside braces
(braces)
[error] 77-77: too many spaces inside braces
(braces)
[error] 77-77: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/redis-statefulset.yaml
[error] 48-48: too many spaces inside braces
(braces)
[error] 51-51: too many spaces inside braces
(braces)
[error] 53-53: too many spaces inside braces
(braces)
[error] 56-56: too many spaces inside braces
(braces)
[error] 79-79: too many spaces inside braces
(braces)
[error] 82-82: too many spaces inside braces
(braces)
[error] 94-94: too many spaces inside braces
(braces)
[error] 97-97: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/database-statefulset.yaml
[error] 60-60: too many spaces inside braces
(braces)
[error] 63-63: too many spaces inside braces
(braces)
[error] 65-65: too many spaces inside braces
(braces)
[error] 68-68: too many spaces inside braces
(braces)
[error] 91-91: too many spaces inside braces
(braces)
[error] 94-94: too many spaces inside braces
(braces)
[error] 106-106: too many spaces inside braces
(braces)
[error] 109-109: too many spaces inside braces
(braces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: package-image
- GitHub Check: check-generated
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (14)
tools/deployment/package-helm/templates/queue-statefulset.yaml (1)
46-49: LGTM!The refactoring to use the
clp.volumeNamehelper for dynamic volume naming is consistent with other components in this PR and improves maintainability.Also applies to: 62-65
tools/deployment/package-helm/templates/database-statefulset.yaml (1)
60-68: LGTM!The dynamic volume naming via
clp.volumeNamehelper is correctly applied to both data and logs volumes, consistent with the pattern used across other components.Also applies to: 91-109
tools/deployment/package-helm/templates/redis-statefulset.yaml (1)
48-56: LGTM!The dynamic volume naming is correctly applied to both Redis data and logs volumes, maintaining consistency with the other statefulset templates.
Also applies to: 79-97
tools/deployment/package-helm/templates/results-cache-statefulset.yaml (1)
41-49: LGTM!The dynamic volume naming via
clp.volumeNameis correctly applied to both data and logs volumes for the results-cache component, consistent with the pattern across all statefulsets.Also applies to: 71-89
tools/deployment/package-helm/templates/reducer-deployment.yaml (1)
28-38: LGTM on init container dependencies.The init containers correctly wait for the query-scheduler service and results-cache-indices-creator job, ensuring the reducer starts only after its dependencies are ready.
tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml (1)
1-108: LGTM! Well-structured deployment with proper dependency orchestration.The compression-scheduler deployment follows best practices:
- Uses templated helpers for consistency (volume naming, wait logic, Celery env vars)
- Appropriate security context and service account
- Correct init containers to ensure dependencies are ready
- Proper volume mounts with readonly flags where appropriate
- Single replica is correct since state is maintained in the database
tools/deployment/package-helm/templates/compression-worker-deployment.yaml (1)
1-107: LGTM! Worker deployment follows established patterns.The compression-worker deployment is well-configured:
- Proper init containers to wait for dependencies
- Correct volume mounts including writable access to archives for compression output
- Templated concurrency setting for operational flexibility
- Uses helper templates consistently for volume and environment variable definitions
tools/deployment/package-helm/templates/query-worker-deployment.yaml (1)
1-100: Overall structure looks good.The query-worker deployment follows the established patterns and uses templated helpers consistently. Subject to resolving the initContainers question and considering the archives mount improvement.
tools/deployment/package-helm/templates/webui-deployment.yaml (1)
1-101: LGTM! WebUI deployment is well-configured.The WebUI deployment follows best practices:
- Appropriate init containers to wait for required jobs
- Secrets properly referenced for database credentials
- Rate limit now properly templated from values
- Health probes configured with appropriate timings
- Volume mounts for configuration separated by client/server concerns
tools/deployment/package-helm/templates/_helpers.tpl (5)
97-111: LGTM! Clean volume naming abstraction.The
clp.volumeNamehelper provides a consistent naming convention across all volume-related resources, reducing duplication and ensuring consistency.
154-182: LGTM! PVC creation helper follows Kubernetes best practices.The
clp.createPvchelper properly:
- Sets storage class to local-storage
- Uses label selectors to match corresponding PVs
- Includes standard CLP labels
- Makes access modes and capacity configurable
184-196: LGTM! Consistent volume reference generation.The
clp.pvcVolumehelper provides a clean abstraction for referencing PVCs in volume definitions, ensuring naming consistency across deployments.
198-210: LGTM! Celery broker URL properly constructed.The
clp.celeryBrokerUrlEnvVarhelper centralizes broker URL construction, reducing duplication across worker deployments.
212-224: LGTM! Result backend configuration centralized.The
clp.celeryResultBackendEnvVarhelper provides a consistent way to configure Redis-backed result storage across workers, with configurable database selection.
| containers: | ||
| - name: "query-scheduler" | ||
| image: "{{ include "clp.image.ref" . }}" | ||
| imagePullPolicy: "{{ .Values.image.clpPackage.pullPolicy }}" | ||
| env: | ||
| - {{- include "clp.celeryBrokerUrlEnvVar" . | nindent 14 }} | ||
| - {{- include "clp.celeryResultBackendEnvVar" (dict | ||
| "root" . | ||
| "database" .Values.clpConfig.redis.query_backend_database | ||
| ) | nindent 14 }} | ||
| - name: "CLP_DB_PASS" | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: {{ include "clp.fullname" . }}-database | ||
| key: "password" | ||
| - name: "CLP_DB_USER" | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: {{ include "clp.fullname" . }}-database | ||
| key: "username" | ||
| - name: "CLP_LOGGING_LEVEL" | ||
| value: {{ .Values.clpConfig.query_scheduler.logging_level | quote }} | ||
| - name: "CLP_LOGS_DIR" | ||
| value: "/var/log/query_scheduler" | ||
| - name: "PYTHONPATH" | ||
| value: "/opt/clp/lib/python3/site-packages" | ||
| ports: | ||
| - name: "query-scheduler" | ||
| containerPort: 7000 | ||
| volumeMounts: | ||
| - name: "config" | ||
| mountPath: "/etc/clp-config.yaml" | ||
| subPath: "clp-config.yaml" | ||
| readOnly: true | ||
| - name: {{ include "clp.volumeName" (dict | ||
| "component_category" "query-scheduler" | ||
| "name" "logs" | ||
| ) | quote }} | ||
| mountPath: "/var/log/query_scheduler" | ||
| command: [ | ||
| "python3", "-u", | ||
| "-m", "job_orchestration.scheduler.query.query_scheduler", | ||
| "--config", "/etc/clp-config.yaml" | ||
| ] |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding resource limits and health probes for production readiness.
The query-scheduler container lacks:
- Resource requests/limits - Without these, the pod may be scheduled on an overcommitted node or consume unbounded resources.
- Readiness/liveness probes - Since the scheduler exposes port 7000, consider adding an HTTP or TCP probe to enable Kubernetes to detect failures and restart hung pods.
🔎 Suggested additions to the container spec:
command: [
"python3", "-u",
"-m", "job_orchestration.scheduler.query.query_scheduler",
"--config", "/etc/clp-config.yaml"
]
+ resources:
+ requests:
+ memory: "256Mi"
+ cpu: "100m"
+ limits:
+ memory: "512Mi"
+ cpu: "500m"
+ readinessProbe:
+ tcpSocket:
+ port: 7000
+ initialDelaySeconds: 10
+ periodSeconds: 10
+ livenessProbe:
+ tcpSocket:
+ port: 7000
+ initialDelaySeconds: 30
+ periodSeconds: 15📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| containers: | |
| - name: "query-scheduler" | |
| image: "{{ include "clp.image.ref" . }}" | |
| imagePullPolicy: "{{ .Values.image.clpPackage.pullPolicy }}" | |
| env: | |
| - {{- include "clp.celeryBrokerUrlEnvVar" . | nindent 14 }} | |
| - {{- include "clp.celeryResultBackendEnvVar" (dict | |
| "root" . | |
| "database" .Values.clpConfig.redis.query_backend_database | |
| ) | nindent 14 }} | |
| - name: "CLP_DB_PASS" | |
| valueFrom: | |
| secretKeyRef: | |
| name: {{ include "clp.fullname" . }}-database | |
| key: "password" | |
| - name: "CLP_DB_USER" | |
| valueFrom: | |
| secretKeyRef: | |
| name: {{ include "clp.fullname" . }}-database | |
| key: "username" | |
| - name: "CLP_LOGGING_LEVEL" | |
| value: {{ .Values.clpConfig.query_scheduler.logging_level | quote }} | |
| - name: "CLP_LOGS_DIR" | |
| value: "/var/log/query_scheduler" | |
| - name: "PYTHONPATH" | |
| value: "/opt/clp/lib/python3/site-packages" | |
| ports: | |
| - name: "query-scheduler" | |
| containerPort: 7000 | |
| volumeMounts: | |
| - name: "config" | |
| mountPath: "/etc/clp-config.yaml" | |
| subPath: "clp-config.yaml" | |
| readOnly: true | |
| - name: {{ include "clp.volumeName" (dict | |
| "component_category" "query-scheduler" | |
| "name" "logs" | |
| ) | quote }} | |
| mountPath: "/var/log/query_scheduler" | |
| command: [ | |
| "python3", "-u", | |
| "-m", "job_orchestration.scheduler.query.query_scheduler", | |
| "--config", "/etc/clp-config.yaml" | |
| ] | |
| containers: | |
| - name: "query-scheduler" | |
| image: "{{ include "clp.image.ref" . }}" | |
| imagePullPolicy: "{{ .Values.image.clpPackage.pullPolicy }}" | |
| env: | |
| - {{- include "clp.celeryBrokerUrlEnvVar" . | nindent 14 }} | |
| - {{- include "clp.celeryResultBackendEnvVar" (dict | |
| "root" . | |
| "database" .Values.clpConfig.redis.query_backend_database | |
| ) | nindent 14 }} | |
| - name: "CLP_DB_PASS" | |
| valueFrom: | |
| secretKeyRef: | |
| name: {{ include "clp.fullname" . }}-database | |
| key: "password" | |
| - name: "CLP_DB_USER" | |
| valueFrom: | |
| secretKeyRef: | |
| name: {{ include "clp.fullname" . }}-database | |
| key: "username" | |
| - name: "CLP_LOGGING_LEVEL" | |
| value: {{ .Values.clpConfig.query_scheduler.logging_level | quote }} | |
| - name: "CLP_LOGS_DIR" | |
| value: "/var/log/query_scheduler" | |
| - name: "PYTHONPATH" | |
| value: "/opt/clp/lib/python3/site-packages" | |
| ports: | |
| - name: "query-scheduler" | |
| containerPort: 7000 | |
| volumeMounts: | |
| - name: "config" | |
| mountPath: "/etc/clp-config.yaml" | |
| subPath: "clp-config.yaml" | |
| readOnly: true | |
| - name: {{ include "clp.volumeName" (dict | |
| "component_category" "query-scheduler" | |
| "name" "logs" | |
| ) | quote }} | |
| mountPath: "/var/log/query_scheduler" | |
| command: [ | |
| "python3", "-u", | |
| "-m", "job_orchestration.scheduler.query.query_scheduler", | |
| "--config", "/etc/clp-config.yaml" | |
| ] | |
| resources: | |
| requests: | |
| memory: "256Mi" | |
| cpu: "100m" | |
| limits: | |
| memory: "512Mi" | |
| cpu: "500m" | |
| readinessProbe: | |
| tcpSocket: | |
| port: 7000 | |
| initialDelaySeconds: 10 | |
| periodSeconds: 10 | |
| livenessProbe: | |
| tcpSocket: | |
| port: 7000 | |
| initialDelaySeconds: 30 | |
| periodSeconds: 15 |
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 49-49: too many spaces inside braces
(braces)
[error] 53-53: too many spaces inside braces
(braces)
[error] 57-57: too many spaces inside braces
(braces)
[error] 57-57: too many spaces inside braces
(braces)
[error] 62-62: too many spaces inside braces
(braces)
[error] 62-62: too many spaces inside braces
(braces)
[error] 65-65: too many spaces inside braces
(braces)
[error] 65-65: too many spaces inside braces
(braces)
[error] 78-78: too many spaces inside braces
(braces)
[error] 81-81: too many spaces inside braces
(braces)
🤖 Prompt for AI Agents
In tools/deployment/package-helm/templates/query-scheduler-deployment.yaml
around lines 44 to 87, the container spec for "query-scheduler" is missing
resource requests/limits and health probes; add a resources block with
appropriate cpu/memory requests and limits (e.g., small request and reasonable
limit) and add liveness and readiness probes (HTTP or TCP to port 7000 or an
application health endpoint) under the container spec so Kubernetes can restart
unhealthy pods and schedule them correctly; ensure probe paths,
initialDelaySeconds, periodSeconds and failureThreshold are set to sane defaults
and make these values configurable via Helm values.
| - name: "CLP_WORKER_LOG_PATH" | ||
| value: "/var/log/query_worker/worker.log" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider consolidating the worker log path to a single source.
Similar to the compression-worker, the log path appears in both CLP_WORKER_LOG_PATH (Line 48) and the Celery CLI flag (Line 78), creating a drift risk if the path needs to change.
🤖 Prompt for AI Agents
In tools/deployment/package-helm/templates/query-worker-deployment.yaml around
lines 47-48 (env var) and line 78 (Celery CLI flag), the log path is defined
twice which risks drift; keep a single source by keeping the CLP_WORKER_LOG_PATH
environment variable and replacing the hardcoded path in the Celery CLI args
with a reference to that env var (e.g., use the env var substitution pattern
used elsewhere in this chart such as $(CLP_WORKER_LOG_PATH) or the Helm template
equivalent), removing the duplicate literal so the CLI always reads the path
from CLP_WORKER_LOG_PATH.
| - name: {{ include "clp.volumeName" (dict | ||
| "component_category" "shared-data" | ||
| "name" "archives" | ||
| ) | quote }} | ||
| mountPath: "/var/data/archives" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding readonly flag to archives mount.
The query-worker reads from compressed archives but does not write to them. Adding readOnly: true to this mount would follow the principle of least privilege and prevent accidental writes.
🔎 Apply this diff
- name: {{ include "clp.volumeName" (dict
"component_category" "shared-data"
"name" "archives"
) | quote }}
mountPath: "/var/data/archives"
+ readOnly: true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: {{ include "clp.volumeName" (dict | |
| "component_category" "shared-data" | |
| "name" "archives" | |
| ) | quote }} | |
| mountPath: "/var/data/archives" | |
| - name: {{ include "clp.volumeName" (dict | |
| "component_category" "shared-data" | |
| "name" "archives" | |
| ) | quote }} | |
| mountPath: "/var/data/archives" | |
| readOnly: true |
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 61-61: too many spaces inside braces
(braces)
[error] 64-64: too many spaces inside braces
(braces)
🤖 Prompt for AI Agents
In tools/deployment/package-helm/templates/query-worker-deployment.yaml around
lines 61 to 65, the archives volume mount is writable by default even though the
query-worker only reads from compressed archives; update the volumeMount to
include readOnly: true to enforce least privilege and prevent accidental writes,
ensuring the mount still uses the same name and mountPath but explicitly sets
readOnly to true.
| - name: {{ include "clp.volumeName" (dict | ||
| "component_category" "shared-data" | ||
| "name" "streams" | ||
| ) | quote }} | ||
| mountPath: "/var/data/streams" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for any references to /var/data/streams or streams directory writes in webui code
find components/webui -type f \( -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" \) 2>/dev/null | head -20Repository: y-scope/clp
Length of output: 1251
🏁 Script executed:
#!/bin/bash
# Search for write operations or streams directory references in webui code
rg -i "writeFile|appendFile|createWriteStream|write\(|/var/data|streams" components/webui/client/src/ --type ts --type tsx -B 2 -A 2Repository: y-scope/clp
Length of output: 82
🏁 Script executed:
#!/bin/bash
# Search for write operations or streams directory references in webui code
rg -i "writeFile|appendFile|createWriteStream|write\(|/var/data|streams" components/webui/client/src/ -B 2 -A 2Repository: y-scope/clp
Length of output: 927
🏁 Script executed:
#!/bin/bash
# Search for stream-files API implementation and backend operations
find components/webui -name "*.ts" -o -name "*.js" | xargs grep -l "stream" | head -20Repository: y-scope/clp
Length of output: 879
🏁 Script executed:
#!/bin/bash
# Examine StreamFileManager and stream-files API to understand read/write operations
echo "=== StreamFileManager.ts ===" && head -100 components/webui/server/src/plugins/app/StreamFileManager.ts
echo ""
echo "=== stream-files API route ===" && head -100 components/webui/server/src/routes/api/stream-files/index.tsRepository: y-scope/clp
Length of output: 6233
🏁 Script executed:
#!/bin/bash
# Check the webui deployment config for /var/data/streams mount
cat tools/deployment/package-helm/templates/webui-deployment.yaml | grep -A 30 "volumeMounts:"Repository: y-scope/clp
Length of output: 1288
🏁 Script executed:
#!/bin/bash
# Check how query-worker mounts the streams directory
grep -A 20 "query-worker" tools/deployment/package/docker-compose-all.yaml | grep -A 10 "volumes:"Repository: y-scope/clp
Length of output: 697
🏁 Script executed:
#!/bin/bash
# Look for where streams are actually written/created in query-worker or other services
rg "CLP_STREAM_OUTPUT_DIR" tools/deployment/package/ -B 2 -A 2Repository: y-scope/clp
Length of output: 2017
🏁 Script executed:
#!/bin/bash
# Check which services write to /var/data/streams
echo "=== docker-compose services mounting /var/data/streams ===" && \
grep -B 5 "/var/data/streams" tools/deployment/package/docker-compose-all.yaml | grep -E "^\s+[a-z-]+:|/var/data/streams"Repository: y-scope/clp
Length of output: 502
🏁 Script executed:
#!/bin/bash
# Get service names more clearly
echo "=== Services and their stream mounts ===" && \
grep -B 20 "/var/data/streams" tools/deployment/package/docker-compose-all.yaml | grep -E "^\s+[a-z-]+:|/var/data/streams" | paste - -Repository: y-scope/clp
Length of output: 158
🏁 Script executed:
#!/bin/bash
# Identify services with stream mounts
echo "=== WebUI service ===" && \
sed -n '/^ webui:/,/^ [a-z-]*:/p' tools/deployment/package/docker-compose-all.yaml | grep -A 15 "volumes:"
echo ""
echo "=== Query Worker service ===" && \
sed -n '/^ query-worker:/,/^ [a-z-]*:/p' tools/deployment/package/docker-compose-all.yaml | grep -A 15 "volumes:"
echo ""
echo "=== Compression Worker service ===" && \
sed -n '/^ compression-worker:/,/^ [a-z-]*:/p' tools/deployment/package/docker-compose-all.yaml | grep -A 15 "volumes:"Repository: y-scope/clp
Length of output: 2150
🏁 Script executed:
#!/bin/bash
# Check if there are any other services that also mount /var/data/streams
echo "=== All services mounting /var/data/streams ===" && \
grep -B 30 "/var/data/streams" tools/deployment/package/docker-compose-all.yaml | grep -E "^ [a-z-]*:" | tail -10Repository: y-scope/clp
Length of output: 162
🏁 Script executed:
#!/bin/bash
# Check API server and garbage-collector mounts
echo "=== API Server service ===" && \
sed -n '/^ api-server:/,/^ [a-z-]*:/p' tools/deployment/package/docker-compose-all.yaml | grep -A 15 "volumes:"
echo ""
echo "=== Garbage Collector service ===" && \
sed -n '/^ garbage-collector:/,/^ [a-z-]*:/p' tools/deployment/package/docker-compose-all.yaml | grep -A 15 "volumes:"Repository: y-scope/clp
Length of output: 1264
🏁 Script executed:
#!/bin/bash
# Verify if webui actually performs write operations
grep -r "writeFile\|fs\.write\|appendFile" components/webui/server/src/ --include="*.ts" | grep -v "\.d\.ts"Repository: y-scope/clp
Length of output: 37
🏁 Script executed:
#!/bin/bash
# Check how webui server accesses the streams directory
grep -r "/var/data/streams\|CLP_STREAM_OUTPUT" components/webui/server/src/ --include="*.ts" -B 2 -A 2Repository: y-scope/clp
Length of output: 37
🏁 Script executed:
#!/bin/bash
# Check webui server main file and how it uses streams
head -200 components/webui/server/src/main.tsRepository: y-scope/clp
Length of output: 1561
Add readOnly: true to the streams volume mount in the WebUI deployment.
The WebUI only reads stream files for displaying query results and performs no write operations. The docker-compose configuration already mounts this directory as read-only (:ro), and the Helm template should match this for consistency and to follow the principle of least privilege. Only the query-worker, api-server, and garbage-collector services require write access to this directory.
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 74-74: too many spaces inside braces
(braces)
[error] 77-77: too many spaces inside braces
(braces)
🤖 Prompt for AI Agents
In tools/deployment/package-helm/templates/webui-deployment.yaml around lines 74
to 78, the streams volumeMount for the WebUI is currently writable but should be
read-only; update the volumeMount spec for the named "streams" mount to include
readOnly: true (properly indented under the mount entry) so the WebUI mounts
/var/data/streams as read-only to match docker-compose and least-privilege
requirements.
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (2)
tools/deployment/package-helm/test.sh (1)
65-68: Add error handling for sample download failure.The background download pipeline still lacks error handling after the
waitcommand. If the download fails, the script proceeds towait_for_podswithout a clear indication that sample data is missing, leading to confusing timeout errors.🔎 Apply this diff to add error checking:
wait $SAMPLE_DOWNLOAD_PID +if [ $? -ne 0 ]; then + echo "ERROR: Sample download failed" + exit 1 +fi echo "Sample download and extraction complete"Also applies to: 96-97
tools/deployment/package-helm/values.yaml (1)
28-35: Past review concern not fully addressed: clarify that logs_input.directory requires user configuration.The comment has been updated to mention symbolic links, but the past review's main concern remains unaddressed. The default
directory: "/"is too permissive and lacks explicit guidance that users must override this value. The comment should:
- Explicitly instruct users to override this to point to their specific logs directory (e.g.,
/var/logor a shared filesystem path)- State that the hostPath volume is mounted read-only in the worker containers
- Note that the "/" default is a placeholder requiring user configuration for safe operation
🔎 View suggested documentation improvement:
- # Location (e.g., directory) containing any logs you wish to compress. Must be reachable by all - # workers. + # Location (e.g., directory) containing logs to compress. Must be reachable by all workers. + # IMPORTANT: Override this default "/" value to point to your actual logs directory (e.g., + # /var/log or a shared filesystem path). The directory is mounted read-only in worker containers + # for safety. logs_input: type: "fs" # NOTE: This directory will be exposed inside the container, so symbolic links to files outside # this directory will be ignored. directory: "/"
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
tools/deployment/package-helm/test.sh(2 hunks)tools/deployment/package-helm/values.yaml(2 hunks)
🧰 Additional context used
🧠 Learnings (18)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
📚 Learning: 2025-07-01T14:51:19.172Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2024-11-18T16:49:20.248Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-05-06T10:07:04.654Z
Learnt from: kirkrodrigues
Repo: y-scope/clp PR: 881
File: components/core/tools/scripts/lib_install/macos/install-all.sh:11-12
Timestamp: 2025-05-06T10:07:04.654Z
Learning: In CLP installation scripts, temporary directories with downloaded files should not be automatically cleaned up on failure (e.g., with EXIT traps) to preserve artifacts for debugging purposes.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-08-25T00:45:05.464Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1242
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:39-41
Timestamp: 2025-08-25T00:45:05.464Z
Learning: Task v3.44.1 has a regression that breaks shell command processing, particularly rsync commands with brace expansion (e.g., `file.{d.ts,js,wasm}`). This causes CI failures in clp-ffi-js project (issue #110), so CLP should avoid v3.44.1 and use v3.44.0 instead, which fixes the dynamic variable bug without the shell processing regression.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-07-07T17:43:04.349Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate `set -eu` and `set -o pipefail` commands rather than combining them into `set -euo pipefail`, to ensure uniform script structure across all platform build scripts.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-10-20T21:05:30.417Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1413
File: tools/docker-images/clp-package/Dockerfile:22-24
Timestamp: 2025-10-20T21:05:30.417Z
Learning: In the clp repository's Dockerfiles, ENV directives should be consolidated into multi-line ENV statements when possible to reduce image layers. ENV statements should only be split into separate commands when consolidation is not possible due to dependencies (e.g., when later variables must reference earlier ones that need to be set first, or when PATH must be modified sequentially).
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-07-08T17:20:02.560Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1078
File: components/webui/server/src/plugins/MongoSocketIoServer/index.ts:105-108
Timestamp: 2025-07-08T17:20:02.560Z
Learning: The y-scope/clp codebase has a 100-character line length limit that takes precedence over other style preferences like template literals when they conflict.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-09-22T15:46:34.873Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1311
File: integration-tests/tests/test_identity_transformation.py:93-97
Timestamp: 2025-09-22T15:46:34.873Z
Learning: In the CLP project, multi-line formatting is preferred over single-line ternary operators when the single line would exceed the project's line length limits, prioritizing code readability and adherence to coding standards.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-12-04T03:31:55.239Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.
Applied to files:
tools/deployment/package-helm/values.yaml
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Applied to files:
tools/deployment/package-helm/values.yaml
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
tools/deployment/package-helm/values.yaml
📚 Learning: 2025-11-28T15:12:53.530Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1655
File: components/clp-package-utils/clp_package_utils/controller.py:183-189
Timestamp: 2025-11-28T15:12:53.530Z
Learning: In the y-scope/clp project, root database credentials (root username and password) are required configuration items for all deployments. The `credentials[ClpDbUserType.ROOT]` entry is guaranteed to exist and can be accessed directly without optional handling.
Applied to files:
tools/deployment/package-helm/values.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: check-generated
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (3)
tools/deployment/package-helm/test.sh (3)
9-10: LGTM: CLP_HOME variable improves maintainability.Defining
CLP_HOMEat the top makes the script easier to maintain and reduces repetition throughout.
58-63: LGTM: Directory structure properly created.The expanded directory structure with proper quoting and line continuation is correct and aligns with the enhanced deployment requirements.
76-79: LGTM: Cluster configuration correctly updated.The additional mount points and port 30000 mapping are properly configured and align with the enhanced deployment requirements for WebUI support.
Also applies to: 87-89
| kind delete cluster --name clp-test | ||
| rm -rf /tmp/clp | ||
| mkdir -p /tmp/clp/var/{data,log}/{database,queue,redis,results_cache} | ||
| rm -rf "$CLP_HOME" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add validation before destructive rm operation.
While set -o nounset protects against unset variables, it's safer to explicitly validate CLP_HOME before rm -rf to prevent accidental deletion if the variable is modified to an empty string or critical path.
🔎 Apply this diff to add validation:
+if [[ -z "$CLP_HOME" || "$CLP_HOME" == "/" ]]; then
+ echo "ERROR: Invalid CLP_HOME value: '$CLP_HOME'"
+ exit 1
+fi
rm -rf "$CLP_HOME"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rm -rf "$CLP_HOME" | |
| if [[ -z "$CLP_HOME" || "$CLP_HOME" == "/" ]]; then | |
| echo "ERROR: Invalid CLP_HOME value: '$CLP_HOME'" | |
| exit 1 | |
| fi | |
| rm -rf "$CLP_HOME" |
🤖 Prompt for AI Agents
In tools/deployment/package-helm/test.sh around line 57, the script does rm -rf
"$CLP_HOME" without validating the variable; add explicit validation to ensure
CLP_HOME is set, non-empty, and not a dangerous path (e.g. not "/" and not empty
string) before removing. Implement checks that fail fast with an error message
and exit if CLP_HOME is empty or equals "/" (or other critical paths you deem
unsafe), and only run rm -rf "$CLP_HOME" after those validations pass.
| pullPolicy: "Always" | ||
| tag: "main" | ||
|
|
||
| workerConcurrency: 8 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add a comment explaining workerConcurrency.
The workerConcurrency setting lacks documentation. Add a brief comment explaining that this controls the number of concurrent workers (compression and query workers) and guidance on how users should tune this value based on their system resources.
🔎 View suggested documentation addition:
+# Number of concurrent workers for compression and query tasks. Adjust based on available CPU
+# cores and memory.
workerConcurrency: 8📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| workerConcurrency: 8 | |
| # Number of concurrent workers for compression and query tasks. Adjust based on available CPU | |
| # cores and memory. | |
| workerConcurrency: 8 |
🤖 Prompt for AI Agents
In tools/deployment/package-helm/values.yaml around line 21, the
workerConcurrency key lacks documentation; add a concise comment above it
explaining that it sets the number of concurrent workers (both compression and
query workers) and advise tuning based on CPU cores and memory—recommend
starting with 1 worker per CPU core (or slightly less) and lowering if CPU or
memory is constrained, and note that increasing this improves throughput at the
cost of higher resource usage.
| webui: | ||
| port: 30000 | ||
| results_metadata_collection_name: "results-metadata" | ||
| rate_limit: 1000 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Document the rate_limit unit and scope.
The rate_limit: 1000 setting lacks documentation about its unit (requests per second? per minute? per hour?) and scope (per IP? per user? global?). Add a comment clarifying what this limit applies to.
🔎 View suggested documentation addition:
webui:
port: 30000
results_metadata_collection_name: "results-metadata"
+ # Maximum number of requests per time window (specify unit and scope, e.g., per IP per minute)
rate_limit: 1000📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| webui: | |
| port: 30000 | |
| results_metadata_collection_name: "results-metadata" | |
| rate_limit: 1000 | |
| webui: | |
| port: 30000 | |
| results_metadata_collection_name: "results-metadata" | |
| # Maximum number of requests per time window (specify unit and scope, e.g., per IP per minute) | |
| rate_limit: 1000 |
🤖 Prompt for AI Agents
In tools/deployment/package-helm/values.yaml around lines 72 to 75, the
webui.rate_limit key has no unit or scope and is ambiguous; update the file by
adding a concise inline comment directly above or beside rate_limit explaining
the unit (e.g., "requests per second") and the scope (e.g., "global", "per-IP",
or "per-user/session"), and, if applicable, note whether it applies to
authenticated users only or includes unauthenticated traffic and whether it is
enforced by the application or an ingress/proxy; keep the comment brief and
accurate so operators know how to interpret and tune the value.
| # Where archives should be output to | ||
| archive_output: | ||
| storage: | ||
| type: "fs" | ||
| # NOTE: This directory must not overlap with any path used in CLP's execution container. An | ||
| # error will be raised if so. | ||
| directory: "/tmp/clp/var/data/archives" | ||
|
|
||
| # Retention period for archives, in minutes. Set to null to disable automatic deletion. | ||
| retention_period: null | ||
|
|
||
| # How much data CLP should try to compress into each archive | ||
| target_archive_size: 268435456 # 256 MB | ||
|
|
||
| # How large the dictionaries should be allowed to get before the archive is | ||
| # closed and a new one is created | ||
| target_dictionaries_size: 33554432 # 32 MB | ||
|
|
||
| # How large each encoded file should be before being split into a new encoded | ||
| # file | ||
| target_encoded_file_size: 268435456 # 256 MB | ||
|
|
||
| # How much data CLP should try to fit into each segment within an archive | ||
| target_segment_size: 268435456 # 256 MB | ||
|
|
||
| # How much archives should be compressed: 1 (fast/low compression) to 19 (slow/high compression) | ||
| compression_level: 3 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding a warning about the /tmp path for production.
The default directory: "/tmp/clp/var/data/archives" uses /tmp, which on most systems is ephemeral and cleared on reboot. While this may be intentional for testing (e.g., with kind clusters), production deployments would lose archive data. Consider adding a comment warning users to change this path for production environments.
🔎 View suggested documentation addition:
# Where archives should be output to
archive_output:
storage:
type: "fs"
+ # WARNING: The default /tmp path is suitable for testing only. For production, use a
+ # persistent path on durable storage to prevent data loss on system reboot.
# NOTE: This directory must not overlap with any path used in CLP's execution container. An
# error will be raised if so.
directory: "/tmp/clp/var/data/archives"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Where archives should be output to | |
| archive_output: | |
| storage: | |
| type: "fs" | |
| # NOTE: This directory must not overlap with any path used in CLP's execution container. An | |
| # error will be raised if so. | |
| directory: "/tmp/clp/var/data/archives" | |
| # Retention period for archives, in minutes. Set to null to disable automatic deletion. | |
| retention_period: null | |
| # How much data CLP should try to compress into each archive | |
| target_archive_size: 268435456 # 256 MB | |
| # How large the dictionaries should be allowed to get before the archive is | |
| # closed and a new one is created | |
| target_dictionaries_size: 33554432 # 32 MB | |
| # How large each encoded file should be before being split into a new encoded | |
| # file | |
| target_encoded_file_size: 268435456 # 256 MB | |
| # How much data CLP should try to fit into each segment within an archive | |
| target_segment_size: 268435456 # 256 MB | |
| # How much archives should be compressed: 1 (fast/low compression) to 19 (slow/high compression) | |
| compression_level: 3 | |
| # Where archives should be output to | |
| archive_output: | |
| storage: | |
| type: "fs" | |
| # WARNING: The default /tmp path is suitable for testing only. For production, use a | |
| # persistent path on durable storage to prevent data loss on system reboot. | |
| # NOTE: This directory must not overlap with any path used in CLP's execution container. An | |
| # error will be raised if so. | |
| directory: "/tmp/clp/var/data/archives" | |
| # Retention period for archives, in minutes. Set to null to disable automatic deletion. | |
| retention_period: null | |
| # How much data CLP should try to compress into each archive | |
| target_archive_size: 268435456 # 256 MB | |
| # How large the dictionaries should be allowed to get before the archive is | |
| # closed and a new one is created | |
| target_dictionaries_size: 33554432 # 32 MB | |
| # How large each encoded file should be before being split into a new encoded | |
| # file | |
| target_encoded_file_size: 268435456 # 256 MB | |
| # How much data CLP should try to fit into each segment within an archive | |
| target_segment_size: 268435456 # 256 MB | |
| # How much archives should be compressed: 1 (fast/low compression) to 19 (slow/high compression) | |
| compression_level: 3 |
🤖 Prompt for AI Agents
In tools/deployment/package-helm/values.yaml around lines 77 to 103 the default
archive directory is set to /tmp/clp/var/data/archives which is ephemeral on
many systems and will be cleared on reboot; add a clear inline warning comment
above or next to the directory setting advising that /tmp is unsuitable for
production, recommend using a persistent path (e.g., a hostPath, PVC mount, or
cloud storage), and suggest examples or a note that this should be changed for
production deployments to avoid data loss.
| # Where CLP stream files (e.g., IR streams) should be output | ||
| stream_output: | ||
| storage: | ||
| type: "fs" | ||
| # NOTE: This directory must not overlap with any path used in CLP's execution container. An | ||
| # error will be raised if so. | ||
| directory: "/tmp/clp/var/data/streams" | ||
|
|
||
| # How large each stream file should be before being split into a new stream file | ||
| target_uncompressed_size: 134217728 # 128 MB |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add warning about /tmp path for production (same as archive_output).
The default directory: "/tmp/clp/var/data/streams" also uses /tmp, which is ephemeral. Since query workers write to this directory (based on learnings), data loss on reboot would affect active query results. Add a similar warning as suggested for archive_output.
Based on learnings, query-worker writes to the stream_output directory and requires read-write access.
🔎 View suggested documentation addition:
# Where CLP stream files (e.g., IR streams) should be output
stream_output:
storage:
type: "fs"
+ # WARNING: The default /tmp path is suitable for testing only. For production, use a
+ # persistent path on durable storage to prevent data loss on system reboot.
# NOTE: This directory must not overlap with any path used in CLP's execution container. An
# error will be raised if so.
directory: "/tmp/clp/var/data/streams"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Where CLP stream files (e.g., IR streams) should be output | |
| stream_output: | |
| storage: | |
| type: "fs" | |
| # NOTE: This directory must not overlap with any path used in CLP's execution container. An | |
| # error will be raised if so. | |
| directory: "/tmp/clp/var/data/streams" | |
| # How large each stream file should be before being split into a new stream file | |
| target_uncompressed_size: 134217728 # 128 MB | |
| # Where CLP stream files (e.g., IR streams) should be output | |
| stream_output: | |
| storage: | |
| type: "fs" | |
| # WARNING: The default /tmp path is suitable for testing only. For production, use a | |
| # persistent path on durable storage to prevent data loss on system reboot. | |
| # NOTE: This directory must not overlap with any path used in CLP's execution container. An | |
| # error will be raised if so. | |
| directory: "/tmp/clp/var/data/streams" | |
| # How large each stream file should be before being split into a new stream file | |
| target_uncompressed_size: 134217728 # 128 MB |
🤖 Prompt for AI Agents
In tools/deployment/package-helm/values.yaml around lines 105 to 114, add a
warning next to stream_output.directory that /tmp is ephemeral and should not be
used in production, mirror the same caution used for archive_output, and note
that query-worker writes to this directory and therefore requires read-write
access; update the comment to recommend using a persistent path (e.g., a mounted
volume) for production deployments and ensure the warning clearly states data
loss on reboot if left as /tmp.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
tools/deployment/package-helm/templates/reducer-service.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
🪛 YAMLlint (1.37.1)
tools/deployment/package-helm/templates/reducer-service.yaml
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 11-11: too many spaces inside braces
(braces)
[error] 14-14: too many spaces inside braces
(braces)
[error] 16-16: too many spaces inside braces
(braces)
[error] 16-16: too many spaces inside braces
(braces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: check-generated
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
| apiVersion: "v1" | ||
| kind: "Service" | ||
| metadata: | ||
| name: {{ include "clp.fullname" . }}-reducer | ||
| labels: | ||
| {{- include "clp.labels" . | nindent 4 }} | ||
| app.kubernetes.io/component: "reducer" | ||
| spec: | ||
| clusterIP: "None" | ||
| selector: | ||
| {{- include "clp.selectorLabels" . | nindent 4 }} | ||
| app.kubernetes.io/component: "reducer" | ||
| ports: | ||
| {{- range $i := until (int .Values.workerConcurrency) }} | ||
| - name: "reducer-{{ $i }}" | ||
| port: {{ add 14009 $i }} | ||
| {{- end }} |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
LGTM! Headless service configuration is appropriate for reducer.
The headless service (clusterIP: None) with dynamically generated ports based on workerConcurrency is well-designed for direct pod-to-pod communication. The port numbering scheme (14009 + index) is clear, and omitting targetPort is correct since it defaults to the port value.
Optional improvement:
Consider explicitly specifying protocol: TCP for each port entry to improve clarity and make the configuration more explicit:
🔎 View optional enhancement:
ports:
{{- range $i := until (int .Values.workerConcurrency) }}
- name: "reducer-{{ $i }}"
port: {{ add 14009 $i }}
+ protocol: TCP
{{- end }}🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 11-11: too many spaces inside braces
(braces)
[error] 14-14: too many spaces inside braces
(braces)
[error] 16-16: too many spaces inside braces
(braces)
[error] 16-16: too many spaces inside braces
(braces)
🤖 Prompt for AI Agents
In tools/deployment/package-helm/templates/reducer-service.yaml around lines 1
to 17, the port entries are missing an explicit protocol; add `protocol: TCP` to
each generated port block in the range so every port entry includes protocol:
TCP (keep the loop and port calculation unchanged) to make the service
configuration explicit and clear.
…Apply suggestions from code review Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
| storageClassName: "local-storage" | ||
| local: | ||
| path: {{ .hostPath }} | ||
| path: {{ .hostPath | quote }} |
There was a problem hiding this comment.
not directly related to this PR but we should fix, for consistency. and just in case users configure paths with # or : which doesn't automatically be considered part of a string in the YAML syntax
…m code review Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
tools/deployment/package-helm/templates/_helpers.tpl (1)
236-250: Pin the kubectl image version and scope service selector to the release.The
clp.waitForhelper has two issues that affect reliability and multi-release deployments:
- Line 238: Using
bitnami/kubectl:latestcreates unpredictable behaviour across deployments as the image can change.- Line 243: The service selector only filters on
app.kubernetes.io/component, which could incorrectly match pods from other Helm releases in the same namespace.🔎 Apply this diff to address both issues:
{{- define "clp.waitFor" -}} -name: "wait-for-{{ .name }}" -image: "bitnami/kubectl:latest" +name: "wait-for-{{ .name }}" +image: {{ .root.Values.image.kubectl | default "bitnami/kubectl:1.32.0" | quote }} command: [ "kubectl", "wait", {{- if eq .type "service" }} "--for=condition=ready", - "pod", "--selector", "app.kubernetes.io/component={{ .name }}", + "pod", "--selector", + {{ printf "app.kubernetes.io/component=%s,app.kubernetes.io/instance=%s" .name .root.Release.Name | quote }}, {{- else if eq .type "job" }} "--for=condition=complete", "job/{{ include "clp.fullname" .root }}-{{ .name }}", {{- end }} "--timeout=300s" ] {{- end }}You'll also need to add a corresponding default in
values.yaml:image: kubectl: "bitnami/kubectl:1.32.0"
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
tools/deployment/package-helm/templates/_helpers.tpl(4 hunks)tools/deployment/package-helm/templates/configmap.yaml(4 hunks)tools/deployment/package-helm/templates/shared-data-archives-pv.yaml(1 hunks)tools/deployment/package-helm/templates/shared-data-streams-pv.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
tools/deployment/package-helm/templates/configmap.yaml
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
tools/deployment/package-helm/templates/configmap.yaml
📚 Learning: 2025-12-04T03:31:55.239Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.
Applied to files:
tools/deployment/package-helm/templates/configmap.yamltools/deployment/package-helm/templates/shared-data-streams-pv.yaml
📚 Learning: 2025-06-22T04:01:43.409Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 939
File: components/package-template/src/etc/clp-config.yml:64-64
Timestamp: 2025-06-22T04:01:43.409Z
Learning: The webui server logging can be configured via LOG_LEVEL and WEBUI_LOGS_DIR environment variables, with file logging enabled automatically in non-TTY environments when a logs directory is specified.
Applied to files:
tools/deployment/package-helm/templates/configmap.yaml
📚 Learning: 2025-08-13T15:07:37.767Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1144
File: components/clp-package-utils/clp_package_utils/scripts/native/dataset_manager.py:164-170
Timestamp: 2025-08-13T15:07:37.767Z
Learning: In the CLP codebase, the ArchiveOutput.get_directory() method already returns a resolved/absolute Path object because FsStorage.make_config_paths_absolute() processes the directory path during configuration loading, making additional .resolve() calls redundant.
Applied to files:
tools/deployment/package-helm/templates/shared-data-archives-pv.yaml
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
tools/deployment/package-helm/templates/shared-data-archives-pv.yaml
🪛 YAMLlint (1.37.1)
tools/deployment/package-helm/templates/shared-data-archives-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/shared-data-streams-pv.yaml
[error] 9-9: too many spaces inside braces
(braces)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: check-generated
🔇 Additional comments (7)
tools/deployment/package-helm/templates/shared-data-archives-pv.yaml (1)
1-9: LGTM!The PersistentVolume template correctly invokes the
clp.createLocalPvhelper with appropriate parameters for the archives volume. The hostPath correctly references the configured archive output directory.tools/deployment/package-helm/templates/shared-data-streams-pv.yaml (1)
1-9: LGTM!The PersistentVolume template correctly invokes the
clp.createLocalPvhelper with appropriate parameters for the streams volume. The hostPath correctly references the configured stream output directory, and the ReadWriteMany access mode is appropriate for the query-worker write requirements.tools/deployment/package-helm/templates/_helpers.tpl (5)
97-111: LGTM!The
clp.volumeNamehelper provides a clean abstraction for standardized volume naming. The documentation is comprehensive and the implementation correctly formats the volume name.
154-182: LGTM!The
clp.createPvchelper is well-implemented with comprehensive documentation. It correctly uses theclp.volumeNamehelper for naming consistency and includes proper label selectors to bind to the corresponding PersistentVolumes.
184-196: LGTM!The
clp.pvcVolumehelper correctly creates volume definitions that reference PersistentVolumeClaims with consistent naming.
198-210: LGTM!The
clp.celeryBrokerUrlEnvVarhelper correctly constructs the Celery broker URL using the standard AMQP format and derives the host from the Helm release name.
212-224: LGTM!The
clp.celeryResultBackendEnvVarhelper correctly constructs the Celery result backend URL for Redis with proper authentication and database selection.
| kind: "PersistentVolume" | ||
| metadata: | ||
| name: {{ include "clp.fullname" .root }}-{{ .component_category }}-{{ .name }} | ||
| name: {{ include "clp.fullname" .root }}-{{ include "clp.volumeName" . }} |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
LGTM!
Good refactoring to use the clp.volumeName helper for consistent naming across all PV/PVC resources.
🤖 Prompt for AI Agents
In tools/deployment/package-helm/templates/_helpers.tpl around line 131, the
refactor replaced inline volume naming with the clp.volumeName helper and is
correct as-is; no changes are necessary—approve and keep the current usage to
ensure consistent PV/PVC naming across the chart.
| storageClassName: "local-storage" | ||
| local: | ||
| path: {{ .hostPath }} | ||
| path: {{ .hostPath | quote }} |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
LGTM!
The hostPath is now properly quoted to handle paths containing special YAML characters like # or :.
🤖 Prompt for AI Agents
In tools/deployment/package-helm/templates/_helpers.tpl around line 142, the
hostPath value is already correctly quoted (path: {{ .hostPath | quote }}) to
handle special YAML characters; no change required—keep the quoted hostPath
as-is to ensure safe serialization of paths containing characters like # or :.
| storage: | ||
| directory: "/var/data/archives" | ||
| type: "fs" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider making storage type configurable.
The storage type is hardcoded as "fs" here and in similar sections (lines 40, 74). For consistency and future flexibility (e.g., supporting S3 or other storage backends), consider reading this from chart values similar to how logs_input.type is referenced in the webui-client-settings.json at line 116.
🔎 View suggested enhancement
storage:
directory: "/var/data/archives"
- type: "fs"
+ type: {{ .Values.clpConfig.archive_output.storage.type | default "fs" | quote }}Apply similar changes to stream_output.storage.type at line 74.
🤖 Prompt for AI Agents
In tools/deployment/package-helm/templates/configmap.yaml around lines 11 to 13
(and also update the similar occurrences at ~line 40 and ~line 74), the
storage.type value is hardcoded to "fs"; make it configurable by reading from
Helm chart values (e.g., add a values key like storage.type with a sensible
default "fs") and replace the literal "fs" in each location to reference that
values path (same pattern used for logs_input.type in
webui-client-settings.json), ensuring stream_output.storage.type is updated as
well so all storage backends can be switched via chart values.
| "StreamFilesS3Region": null, | ||
| "StreamFilesS3PathPrefix": null, | ||
| "StreamFilesS3Profile": null, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider making S3 and Presto settings configurable.
The S3 stream storage settings (lines 146-148) and Presto query engine settings (lines 160-161) are hardcoded as null. For future extensibility, consider making these configurable via chart values with null as the default, similar to the pattern used for other optional settings.
🔎 View suggested enhancement
- "StreamFilesS3Region": null,
- "StreamFilesS3PathPrefix": null,
- "StreamFilesS3Profile": null,
+ "StreamFilesS3Region": {{ .Values.clpConfig.stream_output.s3_region | default "null" }},
+ "StreamFilesS3PathPrefix": {{ .Values.clpConfig.stream_output.s3_path_prefix | default "null" }},
+ "StreamFilesS3Profile": {{ .Values.clpConfig.stream_output.s3_profile | default "null" }},- "PrestoHost": null,
- "PrestoPort": null
+ "PrestoHost": {{ .Values.clpConfig.webui.presto_host | default "null" }},
+ "PrestoPort": {{ .Values.clpConfig.webui.presto_port | default "null" }}Also applies to: 160-161
🤖 Prompt for AI Agents
In tools/deployment/package-helm/templates/configmap.yaml around lines 146-148
and 160-161 the S3 stream storage and Presto settings are hardcoded to null; add
corresponding chart values in values.yaml (with null as the default) and replace
the literal nulls in the configmap with template expressions that read those
values so they remain configurable and default to null if unset; follow the same
templating pattern used elsewhere in this chart (add entries to values.yaml,
reference .Values.<key> in the template, and ensure the rendered output
preserves null when no value is provided).
…dd the missing mount in the webui
…ression user logs and temporary storage
…in both client and server settings.json
| max_concurrent_tasks_per_job: 0 # A value of 0 disables the limit | ||
|
|
||
| query_scheduler: | ||
| port: 7000 |
There was a problem hiding this comment.
Note this value is unused. Since query_scheduler is not a user interfacing service, we should remove this configuration field to avoid confusion. @coderabbitai submit an issue
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track removal of unused query_scheduler.port configuration
gh issue create \
--title "Remove unused query_scheduler.port configuration field from values.yaml" \
--body "## Description
The \`query_scheduler.port\` configuration field in \`tools/deployment/package-helm/values.yaml\` is unused and should be removed to avoid confusion. Since query_scheduler is not a user-interfacing service, this port configuration is unnecessary.
## Related
- PR: https://github.com/y-scope/clp/pull/1749
- Comment: https://github.com/y-scope/clp/pull/1749#discussion_r2633530695
- Reported by: @junhaoliao
## Location
\`tools/deployment/package-helm/values.yaml\` - Lines 47-48" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1812 to track the removal of the unused query_scheduler.port configuration field from values.yaml.
🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Learnt from: haiqi96
Repo: y-scope/clp PR: 569
File: components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py:76-77
Timestamp: 2024-11-17T23:24:08.862Z
Learning: The `query_scheduler.py` file operates with single threading, and multithreading is not used.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
| "type" "service" | ||
| "name" "redis" | ||
| ) | nindent 10 }} | ||
| containers: |
There was a problem hiding this comment.
readinessProbeTimings and livenessProbeTimings could have been added to enhance healthcheck and also match docker-compose-all.yaml. @coderabbitai submit an issue
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find docker-compose-all.yaml and examine healthcheck configurations
echo "=== Looking for docker-compose-all.yaml ==="
fd -t f "docker-compose-all.yaml" -x cat {}
echo -e "\n=== Checking query-scheduler-deployment.yaml for probe definitions ==="
cat tools/deployment/package-helm/templates/query-scheduler-deployment.yaml
echo -e "\n=== Checking webui-deployment.yaml for probe examples ==="
cat tools/deployment/package-helm/templates/webui-deployment.yaml
echo -e "\n=== Checking _helpers.tpl for probe timing helpers ==="
grep -A 20 "readinessProbeTimings\|livenessProbeTimings" tools/deployment/package-helm/templates/_helpers.tpl || echo "No probe timing helpers found"
Length of output: 30278
🏁 Script executed:
#!/bin/bash
# Create GitHub issue for adding health probes to query-scheduler deployment
gh issue create \
--title "Add readiness and liveness probes to query-scheduler deployment" \
--body "## Description
The query-scheduler deployment in the Package Helm chart is missing readinessProbe and livenessProbe configurations that would match the healthcheck settings in docker-compose-all.yaml.
## Current State
The query-scheduler-deployment.yaml currently has no health probes defined, while other deployments (e.g., webui-deployment.yaml) include both readiness and liveness probes using the helper templates \`clp.readinessProbeTimings\` and \`clp.livenessProbeTimings\`.
## Expected Behaviour
The query-scheduler deployment should include:
- A readinessProbe with tcpSocket check on port 7000
- A livenessProbe with tcpSocket check on port 7000
- Both should use the existing helper templates for timing consistency
This would align the Helm deployment with the healthcheck configuration present in docker-compose-all.yaml.
## References
- PR #1749: https://github.com/y-scope/clp/pull/1749
- Comment: https://github.com/y-scope/clp/pull/1749#discussion_r2633531839
- Requested by: @junhaoliao" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1813 to track adding readiness and liveness probes to the query-scheduler deployment.
The issue documents that query-scheduler-deployment.yaml currently lacks health probes, while other deployments like webui use the existing clp.readinessProbeTimings and clp.livenessProbeTimings helper templates with tcpSocket checks. Adding these probes would align the Helm deployment with the healthcheck configuration in docker-compose-all.yaml (which checks port 7000 with interval 30s, timeout 2s, etc.).
Issue: #1813
🧠 Learnings used
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
…o the chart. (y-scope#1749) Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
…o the chart. (y-scope#1749) Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
…o the chart. (y-scope#1749) Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Important
This PR should be merged:
createLocalPv: #1771createLocalPv: #1771)waitForhelper leveraging newjob-watcherservice account privileges. #1772because changes done in #1771 & #1772 are incremental to reach the state in #1749
Description
Note
This PR is part of the ongoing work for #1309. More PRs will be submitted until the Helm chart is complete and fully functional.
This PR enhances the CLP Package Helm chart to support compressions and queries in the WebUI. Changes include:
PersistentVolumeClaimhelperAdded clp.createPvc template for generating PVCs for components.
Component PVs and PVCs
Added PV and PVC templates for:
Deployments
Each deployment includes:
Services
ConfigMap updates
Worker concurrency are configurable via chart values.yaml.
These changes lay the groundwork for full CLP deployment on Kubernetes with all main services, logging, and persistent storage correctly configured.
Checklist
breaking change.
Validation performed
Verified that the clp-s (CLP-JSON) flavour is working
test.shto start the cluster in kind, which maps the host's/homeand/tmpinto the k8s host.kubectl get pods -wand observed all pods were up./tmp/clp/samples/postgresql/postgresql.log/home/junhao/samples/postgresql.jsonlVerified that the clp (CLP-Text) flavour is working
In
tools/deployment/package-helm/values.yaml, modifiedclpConfig.package.storage_engineandclpConfig.package.query_engineboth as "clp". Repeated Step 1-5 above.Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.