feat(clp-package): Add configurable services restart policy for the Package Docker Compose project (resolves #1747).#1773
Conversation
…ackage Docker Compose project.
WalkthroughAdds a validated Click parameter type for Docker restart policies, exposes a Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as start_clp.py
participant Validator as RestartPolicyParamType
participant Controller as DockerComposeController
participant Env as Process Environment
participant Compose as docker-compose
User->>CLI: run CLI with --restart-policy on-failure:5
CLI->>Validator: validate "on-failure:5"
alt invalid
Validator-->>CLI: raise BadParameter
CLI-->>User: error
else valid
Validator-->>CLI: return "on-failure:5"
CLI->>Controller: DockerComposeController(..., restart_policy="on-failure:5")
Controller->>Controller: store self._restart_policy
Controller->>Controller: set_up_env()
Controller->>Env: export CLP_RESTART_POLICY=on-failure:5
Controller->>Compose: run docker-compose (env provided)
Compose->>Compose: apply restart: ${CLP_RESTART_POLICY:-on-failure:3}
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
components/clp-package-utils/clp_package_utils/cli_utils.py(1 hunks)components/clp-package-utils/clp_package_utils/controller.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/start_clp.py(4 hunks)tools/deployment/package/docker-compose-all.yaml(3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-04T03:31:55.239Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.
Applied to files:
tools/deployment/package/docker-compose-all.yaml
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/start_clp.py
📚 Learning: 2025-10-20T20:19:40.504Z
Learnt from: 20001020ycx
Repo: y-scope/clp PR: 1436
File: components/clp-mcp-server/clp_mcp_server/clp_mcp_server.py:22-27
Timestamp: 2025-10-20T20:19:40.504Z
Learning: In the clp-mcp-server project (components/clp-mcp-server/clp_mcp_server/clp_mcp_server.py), the --config Click option should use exists=True to validate the configuration file path at option processing time, even though this requires the default path to exist on all machines.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/start_clp.py
🧬 Code graph analysis (1)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (1)
components/clp-package-utils/clp_package_utils/controller.py (1)
DockerComposeController(897-1073)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: build (macos-15)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (3)
tools/deployment/package/docker-compose-all.yaml (1)
10-10: Restart policy wiring and init-job overrides look soundUsing a global
restart: "${CLP_RESTART_POLICY:-on-failure:3}"inx-service-defaultsaligns with the controller’sCLP_RESTART_POLICYenv var and keeps a sane default, while explicitly pinningdb-table-creatorandresults-cache-indices-creatortorestart: "on-failure:3"ensures init-style jobs don’t get stuck in unbounded restarts when users choose more aggressive policies. Also confirmed that thequery-worker’s/var/data/streamsbind remains read‑write (no:ro), so this change doesn’t regress the known requirement on that volume. Based on learnings, ...Also applies to: 88-91, 213-216
components/clp-package-utils/clp_package_utils/cli_utils.py (1)
1-43: RestartPolicyParamType cleanly captures the intended restart policy spaceThe custom
RestartPolicyParamTypecorrectly accepts the four base policies pluson-failure:<max-retries>with a positive integer, and the failure messages are clear and helpful. Exporting a sharedRESTART_POLICYinstance keeps the Click wiring simple and consistent across scripts. The current lower‑case‑only handling is reasonable given Docker’s own convention.components/clp-package-utils/clp_package_utils/scripts/start_clp.py (1)
12-12: CLI wiring for restart policy is consistent and end‑to‑endImporting
RESTART_POLICY, adding the--restart-policyoption with a validated type and default"on-failure:3", threadingrestart_policy: strthroughmain, and passing it intoDockerComposeControllerties the CLI, controller, and compose configuration together cleanly. The help text reusingRESTART_POLICY.VALID_POLICIES_STRalso keeps user‑facing documentation in sync with the validator.Also applies to: 35-40, 54-54, 105-105
| def __init__(self, clp_config: ClpConfig, instance_id: str, restart_policy: str = "on-failure:3") -> None: | ||
| self._project_name = f"clp-package-{instance_id}" | ||
| self._restart_policy = restart_policy | ||
| super().__init__(clp_config) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Controller restart policy handling is correct; consider de‑duplicating the default
The new restart_policy parameter is threaded cleanly into DockerComposeController and exposed as CLP_RESTART_POLICY in set_up_env, so the compose file can honour the CLI choice without changing existing call sites thanks to the default argument.
One thing to consider for maintainability: the default "on-failure:3" now lives in three places (this constructor default, the --restart-policy Click option, and the compose file’s ${CLP_RESTART_POLICY:-on-failure:3}). If you expect this default to evolve, it may be worth centralising the canonical value in Python (e.g., a shared constant imported by both the CLI and controller) to reduce drift risk, accepting that the compose default will still need manual sync.
Also applies to: 915-918
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/controller.py around lines
902-905 (and also 915-918), the literal default "on-failure:3" is duplicated;
create a single canonical Python constant (e.g., CLP_DEFAULT_RESTART_POLICY) in
a shared module (such as
components/clp-package-utils/clp_package_utils/constants.py), import and use
that constant as the default value for the DockerComposeController __init__
restart_policy parameter and wherever the CLI Click option defines its default,
and update set_up_env to use that constant when setting CLP_RESTART_POLICY so
the Python-side defaults are centralized (the compose file’s fallback default
can remain unchanged).
sitaowang1998
left a comment
There was a problem hiding this comment.
The code LGTM. Should we add user doc for this change?
Co-authored-by: sitaowang1998 <sitaowang1998@outlook.com>
…-time jobs" This reverts commit 5780ced.
sitaowang1998
left a comment
There was a problem hiding this comment.
Code LGTM. Doc needs some improvements.
| |---------------------------|---------------------------------------------------------------------------------------------------------------------------------| | ||
| | `-c, --config PATH` | CLP package configuration file. Defaults to `etc/clp-config.yaml`. | | ||
| | `--restart-policy POLICY` | Docker restart policy for containers. See [Restart policies](#restart-policies) below. Defaults to `on-failure:3`. | | ||
| | `--setup-only` | Validate configuration and prepare directories without starting services. Useful for verifying configuration before deployment. | |
There was a problem hiding this comment.
setup-only sounds like we are running some setup operations. Maybe dry-run or validate? @kirkrodrigues What's your thought on this?
Also, since this is not directly related to this PR, we can open another issue on this if we decide to change.
There was a problem hiding this comment.
Hm, but we are only doing setup operations, no? What would you consider is not a setup operation?
There was a problem hiding this comment.
Do we consider building the container a setup?
| * Use `stop-clp.sh` to properly stop CLP. | ||
| * Initialization jobs (`db-table-creator`, `results-cache-indices-creator`) always use | ||
| `on-failure:3` regardless of this setting. | ||
| * Docker daemon restart (e.g., system reboot or `systemctl restart docker`) may cause some services |
There was a problem hiding this comment.
We should add instructions on how to handle the unexpected restart.
There was a problem hiding this comment.
addded
* If services restart unexpectedly after a Docker daemon restart, run `stop-clp.sh` to clean up
any partially started services bofore running `start-clp.sh`.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @docs/src/user-docs/reference-sbin-scripts/index.md:
- Line 56: Fix the spelling error in the sentence that currently reads "any
partially started services bofore running `start-clp.sh`" by changing "bofore"
to "before" so it reads "any partially started services before running
`start-clp.sh`".
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-docs/reference-sbin-scripts/index.md
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-06-18T20:39:05.899Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Applied to files:
docs/src/user-docs/reference-sbin-scripts/index.md
📚 Learning: 2025-08-25T06:32:48.313Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1261
File: docs/src/dev-guide/components-core/manylinux-2-28-deps-install.md:24-24
Timestamp: 2025-08-25T06:32:48.313Z
Learning: In the CLP project documentation, when linking to scripts or other files within the repository, use relative paths (e.g., components/core/tools/scripts/...) rather than commit-pinned GitHub URLs to ensure docs and referenced files always belong to the same commit and stay synchronized.
Applied to files:
docs/src/user-docs/reference-sbin-scripts/index.md
📚 Learning: 2025-07-01T14:52:15.217Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like `apk update && apk add`, `apt update && apt install`, `yum update && yum install`) rather than platform-specific optimizations, to ensure uniform script structure and readability.
Applied to files:
docs/src/user-docs/reference-sbin-scripts/index.md
📚 Learning: 2025-05-06T09:48:55.408Z
Learnt from: kirkrodrigues
Repo: y-scope/clp PR: 881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
Applied to files:
docs/src/user-docs/reference-sbin-scripts/index.md
📚 Learning: 2024-11-18T16:49:20.248Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.
Applied to files:
docs/src/user-docs/reference-sbin-scripts/index.md
🪛 LanguageTool
docs/src/user-docs/reference-sbin-scripts/index.md
[uncategorized] ~56-~56: Possible missing comma found.
Context: ...to clean up any partially started services bofore runningstart-clp.sh`. ::: ---...
(AI_HYDRA_LEO_MISSING_COMMA)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: check-generated
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (5)
docs/src/user-docs/reference-sbin-scripts/index.md (5)
1-8: LGTM: Clear introduction and structure.The introduction provides a concise overview of the sbin scripts with helpful navigation links.
11-28: LGTM: Comprehensive documentation of start-clp.sh options.The options table clearly documents the new
--restart-policyoption with its default value (on-failure:3) and cross-references the detailed restart policies section below.
30-41: LGTM: Accurate restart policy documentation.The restart policies table correctly describes the standard Docker restart policies with clear, concise descriptions. The inclusion of an example (
on-failure:3) is helpful for users.
61-76: LGTM: Clear stop-clp.sh documentation.The documentation for
stop-clp.shis clear and appropriately concise for this script.
77-83: LGTM: Proper Sphinx toctree structure.The toctree directive correctly links to the admin-tools documentation page.
| restarts containers that exited non-zero. Most CLP services exit cleanly (code 0) and won't | ||
| restart, but some may restart and eventually fail without their dependencies. | ||
| * If services restart unexpectedly after a Docker daemon restart, run `stop-clp.sh` to clean up | ||
| any partially started services bofore running `start-clp.sh`. |
There was a problem hiding this comment.
Fix typo: "bofore" should be "before".
There's a spelling error in the note.
📝 Proposed fix
- any partially started services bofore running `start-clp.sh`.
+ any partially started services before running `start-clp.sh`.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| any partially started services bofore running `start-clp.sh`. | |
| any partially started services before running `start-clp.sh`. |
🧰 Tools
🪛 LanguageTool
[uncategorized] ~56-~56: Possible missing comma found.
Context: ...to clean up any partially started services bofore runningstart-clp.sh`. ::: ---...
(AI_HYDRA_LEO_MISSING_COMMA)
🤖 Prompt for AI Agents
In @docs/src/user-docs/reference-sbin-scripts/index.md at line 56, Fix the
spelling error in the sentence that currently reads "any partially started
services bofore running `start-clp.sh`" by changing "bofore" to "before" so it
reads "any partially started services before running `start-clp.sh`".
…ackage Docker Compose project (resolves y-scope#1747). (y-scope#1773) Co-authored-by: sitaowang1998 <sitaowang1998@outlook.com>
…ackage Docker Compose project (resolves y-scope#1747). (y-scope#1773) Co-authored-by: sitaowang1998 <sitaowang1998@outlook.com>
…ackage Docker Compose project (resolves y-scope#1747). (y-scope#1773) Co-authored-by: sitaowang1998 <sitaowang1998@outlook.com>
Description
This PR adds a
--restart-policycommand-line option tostart_clp.py, allowing users to configure Docker restart policies for CLP services. This addresses the need for automatic service recovery in production deployments.Changes
1. Added
RestartPolicyParamTypeto newcli_utils.pymoduleclp_package_utils/cli_utils.pyfor CLI-related utilitiesRestartPolicyParamTypevalidates Docker Compose restart policiesno- Never restarton-failure[:max-retries]- Restart only on non-zero exit code (optionally limit retries)always- Always restart, including after Docker daemon restartsunless-stopped- Likealways, but won't restart after daemon restart if manually stopped2. Added
--restart-policyCLI option tostart_clp.pyon-failure:3for production readiness (services restart up to 3 times on failure)3. Updated
DockerComposeControllerrestart_policyparameter to constructorCLP_RESTART_POLICYenvironment variable for Docker Compose4. Updated
docker-compose-all.yamlrestart: "${CLP_RESTART_POLICY:-on-failure:3}"to service defaultsdb-table-creator,results-cache-indices-creator) have explicitrestart: "on-failure:3"to ensure they complete successfully regardless of the user-specified policyImpact
--restart-policyflagon-failure:3)--restart-policy always,--restart-policy no, etc.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.