Skip to content

feat(deployment): Support configuration of external third-party services (resolves #1680).#1681

Merged
junhaoliao merged 13 commits into
y-scope:mainfrom
junhaoliao:config-ext-3rd-party
Feb 25, 2026
Merged

feat(deployment): Support configuration of external third-party services (resolves #1680).#1681
junhaoliao merged 13 commits into
y-scope:mainfrom
junhaoliao:config-ext-3rd-party

Conversation

@junhaoliao

@junhaoliao junhaoliao commented Nov 27, 2025

Copy link
Copy Markdown
Member

Description

Enable users to deploy CLP with external third-party services (database, queue, Redis,
results cache) instead of the bundled ones, for both the CLP package (Docker Compose)
and Helm chart deployments. Users configure which services are bundled via the bundled
list in clp-config.yaml; services omitted from the list are treated as external and
must be provided by the user.

Changes

clp_config.py — Host/port transformation with is_bundled

  • Database, Queue, Redis, ResultsCache: transform_for_container() now accepts
    is_bundled: bool. The host is always set to the internal component name (e.g.,
    database, queue) so containers use a consistent hostname. The port is only reset
    to the default when bundled; when external, the user-configured port is preserved.
  • Database.get_container_url() uses self.host and self.port (post-transform)
    instead of hardcoded values, so JDBC URLs reflect the actual (possibly external)
    connection.
  • ClpConfig.transform_for_container() passes the is_bundled flag based on
    self.bundled membership.

controller.py — Conditional env var export and external host resolution

  • _set_up_env_for_database(), _set_up_env_for_queue(), _set_up_env_for_redis(),
    _set_up_env_for_results_cache(): Only emit CLP_EXTRA_HOST_*_NAME,
    CLP_EXTRA_HOST_*_ADDR, and CLP_*_PORT when the service is not bundled. When
    bundled, Docker DNS resolves service names without extra_hosts.
  • New _resolve_external_host(): Returns Docker's host-gateway token for
    localhost/127.0.0.1 (so containers can reach services running on the Docker host
    via Docker's built-in host resolution), and falls back to DNS resolution for other
    hostnames.

docker-compose-all.yamlextra_hosts and env-var-driven ports

  • Added extra_hosts to x-service-defaults with env-var-driven entries. When bundled,
    defaults map no-op hostnames to 127.0.0.2 (harmless, Docker DNS takes precedence).
    When external, the controller populates them to override DNS with the external address.
  • Connection strings use hardcoded service hostnames (queue, redis, database) and
    env vars for ports (previously hardcoded).
  • results-cache-indices-creator URI uses ${CLP_EXTRA_HOST_RESULTS_CACHE_ADDR:-results_cache}
    so that when external, the addr resolves directly (needed because initialize-results-cache.py
    uses the URI hostname to configure the MongoDB replica set member, and the external MongoDB
    server must be able to resolve that hostname to itself).

Helm chart — Conditional bundling

  • Added clpConfig.bundled list to values.yaml (default: all four services).
  • New helper templates in _helpers.tpl: clp.isBundled, clp.databaseHost/Port,
    clp.queueHost/Port, clp.redisHost/Port, clp.resultsCacheHost/Port. These
    resolve to the bundled K8s service name or the user-configured external host/port.
  • configmap.yaml uses the host/port helpers instead of hardcoded service names.
  • 18 bundled-only resources (StatefulSets, Services, PVs, Secrets) wrapped with
    {{- if has "service" .Values.clpConfig.bundled }} conditionals.
  • Scheduler deployments conditionally include queue/redis init-wait containers.
  • results-cache-indices-creator-job and db-table-creator-job use the helpers and
    conditionally include init-wait containers.
  • Chart version bumped to 0.1.4-dev.3.

Config templates & docs

  • clp-config.template.*.yaml: Improved bundled section comments; explicit YAML list
    format with guidance on removing services.
  • guides-external-database.md: Restructured to use the new bundled configuration
    workflow (configure bundled list + host/port, then start CLP). Added warning about
    localhost MongoDB not being supported when external.
  • guides-docker-compose-deployment.md: Simplified multi-host guide; removed manual
    post-setup config editing (now auto-generated).
  • quick-start/clp-json.md, clp-text.md: Cross-reference to external database guide.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Tested on the actual built package (task to build, then sbin/start-clp.sh).

Scenario 1: Bundled mode (default config)

All services bundled (default clp-config.yaml), no external services running.

Task: Verify that the default bundled deployment starts and all services become healthy.

Command:

$ cd build/clp-package
$ ./sbin/start-clp.sh

Output:

2026-02-20T06:04:46.852 INFO [controller] Setting up environment for bundling database...
2026-02-20T06:04:46.853 INFO [controller] Setting up environment for bundling queue...
2026-02-20T06:04:46.854 INFO [controller] Setting up environment for bundling redis...
2026-02-20T06:04:46.854 INFO [controller] Setting up environment for bundling results_cache...
2026-02-20T06:04:46.855 INFO [controller] Setting up environment for database...
2026-02-20T06:04:46.855 INFO [controller] Setting up environment for queue...
2026-02-20T06:04:46.855 INFO [controller] Setting up environment for redis...
2026-02-20T06:04:46.855 INFO [controller] spider_scheduler is not configured, skipping environment setup...
2026-02-20T06:04:46.855 INFO [controller] Setting up environment for results_cache...
2026-02-20T06:04:46.855 INFO [controller] Setting up environment for compression_scheduler...
2026-02-20T06:04:46.855 INFO [controller] Setting up environment for query_scheduler...
2026-02-20T06:04:46.855 INFO [controller] Setting up environment for compression_worker...
2026-02-20T06:04:46.855 INFO [controller] Setting up environment for query_worker...
2026-02-20T06:04:46.855 INFO [controller] Setting up environment for reducer...
2026-02-20T06:04:46.856 INFO [controller] Setting up environment for api_server...
2026-02-20T06:04:46.857 INFO [controller] log_ingestor is only applicable for S3 logs input type, skipping environment setup...
2026-02-20T06:04:46.857 INFO [controller] Setting up environment for webui...
2026-02-20T06:04:46.858 INFO [controller] The MCP Server is not configured, skipping mcp_server creation...
2026-02-20T06:04:46.858 INFO [controller] Setting up environment for garbage_collector...
2026-02-20T06:04:46.914 INFO [controller] Starting CLP using Docker Compose (full deployment)...
...
2026-02-20T06:05:16.932 INFO [controller] Started CLP.

Explanation: All 14 containers started and became healthy (database, queue, redis,
results-cache, db-table-creator, results-cache-indices-creator, compression-scheduler,
query-scheduler, compression-worker, query-worker, reducer, webui, api-server,
garbage-collector). .env contains no CLP_EXTRA_HOST_* or CLP_*_PORT entries —
bundled mode uses Docker DNS only, no extra_hosts overrides needed.

Scenario 2: All four services external

All four services (database, queue, redis, results_cache) unbundled and running
externally on the Docker host.

Task: Start 4 external services using the same images as CLP, configure
bundled: [], and verify all CLP containers connect to the external services.

Setup — start four external services on the Docker host:

$ docker run -d --name ext-mariadb \
    -p 3306:3306 \
    -e MYSQL_ROOT_PASSWORD=qHV6ufRG-0A \
    -e MYSQL_DATABASE=clp-db \
    -e MYSQL_USER=clp-user \
    -e MYSQL_PASSWORD=0moraekQbUk \
    mariadb:10-jammy

$ docker run -d --name ext-rabbitmq \
    -p 5672:5672 \
    -e RABBITMQ_DEFAULT_USER=clp-user \
    -e RABBITMQ_DEFAULT_PASS=2ksymJxbhFw \
    rabbitmq:3.9.8

$ docker run -d --name ext-redis \
    -p 6379:6379 \
    redis:7.2.4 \
    redis-server --requirepass 'Jf0zpUx7uZ3Kdj0yE8LibA'

$ docker run -d --network=host --name ext-mongodb \
    mongo:7.0.1 \
    mongod --bind_ip_all

(Credentials match etc/credentials.yaml. Images match those used by CLP. MongoDB is
started with --bind_ip_all and initialized at the host's non-loopback IP since
localhost MongoDB is not supported when external — see docs warning.)

Config (etc/clp-config-ext-all.yaml):

bundled: []

database:
  host: "localhost"
  port: 3306

queue:
  host: "localhost"
  port: 5672

redis:
  host: "localhost"
  port: 6379

results_cache:
  host: "192.168.3.247"
  port: 27017

Command:

$ ./sbin/start-clp.sh --config etc/clp-config-ext-all.yaml

Output:

2026-02-20T06:43:04.153 INFO [controller] database is not included in the 'bundled' configuration, skipping service bundling...
2026-02-20T06:43:04.153 INFO [controller] queue is not configured or part of the 'bundled' configuration, skipping service bundling...
2026-02-20T06:43:04.154 INFO [controller] redis is not configured or part of the 'bundled' configuration, skipping service bundling...
2026-02-20T06:43:04.154 INFO [controller] results_cache is not included in the 'bundled' configuration, skipping service bundling...
2026-02-20T06:43:04.154 INFO [controller] Setting up environment for database...
2026-02-20T06:43:04.154 INFO [controller] Setting up environment for queue...
2026-02-20T06:43:04.154 INFO [controller] Setting up environment for redis...
2026-02-20T06:43:04.154 INFO [controller] spider_scheduler is not configured, skipping environment setup...
2026-02-20T06:43:04.154 INFO [controller] Setting up environment for results_cache...
2026-02-20T06:43:04.154 INFO [controller] Setting up environment for compression_scheduler...
2026-02-20T06:43:04.155 INFO [controller] Setting up environment for query_scheduler...
2026-02-20T06:43:04.155 INFO [controller] Setting up environment for compression_worker...
2026-02-20T06:43:04.155 INFO [controller] Setting up environment for query_worker...
2026-02-20T06:43:04.155 INFO [controller] Setting up environment for reducer...
2026-02-20T06:43:04.155 INFO [controller] Setting up environment for api_server...
2026-02-20T06:43:04.155 INFO [controller] log_ingestor is only applicable for S3 logs input type, skipping environment setup...
2026-02-20T06:43:04.155 INFO [controller] Setting up environment for webui...
2026-02-20T06:43:04.156 INFO [controller] The MCP Server is not configured, skipping mcp_server creation...
2026-02-20T06:43:04.156 INFO [controller] Setting up environment for garbage_collector...
2026-02-20T06:43:04.197 INFO [controller] Starting CLP using Docker Compose (full deployment)...
...
2026-02-20T06:43:21.952 INFO [controller] Started CLP.

Verification — container status:

$ docker ps --filter "name=clp-package-3fcd" --format "table {{.Names}}\t{{.Status}}"
NAMES                                      STATUS
clp-package-3fcd-reducer-1                 Up About a minute
clp-package-3fcd-query-scheduler-1         Up 2 minutes (healthy)
clp-package-3fcd-webui-1                   Up 2 minutes (healthy)
clp-package-3fcd-garbage-collector-1       Up 2 minutes
clp-package-3fcd-compression-scheduler-1   Up 2 minutes
clp-package-3fcd-api-server-1              Up 2 minutes (healthy)
clp-package-3fcd-compression-worker-1      Up 2 minutes
clp-package-3fcd-query-worker-1            Up 2 minutes

No database, queue, redis, or results-cache containers — all are external.

Verification — .env external host entries:

CLP_DATABASE_ENABLED=0
CLP_QUEUE_ENABLED=0
CLP_REDIS_ENABLED=0
CLP_RESULTS_CACHE_ENABLED=0
CLP_DB_PORT=3306
CLP_EXTRA_HOST_DATABASE_NAME=database
CLP_EXTRA_HOST_DATABASE_ADDR=host-gateway
CLP_QUEUE_PORT=5672
CLP_EXTRA_HOST_QUEUE_NAME=queue
CLP_EXTRA_HOST_QUEUE_ADDR=host-gateway
CLP_REDIS_PORT=6379
CLP_EXTRA_HOST_REDIS_NAME=redis
CLP_EXTRA_HOST_REDIS_ADDR=host-gateway
CLP_RESULTS_CACHE_PORT=27017
CLP_EXTRA_HOST_RESULTS_CACHE_NAME=results_cache
CLP_EXTRA_HOST_RESULTS_CACHE_ADDR=192.168.3.247

Database, queue, and redis configured with localhost resolved to Docker's host-gateway
token, which Docker maps to the host's IP at container start. Results cache uses the
non-loopback IP directly (192.168.3.247).

Verification — service logs:

  • results-cache-indices-creator connected to external MongoDB and exited cleanly (exit
    code 0, no warnings — replica set was initialized at the same IP 192.168.3.247).

  • db-table-creator connected to external MariaDB:

    2026-02-20 06:43:14,846 [INFO] No spider database configured. Skipping Spider database initialization.
    
  • compression-scheduler started successfully:

    2026-02-20 06:43:19,222 compression_scheduler [INFO] Starting compression_scheduler
    
  • query-scheduler connected to the external database via extra_hosts mapping:

    2026-02-20 06:43:19,552 search-job-handler [INFO] Connected to archive database database:3306.
    2026-02-20 06:43:19,553 search-job-handler [INFO] query_scheduler started.
    

Task: Verify end-to-end data flow (compression) with all external services.

Command:

$ ./sbin/compress.sh --timestamp-key timestamp ~/samples/postgresql.log

Output:

2026-02-20T06:45:41.180 INFO [compress] Compression job 1 submitted.
2026-02-20T06:45:52.726 INFO [compress] Compression finished.
2026-02-20T06:45:52.726 INFO [compress] Compressed 385.21MB into 10.06MB (38.31x). Speed: 34.56MB/s.

Explanation: Compression job was submitted through external RabbitMQ, stored
metadata in external MariaDB, used external Redis for task coordination, and the
results-cache-indices-creator successfully created indices on the external MongoDB.
Full data pipeline works with all four services external.

Scenario 3: Bundled mode in Kubernetes (kind cluster)

All services bundled (default values.yaml), deployed via set-up-test.sh.

Task: Verify that the default Helm chart deployment starts all pods successfully.

Command:

$ bash tools/deployment/package-helm/set-up-test.sh

After fix (helm upgrade):

$ kubectl get pods
NAME                                              READY   STATUS      RESTARTS   AGE
test-clp-api-server-cfb5f7c4c-8jqn9               1/1     Running     0          6m23s
test-clp-compression-scheduler-77dff7bb67-kmdwl   1/1     Running     0          40s
test-clp-compression-worker-dc489ccb5-l8qv7       1/1     Running     0          40s
test-clp-database-0                               1/1     Running     0          6m23s
test-clp-db-table-creator-mkz2j                   0/1     Completed   0          6m24s
test-clp-garbage-collector-557d7f86bd-6mjbn       1/1     Running     0          6m23s
test-clp-query-scheduler-6d5987fc5f-cvx7j         1/1     Running     0          40s
test-clp-query-worker-776c9fd74f-45jcg            1/1     Running     0          40s
test-clp-queue-0                                  1/1     Running     0          6m23s
test-clp-redis-0                                  1/1     Running     0          6m23s
test-clp-reducer-854587454d-cpzwk                 1/1     Running     0          6m23s
test-clp-results-cache-0                          1/1     Running     0          6m23s
test-clp-results-cache-indices-creator-gd876      0/1     Completed   0          6m24s
test-clp-webui-645c796944-2k2js                   1/1     Running     0          6m23s

Explanation: All 14 pods running/completed. Workers connected to queues:

$ kubectl logs test-clp-compression-worker-dc489ccb5-l8qv7 --tail=3
 -------------- [queues]
                .> compression      exchange=compression(direct) key=compression

$ kubectl logs test-clp-query-worker-776c9fd74f-45jcg --tail=3
 -------------- [queues]
                .> query            exchange=query(direct) key=query

Scenario 4: All four services external in Kubernetes (kind cluster)

All four services unbundled, running as separate Docker containers on the host, accessible
from kind pods via the host's non-loopback IP (192.168.3.247).

Setup — start four external services on the host:

$ docker run -d --name ext-k8s-mariadb \
    -p 192.168.3.247:13306:3306 \
    -e MYSQL_ROOT_PASSWORD=root-pass \
    -e MYSQL_DATABASE=clp-db \
    -e MYSQL_USER=clp-user \
    -e MYSQL_PASSWORD=pass \
    mariadb:10-jammy

$ docker run -d --name ext-k8s-rabbitmq \
    -p 192.168.3.247:15672:5672 \
    -e RABBITMQ_DEFAULT_USER=clp-user \
    -e RABBITMQ_DEFAULT_PASS=pass \
    rabbitmq:3.9.8

$ docker run -d --name ext-k8s-redis \
    -p 192.168.3.247:16379:6379 \
    redis:7.2.4 \
    redis-server --requirepass 'pass'

$ docker run -d --name ext-k8s-mongodb \
    -p 192.168.3.247:17017:27017 \
    mongo:7.0.1 \
    mongod --replSet rs0 --bind_ip_all

(Credentials match values.yaml defaults. Non-default ports used to avoid conflicts
with the bundled deployment that was running earlier. No manual rs.initiate() needed —
initialize-results-cache.py detects the uninitialized replica set and initializes it
using the configured URI hostname.)

Override values (ext-values.yaml):

clpConfig:
  bundled: []

  database:
    host: "192.168.3.247"
    port: 13306

  queue:
    host: "192.168.3.247"
    port: 15672

  redis:
    host: "192.168.3.247"
    port: 16379

  results_cache:
    host: "192.168.3.247"
    port: 17017

Command:

$ helm install test tools/deployment/package-helm/ -f tools/deployment/package-helm/ext-values.yaml

Pod status:

$ kubectl get pods
NAME                                              READY   STATUS      RESTARTS   AGE
test-clp-api-server-cfb5f7c4c-7j24m               1/1     Running     0          86s
test-clp-compression-scheduler-58685bff4f-njd6v   1/1     Running     0          86s
test-clp-compression-worker-6646444bf7-vmfwl      1/1     Running     0          86s
test-clp-db-table-creator-2q52r                   0/1     Completed   0          86s
test-clp-garbage-collector-557d7f86bd-mgcfx       1/1     Running     0          86s
test-clp-query-scheduler-574c896989-ml47n         1/1     Running     0          86s
test-clp-query-worker-f56bbfc65-mr2xx             1/1     Running     0          86s
test-clp-reducer-854587454d-g7qfk                 1/1     Running     0          86s
test-clp-results-cache-indices-creator-vntsq      0/1     Completed   0          86s
test-clp-webui-645c796944-7qkzz                   1/1     Running     0          86s

No database, queue, redis, or results-cache statefulsets — all external.

Verification — no bundled resources:

$ kubectl get statefulsets
No resources found in default namespace.

$ kubectl get svc
NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)           AGE
kubernetes                 ClusterIP   10.96.0.1       <none>        443/TCP           2m
test-clp-api-server        NodePort    10.96.147.249   <none>        3001:30301/TCP    111s
test-clp-query-scheduler   ClusterIP   None            <none>        7000/TCP          111s
test-clp-reducer           ClusterIP   None            <none>        14009/TCP,...     111s
test-clp-webui             NodePort    10.96.24.47     <none>        4000:30000/TCP    111s

Verification — service logs (external connections):

  • db-table-creator connected to external MariaDB:

    2026-02-22 16:11:21,716 [INFO] No spider database configured. Skipping Spider database initialization.
    
  • results-cache-indices-creator completed (exit 0, no output — replica set auto-initialized).

  • compression-worker connected to external queue and redis:

    .> transport:   amqp://clp-user:**@192.168.3.247:15672//
    .> results:     redis://default:**@192.168.3.247:16379/1
    
  • query-scheduler connected to external database:

    2026-02-22 16:11:39,790 search-job-handler [INFO] Connected to archive database 192.168.3.247:13306.
    2026-02-22 16:11:39,790 search-job-handler [INFO] query_scheduler started.
    

Verification — configmap (external hosts):

{
  "SqlDbHost": "192.168.3.247",
  "SqlDbPort": 13306,
  "MongoDbHost": "192.168.3.247",
  "MongoDbPort": 17017
}

Explanation: Helm chart correctly skips all bundled service resources (StatefulSets,
Services, PVs, Secrets for queue/redis) when bundled: []. The configmap, Celery
connection strings, and init-job URIs all correctly resolve to the user-configured
external host/port. All CLP pods connect to external services successfully.

@coderabbitai

coderabbitai Bot commented Nov 27, 2025

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Container-specific host/port mappings and JDBC URL construction were parameterized and conditionalized: transform_for_container methods now accept an is_bundled flag and container host/port/env provisioning is only applied when a service is marked bundled. Deployment templates and docs updated accordingly.

Changes

Cohort / File(s) Summary
Core config handling
components/clp-py-utils/clp_py_utils/clp_config.py
Updated transform_for_container(...) signatures for Database, Redis, ResultsCache, and Queue to accept is_bundled: bool; JDBC/container URLs now use configured host/port and host/port mappings are applied only when the service is bundled.
Docker composition
tools/deployment/package/docker-compose-all.yaml
Replaced hardcoded service host/port references with environment-variable templates and added extra_hosts entries so database, queue, redis, and results-cache endpoints are driven by env vars (CLP_ADDR/PORT and CLP_EXTRA_HOST).
Package controller env setup
components/clp-package-utils/clp_package_utils/controller.py
Env var provisioning for database, queue, redis, and results_cache now conditions emission of direct CLP_HOST/CLP_PORT vs CLP_EXTRA_HOST_* entries based on whether each service is bundled.
Helm configmap templates
tools/deployment/package-helm/templates/configmap.yaml
Replaced hard-coded host/port template includes with dedicated helpers (clp.databaseHost/Port, clp.queueHost/Port, clp.redisHost/Port, clp.resultsCacheHost/Port) and updated webui/server settings to use these helpers.
Package config templates
components/package-template/src/etc/clp-config.template.json.yaml, components/package-template/src/etc/clp-config.template.text.yaml
Changed bundled services presentation to an explicit commented multi-line block (documentation-style) listing database, queue, redis, and results_cache rather than an active single-line default.
Documentation / Quick start
docs/src/user-docs/guides-external-database.md, docs/src/user-docs/quick-start/clp-json.md, docs/src/user-docs/quick-start/clp-text.md
Rewrote external-database guide into a multi-step flow (bundled list, external connection details, credentials, host-network guidance, multi-host notes) and added references to external-database guidance in quick-start tips.

Sequence Diagram(s)

sequenceDiagram
    participant Operator
    participant Controller
    participant CLPConfig
    participant Database
    participant Queue
    participant Redis
    participant ResultsCache
    participant DockerCompose

    Operator->>Controller: request containerized deployment
    Controller->>CLPConfig: load clp-config (bundled list)
    CLPConfig->>Controller: bundled membership for services
    Controller->>Database: transform_for_container(is_bundled?)
    Controller->>Queue: transform_for_container(is_bundled?)
    Controller->>Redis: transform_for_container(is_bundled?)
    Controller->>ResultsCache: transform_for_container(is_bundled?)
    Database-->>Controller: return host/port/env only if bundled
    Queue-->>Controller: return host/port/env only if bundled
    Redis-->>Controller: return host/port/env only if bundled
    ResultsCache-->>Controller: return host/port/env only if bundled
    Controller->>DockerCompose: emit env vars / extra_hosts accordingly
    DockerCompose->>Operator: start containers with configured endpoints
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding support for external third-party services configuration in deployment, which aligns with the substantial changes across configuration, controller, Docker Compose, and Helm files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao marked this pull request as ready for review November 27, 2025 17:16
@junhaoliao junhaoliao requested a review from a team as a code owner November 27, 2025 17:16
@junhaoliao junhaoliao requested a review from hoophalab November 27, 2025 17:16
@junhaoliao junhaoliao changed the title feat(deployment): Support configuration of external 3rd-party services (resolves #1680). feat(deployment): Support configuration of external third-party services (resolves #1680). Nov 27, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/src/user-docs/guides-external-database.md (1)

176-215: Clarify how managed-service “connection strings” map onto the new host/port config

The new bundling and external database/results_cache host/port examples look good and line up with the code. One small docs mismatch: the earlier “AWS DocumentDB or MongoDB Atlas” section tells users to “use the provided connection string when configuring CLP (see below)”, but the new config steps only show host/port fields.

To avoid confusion for users coming from a full MongoDB/Atlas URI, consider adding a short clarification here, for example:

  • Explicitly state that CLP currently expects separate host/port (plus optional db_name), not a raw mongodb+srv://... URI; and/or
  • Briefly describe how to extract the host and port from a typical managed-service connection string.

That would align the step-by-step config with the earlier managed-service guidance.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82f73d8 and f6be19af0140a79bcd170147451f64a674bacfb5.

📒 Files selected for processing (6)
  • components/clp-py-utils/clp_py_utils/clp_config.py (1 hunks)
  • docs/src/user-docs/guides-external-database.md (1 hunks)
  • docs/src/user-docs/guides-multi-host.md (2 hunks)
  • docs/src/user-docs/quick-start/clp-json.md (1 hunks)
  • docs/src/user-docs/quick-start/clp-text.md (1 hunks)
  • tools/deployment/package/docker-compose-all.yaml (5 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1575
File: components/clp-py-utils/clp_py_utils/clp_config.py:602-607
Timestamp: 2025-11-10T05:19:56.600Z
Learning: In the y-scope/clp repository, the `ApiServer` class in `components/clp-py-utils/clp_py_utils/clp_config.py` does not need a `transform_for_container()` method because no other containerized service depends on the API server - it's only accessed from the host, so no docker-network communication is expected.
📚 Learning: 2025-11-10T05:19:56.600Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1575
File: components/clp-py-utils/clp_py_utils/clp_config.py:602-607
Timestamp: 2025-11-10T05:19:56.600Z
Learning: In the y-scope/clp repository, the `ApiServer` class in `components/clp-py-utils/clp_py_utils/clp_config.py` does not need a `transform_for_container()` method because no other containerized service depends on the API server - it's only accessed from the host, so no docker-network communication is expected.

Applied to files:

  • components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-06-18T20:39:05.899Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
  • docs/src/user-docs/quick-start/clp-text.md
📚 Learning: 2025-07-05T03:38:16.779Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-06-18T20:48:48.990Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
  • tools/deployment/package/docker-compose-all.yaml
📚 Learning: 2025-07-01T14:52:15.217Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like `apk update && apk add`, `apt update && apt install`, `yum update && yum install`) rather than platform-specific optimizations, to ensure uniform script structure and readability.

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • tools/deployment/package/docker-compose-all.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
components/clp-py-utils/clp_py_utils/clp_config.py (1)

870-877: Bundled checks correctly avoid clobbering external service endpoints

Conditioning the transform_for_container() calls on BundledService membership cleanly prevents overwriting host/port when services are external while preserving existing behaviour when they’re bundled. This aligns with the new docs and bundling model.

docs/src/user-docs/quick-start/clp-json.md (1)

21-24: Tip text and external-database cross-link read clearly

The expanded tip nicely combines --setup-only guidance with a clear pointer to the external-database guide, and the relative link path looks correct for this location.

docs/src/user-docs/quick-start/clp-text.md (1)

23-26: Consistent external-database guidance across quick-starts

The updated tip matches the clp-json quick-start and clearly directs users who want external databases/third-party services to the dedicated guide. Consistent and easy to follow.

docs/src/user-docs/guides-multi-host.md (1)

40-41: Bundled vs external service guidance is aligned and clear

The multi-host steps now cleanly explain configuring the bundled list and explicitly tie external-database usage to skipping database/results-cache services while still running the init jobs. This matches the config model and should make multi-host + external DB setups much easier to reason about.

Also applies to: 55-65, 123-127

Comment thread tools/deployment/package/docker-compose-all.yaml Outdated
Comment thread .yamllint.yaml Outdated
@@ -37,7 +37,7 @@ rules:

line-length:
max: 100
allow-non-breakable-words: false
allow-non-breakable-words: true

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has to be set as true; otherwise, the clp-config.yaml with the long URL inside would fail the rule.

@coderabbitai create an issue to track this lint rule change, and i will submit the lint rule change in a separate PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to track the yamllint rule change
gh issue create \
  --repo y-scope/clp \
  --title "Update yamllint configuration to allow non-breakable words in long lines" \
  --body "## Context

This issue tracks the yamllint configuration change to set \`allow-non-breakable-words: true\` in the \`line-length\` rule. This change is necessary because \`clp-config.yaml\` contains long URLs that would fail the yamllint check with the current setting.

## Background

While working on PR #1681 (support for external third-party services configuration), it was identified that the yamllint \`allow-non-breakable-words\` setting needs to be changed from \`false\` to \`true\` to accommodate long URLs in configuration files.

## Related

- PR: https://github.com/y-scope/clp/pull/1681
- Comment: https://github.com/y-scope/clp/pull/1681#discussion_r2569846542
- Requested by: @junhaoliao

## Proposed Change

In \`.yamllint.yaml\`:
\`\`\`yaml
line-length:
  max: 100
  allow-non-breakable-words: true  # Changed from false
\`\`\`" \
  --assignee junhaoliao

Length of output: 42


@junhaoliao, I've created issue #1683 to track this yamllint configuration change. The issue includes the context about why this change is needed (to accommodate long URLs in clp-config.yaml) and references this PR and comment as backlinks. The issue has been assigned to you for the follow-up PR.


🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1466
File: .github/workflows/clp-rust-checks.yaml:14-15
Timestamp: 2025-10-22T21:14:12.225Z
Learning: Repository y-scope/clp: In GitHub Actions workflows (e.g., .github/workflows/clp-rust-checks.yaml), YAML anchors/aliases are acceptable and preferred to avoid duplication; if actionlint flags an alias node (e.g., on push.paths) as an error, treat it as a tool limitation and do not require inlining unless the team asks to silence the warning.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/src/user-docs/guides-external-database.md (1)

218-223: Multi-host deployment note is helpful but could be more explicit.

The note clarifies that db-table-creator and results-cache-indices-creator jobs are still needed, but it would be helpful to explicitly state that these initialization jobs should be run before starting other services. Consider adding a reference or cross-link to the initialization sequence in the multi-host guide.

Suggested refinement:

 When using external databases in a multi-host deployment, you do **not** need to start the
 `database` and `results-cache` Docker Compose services. Skip those services when following the
-[multi-host deployment guide][multi-host-guide]. However, you still need to run the database
-initialization jobs (`db-table-creator` and `results-cache-indices-creator`).
+[multi-host deployment guide][multi-host-guide]. However, you still need to run the database
+initialization jobs (`db-table-creator` and `results-cache-indices-creator`) **on one host before**
+starting the other CLP services.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6d50515a0b2f9782fdd6a0c230e3a09db81583c and cc2aa722850e767e7c374445772df84e6f4f48d7.

📒 Files selected for processing (3)
  • .yamllint.yaml (1 hunks)
  • components/package-template/src/etc/clp-config.yaml (2 hunks)
  • docs/src/user-docs/guides-external-database.md (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-08T17:20:02.560Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1078
File: components/webui/server/src/plugins/MongoSocketIoServer/index.ts:105-108
Timestamp: 2025-07-08T17:20:02.560Z
Learning: The y-scope/clp codebase has a 100-character line length limit that takes precedence over other style preferences like template literals when they conflict.

Applied to files:

  • .yamllint.yaml
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/package-template/src/etc/clp-config.yaml
📚 Learning: 2025-10-22T21:14:12.225Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1466
File: .github/workflows/clp-rust-checks.yaml:14-15
Timestamp: 2025-10-22T21:14:12.225Z
Learning: Repository y-scope/clp: In GitHub Actions workflows (e.g., .github/workflows/clp-rust-checks.yaml), YAML anchors/aliases are acceptable and preferred to avoid duplication; if actionlint flags an alias node (e.g., on push.paths) as an error, treat it as a tool limitation and do not require inlining unless the team asks to silence the warning.

Applied to files:

  • components/package-template/src/etc/clp-config.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
.yamllint.yaml (1)

40-40: Verify that non-breakable-words relaxation is necessary.

The change allows non-breakable words to exceed the 100-character line-length limit. Confirm that the new environment-variable-based URIs and multi-service configurations in the accompanying changes (ClpConfig transformations, docker-compose-all.yaml) actually introduce non-breakable tokens that exceed this threshold and justify relaxing the constraint.

docs/src/user-docs/guides-external-database.md (2)

178-189: YAML syntax and bundled services example look good.

The structured step-by-step configuration flow is clear and provides helpful YAML examples. The commented-out services demonstrate which services can be removed from the bundled list to use external instances.


210-216: I'm unable to access the repository to verify the credentials.yaml structure against the codebase configuration expectations due to cloning failures. However, I can provide a rewritten review comment that reflects this limitation:


Unable to verify credentials.yaml structure without codebase access.

The documentation shows credentials nested under database: with username and password fields. This structure should be verified against the actual configuration parser in the codebase to confirm it matches expectations for external database credential handling. Due to technical limitations preventing repository access, this verification could not be completed.

components/package-template/src/etc/clp-config.yaml (1)

19-20: Documentation URL reference is correct and accessible.

The URL https://docs.yscope.com/clp/main/user-docs/guides-external-database.html has been verified to exist and contains the "External database setup" guide with sections covering MariaDB/MySQL and MongoDB configuration for CLP. The anchor fragment #configuring-clp-to-use-external-databases follows standard markdown heading conventions and appropriately references a configuration section on that page.

Comment on lines +202 to +208
:::{important}
If your external service is running on the host machine, **do not use `127.0.0.1`** as the host.
The CLP components run in containers, so `127.0.0.1` would refer to the container itself.
Instead, use:
- `host.docker.internal` to refer to the host machine.
- The host's actual network IP address (e.g., `192.168.1.10`)
:::

@coderabbitai coderabbitai Bot Nov 27, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Host networking guidance needs platform-specific clarification.

The note about 127.0.0.1 and host.docker.internal is helpful, but host.docker.internal is primarily available on Docker Desktop (Mac/Windows). For Linux Docker hosts without Docker Desktop, users need different guidance—typically using the actual host IP or DNS hostname. Consider adding a platform-specific note.

Suggested addition (after line 207):

   - `host.docker.internal` to refer to the host machine.
   - The host's actual network IP address (e.g., `192.168.1.10`)
+
+  ::::{note}
+  **For Linux Docker hosts**: `host.docker.internal` is not available. Use the actual host IP address or hostname instead.
+  ::::

Committable suggestion skipped: line range outside the PR's diff.

@hoophalab hoophalab Nov 27, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether this coderabbit comment is valid or not

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🌐 Web query:

host.docker.internal support Linux Docker 2024 2025

💡 Result:

Short answer: Yes — but with caveats.

  • Docker Desktop (macOS, Windows — and Docker Desktop for Linux) provides host.docker.internal by default so containers can resolve the host localhost. [1][2][6]
  • On plain Docker Engine on Linux you must create the mapping yourself (common options):
    • docker run --add-host=host.docker.internal:host-gateway (or in Compose: extra_hosts: - "host.docker.internal:host-gateway"). The dockerd daemon exposes the special host-gateway value and can be configured with host-gateway-ip. [1][3][5]
    • Or map the host gateway/docker0 IP directly (e.g. 172.17.0.1) in /etc/hosts or via extra_hosts. [4][2]

References

  • Docker docs — dockerd / host-gateway and --add-host examples. [1]
  • Docker docs — host.docker.internal behavior / Networking how‑tos. [2]
  • Docker Engine 20.10 release notes (adds Linux support hooks). [3]
  • Community notes / common workarounds (docker0 / extra_hosts). [4]

If you want, I can show the exact docker run or docker‑compose lines to use for your setup.


@hoophalab The comment is valid. On Docker Desktop (Mac/Windows/Docker Desktop for Linux), host.docker.internal is available by default, but on plain Docker Engine on Linux, you must create the mapping yourself.

For Linux Docker Engine users, they need to use docker run --add-host=host.docker.internal:host-gateway or configure it in Docker Compose with extra_hosts: - "host.docker.internal:host-gateway".

Suggested documentation improvement:

Update the important block in the documentation (around line 203-207) to include platform-specific guidance:

   :::{important}
   If your external service is running on the host machine, **do not use `127.0.0.1`** as the host.
   The CLP components run in containers, so `127.0.0.1` would refer to the container itself.
   Instead, use:
   - `host.docker.internal` to refer to the host machine (available by default on Docker Desktop)
   - The host's actual network IP address (e.g., `192.168.1.10`)
   
   **For Linux Docker Engine users** (not Docker Desktop): `host.docker.internal` requires manual 
   configuration. Add the following to your `docker-compose-all.yaml` services that need to 
   connect to the host:
   ```yaml
   extra_hosts:
     - "host.docker.internal:host-gateway"

Alternatively, use the host's actual IP address instead.
:::


This will help Linux users avoid confusion and provide them with a clear path forward.

---

<details>
<summary>🧠 Learnings used</summary>

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).


</details>



<!-- This is an auto-generated reply by CodeRabbit -->

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this requires more thoughts / investigations. Let's put this review on hold

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@hoophalab hoophalab left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. some comments

@@ -1,4 +1,5 @@
# yaml-language-server: $schema=../usr/share/config-schemas/clp-config.schema.json
#

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#

@@ -15,6 +16,8 @@
## File containing credentials for services
#credentials_file_path: "etc/credentials.yaml"
#
## Remove any bundled services below if you wish to use your own. For more information, see
## https://docs.yscope.com/clp/main/user-docs/guides-external-database.html#configuring-clp-to-use-external-databases
#bundled: ["database", "queue", "redis", "results_cache"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to keep consistent with the doc

Suggested change
#bundled: ["database", "queue", "redis", "results_cache"]
#bundled:
# - "database"
# - "queue"
# - "redis"
# - "results_cache"

Comment thread docs/src/user-docs/guides-external-database.md
Comment on lines +202 to +208
:::{important}
If your external service is running on the host machine, **do not use `127.0.0.1`** as the host.
The CLP components run in containers, so `127.0.0.1` would refer to the container itself.
Instead, use:
- `host.docker.internal` to refer to the host machine.
- The host's actual network IP address (e.g., `192.168.1.10`)
:::

@hoophalab hoophalab Nov 27, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether this coderabbit comment is valid or not

@junhaoliao junhaoliao marked this pull request as draft November 28, 2025 23:27
@junhaoliao junhaoliao modified the milestones: Backlog, February 2026 Jan 19, 2026
@junhaoliao junhaoliao marked this pull request as ready for review February 17, 2026 20:08

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/clp-package-utils/clp_package_utils/controller.py`:
- Around line 193-200: The code eagerly resolves the DB hostname via
_get_ip_from_hostname when building env vars (BundledService.DATABASE branch),
which can produce stale extra_hosts for dynamic DNS; update the controller to
log or document this limitation and avoid surprising operators by: add a brief
warning message in the same setup routine (e.g., set_up_env or the surrounding
method) noting that hostname is resolved at setup time and may become stale for
cloud-managed/floating IP DB endpoints, include the raw hostname in the log (not
just the IP) and recommend using stable static IPs or service discovery, and
add/update the controller class docstring or README to mention this behavior so
operators are aware.

In `@components/clp-py-utils/clp_py_utils/clp_config.py`:
- Around line 404-407: Change the boolean positional parameter is_bundled on
each transform_for_container method to be keyword-only to satisfy FBT001: update
the signatures (e.g., transform_for_container(self, *, is_bundled: bool)) for
the four implementations found in clp_config.py and keep the type hint and
behavior unchanged; update any internal docstring or comments if present and
ensure call sites use is_bundled=... (e.g., is_bundled=BundledService.DATABASE
in self.bundled) so calls remain explicit and linting passes.

Comment on lines +193 to +200
if BundledService.DATABASE not in self._clp_config.bundled:
env_vars |= {
"CLP_DB_PORT": str(self._clp_config.database.port),
"CLP_EXTRA_HOST_DATABASE_NAME": DB_COMPONENT_NAME,
"CLP_EXTRA_HOST_DATABASE_ADDR": _get_ip_from_hostname(
self._clp_config.database.host
),
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

DNS resolution at setup time could become stale for external services.

_get_ip_from_hostname resolves the hostname to an IP eagerly during set_up_env. If the external database's IP changes between setup and when containers actually connect (e.g., cloud-managed DB with floating IPs), the extra_hosts mapping will be stale.

This is an inherent limitation of Docker's extra_hosts mechanism and likely acceptable for the target deployment model, but worth documenting for operators using dynamic DNS endpoints.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/clp-package-utils/clp_package_utils/controller.py` around lines
193 - 200, The code eagerly resolves the DB hostname via _get_ip_from_hostname
when building env vars (BundledService.DATABASE branch), which can produce stale
extra_hosts for dynamic DNS; update the controller to log or document this
limitation and avoid surprising operators by: add a brief warning message in the
same setup routine (e.g., set_up_env or the surrounding method) noting that
hostname is resolved at setup time and may become stale for
cloud-managed/floating IP DB endpoints, include the raw hostname in the log (not
just the IP) and recommend using stable static IPs or service discovery, and
add/update the controller class docstring or README to mention this behavior so
operators are aware.

Comment on lines +404 to +407
def transform_for_container(self, is_bundled: bool):
if is_bundled:
self.host = DB_COMPONENT_NAME
self.port = self.DEFAULT_PORT

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider making is_bundled a keyword-only argument to address FBT001.

Ruff flags FBT001 (boolean positional argument) on all four transform_for_container signatures (Lines 404, 482, 513, 550). Making it keyword-only would improve call-site readability and satisfy the lint rule.

♻️ Example fix (apply similarly to Redis, ResultsCache, Queue)
-    def transform_for_container(self, is_bundled: bool):
+    def transform_for_container(self, *, is_bundled: bool):

Callers would then use:

self.database.transform_for_container(is_bundled=BundledService.DATABASE in self.bundled)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def transform_for_container(self, is_bundled: bool):
if is_bundled:
self.host = DB_COMPONENT_NAME
self.port = self.DEFAULT_PORT
def transform_for_container(self, *, is_bundled: bool):
if is_bundled:
self.host = DB_COMPONENT_NAME
self.port = self.DEFAULT_PORT
🧰 Tools
🪛 Ruff (0.15.1)

[error] 404-404: Boolean-typed positional argument in function definition

(FBT001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/clp-py-utils/clp_py_utils/clp_config.py` around lines 404 - 407,
Change the boolean positional parameter is_bundled on each
transform_for_container method to be keyword-only to satisfy FBT001: update the
signatures (e.g., transform_for_container(self, *, is_bundled: bool)) for the
four implementations found in clp_config.py and keep the type hint and behavior
unchanged; update any internal docstring or comments if present and ensure call
sites use is_bundled=... (e.g., is_bundled=BundledService.DATABASE in
self.bundled) so calls remain explicit and linting passes.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/deployment/package/docker-compose-all.yaml`:
- Around line 235-236: The URI uses CLP_EXTRA_HOST_RESULTS_CACHE_ADDR
(defaulting to results_cache) directly in the MongoDB connection string instead
of the usual hardcoded hostname + extra_hosts pattern; add a short inline
comment above the "--uri" line explaining that this deliberate divergence is
required for MongoDB replica set member hostname resolution (so replica members
resolve their actual hostnames) and should not be changed to the other services'
hostname/extra_hosts pattern; reference the exact symbol
CLP_EXTRA_HOST_RESULTS_CACHE_ADDR and the URI fragment
"mongodb://${CLP_EXTRA_HOST_RESULTS_CACHE_ADDR:-results_cache}:.../${CLP_RESULTS_CACHE_DB_NAME:-clp-query-results}"
in the comment so future maintainers understand the intent.

Comment on lines +235 to +236
"--uri", "mongodb://${CLP_EXTRA_HOST_RESULTS_CACHE_ADDR:-results_cache}\
:${CLP_RESULTS_CACHE_PORT:-27017}/${CLP_RESULTS_CACHE_DB_NAME:-clp-query-results}",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Intentional divergence from other services' host resolution pattern.

This service uses CLP_EXTRA_HOST_RESULTS_CACHE_ADDR directly in the URI (defaulting to results_cache) instead of using the hardcoded hostname + extra_hosts redirection pattern used by all other services. The PR description explains this is for MongoDB replica set member hostname resolution.

Consider adding a brief inline comment above this line explaining why the address is used directly here rather than the hostname-based approach, so future maintainers don't "fix" it to match the others.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/deployment/package/docker-compose-all.yaml` around lines 235 - 236, The
URI uses CLP_EXTRA_HOST_RESULTS_CACHE_ADDR (defaulting to results_cache)
directly in the MongoDB connection string instead of the usual hardcoded
hostname + extra_hosts pattern; add a short inline comment above the "--uri"
line explaining that this deliberate divergence is required for MongoDB replica
set member hostname resolution (so replica members resolve their actual
hostnames) and should not be changed to the other services' hostname/extra_hosts
pattern; reference the exact symbol CLP_EXTRA_HOST_RESULTS_CACHE_ADDR and the
URI fragment
"mongodb://${CLP_EXTRA_HOST_RESULTS_CACHE_ADDR:-results_cache}:.../${CLP_RESULTS_CACHE_DB_NAME:-clp-query-results}"
in the comment so future maintainers understand the intent.

Comment thread docs/src/user-docs/guides-external-database.md
@junhaoliao junhaoliao requested a review from hoophalab February 20, 2026 07:16
hoophalab
hoophalab previously approved these changes Feb 25, 2026

@hoophalab hoophalab left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One nitpick and a high level question.

Validation: Successfully compressed and searched logs in WebUI with external databases in both helm and docker compose.

Comment thread components/clp-py-utils/clp_py_utils/clp_config.py
Comment thread components/clp-package-utils/clp_package_utils/controller.py Outdated
@junhaoliao junhaoliao merged commit d754cbf into y-scope:main Feb 25, 2026
23 checks passed
junhaoliao added a commit to junhaoliao/clp that referenced this pull request Feb 26, 2026
The {{- if has "..." .Values.clpConfig.bundled }} guards from PR y-scope#1681
were lost when resolving merge conflicts with main. Restore them for:
- database-statefulset.yaml
- db-table-creator-job.yaml (initContainers)
- queue-statefulset.yaml
- redis-statefulset.yaml
- results-cache-statefulset.yaml
- results-cache-indices-creator-job.yaml (initContainers)

Without these guards, Helm renders bundled third-party resources even
when configured to use external services.
junhaoliao added a commit to junhaoliao/clp that referenced this pull request Feb 26, 2026
The merge incorrectly hardcoded the MongoDB URI using clp.fullname
instead of the clp.resultsCacheHost/clp.resultsCachePort helpers
introduced in PR y-scope#1681 for external service support.
@junhaoliao junhaoliao deleted the config-ext-3rd-party branch May 7, 2026 19:46
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…ces (resolves y-scope#1680). (y-scope#1681)

Co-authored-by: hoophalab <200652805+hoophalab@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants