Skip to content

fix(clp-package): Restore CLP_*_PORT env vars for bundled services (fixes #2065); Separate published ports from connection ports.#2066

Merged
junhaoliao merged 1 commit into
y-scope:mainfrom
junhaoliao:bundled-ports
Mar 5, 2026

Conversation

@junhaoliao

@junhaoliao junhaoliao commented Mar 4, 2026

Copy link
Copy Markdown
Member

Description

#1681 stopped emitting CLP_DB_PORT, CLP_QUEUE_PORT, CLP_REDIS_PORT, and
CLP_RESULTS_CACHE_PORT for bundled services, causing Docker Compose to fall back to
hardcoded defaults for the published port in each service's ports section. This makes
it impossible to customize the host-side published port for bundled third-party services
(e.g., to expose the database on port 13306 instead of 3306, or to avoid port conflicts
on the host). This is the port-side analogue of #2055 (which addressed the same
regression for CLP_*_HOST variables, fixed in #2056).

A naive fix of simply restoring CLP_*_PORT in the bundled branch would break
inter-container connection strings (BROKER_URL, RESULT_BACKEND, JDBC URLs,
initialize-results-cache URI) that reference the same variables: bundled containers
always listen on their default internal ports (3306, 5672, 6379, 27017) regardless of
the user-configured published port.

docker-compose-all.yaml

  • Published ports (published: fields): Unchanged -- still reference CLP_*_PORT.
  • Connection strings: Changed from CLP_*_PORT to CLP_*_CONNECT_PORT in all
    BROKER_URL, RESULT_BACKEND, JDBC, and MongoDB URI entries (11 occurrences across
    spider-scheduler, compression-scheduler, compression-worker, spider-compression-worker,
    query-scheduler, query-worker, and results-cache-indices-creator).

This separation ensures:

  • Bundled: CLP_*_PORT is set (custom published port), CLP_*_CONNECT_PORT is
    unset (defaults to standard container port in compose).
  • External: CLP_*_PORT is unset (irrelevant, service disabled), CLP_*_CONNECT_PORT
    is set (external service port used in connection strings).

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Tested on the actual built package (task package to build, then sbin/start-clp.sh).

Scenario 1: Bundled mode (default config)

All services bundled (default clp-config.yaml), no external services running.

Task: Verify that the default bundled deployment starts, all services become healthy,
and compression works.

Command:

$ cd build/clp-package
$ ./sbin/start-clp.sh

Output:

2026-03-04T17:49:15.888 INFO [controller] Setting up environment for bundling database...
2026-03-04T17:49:15.889 INFO [controller] Setting up environment for bundling queue...
2026-03-04T17:49:15.890 INFO [controller] Setting up environment for bundling redis...
2026-03-04T17:49:15.890 INFO [controller] Setting up environment for bundling results_cache...
2026-03-04T17:49:15.891 INFO [controller] Setting up environment for database...
2026-03-04T17:49:15.892 INFO [controller] Setting up environment for queue...
2026-03-04T17:49:15.892 INFO [controller] Setting up environment for redis...
2026-03-04T17:49:15.892 INFO [controller] spider_scheduler is not configured, skipping environment setup...
2026-03-04T17:49:15.892 INFO [controller] Setting up environment for results_cache...
2026-03-04T17:49:15.892 INFO [controller] Setting up environment for compression_scheduler...
2026-03-04T17:49:15.893 INFO [controller] Setting up environment for query_scheduler...
2026-03-04T17:49:15.893 INFO [controller] Setting up environment for compression_worker...
2026-03-04T17:49:15.893 INFO [controller] Setting up environment for query_worker...
2026-03-04T17:49:15.893 INFO [controller] Setting up environment for reducer...
2026-03-04T17:49:15.893 INFO [controller] Setting up environment for api_server...
2026-03-04T17:49:15.894 INFO [controller] log_ingestor is only applicable for S3 logs input type, skipping environment setup...
2026-03-04T17:49:15.894 INFO [controller] Setting up environment for webui...
2026-03-04T17:49:15.895 INFO [controller] The MCP Server is not configured, skipping mcp_server creation...
2026-03-04T17:49:15.895 INFO [controller] Setting up environment for garbage_collector...
2026-03-04T17:49:15.959 INFO [controller] Starting CLP using Docker Compose (full deployment)...
...
2026-03-04T17:49:29.034 INFO [controller] Started CLP.

Verification -- .env now contains bundled host and port vars:

CLP_DB_HOST=127.0.0.1
CLP_DB_PORT=3306
CLP_QUEUE_HOST=127.0.0.1
CLP_QUEUE_PORT=5672
CLP_REDIS_HOST=127.0.0.1
CLP_REDIS_PORT=6379
CLP_RESULTS_CACHE_HOST=127.0.0.1
CLP_RESULTS_CACHE_PORT=27017

Explanation: All containers started and became healthy. The .env now contains both
CLP_*_HOST and CLP_*_PORT entries for bundled services. Before this fix, CLP_*_PORT
was absent and Docker fell back to the hardcoded defaults in the compose file --
functionally the same in the default case, but now explicitly set so that users can
override them.

Task: Verify end-to-end data flow (compression) in bundled mode.

Command:

$ ./sbin/compress.sh --timestamp-key timestamp ~/samples/postgresql-simple.jsonl

Output:

2026-03-04T17:49:54.379 INFO [compress] Compression job 2 submitted.
2026-03-04T17:49:54.881 INFO [compress] Compressed 3.94KB into 1.57KB (2.50x). Speed: 9.08KB/s.
2026-03-04T17:49:55.382 INFO [compress] Compression finished.
2026-03-04T17:49:55.383 INFO [compress] Compressed 3.94KB into 1.57KB (2.50x). Speed: 6.75KB/s.

Scenario 2: Bundled mode with custom ports

All services bundled, but with non-default published ports and host: "0.0.0.0".

Task: Verify that custom ports are respected in the Docker port bindings while
inter-container connections still use default internal ports.

Config (etc/clp-config-custom-port.yaml):

database:
  host: "0.0.0.0"
  port: 13306

queue:
  host: "0.0.0.0"
  port: 15672

redis:
  host: "0.0.0.0"
  port: 16379

results_cache:
  host: "0.0.0.0"
  port: 17017

Command:

$ ./sbin/start-clp.sh --config etc/clp-config-custom-port.yaml

Output:

2026-03-04T17:50:45.775 INFO [controller] Setting up environment for bundling database...
...
2026-03-04T17:50:59.698 INFO [controller] Started CLP.

Verification -- .env contains custom port vars:

CLP_DB_HOST=0.0.0.0
CLP_DB_PORT=13306
CLP_QUEUE_HOST=0.0.0.0
CLP_QUEUE_PORT=15672
CLP_REDIS_HOST=0.0.0.0
CLP_REDIS_PORT=16379
CLP_RESULTS_CACHE_HOST=0.0.0.0
CLP_RESULTS_CACHE_PORT=17017

Verification -- Docker port bindings use custom ports:

$ docker ps --format "table {{.Names}}\t{{.Ports}}" | grep -E "database|queue|redis|results"
clp-package-7782-redis-1              0.0.0.0:16379->6379/tcp
clp-package-7782-database-1           0.0.0.0:13306->3306/tcp
clp-package-7782-queue-1              ..., 0.0.0.0:15672->5672/tcp
clp-package-7782-results-cache-1      0.0.0.0:17017->27017/tcp

Explanation: All four bundled services now publish on the user-configured ports
(13306, 15672, 16379, 17017) instead of the defaults. The host_ip is 0.0.0.0
as configured. No CLP_*_CONNECT_PORT vars are set, so inter-container connection
strings (BROKER_URL, RESULT_BACKEND, JDBC, MongoDB URI) correctly fall back to the
default container-internal ports (3306, 5672, 6379, 27017).

Task: Verify end-to-end data flow (compression) with custom ports.

Command:

$ ./sbin/compress.sh --config etc/clp-config-custom-port.yaml \
    --timestamp-key timestamp ~/samples/postgresql-simple.jsonl

Output:

2026-03-04T17:55:45.252 INFO [compress] Compression job 1 submitted.
2026-03-04T17:55:45.755 INFO [compress] Compressed 3.94KB into 1.57KB (2.50x). Speed: 8.65KB/s.
2026-03-04T17:55:46.257 INFO [compress] Compression finished.
2026-03-04T17:55:46.257 INFO [compress] Compressed 3.94KB into 1.57KB (2.50x). Speed: 6.77KB/s.

Scenario 3: All four services external (non-default ports)

All four services (database, queue, redis, results_cache) unbundled and running
externally on the Docker host with non-default ports.

Task: Start 4 external services, configure bundled: [] with non-default ports, and
verify all CLP containers connect to the external services using CLP_*_CONNECT_PORT.

Setup -- start four external services on the Docker host:

$ docker run -d --name ext-mariadb \
    -p 13306:3306 \
    -e MYSQL_ROOT_PASSWORD=qCirQ7pszck \
    -e MYSQL_DATABASE=clp-db \
    -e MYSQL_USER=clp-user \
    -e MYSQL_PASSWORD=rCaJLsBsg2g \
    mariadb:10-jammy

$ docker run -d --name ext-rabbitmq \
    -p 15672:5672 \
    -e RABBITMQ_DEFAULT_USER=clp-user \
    -e RABBITMQ_DEFAULT_PASS=fg0oaBD6jTA \
    rabbitmq:3.9.8

$ docker run -d --name ext-redis \
    -p 16379:6379 \
    redis:7.2.4 \
    redis-server --requirepass 'Cz1tkSufuwgntT2BbNrqIg'

$ docker run -d --network=host --name ext-mongodb \
    mongo:7.0.1 \
    mongod --replSet rs0 --bind_ip_all --port 17017

(Credentials match etc/credentials.yaml. Images match those used by CLP.)

Config (etc/clp-config-ext-all.yaml):

bundled: []

database:
  host: "localhost"
  port: 13306

queue:
  host: "localhost"
  port: 15672

redis:
  host: "localhost"
  port: 16379

results_cache:
  host: "192.168.3.89"
  port: 17017

Command:

$ ./sbin/start-clp.sh --config etc/clp-config-ext-all.yaml

Output:

2026-03-04T17:57:30.212 INFO [controller] database is not included in the 'bundled' configuration, skipping service bundling...
2026-03-04T17:57:30.213 INFO [controller] queue is not configured or part of the 'bundled' configuration, skipping service bundling...
2026-03-04T17:57:30.213 INFO [controller] redis is not configured or part of the 'bundled' configuration, skipping service bundling...
2026-03-04T17:57:30.213 INFO [controller] results_cache is not included in the 'bundled' configuration, skipping service bundling...
...
2026-03-04T17:57:36.661 INFO [controller] Started CLP.

Verification -- .env external connection port entries:

CLP_QUEUE_ENABLED=0
CLP_REDIS_ENABLED=0
CLP_RESULTS_CACHE_ENABLED=0
CLP_DB_CONNECT_PORT=13306
CLP_QUEUE_CONNECT_PORT=15672
CLP_REDIS_CONNECT_PORT=16379
CLP_RESULTS_CACHE_CONNECT_PORT=17017

No CLP_*_PORT or CLP_*_HOST entries (only relevant for bundled). No bundled
database/queue/redis/results-cache containers running.

Verification -- container status (no bundled 3rd-party services):

$ docker ps --filter "name=clp-package-3bef" --format "table {{.Names}}\t{{.Status}}"
clp-package-3bef-reducer-1                 Up 8 seconds
clp-package-3bef-compression-scheduler-1   Up 11 seconds
clp-package-3bef-api-server-1              Up 11 seconds (healthy)
clp-package-3bef-webui-1                   Up 11 seconds (healthy)
clp-package-3bef-garbage-collector-1       Up 11 seconds
clp-package-3bef-query-scheduler-1         Up 11 seconds (healthy)
clp-package-3bef-compression-worker-1      Up 13 seconds
clp-package-3bef-query-worker-1            Up 13 seconds

Verification -- service logs (external connections on non-default ports):

  • compression-worker connected to external queue and redis on custom ports:

    .> transport:   amqp://clp-user:**@queue:15672//
    .> results:     redis://default:**@redis:16379/1
    
  • query-scheduler connected to external database on custom port:

    2026-03-04 17:57:34,364 search-job-handler [INFO] Connected to archive database database:13306.
    2026-03-04 17:57:34,365 search-job-handler [INFO] query_scheduler started.
    

Task: Verify end-to-end data flow (compression) with all external services.

Command:

$ ./sbin/compress.sh --config etc/clp-config-ext-all.yaml \
    --timestamp-key timestamp ~/samples/postgresql-simple.jsonl

Output:

2026-03-04T17:58:06.018 INFO [compress] Compression job 1 submitted.
2026-03-04T17:58:06.520 INFO [compress] Compressed 3.94KB into 1.57KB (2.50x). Speed: 9.97KB/s.
2026-03-04T17:58:07.021 INFO [compress] Compression finished.
2026-03-04T17:58:07.022 INFO [compress] Compressed 3.94KB into 1.57KB (2.50x). Speed: 6.79KB/s.

Explanation: All CLP services connected to external services on non-default ports
using CLP_*_CONNECT_PORT. Compression job was submitted through external RabbitMQ
(port 15672), stored metadata in external MariaDB (port 13306), used external Redis
(port 16379) for task coordination, and the results-cache-indices-creator connected to
external MongoDB (port 17017). Full data pipeline works with all four services external
on non-default ports.

Summary by CodeRabbit

  • Chores
    • Standardized connectivity environment variable naming conventions for database, queue, Redis, and cache services across deployment configurations to improve consistency.
    • Enhanced host and port binding in bundled service deployment scenarios for more reliable connectivity management.
    • Updated deployment configuration files to reference standardized variable naming throughout.

…and separate published ports from connection ports (fixes y-scope#2065).
@junhaoliao junhaoliao requested a review from a team as a code owner March 4, 2026 18:04
@junhaoliao junhaoliao requested a review from hoophalab March 4, 2026 18:04
@coderabbitai

coderabbitai Bot commented Mar 4, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

Environment variable handling for bundled and non-bundled services is updated to separate inter-container connection ports from published host ports. Non-bundled services now use CONNECT_PORT variable names, while bundled services set both host and port values to enable customization of published ports in Docker Compose.

Changes

Cohort / File(s) Summary
Controller Environment Setup
components/clp-package-utils/clp_package_utils/controller.py
Updated environment variable logic for database, queue, Redis, and results-cache services. For non-bundled services, port variables renamed to CLP_CONNECT_PORT to denote inter-container connectivity. For bundled services, now sets both CLPHOST and CLP*_PORT entries (previously only host or only port was set).
Docker Compose Configuration
tools/deployment/package/docker-compose-all.yaml
Renamed environment variables across service definitions from CLP_PORT to CLP_CONNECT_PORT for database, queue, Redis, and results-cache services. Updated all connection URLs and references to use new CONNECT_PORT variants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Linked Issues check ✅ Passed All objectives from issue #2065 are met: CLP_PORT vars restored for bundled services, CLP_CONNECT_PORT introduced for connections, and published ports are now user-configurable without breaking inter-container connections.
Out of Scope Changes check ✅ Passed All changes are directly related to the stated objectives of separating published ports from connection ports across bundled and external service configurations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main changes: restoring CLP_*_PORT environment variables for bundled services and separating published ports from connection ports.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao changed the title fix(clp-package): Restore CLP_*_PORT env vars for bundled services … fix(clp-package): Restore CLP_*_PORT env vars for bundled services and separate published ports from connection ports (fixes #2065). Mar 4, 2026
@junhaoliao junhaoliao changed the title fix(clp-package): Restore CLP_*_PORT env vars for bundled services and separate published ports from connection ports (fixes #2065). fix(clp-package): Restore CLP_*_PORT env vars for bundled services; Separate published ports from connection ports (fixes #2065). Mar 4, 2026
@junhaoliao junhaoliao changed the title fix(clp-package): Restore CLP_*_PORT env vars for bundled services; Separate published ports from connection ports (fixes #2065). fix(clp-package): Restore CLP_*_PORT env vars for bundled services (fixes #2065); Separate published ports from connection ports. Mar 4, 2026
@junhaoliao junhaoliao added this to the February 2026 milestone Mar 4, 2026

@hoophalab hoophalab left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Validations: modifying "port" fields of bundled services in clp-config.yaml exposes ports correctly.

@junhaoliao junhaoliao merged commit dbc1799 into y-scope:main Mar 5, 2026
24 of 25 checks passed
@junhaoliao junhaoliao deleted the bundled-ports branch March 5, 2026 15:03
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants