Skip to content

fix(deployment): Use CLP Package's Docker Compose network for Presto deployments (fixes #1498).#1501

Merged
junhaoliao merged 17 commits into
y-scope:mainfrom
junhaoliao:codex/fix-issue-1498
Oct 27, 2025
Merged

fix(deployment): Use CLP Package's Docker Compose network for Presto deployments (fixes #1498).#1501
junhaoliao merged 17 commits into
y-scope:mainfrom
junhaoliao:codex/fix-issue-1498

Conversation

@junhaoliao

@junhaoliao junhaoliao commented Oct 25, 2025

Copy link
Copy Markdown
Member

Description

This PR aligns the Presto deployment and runtime configuration with the CLP package's Docker network and containerized defaults.

Summary of changes:

  • clp_package_utils.controller:
    • When QueryEngine is Presto, the WebUI now reads PrestoHost/PrestoPort from the container-transformed config (container_clp_config.presto.*) instead of the host config, ensuring correct in-network addressing for the UI.
  • clp_py_utils.clp_config:
    • Add PRESTO_COORDINATOR_COMPONENT_NAME = "presto-coordinator".
    • Add Presto.DEFAULT_PORT = 8080 and Presto.transform_for_container() to set host to the coordinator service name and port to the default.
    • Call self.presto.transform_for_container() within package transform_for_container() when present.
  • tools/deployment/presto-clp/docker-compose.yaml:
    • Switch services to the external CLP package network (clp-package) and remove the internal presto bridge network.
    • Network name is now driven by ${CLP_PACKAGE_NETWORK_NAME:-clp-package_default} to bind Presto to the active CLP package network.
  • tools/deployment/presto-clp/scripts/init.py:
    • Point Presto metadata DB URL to the database service (jdbc:mysql://database:3306) rather than host-bound values.
    • Read the CLP package instance ID from logs_directory/instance-id and export CLP_PACKAGE_NETWORK_NAME=clp-package-<instance>_default, ensuring Presto composes onto the correct external network.
  • docs/src/user-docs/guides-using-presto.md:
    • Remove the requirement to set database.host to a non-localhost value; this is no longer needed with the shared Docker network.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Tested both on a bare metal Ubuntu 22.04 machine and in WSL.

  1. Apply the changes in fix(deployment): Update archives directory mount paths in Presto Docker Compose project to match the CLP package Compose project (fixes #1496). #1500 to resolve an issue in getting valid rows in the results.
  2. Started both projects and compressed sample data:
    cd <project-root>
    task
    
    cd build/clp-package
    
    # follow https://docs.yscope.com/clp/main/user-docs/guides-using-presto.html#setting-up-clp and configure `etc/clp-config.yaml`
    
    ./sbin/start-clp.sh
    ./sbin/compress.sh --timestamp-key=timestamp ~/samples/postgresql.jsonl
    
    cd <project-root>
    cd tools/deployment/presto-clp
    # follow https://docs.yscope.com/clp/main/user-docs/guides-using-presto.html#setting-up-presto and configure `coordinator/config-template/split-filter.json`
    #
    # NOTE: Do NOT follow the instructions about setting the database's host to non-localhost. i.e.,
    # keep the database's host as `localhost`
    #
    ./scripts/set-up-config.sh <project-root>/build/clp-package
    docker compose up --wait
    
  3. Accessed the presto web console at http://localhost:8889/ui/sql_client.html
    1. Set "Catalog" to clp, "Schema" to default
    2. Ran query:
      select timestamp, message from default limit 100
    3. Observed rows are returned in the results:
      image
  4. Performed a query CLP's WebUI http://localhost:4000 and observed results listed:
    image

Summary by CodeRabbit

  • New Features

    • Presto container setup now prefers component-specific host/port and applies coordinator and database defaults for containerized runs.
  • Bug Fixes

    • Environment variable retrieval is stricter to surface missing configuration earlier.
  • Documentation

    • Simplified Presto guide by removing a redundant database configuration step.
  • Infrastructure

    • Docker deployment switched to the unified CLP package network and derives network names from the package instance.

@coderabbitai

coderabbitai Bot commented Oct 25, 2025

Copy link
Copy Markdown
Contributor

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Presto configuration and deployment were updated to prefer container-specific Presto host/port when QueryEngine is PRESTO, add a Presto coordinator name/default port and transform_for_container behavior, tighten env-var handling, switch Compose services to an external clp-package network, add init helpers to derive CLP package instance/network, and include minor unrelated calculator edits.

Changes

Cohort / File(s) Summary
Presto configuration model
components/clp-py-utils/clp_py_utils/clp_config.py
Added PRESTO_COORDINATOR_COMPONENT_NAME; Presto.DEFAULT_PORT = 8080; added Presto.transform_for_container() to set host to coordinator name and port to default; _get_env_var() now raises ValueError if missing; CLPConfig.transform_for_container() calls Presto transform when appropriate.
Controller: Presto host/port selection
components/clp-package-utils/clp_package_utils/controller.py
Changed web UI population logic to use container_clp_config Presto host/port when QueryEngine is PRESTO, instead of top-level CLP config.
Docker Compose: Presto networking
tools/deployment/presto-clp/docker-compose.yaml
Switched presto-coordinator and presto-worker services from local presto network to external clp-package network; top-level networks now reference external clp-package (name derived from CLP_PACKAGE_NETWORK_NAME).
Init script: CLP package instance & env
tools/deployment/presto-clp/scripts/init.py
Added DATABASE_COMPONENT_NAME and DATABASE_DEFAULT_PORT; replaced direct DB host/port extraction with constants; added _get_clp_package_instance_id(clp_config, clp_package_dir) to read package instance id from logs and derive CLP_PACKAGE_NETWORK_NAME; integrated into env generation with error handling for missing/invalid instance id.
Documentation
docs/src/user-docs/guides-using-presto.md
Removed a YAML snippet and explanatory note about non-localhost DB host configuration; other guidance retained.
Unrelated demo edits
src/calculator.py
Added coderabbit_add(x, y); changed coderabbit_formula(x, y) signature to coderabbit_formula(x, y, z); renamed global var old_global_varnew_global_var.

Sequence Diagram(s)

sequenceDiagram
  participant UI as Web UI controller
  participant CLP as Top-level CLPConfig
  participant CCLP as Container CLPConfig
  participant PRESTO as Presto model

  Note over UI,CLP: Populate Presto host/port for web UI
  UI->>CLP: request package config & query_engine
  alt query_engine == PRESTO
    UI->>CCLP: request container_clp_config
    CCLP->>PRESTO: provide presto.host / presto.port (may be transformed)
    PRESTO-->>UI: host/port from container config
  else other engines or None
    CLP->>PRESTO: provide presto.host / presto.port
    PRESTO-->>UI: host/port from top-level config or None
  end
Loading
sequenceDiagram
  participant Init as init.py
  participant Logs as CLP package logs
  participant Env as Generated environment

  Note over Init,Logs: Derive CLP package network name
  Init->>Logs: read package instance id (logs file)
  alt instance id found
    Logs-->>Init: instance id
    Init->>Env: set CLP_PACKAGE_NETWORK_NAME based on instance id
  else missing/invalid
    Logs-->>Init: error/empty
    Init-->>Env: fail env generation / surface error
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas needing extra attention:

  • components/clp-py-utils/clp_py_utils/clp_config.py: callers that expected _get_env_var() to return None must be updated; validate Presto.transform_for_container idempotence and impacts.
  • components/clp-package-utils/clp_package_utils/controller.py: ensure UI remains consistent when switching to container_clp_config for PRESTO.
  • tools/deployment/presto-clp/scripts/init.py: verify robustness/error messages of _get_clp_package_instance_id() and handling when logs are absent.
  • tools/deployment/presto-clp/docker-compose.yaml: confirm deployment environments create/attach the external clp-package network as expected.

Possibly related issues

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix(deployment): Use CLP Package's Docker Compose network for Presto deployments (fixes #1498)" is directly and specifically related to the main change in the changeset. The title accurately captures the core objective of aligning Presto deployment and runtime configuration with the CLP package's Docker network and containerized defaults, as evidenced by the changes across multiple files (docker-compose.yaml, clp_config.py, controller.py, init.py, and documentation). The title is concise, clear, and specific without unnecessary noise or vague terminology, and a teammate scanning the commit history would immediately understand that this PR resolves a networking issue for Presto deployments.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fdb5c96 and 14501cf.

📒 Files selected for processing (1)
  • tools/deployment/presto-clp/scripts/init.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: package-image
🔇 Additional comments (3)
tools/deployment/presto-clp/scripts/init.py (3)

10-13: Database constants are well-documented for container-internal use.

The comment clearly indicates these are for the CLP Package Docker network and must match the constants in clp_py_utils/clp_config.py. The implementation is appropriate for the containerized networking approach.


109-112: Database URL correctly uses container-internal endpoint.

The metadata database URL now uses the database service name and default port for in-network communication. This aligns with the PR's objective of using the CLP package's Docker network for Presto deployments.


285-322: Well-implemented helper function with comprehensive error handling.

The function properly handles missing files, read errors, and empty content with appropriate logging for each case. Using logger.exception for the OSError provides valuable stack trace information for debugging.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao marked this pull request as ready for review October 25, 2025 13:56
@junhaoliao junhaoliao requested a review from a team as a code owner October 25, 2025 13:56
Comment on lines 13 to +14
networks:
- "presto"
- "clp-package"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't plan to add more networks

Suggested change
networks:
- "presto"
- "clp-package"
networks: ["clp-package"]

Comment on lines 41 to +42
networks:
- "presto"
- "clp-package"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't plan to add more networks

Suggested change
networks:
- "presto"
- "clp-package"
networks: ["clp-package"]

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 107 to 110
database_name = _get_config_value(clp_config, "database.name", "clp-db")
env_vars["PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_URL"] = (
f"jdbc:mysql://{database_host}:{database_port}"
f"jdbc:mysql://{DATABASE_COMPONENT_NAME}:{DATABASE_DEFAULT_PORT}"
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use configured database host/port when building Presto metadata URL

The new logic hard‑codes the metadata database endpoint to jdbc:mysql://database:3306 regardless of the values in clp-config.yml. Prior to this change _add_clp_env_vars read database.host and database.port, which allowed deployments that point CLP at an external database or a non‑default port. With the constant URL, any such deployment will have Presto fail to connect because the database service on port 3306 may not exist. This regression affects every environment where the CLP database is not the default container at 3306.

Useful? React with 👍 / 👎.

@junhaoliao junhaoliao Oct 25, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex the presto-coordinator now uses the database's internal host and port for communication so no need to change dynamically? that's also the whole point of this PR

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
tools/deployment/presto-clp/scripts/init.py (1)

108-111: Don’t hard‑code JDBC to database:3306 — respect configured DB host/port (with container-aware remap).

This regresses deployments using an external DB or non‑default port; Presto would point to the wrong database.

-    env_vars["PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_URL"] = (
-        f"jdbc:mysql://{DATABASE_COMPONENT_NAME}:{DATABASE_DEFAULT_PORT}"
-    )
+    # Prefer configured host/port; remap localhost to the container service name.
+    db_host = _get_config_value(clp_config, "database.host", DATABASE_COMPONENT_NAME)
+    if str(db_host) in ("localhost", "127.0.0.1"):
+        db_host = DATABASE_COMPONENT_NAME
+    db_port = _get_config_value(clp_config, "database.port", DATABASE_DEFAULT_PORT)
+    try:
+        db_port = int(db_port)
+    except (TypeError, ValueError):
+        db_port = DATABASE_DEFAULT_PORT
+    env_vars["PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_URL"] = (
+        f"jdbc:mysql://{db_host}:{db_port}"
+    )

If you’d prefer to always target the CLP package’s DB container when PRESTO is selected, add an explicit validation that disallows external DB hosts for PRESTO to avoid config drift.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c6de99 and b7ed53b.

📒 Files selected for processing (5)
  • components/clp-package-utils/clp_package_utils/controller.py (1 hunks)
  • components/clp-py-utils/clp_py_utils/clp_config.py (3 hunks)
  • docs/src/user-docs/guides-using-presto.md (0 hunks)
  • tools/deployment/presto-clp/docker-compose.yaml (3 hunks)
  • tools/deployment/presto-clp/scripts/init.py (4 hunks)
💤 Files with no reviewable changes (1)
  • docs/src/user-docs/guides-using-presto.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • components/clp-package-utils/clp_package_utils/controller.py
🪛 Ruff (0.14.1)
tools/deployment/presto-clp/scripts/init.py

309-311: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (5)
components/clp-py-utils/clp_py_utils/clp_config.py (1)

35-35: Constant name for Presto coordinator aligns with service naming.

No issues.

tools/deployment/presto-clp/scripts/init.py (1)

167-171: Deriving CLP package network name from instance ID — LGTM.

Ensures presto-compose joins the correct external network.

Consider logging the resolved network name at INFO for easier troubleshooting.

tools/deployment/presto-clp/docker-compose.yaml (3)

14-14: Attach coordinator to external clp-package network — LGTM.

Matches the package network strategy.


43-43: Attach worker to external clp-package network — LGTM.

Keeps intra‑cluster DNS consistent.


53-55: External network with overridable name — LGTM.

Default aligns with DockerComposeController project naming.

Comment thread components/clp-package-utils/clp_package_utils/controller.py
Comment on lines +595 to +603
DEFAULT_PORT: ClassVar[int] = 8080

host: DomainStr
port: Port

def transform_for_container(self):
self.host = PRESTO_COORDINATOR_COMPONENT_NAME
self.port = self.DEFAULT_PORT

@coderabbitai coderabbitai Bot Oct 26, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Set sensible defaults for Presto host/port to simplify config.

Provide defaults to avoid requiring users to specify them when running in the packaged setup; still allow override if needed.

 class Presto(BaseModel):
-    DEFAULT_PORT: ClassVar[int] = 8080
-
-    host: DomainStr
-    port: Port
+    DEFAULT_PORT: ClassVar[int] = 8080
+    # Disallow unknown fields to catch typos early.
+    model_config = ConfigDict(extra="forbid")
+
+    host: DomainStr = PRESTO_COORDINATOR_COMPONENT_NAME
+    port: Port = DEFAULT_PORT
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
DEFAULT_PORT: ClassVar[int] = 8080
host: DomainStr
port: Port
def transform_for_container(self):
self.host = PRESTO_COORDINATOR_COMPONENT_NAME
self.port = self.DEFAULT_PORT
DEFAULT_PORT: ClassVar[int] = 8080
# Disallow unknown fields to catch typos early.
model_config = ConfigDict(extra="forbid")
host: DomainStr = PRESTO_COORDINATOR_COMPONENT_NAME
port: Port = DEFAULT_PORT
def transform_for_container(self):
self.host = PRESTO_COORDINATOR_COMPONENT_NAME
self.port = self.DEFAULT_PORT
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/clp_config.py around lines 595-603, the
Presto host and port fields have no class-level defaults so callers must always
set them; add sensible defaults by giving host a default of
PRESTO_COORDINATOR_COMPONENT_NAME and port a default of DEFAULT_PORT (use
Pydantic Field or direct default assignment consistent with the model in this
file) so that instances created for the packaged setup get correct values while
still allowing overrides; leave transform_for_container() as-is to explicitly
set these values when transforming for container runtime.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would not work when the config needs to be transformed for container uses

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, you're absolutely right. The suggested defaults would break non-container usage.

Looking at the pattern used by other components like Database, Queue, and Redis, they default to "localhost" for the host (not their component names), and transform_for_container() updates them to the service names. This allows the config to work both on the host machine and inside containers.

However, Presto is different because it's an external service deployed separately from the CLP package. There's no sensible default for the host in non-container mode since users need to specify where their Presto instance is actually running.

The current implementation (requiring explicit host and port configuration) is correct for this use case. The transform_for_container() method appropriately updates these values when the CLP components run inside Docker and need to connect to Presto via the Docker network.

Comment thread components/clp-py-utils/clp_py_utils/clp_config.py Outdated
Comment thread tools/deployment/presto-clp/scripts/init.py Outdated

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.

For the PR title, how about:

fix(deployment): Use CLP Package's Docker Compose network for Presto deployments (fixes #1498).

presto:
driver: "bridge"
clp-package:
name: "${CLP_PACKAGE_NETWORK_NAME:-clp-package_default}"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make it required? Or do you think the default would be useful in some way?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default is useful when users don't use start_clp.py to set up the Package compose project (so instance IDs won't be appended to the project name), and when they don't use the init.py script to set up the Presto compose project

which is honestly not recommended to general users...

That said, at least the clp-package part will be consistent with the default Package compose project name:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Gotcha.

Comment thread tools/deployment/presto-clp/scripts/init.py Outdated

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to post this as well.

Comment thread tools/deployment/presto-clp/scripts/init.py Outdated
junhaoliao and others added 2 commits October 27, 2025 02:24
…eview

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
…uggestions from code review

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
@junhaoliao junhaoliao changed the title fix(deployment): Align Presto Docker Compose networking with CLP Package (fixes #1498). fix(deployment): Use CLP Package's Docker Compose network for Presto deployments (fixes #1498). Oct 27, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
components/clp-py-utils/clp_py_utils/clp_config.py (1)

595-603: Harden the Presto model against config typos.

Functional change LGTM. Recommend forbidding extra fields on Presto to catch misspelled keys early.

 class Presto(BaseModel):
+    # Disallow unknown keys to catch misspellings in config.
+    model_config = ConfigDict(extra="forbid")
     DEFAULT_PORT: ClassVar[int] = 8080
tools/deployment/presto-clp/scripts/init.py (1)

109-111: Fail fast when CLP is configured to use an external DB or non‑default port.

Hard-coding jdbc:mysql://database:3306 will break deployments pointing CLP at an external DB or a different port. Validate config and emit a clear error instead of failing later at runtime.

-    env_vars["PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_URL"] = (
-        f"jdbc:mysql://{DATABASE_COMPONENT_NAME}:{DATABASE_DEFAULT_PORT}"
-    )
+    # Validate that CLP is using the packaged DB; otherwise Presto cannot reach it via service name.
+    db_host_cfg = _get_config_value(clp_config, "database.host", "localhost")
+    db_port_cfg = _get_config_value(clp_config, "database.port", DATABASE_DEFAULT_PORT)
+    try:
+        db_port_cfg_int = int(db_port_cfg)
+    except (TypeError, ValueError):
+        logger.error("Invalid 'database.port' value in '%s': %r", clp_config_file_path, db_port_cfg)
+        return False
+    if db_host_cfg not in ("localhost", DATABASE_COMPONENT_NAME) or db_port_cfg_int != DATABASE_DEFAULT_PORT:
+        logger.error(
+            "Presto deployment assumes the packaged database at '%s:%d' on the CLP package network, "
+            "but clp-config.yml is set to database.host=%r, database.port=%r. "
+            "Use the packaged DB or adjust the Presto deployment accordingly.",
+            DATABASE_COMPONENT_NAME,
+            DATABASE_DEFAULT_PORT,
+            db_host_cfg,
+            db_port_cfg,
+        )
+        return False
+    env_vars["PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_URL"] = (
+        f"jdbc:mysql://{DATABASE_COMPONENT_NAME}:{DATABASE_DEFAULT_PORT}"
+    )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7ed53b and b791f44.

📒 Files selected for processing (2)
  • components/clp-py-utils/clp_py_utils/clp_config.py (3 hunks)
  • tools/deployment/presto-clp/scripts/init.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: package-image
🔇 Additional comments (4)
components/clp-py-utils/clp_py_utils/clp_config.py (2)

825-826: Good guard: only transform Presto when it’s the selected engine.

Prevents unnecessary mutation in non‑Presto deployments.


35-35: Naming alignment verified; constant is correct.

The constant "presto-coordinator" at line 35 matches the docker-compose service name exactly and aligns with all deployment configurations. In-network DNS resolution will work as intended with no naming drift detected.

tools/deployment/presto-clp/scripts/init.py (2)

167-171: Deriving CLP_PACKAGE_NETWORK_NAME from instance ID is solid.

Ensures Presto attaches to the correct external network name for the current package instance.


283-320: Instance ID reader is robust and well‑logged.

Good path resolution, explicit messages, and exception logging.

Comment thread tools/deployment/presto-clp/scripts/init.py

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor fixes.

Comment thread tools/deployment/presto-clp/scripts/init.py Outdated
Comment thread tools/deployment/presto-clp/scripts/init.py Outdated
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b791f44 and fdb5c96.

📒 Files selected for processing (1)
  • tools/deployment/presto-clp/scripts/init.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (2)
tools/deployment/presto-clp/scripts/init.py (2)

285-321: LGTM! Well-structured helper with comprehensive error handling.

The function properly handles all edge cases (missing file, read errors, empty content) and provides clear error messages. The use of logger.exception for OSError ensures stack traces are captured for debugging.


109-112: Verify deployment context: presto-clp must be deployed as part of the CLP package.

The database service named "database" on port 3306 is not defined in the presto-clp docker-compose.yaml. These hard-coded values assume integration with the parent CLP package, which provides the database service on the shared "clp-package" network. Ensure this deployment is part of the full CLP package infrastructure, not standalone Presto deployment.

Comment on lines +10 to +13
# Database endpoint inside the CLP Package Docker network. Must match the constants defined in
# `components/clp-py-utils/clp_py_utils/clp_config.py`.
DATABASE_COMPONENT_NAME = "database"
DATABASE_DEFAULT_PORT = 3306

@coderabbitai coderabbitai Bot Oct 27, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider importing these constants instead of duplicating them.

The comment indicates these must match constants in clp_config.py, but duplication creates a maintenance risk. If the values diverge, it could cause connection failures between Presto and the database.

Consider importing from clp_py_utils.clp_config instead:

+from clp_py_utils.clp_config import DATABASE_COMPONENT_NAME, DATABASE_DEFAULT_PORT
+
-# Database endpoint inside the CLP Package Docker network. Must match the constants defined in
-# `components/clp-py-utils/clp_py_utils/clp_config.py`.
-DATABASE_COMPONENT_NAME = "database"
-DATABASE_DEFAULT_PORT = 3306

If clp_py_utils is not available in this context, document why duplication is necessary and consider adding a runtime check that validates the values match.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tools/deployment/presto-clp/scripts/init.py around lines 10 to 13, the
DATABASE_COMPONENT_NAME and DATABASE_DEFAULT_PORT are duplicated values that
should be imported from clp_py_utils.clp_config to avoid drift; replace the
hardcoded constants with imports from clp_py_utils.clp_config (import the two
names and use them directly), and if clp_py_utils is not available in this
runtime, add a short comment explaining why duplication is necessary and
implement a runtime check that compares the local constants to the values from
clp_py_utils (raising a clear error or logging and exiting if they differ).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the clp_py_utils package is not expected to be installed into the Presto init.py script environment, though it's technically feasible. the two source files are in complete separate flows anyways and that's been well documented.

that said, we should merge both flows after the coming release

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, thank you for the clarification! That makes sense—if clp_py_utils isn't available in the Presto init script's runtime environment, the duplication is the right approach for now. I appreciate you documenting the reasoning in the comment.

Looking forward to seeing the flows merged after the release, which should eliminate this maintenance concern. 👍


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp#1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.889Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +169 to +172
instance_id = _get_clp_package_instance_id(clp_config, clp_package_dir)
if instance_id is None:
return False
env_vars["CLP_PACKAGE_NETWORK_NAME"] = f"clp-package-{instance_id}_default"

@coderabbitai coderabbitai Bot Oct 27, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the Docker Compose network name format is consistent across deployments.

The network name is constructed as clp-package-{instance_id}_default, which assumes a specific Docker Compose project naming convention. If the CLP package's compose project name or network configuration changes, Presto will fail to join the network.

Run the following script to verify the network name format matches the CLP package's actual network:


🏁 Script executed:

#!/bin/bash
# Description: Check if the network name format matches Docker Compose conventions

# Find docker-compose files in CLP package and check network configuration
echo "Checking CLP package docker-compose network configuration:"
fd -t f 'docker-compose.ya?ml' -E 'presto-clp' | head -1 | xargs rg -A 10 '^networks:' 

echo -e "\nChecking Presto docker-compose network reference:"
cat tools/deployment/presto-clp/docker-compose.yaml | rg -A 5 'CLP_PACKAGE_NETWORK_NAME'

Length of output: 189



---

🏁 Script executed:

```shell
#!/bin/bash
# Find _get_clp_package_instance_id function implementation
echo "=== Finding _get_clp_package_instance_id function ==="
rg -n '_get_clp_package_instance_id' tools/deployment/presto-clp/scripts/init.py -A 15

echo -e "\n=== Searching for CLP package docker-compose in repository ==="
fd -t f 'docker-compose' | grep -v presto-clp

echo -e "\n=== Searching for network configuration in CLP or clp-package ==="
fd -t f 'docker-compose.ya?ml' | xargs rg -l 'networks:' 2>/dev/null

Length of output: 1529



---

🏁 Script executed:

```shell
#!/bin/bash
# Get full _get_clp_package_instance_id function
echo "=== Full _get_clp_package_instance_id function ==="
rg -n '_get_clp_package_instance_id' tools/deployment/presto-clp/scripts/init.py -A 25

echo -e "\n=== CLP package docker-compose.yaml networks and project config ==="
cat tools/deployment/package/docker-compose.yaml | rg -B 5 -A 10 'networks:|^services:|name:'

echo -e "\n=== CLP package docker-compose.base.yaml networks ==="
cat tools/deployment/package/docker-compose.base.yaml | rg -B 2 -A 10 'networks:'

Length of output: 4911



---

🏁 Script executed:

```shell
#!/bin/bash
# Check if CLP package docker-compose defines networks explicitly
echo "=== Checking for explicit networks definition in CLP package ==="
cat tools/deployment/package/docker-compose.yaml | head -50
cat tools/deployment/package/docker-compose.base.yaml | grep -A 15 "networks:"

# Check if project name is dynamically set or if there's instance-id usage
echo -e "\n=== Looking for instance-id usage in deployment scripts ==="
fd -t f 'docker-compose|\.sh' tools/deployment/package/ | xargs grep -l 'instance' 2>/dev/null || echo "No instance-id references found"

# Check how the network is actually used
echo -e "\n=== Network name format consistency check ==="
grep -r 'clp-package.*_default\|instance.*id' tools/deployment/ --include='*.yaml' --include='*.yml' --include='*.py' | grep -E 'network|project' | head -20

Length of output: 1895



---

🏁 Script executed:

```shell
#!/bin/bash
# Search for where instance-id file is created and how it's used
echo "=== Where is instance-id file created/written? ==="
rg -n 'instance-id' tools/deployment/ -B 3 -A 3 --include='*.py' --include='*.sh'

echo -e "\n=== Check if CLP package has any external networks defined ==="
cat tools/deployment/package/docker-compose.base.yaml

echo -e "\n=== Check for explicit network creation/configuration in scripts ==="
rg -n 'networks:' tools/deployment/package/ --include='*.yaml' -A 5

Length of output: 13250


Fix network name format: underscore instead of hyphen before instance ID.

The network name is constructed as clp-package-{instance_id}_default (line 172), but Docker Compose's default network naming convention is <project_name>_default. With CLP package's project name set to "clp-package", the default network would be clp-package_default. If using a custom instance ID, the format should be clp-package_{instance_id}_default (underscore separator), not clp-package-{instance_id}_default (hyphen separator).

Change line 172 from:

env_vars["CLP_PACKAGE_NETWORK_NAME"] = f"clp-package-{instance_id}_default"

to:

env_vars["CLP_PACKAGE_NETWORK_NAME"] = f"clp-package_{instance_id}_default"
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/scripts/init.py around lines 169 to 172, the
constructed Docker network name uses a hyphen before the instance ID which
yields clp-package-{instance_id}_default; change this to use an underscore so
the network matches Docker Compose convention by setting
CLP_PACKAGE_NETWORK_NAME to clp-package_{instance_id}_default instead of
clp-package-{instance_id}_default.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't, or it won't match

self._project_name = f"clp-package-{instance_id}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the network name format in controller.py around line 603
echo "=== Checking controller.py line 603 and surrounding context ==="
cat components/clp-package-utils/clp_package_utils/controller.py | sed -n '595,615p'

echo -e "\n=== Search for all occurrences of network name construction in controller.py ==="
rg -n 'clp-package.*instance.*default|instance.*clp-package.*default' components/clp-package-utils/clp_package_utils/controller.py -B 2 -A 2

Length of output: 809


🏁 Script executed:

#!/bin/bash
# Search for network name patterns more broadly
echo "=== Search for 'clp-package' in controller.py ==="
rg -n 'clp-package' components/clp-package-utils/clp_package_utils/controller.py -C 3

echo -e "\n=== Search for 'instance' combined with 'network' or 'default' ==="
rg -n 'instance.*network|network.*instance|instance.*default' components/clp-package-utils/clp_package_utils/controller.py -C 2

echo -e "\n=== Check total lines in controller.py ==="
wc -l components/clp-package-utils/clp_package_utils/controller.py

Length of output: 461


@junhaoliao You're absolutely right! I can confirm that line 642 in controller.py sets the project name as:

self._project_name = f"clp-package-{instance_id}"

Since Docker Compose creates the default network as {project_name}_default, the network name constructed in init.py line 172 as clp-package-{instance_id}_default is correct and consistent with the project naming convention. The hyphen before instance_id is intentional and necessary for the network names to match.

Thank you for the clarification! The current implementation is correct as-is.

@junhaoliao junhaoliao merged commit fdd2f82 into y-scope:main Oct 27, 2025
20 checks passed
@junhaoliao junhaoliao deleted the codex/fix-issue-1498 branch May 7, 2026 19:46
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants