Skip to content

fix(deployment): Update archives directory mount paths in Presto Docker Compose project to match the CLP package Compose project (fixes #1496).#1500

Merged
junhaoliao merged 4 commits into
y-scope:mainfrom
junhaoliao:presto-mount-path
Oct 26, 2025

Conversation

@junhaoliao

@junhaoliao junhaoliao commented Oct 25, 2025

Copy link
Copy Markdown
Member

Description

This change standardizes archive-related mount points in the Presto Docker Compose setup to align with the CLP Package Compose conventions.

  • Docker Compose (presto-clp):
    • Mount host archives to /var/data/archives:ro using ${CLP_ARCHIVES_DIR:-empty}.
    • Mount staged archives to /var/data/staged-archives:ro using ${CLP_STAGED_ARCHIVES_DIR:-empty}.
    • Introduce a named empty volume as a safe default so Compose can start even when env vars are unset.
  • Init script:
    • For s3 archive output storage, set CLP_STAGED_ARCHIVES_DIR based on the configured staging_directory (previously set CLP_ARCHIVES_DIR).

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

fs output

  1. Started both projects and compressed sample data:
    cd <project-root>
    task
    
    cd build/clp-package
    
    # follow https://docs.yscope.com/clp/main/user-docs/guides-using-presto.html#setting-up-clp and configure `etc/clp-config.yaml`
    
    ./sbin/start-clp.sh
    ./sbin/compress.sh --timestamp-key=timestamp ~/samples/postgresql.jsonl
    
    
    cd <project-root>
    cd tools/deployment/presto-clp
    # follow https://docs.yscope.com/clp/main/user-docs/guides-using-presto.html#setting-up-presto and configure `coordinator/config-template/split-filter.json`
    ./scripts/set-up-config.sh <project-root>/build/clp-package
    docker compose up --wait
    
  2. Accessed the presto web console at http://localhost:8889/ui/sql_client.html
    1. Set "Catalog" to clp, "Schema" to default
    2. Ran query:
      select timestamp, message from default limit 100
    3. Observed rows are returned in the results:
      image

s3 output

  1. Clean built the project.
  2. Followed Step 1 in above "fs output" instructions to start both projects and compress sample data. When configuring clp-config.yml, also configure the archives output to use s3 output:
    archive_output:
      storage:
        type: "s3"
        s3_config:
          ...
  3. Followed Step 1 in above "fs output" instructions. Performed a query and observed results:
    image

Summary by CodeRabbit

  • Chores
    • Updated deployment to use conditional mounts with a dummy fallback volume for archives and staged archives, improving robustness when host paths are unset.
    • Adjusted environment variable for S3-based storage to use a staging-specific name so staged archive paths are consistently set for cloud deployments.

… Compose project to match CLP Package Compose (fixes y-scope#1496).
@junhaoliao junhaoliao requested a review from a team as a code owner October 25, 2025 10:09
@coderabbitai

coderabbitai Bot commented Oct 25, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

The PR updates the Presto worker Docker Compose volume mounts to add explicit /var/data/archives and /var/data/staged-archives read-only mounts with environment-variable fallbacks to an empty volume, removes an existing absolute bind, and renames an environment variable in the deployment init script for S3 staging from CLP_ARCHIVES_DIR to CLP_STAGED_ARCHIVES_DIR.

Changes

Cohort / File(s) Summary
Docker Compose volume configuration
tools/deployment/presto-clp/docker-compose.yaml
Added two bind mounts to services.presto-worker.volumes: \${CLP_ARCHIVES_DIR:-empty}:/var/data/archives:ro and \${CLP_STAGED_ARCHIVES_DIR:-empty}:/var/data/staged-archives:ro; removed \${CLP_ARCHIVES_DIR}:\${CLP_ARCHIVES_DIR} bind; added top-level empty volume.
Init script env var mapping
tools/deployment/presto-clp/scripts/init.py
In _add_clp_env_vars, when archive_output.storage.type == "s3", changed the environment variable name from CLP_ARCHIVES_DIR to CLP_STAGED_ARCHIVES_DIR; value construction from staging_directory unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Init as init.py
  participant Env as Environment
  participant Compose as docker-compose

  rect rgb(230,245,255)
  Note over Init,Env: Initialization phase
  Init->>Env: set CLP_STAGED_ARCHIVES_DIR=<staging_directory> (for S3)
  Init->>Env: (no longer sets CLP_ARCHIVES_DIR for S3)
  end

  rect rgb(245,255,230)
  Note over Env,Compose: Compose uses env vars for mounts
  Env->>Compose: CLP_ARCHIVES_DIR or fallback "empty" -> mount /var/data/archives:ro
  Env->>Compose: CLP_STAGED_ARCHIVES_DIR or fallback "empty" -> mount /var/data/staged-archives:ro
  end

  Note over Compose: presto-worker mounts resolved paths (or empty dummy volume)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify the empty volume purpose and its effect when env vars unset.
  • Search repository for other CLP_ARCHIVES_DIR usages to ensure no unintended regressions.
  • Confirm Docker Compose variable expansion semantics in target deployment environment.

Possibly related issues

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check ✅ Passed The PR title "Update archives directory mount paths in Presto Docker Compose project to match the CLP package Compose project" is fully related to the main change in the changeset. It accurately describes the primary modifications—updating mount paths for archives and staged archives in the Presto Docker Compose configuration to align with the CLP package conventions. The title is specific and clear enough that a teammate scanning the repository history would immediately understand the core change (standardizing archive mount paths), and it appropriately uses conventional commit format (fix:) to indicate this is a bugfix. While the title doesn't enumerate every detail such as the introduction of the dummy empty volume or the init.py environment variable name change, such comprehensive coverage is not expected and would detract from clarity.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.


volumes:
coordinator-config:
empty:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This volume differs from the one used in the Package project, where a tmpfs mount with size 0 is configured.

In this case, the volume is mounted as read-only, so we don't need to account for any data written to the dummy empty volume.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should add a comment for future editors?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one moment. i think it's worth marking the volume as readonly to avoid mistakes

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the readonly attribute can only be applied when defining the mount, not in the volume definition. changing the definition to tmpfs like the Package project should just work, but it will take time to re-validate. i think it's fine to keep it simple as currently is, so long it works and has been validated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb7f0d5 and f1d4155.

📒 Files selected for processing (2)
  • tools/deployment/presto-clp/docker-compose.yaml (1 hunks)
  • tools/deployment/presto-clp/scripts/init.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (2)
tools/deployment/presto-clp/scripts/init.py (1)

127-127: LGTM! Environment variable renamed to align with staging archives semantics.

The change from CLP_ARCHIVES_DIR to CLP_STAGED_ARCHIVES_DIR correctly reflects that S3 storage uses staging directories rather than direct archive directories. This aligns with the docker-compose.yaml updates that mount this variable to /var/data/staged-archives.

tools/deployment/presto-clp/docker-compose.yaml (1)

37-38: LGTM! Unified mount paths with safe defaults.

The fixed mount paths (/var/data/archives and /var/data/staged-archives) with fallback to the empty volume provide a clean solution that:

  • Standardizes paths to match CLP Package Compose conventions
  • Allows the service to start even when environment variables are unset
  • Uses read-only mounts appropriately for archive directories


volumes:
coordinator-config:
empty:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider a more descriptive volume name.

The volume name empty is functional but somewhat generic. Consider renaming it to something more descriptive like presto-empty-default or empty-fallback to clarify its purpose as a safe fallback volume.

🤖 Prompt for AI Agents
In tools/deployment/presto-clp/docker-compose.yaml around line 47 the Docker
volume is named `empty`, which is too generic; rename it to a more descriptive
name (e.g., `presto-empty-default` or `empty-fallback`) across the compose file
and any services that reference it so it clearly communicates its role as a safe
fallback volume, update the volumes: section and all volume mounts to use the
new name, and run a quick docker-compose config validation to ensure no
references were missed.

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good except for the question I posted.

For the PR title, how about:

fix(deployment): Update `/var/data` mount paths in Presto Docker Compose project to match the CLP package Compose project (fixes #1496).

@junhaoliao junhaoliao changed the title fix(deployment): Use unified /var/data mount paths in Presto Docker Compose project to match CLP Package Compose (fixes #1496). fix(deployment): Update /var/data mount paths in Presto Docker Compose project to match the CLP package Compose project (fixes #1496). Oct 26, 2025
@junhaoliao junhaoliao requested review from kirkrodrigues and removed request for kirkrodrigues October 26, 2025 19:24
Comment thread tools/deployment/presto-clp/docker-compose.yaml
Comment thread tools/deployment/presto-clp/docker-compose.yaml Outdated
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I confused myself before with the PR title. How about:

fix(deployment): Update archives directory mount paths in Presto Docker Compose project to match the CLP package Compose project (fixes #1496).

@junhaoliao junhaoliao changed the title fix(deployment): Update /var/data mount paths in Presto Docker Compose project to match the CLP package Compose project (fixes #1496). fix(deployment): Update archives directory mount paths in Presto Docker Compose project to match the CLP package Compose project (fixes #1496). Oct 26, 2025
@junhaoliao junhaoliao merged commit 6c6de99 into y-scope:main Oct 26, 2025
21 of 22 checks passed
@junhaoliao junhaoliao deleted the presto-mount-path branch October 26, 2025 22:08
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…er Compose project to match the CLP package Compose project (fixes y-scope#1496). (y-scope#1500)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants