Skip to content

fix(package): Don't remap output paths when mounting into CLP execution containers (fixes #960).#998

Merged
haiqi96 merged 9 commits into
y-scope:mainfrom
haiqi96:container_path
Jun 18, 2025
Merged

fix(package): Don't remap output paths when mounting into CLP execution containers (fixes #960).#998
haiqi96 merged 9 commits into
y-scope:mainfrom
haiqi96:container_path

Conversation

@haiqi96

@haiqi96 haiqi96 commented Jun 10, 2025

Copy link
Copy Markdown
Contributor

Description

This PR fixes issue mentioned in #960; it mounts the output path in the execution container without any remapping, so the path of logs and archives are consistent inside and outside the container. This prevent package referring to any confusing path that does not exist in user's host file system.

This PR also adds a prefix_blacklist to prevent user from mounting output to any path that could potentially affect the container.

Note this PR doesn't modify any logic related to input logs mount. As discussed in #960, we plan to leave the input log mount as it is for now. For CLP, the "path_prefix_to_remove" is already properly configured so that path of logs will be displayed properly in the log viewer webui. For CLP_S, we don't support displaying the original path of the logs.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  1. Tested the following output directory cases and ensured that start-clp.py returns an error:
  • The output directory is the same as one of the restricted paths.
  • The output directory is a sub-directory of any of the restricted paths.
  1. Trigger a worker error with a corrupted clp/clp-s executable and ensure that the error message shows the actual on-disk path.
  • When compression fails, the webui shows the actual on disk path for the log file.
  • When search fails, the search scheduler logs shows the on disk path for clg/clp-s error log.
  1. Ensured path like /mnt/c/user//logs work.
  2. Pulled changes from feat(clp-json): Use dataset-specific tables and archive directories for compression, decompression, and search. #868, ensured that archive entires in clp_datasets display the correct on-disk path.
  3. Confirmed that aws profile still works.

Summary by CodeRabbit

  • New Features

    • Added validation to prevent mounting disallowed or system-critical directories into containers.
    • Included cautionary notes in configuration files to guide users on directory path restrictions.
  • Bug Fixes

    • Improved handling of container directory paths to avoid redundant assignments and ensure only valid directories are used.

@haiqi96 haiqi96 requested a review from a team as a code owner June 10, 2025 21:36
@coderabbitai

coderabbitai Bot commented Jun 10, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

The update introduces a new function to validate directory paths before they are mounted into containers, ensuring they are absolute and not under restricted system directories. Existing functions are modified to use this validation, and redundant assignments of container directory paths are removed from the configuration generation logic. Additionally, clarifying comments were added to the configuration file warning against overlapping storage directories with container paths.

Changes

File(s) Change Summary
components/clp-package-utils/clp_package_utils/general.py Added CONTAINER_AWS_CONFIG_DIRECTORY constant and validate_path_for_container_mount function; integrated path validation into config loading and worker config validation; removed redundant container path assignments in config generation.
components/package-template/src/etc/clp-config.yml Added cautionary comments to storage directory paths warning against overlap with container execution paths.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Loader as load_config_file
    participant Validator as validate_path_for_container_mount

    User->>Loader: Load configuration file
    Loader->>Validator: Validate data_directory path
    Validator-->>Loader: Path valid or raise ValueError
    Loader->>Validator: Validate logs_directory path
    Validator-->>Loader: Path valid or raise ValueError
    Loader-->>User: Return validated config or error
Loading
sequenceDiagram
    participant Validator as validate_worker_config
    participant PathValidator as validate_path_for_container_mount

    Validator->>PathValidator: Validate archive_output path
    PathValidator-->>Validator: Path valid or raise ValueError
    Validator->>PathValidator: Validate stream_output path
    PathValidator-->>Validator: Path valid or raise ValueError
    Validator-->>Caller: Return validation result
Loading
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
components/clp-package-utils/clp_package_utils/general.py (2)

544-572: 🛠️ Refactor suggestion

⚠️ Potential issue

validate_path_for_container_mount — portability & overly broad restrictions

  1. path.is_relative_to(...) is Python 3.9+. The rest of the project still targets 3.8 (per pyproject.toml / CI matrix). Falling back to a try/except path.relative_to(...) keeps 3.8 compatibility.

  2. Blanket-banning /var prevents many standard Linux layouts (/var/lib/clp, /var/log/clp). Blocking the entire hierarchy may break existing deployments and contradict the PR goal (only avoid remapping host paths that collide with container internals). Narrow the restriction or make it configurable.

  3. Missing docstring and RESTRICTED_PREFIXES could be _RESTRICTED_PREFIXES to silence Pylint C0103.

Suggested compatibility shim:

-        if path.is_relative_to(prefix):
+        try:
+            path.relative_to(prefix)
+        except ValueError:
+            continue
+        else:
             raise ValueError(...)
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 544-544: Missing function or method docstring

(C0116)


[convention] 545-545: Variable name "RESTRICTED_PREFIXES" doesn't conform to snake_case naming style

(C0103)


368-374: 💡 Verification agent

❓ Verification inconclusive

Validate after hostname-suffix is appended

validate_path_for_container_mount runs before the data & logs directories are transformed into node-specific paths (path /= hostname).
If the appended hostname makes the path invalid (e.g., the parent dir passes validation but the suffixed full path exceeds a length restriction, changes its symlink resolution, etc.), the new value is never re-checked, and an invalid mount path may slip through.

Diff illustrating the suggested sequence:

-    validate_path_for_container_mount(clp_config.data_directory)
-    validate_path_for_container_mount(clp_config.logs_directory)
-
     # Make data and logs directories node-specific
     hostname = socket.gethostname()
     clp_config.data_directory /= hostname
     clp_config.logs_directory /= hostname

+    # Re-validate the final absolute paths
+    validate_path_for_container_mount(clp_config.data_directory)
+    validate_path_for_container_mount(clp_config.logs_directory)

Re-validate container mount paths after appending hostname

The current code only validates the base directories before adding the hostname suffix. Any changes introduced by appending the hostname (e.g. path-length limits, symlink resolution) won’t be caught. Move the calls to validate_path_for_container_mount to after the hostname is appended:

• File: components/clp-package-utils/clp_package_utils/general.py
Lines: ~368–374

Suggested diff:

-    validate_path_for_container_mount(clp_config.data_directory)
-    validate_path_for_container_mount(clp_config.logs_directory)
-
     # Append hostname to make data and logs directories node-specific
     hostname = socket.gethostname()
     clp_config.data_directory /= hostname
     clp_config.logs_directory /= hostname

+    # Re-validate the final absolute paths
+    validate_path_for_container_mount(clp_config.data_directory)
+    validate_path_for_container_mount(clp_config.logs_directory)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e1c32b and 77b47d7.

📒 Files selected for processing (1)
  • components/clp-package-utils/clp_package_utils/general.py (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
components/clp-package-utils/clp_package_utils/general.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
  • get_directory (541-542)
  • get_directory (563-564)
🪛 Pylint (3.3.7)
components/clp-package-utils/clp_package_utils/general.py

[convention] 544-544: Missing function or method docstring

(C0116)


[convention] 545-545: Variable name "RESTRICTED_PREFIXES" doesn't conform to snake_case naming style

(C0103)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: lint-check (ubuntu-latest)

Comment thread components/clp-package-utils/clp_package_utils/general.py
@haiqi96 haiqi96 changed the title fix(package): Prevent remapping of host filesystem paths when mounting output directory in CLP execution containers. (fixes #960) fix(package): Prevent remapping of output paths when mounting in CLP execution containers. (fixes #960) Jun 10, 2025
@haiqi96

haiqi96 commented Jun 11, 2025

Copy link
Copy Markdown
Contributor Author

Let me know if you think there's any docs to update.

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can we add some comments in clp-config.yml for each directory path? Something like:
    • This path cannot overlap with a path in clp's execution container.
  • We need to check aws_config_directory as well, right?

Comment thread components/clp-package-utils/clp_package_utils/general.py
Comment thread components/clp-package-utils/clp_package_utils/general.py Outdated
@haiqi96

haiqi96 commented Jun 16, 2025

Copy link
Copy Markdown
Contributor Author

I don't think we need to check aws_config_directory since

  1. It is a read only mount, so we don't write to it
  2. It has to be mounted to a specific path in the container to make it work for the default user.

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77b47d7 and e3cd1f0.

📒 Files selected for processing (1)
  • components/clp-package-utils/clp_package_utils/general.py (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
components/clp-package-utils/clp_package_utils/general.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
  • get_directory (541-542)
  • get_directory (563-564)
🪛 Pylint (3.3.7)
components/clp-package-utils/clp_package_utils/general.py

[convention] 544-544: Missing function or method docstring

(C0116)


[convention] 545-545: Variable name "RESTRICTED_PREFIXES" doesn't conform to snake_case naming style

(C0103)

🔇 Additional comments (1)
components/clp-package-utils/clp_package_utils/general.py (1)

368-370: ```shell
#!/bin/bash
sed -n '530,590p' components/clp-package-utils/clp_package_utils/general.py


</details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines +544 to +574
def validate_path_for_container_mount(path: pathlib.Path) -> None:
RESTRICTED_PREFIXES: List[pathlib.Path] = [
pathlib.Path("/bin"),
pathlib.Path("/boot"),
pathlib.Path("/dev"),
pathlib.Path("/etc"),
pathlib.Path("/lib"),
pathlib.Path("/lib32"),
pathlib.Path("/lib64"),
pathlib.Path("/libx32"),
pathlib.Path("/proc"),
pathlib.Path("/root"),
pathlib.Path("/run"),
pathlib.Path("/sbin"),
pathlib.Path("/srv"),
pathlib.Path("/sys"),
pathlib.Path("/usr"),
pathlib.Path("/var"),
CONTAINER_CLP_HOME,
CONTAINER_INPUT_LOGS_ROOT_DIR,
]

if not path.is_absolute():
raise ValueError(f"Path: `{path}` must be absolute:")

for prefix in RESTRICTED_PREFIXES:
if path.is_relative_to(prefix):
raise ValueError(
f"Invalid path: `{path}` cannot be under '{prefix}' which may overlap with a path"
f" in the container."
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

pathlib.Path.is_relative_to breaks Python 3.8 compatibility

Path.is_relative_to was introduced in Python 3.9. The rest of this file purposefully avoids it (using try/except around relative_to) to stay 3.8-compatible; introducing it here will raise AttributeError on 3.8 runtimes.

Proposed drop-in patch that retains compatibility and fixes the trailing-colon message typo:

@@
-    if not path.is_absolute():
-        raise ValueError(f"Path: `{path}` must be absolute:")
+    if not path.is_absolute():
+        raise ValueError(f"Path `{path}` must be absolute.")

-    for prefix in RESTRICTED_PREFIXES:
-        if path.is_relative_to(prefix):
+    for prefix in RESTRICTED_PREFIXES:
+        try:
+            path.relative_to(prefix)
+        except ValueError:
+            continue
         raise ValueError(
             f"Invalid path: `{path}` cannot be under '{prefix}' which may overlap with a path"
             f" in the container."
         )

Minor extras:
• Add a short doc-string – Pylint is already flagging this.
RESTRICTED_PREFIXES is fine in ALL_CAPS (constant), you can silence the style warning with # pylint: disable=invalid-name if desired.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def validate_path_for_container_mount(path: pathlib.Path) -> None:
RESTRICTED_PREFIXES: List[pathlib.Path] = [
pathlib.Path("/bin"),
pathlib.Path("/boot"),
pathlib.Path("/dev"),
pathlib.Path("/etc"),
pathlib.Path("/lib"),
pathlib.Path("/lib32"),
pathlib.Path("/lib64"),
pathlib.Path("/libx32"),
pathlib.Path("/proc"),
pathlib.Path("/root"),
pathlib.Path("/run"),
pathlib.Path("/sbin"),
pathlib.Path("/srv"),
pathlib.Path("/sys"),
pathlib.Path("/usr"),
pathlib.Path("/var"),
CONTAINER_CLP_HOME,
CONTAINER_INPUT_LOGS_ROOT_DIR,
]
if not path.is_absolute():
raise ValueError(f"Path: `{path}` must be absolute:")
for prefix in RESTRICTED_PREFIXES:
if path.is_relative_to(prefix):
raise ValueError(
f"Invalid path: `{path}` cannot be under '{prefix}' which may overlap with a path"
f" in the container."
)
def validate_path_for_container_mount(path: pathlib.Path) -> None:
RESTRICTED_PREFIXES: List[pathlib.Path] = [
pathlib.Path("/bin"),
pathlib.Path("/boot"),
pathlib.Path("/dev"),
pathlib.Path("/etc"),
pathlib.Path("/lib"),
pathlib.Path("/lib32"),
pathlib.Path("/lib64"),
pathlib.Path("/libx32"),
pathlib.Path("/proc"),
pathlib.Path("/root"),
pathlib.Path("/run"),
pathlib.Path("/sbin"),
pathlib.Path("/srv"),
pathlib.Path("/sys"),
pathlib.Path("/usr"),
pathlib.Path("/var"),
CONTAINER_CLP_HOME,
CONTAINER_INPUT_LOGS_ROOT_DIR,
]
if not path.is_absolute():
raise ValueError(f"Path `{path}` must be absolute.")
for prefix in RESTRICTED_PREFIXES:
try:
path.relative_to(prefix)
except ValueError:
continue
raise ValueError(
f"Invalid path: `{path}` cannot be under '{prefix}' which may overlap with a path"
f" in the container."
)
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 544-544: Missing function or method docstring

(C0116)


[convention] 545-545: Variable name "RESTRICTED_PREFIXES" doesn't conform to snake_case naming style

(C0103)

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 544 to
574, replace the use of pathlib.Path.is_relative_to with a try-except block
using path.relative_to to maintain Python 3.8 compatibility. Add a short
doc-string to the function to satisfy Pylint. Fix the trailing colon typo in the
ValueError message for non-absolute paths. Optionally, keep RESTRICTED_PREFIXES
in all caps and add a pylint disable comment for invalid-name to avoid style
warnings.

@haiqi96

haiqi96 commented Jun 16, 2025

Copy link
Copy Markdown
Contributor Author

@kirkrodrigues how about
# Note: This path cannot (or must not?) overlap with any path used in clp's execution container.

@kirkrodrigues

Copy link
Copy Markdown
Member

I don't think we need to check aws_config_directory since

1. It is a read only mount, so we don't write to it

2. It has to be mounted  to a specific path in the container to make it work for the default user.

Wouldn't we need to guard against someone setting the data/logs/archive/stream output path to the same path?

@kirkrodrigues

Copy link
Copy Markdown
Member

@kirkrodrigues how about # Note: This path cannot (or must not?) overlap with any path used in clp's execution container.

# NOTE: This directory must not overlap with any path used in CLP's execution container. An error
# will be raised if so.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
components/clp-package-utils/clp_package_utils/general.py (1)

545-577: Path.is_relative_to breaks Py 3.8 compatibility (still unresolved)

This helper still relies on path.is_relative_to, introduced in Python 3.9; CLP currently supports 3.8. On 3.8 the call raises AttributeError, blocking package startup.

Please switch to the safe try/except relative_to pattern, add a minimal doc-string, and fix the stray colon in the “must be absolute” error.

@@
-    if not path.is_absolute():
-        raise ValueError(f"Path: `{path}` must be absolute:")
+    if not path.is_absolute():
+        raise ValueError(f"Path `{path}` must be absolute.")
@@
-        if path.is_relative_to(prefix):
+        try:
+            path.relative_to(prefix)
+        except ValueError:
+            continue
         raise ValueError(
             f"Invalid path: `{path}` cannot be under '{prefix}' which may overlap with a path"
             f" in the container."
         )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3cd1f0 and 33f9d7a.

📒 Files selected for processing (2)
  • components/clp-package-utils/clp_package_utils/general.py (5 hunks)
  • components/package-template/src/etc/clp-config.yml (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
components/clp-package-utils/clp_package_utils/general.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
  • get_directory (541-542)
  • get_directory (563-564)
🪛 Pylint (3.3.7)
components/clp-package-utils/clp_package_utils/general.py

[convention] 545-545: Missing function or method docstring

(C0116)


[convention] 546-546: Variable name "RESTRICTED_PREFIXES" doesn't conform to snake_case naming style

(C0103)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (3)
components/package-template/src/etc/clp-config.yml (1)

74-76: Clear warning comments added – good improvement

The repeated NOTE blocks succinctly alert operators that these directories must not overlap with container-internal paths and document the resulting error behaviour. This should prevent a class of support tickets. Nice touch.

Also applies to: 99-101, 108-110, 113-115

components/clp-package-utils/clp_package_utils/general.py (2)

270-276: Assignment aligns with new constant

The conditional mount now uses DEFAULT_AWS_CONFIG_DIRECTORY, which keeps the literal in one place—good for maintainability.


369-371: Early validation looks correct

Validating data_directory and logs_directory before mutating them with the hostname keeps the check simple and still guarantees safety. 👍

Comment on lines +512 to +514
validate_path_for_container_mount(clp_config.archive_output.get_directory())
validate_path_for_container_mount(clp_config.stream_output.get_directory())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Redundant but harmless re-validation

validate_worker_config repeats the same mount-safety check performed during config load. While not wrong, it adds negligible value and another traversal of the restricted list. Optional: drop the duplicate to reduce cognitive overhead.

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 512 to
514, the calls to validate_path_for_container_mount are redundant because the
same mount-safety checks are already performed during config load. To reduce
unnecessary duplicate validation and cognitive overhead, remove these repeated
calls from validate_worker_config.

Comment on lines 44 to 45
DEFAULT_AWS_CONFIG_DIRECTORY = pathlib.Path("/") / ".aws"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Constant name is fine, but double-check intent of "/.aws"

DEFAULT_AWS_CONFIG_DIRECTORY = pathlib.Path("/") / ".aws" mounts to /​.aws at the container root.
If the intent was /home/<user>/.aws, ensure all call-sites expect the root-level location; otherwise consider documenting the rationale to avoid confusion.

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py at lines 44-45, the
constant DEFAULT_AWS_CONFIG_DIRECTORY is set to the root-level path "/.aws".
Verify if the intended directory is actually the root-level or a user-specific
path like "/home/<user>/.aws". If the root-level path is correct, add a comment
explaining this choice to clarify intent for future maintainers. If not, update
the path to point to the appropriate user directory and ensure all code using
this constant aligns with the new location.

@haiqi96 haiqi96 requested a review from kirkrodrigues June 16, 2025 19:55

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
components/clp-package-utils/clp_package_utils/general.py (2)

568-570: Tighten the “absolute path” error message.

The current string has an unnecessary “Path:” prefix and a trailing colon that makes the message look like a typo.

-    if not path.is_absolute():
-        raise ValueError(f"Path: `{path}` must be absolute:")
+    if not path.is_absolute():
+        raise ValueError(f"Path `{path}` must be absolute.")

571-573: Path.is_relative_to breaks Python 3.8 support – use relative_to with try/except.

pathlib.Path.is_relative_to only exists from 3.9 onwards.
The rest of the project still targets 3.8, so importing this module in a 3.8 runtime will raise AttributeError, making the CLI unusable.

-    for prefix in RESTRICTED_PREFIXES:
-        if path.is_relative_to(prefix):
+    for prefix in RESTRICTED_PREFIXES:
+        try:
+            path.relative_to(prefix)
+        except ValueError:
+            continue  # not under this prefix
         raise ValueError(
             f"Invalid path: `{path}` cannot be under '{prefix}' which may overlap with a path"
             f" in the container."
         )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 33f9d7a and 873c0aa.

📒 Files selected for processing (1)
  • components/clp-package-utils/clp_package_utils/general.py (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
components/clp-package-utils/clp_package_utils/general.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
  • get_directory (541-542)
  • get_directory (563-564)
🪛 Pylint (3.3.7)
components/clp-package-utils/clp_package_utils/general.py

[convention] 545-545: Missing function or method docstring

(C0116)


[convention] 546-546: Variable name "RESTRICTED_PREFIXES" doesn't conform to snake_case naming style

(C0103)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: lint-check (ubuntu-latest)

Comment on lines +545 to +546
def validate_path_for_container_mount(path: pathlib.Path) -> None:
RESTRICTED_PREFIXES: List[pathlib.Path] = [

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add a short doc-string for validate_path_for_container_mount.

Pylint is already complaining. A one-liner describing the purpose of the guard is enough.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 545-545: Missing function or method docstring

(C0116)


[convention] 546-546: Variable name "RESTRICTED_PREFIXES" doesn't conform to snake_case naming style

(C0103)

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 545 to
546, the function validate_path_for_container_mount lacks a doc-string, causing
a pylint warning. Add a concise one-line doc-string immediately below the
function definition that briefly describes its purpose, such as indicating it
validates if a given path is safe for container mounting.

@kirkrodrigues kirkrodrigues changed the title fix(package): Prevent remapping of output paths when mounting in CLP execution containers. (fixes #960) fix(package): Don't remap output paths when mounting into CLP execution containers (fixes #960). Jun 18, 2025

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited the PR title directly.

Comment on lines +547 to +565
pathlib.Path("/bin"),
pathlib.Path("/boot"),
pathlib.Path("/dev"),
pathlib.Path("/etc"),
pathlib.Path("/lib"),
pathlib.Path("/lib32"),
pathlib.Path("/lib64"),
pathlib.Path("/libx32"),
pathlib.Path("/proc"),
pathlib.Path("/root"),
pathlib.Path("/run"),
pathlib.Path("/sbin"),
pathlib.Path("/srv"),
pathlib.Path("/sys"),
pathlib.Path("/usr"),
pathlib.Path("/var"),
CONTAINER_CLP_HOME,
CONTAINER_INPUT_LOGS_ROOT_DIR,
CONTAINER_AWS_CONFIG_DIRECTORY,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should've mentioned this in the previous review, but can we alphabetize these?

@haiqi96 haiqi96 requested a review from kirkrodrigues June 18, 2025 15:06

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
components/clp-package-utils/clp_package_utils/general.py (1)

545-576: Python 3.8 incompatibility & minor polish in validate_path_for_container_mount

  1. Path.is_relative_to is a Python 3.9 feature – calling it will raise an AttributeError in the 3.8 runtimes we still support.
  2. The “must be absolute” error message has an extra trailing colon.
  3. Pylint is flagging the missing doc-string.
  4. The loop pattern can be kept 3.8-compatible with a try/except relative_to block.

Proposed drop-in patch:

-def validate_path_for_container_mount(path: pathlib.Path) -> None:
-    RESTRICTED_PREFIXES: List[pathlib.Path] = [
+def validate_path_for_container_mount(path: pathlib.Path) -> None:
+    """
+    Validate that a host path is absolute and does not live under a container-reserved
+    prefix that would shadow critical directories once mounted.
+    """
+    # pylint: disable=invalid-name
+    RESTRICTED_PREFIXES: List[pathlib.Path] = [
@@
-    if not path.is_absolute():
-        raise ValueError(f"Path: `{path}` must be absolute:")
+    if not path.is_absolute():
+        raise ValueError(f"Path `{path}` must be absolute.")
@@
-    for prefix in RESTRICTED_PREFIXES:
-        if path.is_relative_to(prefix):
+    for prefix in RESTRICTED_PREFIXES:
+        try:
+            path.relative_to(prefix)
+        except ValueError:
+            continue
         raise ValueError(
             f"Invalid path: `{path}` cannot be under '{prefix}' which may overlap with a path"
             f" in the container."
         )

This restores 3.8 compatibility, silences the style warning, and fixes the typo.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 873c0aa and 17952a6.

📒 Files selected for processing (1)
  • components/clp-package-utils/clp_package_utils/general.py (5 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
components/clp-package-utils/clp_package_utils/general.py

[convention] 545-545: Missing function or method docstring

(C0116)


[convention] 546-546: Variable name "RESTRICTED_PREFIXES" doesn't conform to snake_case naming style

(C0103)

Comment on lines +369 to +371
validate_path_for_container_mount(clp_config.data_directory)
validate_path_for_container_mount(clp_config.logs_directory)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider de-duplicating mount-path validation

validate_path_for_container_mount is invoked here for data_directory and logs_directory, and later again for archive & stream outputs in validate_worker_config.
If every CLP entry-point already flows through load_config_file, it may be cleaner to centralise all four validations in one place to avoid future drift.

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 369 to
371, the function validate_path_for_container_mount is called separately for
data_directory and logs_directory, and then again for archive and stream outputs
in validate_worker_config. To avoid duplication and potential inconsistencies,
consolidate all four path validations into a single location, ideally within
load_config_file, so that all mount-path validations happen centrally and
consistently for every CLP entry-point.

@haiqi96 haiqi96 merged commit f3ffa53 into y-scope:main Jun 18, 2025
7 checks passed
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…on containers (fixes y-scope#960). (y-scope#998)

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants