Skip to content

fix(clp-package): Resolve relative paths and symlinks in host input and output paths passed through the package's config or CLI arguments (fixes #1558).#1559

Merged
junhaoliao merged 15 commits into
y-scope:mainfrom
junhaoliao:resolve-host-path-space
Nov 6, 2025

Conversation

@junhaoliao

@junhaoliao junhaoliao commented Nov 5, 2025

Copy link
Copy Markdown
Member

Description

This change strengthens host-path handling and removes .resolve() usages that could misinterpret paths inside bootstrap containers of the sbin scripts.

Summary of changes:

  • clp_py_utils.core
    • Enhance resolve_host_path_in_container to:
      • Expand ~ to the user’s home directory.
      • Resolve relative paths against CLP_PWD_HOST (with a clear error if unset).
      • Perform recursive symlink resolution with cycle detection.
      • Guarantee an absolute, container-mount path output under /mnt/host.
    • Add new helper resolve_host_path(host_path: Path) -> Path, returning the fully resolved host-space path (without the /mnt/host prefix).
  • clp_package_utils.general
    • Use resolve_host_path(...) for logs_input.directory when building container configs.
  • clp_package_utils.scripts.compress
    • Replace Path(...).resolve() with resolve_host_path(...) for input paths and lists.
  • clp_package_utils.scripts.decompress
    • Stop pre-resolving extraction_dir before translation; pass through to resolve_host_path_in_container(...).
    • Default --extraction-dir to "." instead of relying on CLP_PWD_HOST at arg-parse time (resolution now happens in core).
    • resolve_host_path(extraction_dir) as the source for the docker bind mount.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Tested compress.sh - input paths that are relative, contains ~, symlinks, or regular but whose parent is a symlink can now be properly ingested

junhao@baker22:~/Downloads/clp-package$ ll ~/samples/mongod 
-rw-r--r-- 1 junhao 1000071 393636231 Jan  9  2024 /home/junhao/samples/mongod
junhao@baker22:~/Downloads/clp-package$ ll ~/nfs_junhao
lrwxrwxrwx 1 junhao Domain Users 13 Oct  4  2023 /home/junhao/nfs_junhao -> /homes/junhao/
junhao@baker22:~/Downloads/clp-package$ ll ~/nfs_junhao/mongod 
-rw-r--r-- 1 junhao Domain Users 393636231 Nov  5 12:20 /home/junhao/nfs_junhao/mongod
junhao@baker22:~/Downloads/clp-package$ ./sbin/compress.sh ~/samples/mongod 
2025-11-05T21:40:10.377 INFO [compress] Compression job 4 submitted.
2025-11-05T21:40:17.398 INFO [compress] Compressed 375.40MB into 2.92MB (128.48x). Speed: 53.85MB/s.
2025-11-05T21:40:17.900 INFO [compress] Compression finished.
2025-11-05T21:40:17.901 INFO [compress] Compressed 375.40MB into 2.92MB (128.48x). Speed: 52.49MB/s.
junhao@baker22:~/Downloads/clp-package$ ./sbin/compress.sh ../../samples/mongod 
2025-11-05T21:44:27.202 INFO [compress] Compression job 5 submitted.
2025-11-05T21:44:33.222 INFO [compress] Compression finished.
2025-11-05T21:44:33.222 INFO [compress] Compressed 375.40MB into 2.92MB (128.48x). Speed: 63.75MB/s.
junhao@baker22:~/Downloads/clp-package$ ./sbin/compress.sh ~/samples/
2025-11-05T21:44:49.783 INFO [compress] Compression job 6 submitted.
2025-11-05T21:44:55.801 INFO [compress] Compression finished.
2025-11-05T21:44:55.801 INFO [compress] Compressed 375.40MB into 2.92MB (128.48x). Speed: 64.94MB/s.
junhao@baker22:~/Downloads/clp-package$ ./sbin/compress.sh ../../samples/
2025-11-05T21:45:15.504 INFO [compress] Compression job 7 submitted.
2025-11-05T21:45:21.532 INFO [compress] Compression finished.
2025-11-05T21:45:21.532 INFO [compress] Compressed 375.40MB into 2.92MB (128.48x). Speed: 64.82MB/s.
junhao@baker22:~/Downloads/clp-package$ ./sbin/compress.sh ~/nfs_junhao/mongod 
2025-11-05T21:45:42.551 INFO [compress] Compression job 8 submitted.
2025-11-05T21:45:49.572 INFO [compress] Compressed 375.40MB into 2.92MB (128.48x). Speed: 53.46MB/s.
2025-11-05T21:45:50.074 INFO [compress] Compression finished.
2025-11-05T21:45:50.074 INFO [compress] Compressed 375.40MB into 2.92MB (128.48x). Speed: 52.82MB/s.
junhao@baker22:~/Downloads/clp-package$ ./sbin/compress.sh ../../nfs_junhao/mongod 
2025-11-05T21:46:16.597 INFO [compress] Compression job 9 submitted.
2025-11-05T21:46:23.115 INFO [compress] Compression finished.
2025-11-05T21:46:23.115 INFO [compress] Compressed 375.40MB into 2.92MB (128.48x). Speed: 60.30MB/s.
junhao@baker22:~/Downloads/clp-package$ ./sbin/compress.sh ~/nfs_junhao
2025-11-05T21:46:50.268 INFO [compress] Compression job 10 submitted.
2025-11-05T21:46:56.298 INFO [compress] Compression finished.
2025-11-05T21:46:56.298 INFO [compress] Compressed 375.40MB into 2.92MB (128.48x). Speed: 65.35MB/s.
junhao@baker22:~/Downloads/clp-package$ ./sbin/compress.sh ../../nfs_junhao
2025-11-05T21:48:31.205 INFO [compress] Compression job 11 submitted.
2025-11-05T21:48:37.235 INFO [compress] Compression finished.
2025-11-05T21:48:37.236 INFO [compress] Compressed 375.40MB into 2.92MB (128.48x). Speed: 67.64MB/s.

Tested the decompression script in various directories

./sbin/decompress.sh x
# observed successful decompression and inspected the decompressed files fine

./sbin/decompress.sh x --extraction-dir ..
# observed successful decompression and inspected the decompressed files in `..` fine


cd ~/nfs_junhao
readlink -f .
# /homes/junhao
~/Downloads/clp-package/sbin/decompress.sh x
# observed successful decompression and inspected the decompressed files fine

Reset and restarted the package with below settings that set the input directory to a symlink one

logs_input:
 type: "fs"

 # NOTE: This directory will be exposed inside the container, so symbolic links to files outside
 # this directory will be ignored.
 directory: "../../nfs_junhao"

->

junhao@baker22:~/Downloads/clp-package/sbin$ ./compress.sh ~/nfs_junhao/mongod 
2025-11-05T22:01:14.130 INFO [compress] Compression job 1 submitted.
2025-11-05T22:01:14.633 ERROR [compress] Compression failed. At least one of your input paths could not be processed. See the error log at 'user/job_1_failed_paths.txt' inside your configured logs directory (`logs_directory`) for more details.

->

Failed input paths log.
Generated at 2025-11-05T22:01:14.

"/mnt/logs/homes/junhao/mongod" does not exist.

which is expected. The rationale can be seen in the clp-config.yml comment.

Summary by CodeRabbit

Release Notes

  • Refactor
    • Enhanced path resolution for input logs with improved symbolic link handling and cycle detection.
    • Updated default extraction directory to use the current working directory for improved usability.

y-scope#1558); Resolve symlinks recursively in `resolve_host_path_in_container`.
@coderabbitai

coderabbitai Bot commented Nov 5, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

This change introduces a new resolve_host_path function in the CLP utilities to standardize host path resolution across the package. The function replaces direct .resolve() calls in compress.py, decompress.py, and general.py with a consistent resolution mechanism that handles symlinks and relative paths using the CLP_PWD_HOST environment variable.

Changes

Cohort / File(s) Summary
Core path resolution utility
components/clp-py-utils/clp_py_utils/core.py
Added new public function resolve_host_path() to resolve host paths by converting container-space results to host-space. Added internal helper _resolve_symlinks_in_path() with symlink cycle detection via inode tracking. Updated resolve_host_path_in_container() to expand user tilde, handle relative paths via CLP_PWD_HOST, and use dedicated symlink resolution flow. Added os import.
Script path resolution updates
components/clp-package-utils/clp_package_utils/scripts/compress.py, components/clp-package-utils/clp_package_utils/scripts/decompress.py, components/clp-package-utils/clp_package_utils/general.py
Imported resolve_host_path from clp_py_utils.core. Replaced direct pathlib.Path(...).resolve() calls with resolve_host_path(pathlib.Path(...)) for host path normalization. In decompress.py, changed default --extraction-dir from computed environment-based path to ".".

Sequence Diagram

sequenceDiagram
    participant User as User Input
    participant Script as compress.py/<br/>decompress.py
    participant resolve_host_path as resolve_host_path()
    participant resolve_symlinks as _resolve_symlinks_<br/>in_path()
    participant Container as resolve_host_path_<br/>in_container()
    
    User->>Script: Provide host path (relative or absolute)
    Script->>resolve_host_path: resolve_host_path(Path)
    activate resolve_host_path
    resolve_host_path->>Container: Call resolve_host_path_in_container()
    activate Container
    Container->>Container: Expand user tilde<br/>Convert relative→absolute<br/>(via CLP_PWD_HOST)
    Container->>resolve_symlinks: _resolve_symlinks_in_path()
    activate resolve_symlinks
    resolve_symlinks->>resolve_symlinks: Walk components,<br/>follow symlinks<br/>(with inode tracking)
    alt Symlinks resolved
        resolve_symlinks-->>Container: Resolved Path
    else Resolution failed
        resolve_symlinks-->>Container: None
    end
    deactivate resolve_symlinks
    
    alt Got resolved symlinks
        Container-->>resolve_host_path: Container path (resolved)
    else Fallback to mount root
        Container-->>resolve_host_path: Container path (mount-rooted)
    end
    deactivate Container
    
    resolve_host_path->>resolve_host_path: Convert to host-space<br/>(rooted at /)
    resolve_host_path-->>Script: Host Path
    deactivate resolve_host_path
    
    Script->>Script: Use resolved path
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • High-attention areas:
    • _resolve_symlinks_in_path() symlink following logic with inode tracking for cycle detection—verify correctness of symlink traversal and edge case handling
    • resolve_host_path_in_container() refactoring—confirm that relative path expansion via CLP_PWD_HOST is correctly applied and documented
    • Verification that all three scripts (compress.py, decompress.py, general.py) correctly use resolve_host_path() instead of .resolve(), particularly around Docker bind mount construction in general.py
    • In decompress.py, confirm that the default --extraction-dir change to "." combined with resolve_host_path() behaves as intended

Possibly related issues

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately and specifically describes the main changes: resolving relative paths and symlinks in host input and output paths, with a clear reference to the issue being fixed.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao changed the title fix(clp-package): Resolve log input paths in the host path space (fixes #1558); Resolve symlinks recursively in resolve_host_path_in_container. fix(clp-package): Resolve log input paths in the host path space (fixes #1558); Resolve symlinks recursively and user home in resolve_host_path_in_container. Nov 5, 2025
@junhaoliao junhaoliao marked this pull request as ready for review November 5, 2025 22:42
@junhaoliao junhaoliao requested a review from a team as a code owner November 5, 2025 22:42

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the validation steps for the rationale of adding those extra handling in resolve_host_path_in_container

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

fix(clp-package): Resolve symlinks in host input and output paths passed through the package's config or CLI arguments (fixes #1558).

Comment thread components/clp-py-utils/clp_py_utils/core.py Outdated
Comment thread components/clp-py-utils/clp_py_utils/core.py Outdated
Comment thread components/clp-py-utils/clp_py_utils/core.py Outdated
Comment thread components/clp-py-utils/clp_py_utils/core.py Outdated
Comment thread components/clp-py-utils/clp_py_utils/core.py Outdated
Comment thread components/clp-py-utils/clp_py_utils/core.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd0601f and d82d6c6.

📒 Files selected for processing (1)
  • components/clp-py-utils/clp_py_utils/core.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
🪛 Ruff (0.14.3)
components/clp-py-utils/clp_py_utils/core.py

81-81: Avoid specifying long messages outside the exception class

(TRY003)


108-108: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (4)
components/clp-py-utils/clp_py_utils/core.py (4)

1-1: LGTM!

The os import is necessary for accessing the CLP_PWD_HOST environment variable.


68-74: LGTM!

The docstring clearly describes the function's behaviour and appropriately references resolve_host_path for resolution details.


78-83: Verify that .absolute() is necessary.

After joining a relative host_path with pwd_host, the result should already be absolute if pwd_host itself is absolute. The explicit .absolute() call on line 83 might be redundant.

If you want to be defensive against cases where pwd_host might be relative, consider validating that pwd_host is absolute instead:

     if not host_path.is_absolute():
         pwd_host = os.getenv("CLP_PWD_HOST")
         if pwd_host is None:
             raise ValueError("Relative host path provided but CLP_PWD_HOST is not set.")
+        pwd_host = pathlib.Path(pwd_host)
+        if not pwd_host.is_absolute():
+            raise ValueError("CLP_PWD_HOST must be an absolute path.")
-        host_path = pathlib.Path(pwd_host) / host_path
-        host_path = host_path.absolute()
+        host_path = pwd_host / host_path

116-128: LGTM!

The new resolve_host_path helper is well-designed with a clear docstring. The implementation correctly delegates to resolve_host_path_in_container and translates the result back to host path space.

Comment thread components/clp-py-utils/clp_py_utils/core.py
Comment thread components/clp-py-utils/clp_py_utils/core.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d82d6c6 and ffed08e.

📒 Files selected for processing (1)
  • components/clp-py-utils/clp_py_utils/core.py (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1466
File: .github/workflows/clp-rust-checks.yaml:14-15
Timestamp: 2025-10-22T21:14:12.225Z
Learning: Repository y-scope/clp: In GitHub Actions workflows (e.g., .github/workflows/clp-rust-checks.yaml), YAML anchors/aliases are acceptable and preferred to avoid duplication; if actionlint flags an alias node (e.g., on push.paths) as an error, treat it as a tool limitation and do not require inlining unless the team asks to silence the warning.
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1261
File: docs/src/dev-guide/components-core/manylinux-2-28-deps-install.md:24-24
Timestamp: 2025-08-25T06:32:48.313Z
Learning: In the CLP project documentation, when linking to scripts or other files within the repository, use relative paths (e.g., components/core/tools/scripts/...) rather than commit-pinned GitHub URLs to ensure docs and referenced files always belong to the same commit and stay synchronized.
Learnt from: haiqi96
Repo: y-scope/clp PR: 1144
File: components/clp-package-utils/clp_package_utils/scripts/native/dataset_manager.py:164-170
Timestamp: 2025-08-13T15:07:37.767Z
Learning: In the CLP codebase, the ArchiveOutput.get_directory() method already returns a resolved Path object, so additional .resolve() calls on its return value are redundant.
Learnt from: haiqi96
Repo: y-scope/clp PR: 1144
File: components/clp-package-utils/clp_package_utils/scripts/native/dataset_manager.py:164-170
Timestamp: 2025-08-13T15:07:37.767Z
Learning: In the CLP codebase, the ArchiveOutput.get_directory() method already returns a resolved Path object, so additional .resolve() calls on its return value are redundant.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Learnt from: haiqi96
Repo: y-scope/clp PR: 1144
File: components/clp-package-utils/clp_package_utils/scripts/native/dataset_manager.py:164-170
Timestamp: 2025-08-13T15:07:37.767Z
Learning: In the CLP codebase, the ArchiveOutput.get_directory() method already returns a resolved/absolute Path object because FsStorage.make_config_paths_absolute() processes the directory path during configuration loading, making additional .resolve() calls redundant.
Learnt from: haiqi96
Repo: y-scope/clp PR: 1144
File: components/clp-package-utils/clp_package_utils/scripts/native/dataset_manager.py:164-170
Timestamp: 2025-08-13T15:07:37.767Z
Learning: In the CLP codebase, the ArchiveOutput.get_directory() method already returns a resolved/absolute Path object because FsStorage.make_config_paths_absolute() processes the directory path during configuration loading, making additional .resolve() calls redundant.
🪛 Ruff (0.14.3)
components/clp-py-utils/clp_py_utils/core.py

96-96: Avoid specifying long messages outside the exception class

(TRY003)


144-144: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (2)
components/clp-py-utils/clp_py_utils/core.py (2)

1-1: LGTM!

The os import is necessary for accessing environment variables in the updated resolution logic.


68-81: LGTM!

The new resolve_host_path function provides a clean public API for host-space path resolution. The implementation correctly delegates to resolve_host_path_in_container and strips the /mnt/host prefix to return the resolved host path.

Comment on lines 67 to +104
def resolve_host_path_in_container(host_path: pathlib.Path) -> pathlib.Path:
"""
Translates a host path to its container-mount equivalent. It also resolves a single level of
symbolic link if the host path itself is a symlink.
Resolves a host path and translates it into its container-mount equivalent absolute path. See
`resolve_host_path` for details about the resolution.

:param host_path: The host path.
:return: The translated path.
:return: The translated path (with /mnt/host prefix).
"""
host_path = host_path.absolute()
translated_path = CONTAINER_DIR_FOR_HOST_ROOT / host_path.relative_to("/")
host_path = host_path.expanduser()

try:
if not translated_path.is_symlink():
return translated_path

link_target = translated_path.readlink()
if link_target.is_absolute():
return CONTAINER_DIR_FOR_HOST_ROOT / link_target.relative_to("/")
else:
# If the symlink points to a relative path, resolve it relative to the symlink's parent.
return (translated_path.parent / link_target).resolve()
except OSError:
# Ignore if reading the symlink fails (e.g., broken link or permission error).
pass
if not host_path.is_absolute():
pwd_host = os.getenv("CLP_PWD_HOST")
if pwd_host is None:
raise ValueError("Relative host path provided but CLP_PWD_HOST is not set.")
host_path = pathlib.Path(pwd_host) / host_path
host_path = host_path.absolute()

return translated_path
resolved_path = _resolve_symlinks_in_path(host_path)
if resolved_path is not None:
return resolved_path

return CONTAINER_DIR_FOR_HOST_ROOT / host_path.relative_to("/")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

LGTM!

The enhanced resolution logic correctly handles tilde expansion, relative paths via CLP_PWD_HOST, and delegates symlink resolution to the new helper function. The fallback behaviour is appropriate.

Optional: Consider extracting the error message (line 96).

Static analysis suggests defining the error message outside the exception class for reusability, though the current inline approach is acceptable for this single use case.

🧰 Tools
🪛 Ruff (0.14.3)

96-96: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/core.py around lines 83 to 104, extract
the inline error string used when CLP_PWD_HOST is missing into a named constant
(e.g., RELATIVE_PATH_PWD_HOST_ERROR = "Relative host path provided but
CLP_PWD_HOST is not set.") at module scope, then use that constant when raising
the ValueError so the message is reusable and easier to test/maintain (replace
the inline string in raise ValueError(...) with the constant).

Comment on lines +117 to +147
def _resolve_symlinks_in_path(host_path: pathlib.Path) -> pathlib.Path | None:
"""
Resolves symlinks in a path by walking through each component and following symlinks.

:param host_path: The host path.
:return: The resolved path with all symlinks followed, or None on error.
"""
try:
visited_symlink_inodes = set()
current_path = CONTAINER_DIR_FOR_HOST_ROOT
for part in host_path.relative_to("/").parts:
current_path = current_path / part

while current_path.is_symlink():
stat = current_path.lstat()
if stat.st_ino in visited_symlink_inodes:
break
visited_symlink_inodes.add(stat.st_ino)

link_target = current_path.readlink()
if link_target.is_absolute():
current_path = CONTAINER_DIR_FOR_HOST_ROOT / link_target.relative_to("/")
else:
# If the symlink points to a relative path, resolve it relative to the symlink's
# parent.
current_path = (current_path.parent / link_target).resolve()

return current_path
except OSError:
# Ignore if reading the symlink fails (e.g., broken link or permission error).
return None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Remove .resolve() usage that contradicts PR objectives.

Line 142 uses .resolve(), which contradicts the PR's core objective of removing .resolve() usages that could misinterpret paths inside bootstrap containers. This call resolves symlinks in the container's filesystem context rather than the host's context, bypassing the cycle detection mechanism implemented in this helper.

As noted in the previous review, the .resolve() call should be replaced with manual path normalization.

Apply this diff to manually normalize the relative symlink target:

                else:
                    # If the symlink points to a relative path, resolve it relative to the symlink's
                    # parent.
-                    current_path = (current_path.parent / link_target).resolve()
+                    normalized_path_str = os.path.normpath(str(current_path.parent / link_target))
+                    current_path = pathlib.Path(normalized_path_str)

This approach normalizes the path (handling .. and . components) without resolving symlinks, allowing the outer loop to continue handling any additional symlinks encountered.


Optional: Consider restructuring the try-except for clarity.

Static analysis suggests moving the return current_path statement (line 144) to an else block to follow the TRY300 style guideline, though the current structure is acceptable.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.3)

144-144: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
components/clp-py-utils/clp_py_utils/core.py around lines 117-147: the use of
.resolve() on line ~142 must be removed because it resolves symlinks in the
container FS and bypasses the helper's cycle-detection; replace that .resolve()
call with manual path normalization: compute the candidate path as
current_path.parent / link_target, then normalize its parts by processing each
component (skip '.'; on '..' remove the previous component if any; otherwise
append the part) to produce a normalized path without following symlinks, and
assign that normalized path to current_path so the outer loop can continue
detecting and handling any subsequent symlinks; keep the visited_symlink_inodes
logic intact. Optionally, you can move the final return current_path into an
else block of the try for clearer flow.

@junhaoliao

Copy link
Copy Markdown
Member Author

Retested following the "Validation performed" steps

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See title suggested in previous review.

@junhaoliao junhaoliao changed the title fix(clp-package): Resolve log input paths in the host path space (fixes #1558); Resolve symlinks recursively and user home in resolve_host_path_in_container. fix(clp-package): Resolve relative paths and symlinks in host input and output paths passed through the package's config or CLI arguments (fixes #1558). Nov 6, 2025
@junhaoliao junhaoliao merged commit 10cdc91 into y-scope:main Nov 6, 2025
20 checks passed
@junhaoliao junhaoliao deleted the resolve-host-path-space branch November 6, 2025 06:18
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…nd output paths passed through the package's config or CLI arguments (fixes y-scope#1558). (y-scope#1559)

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants