Skip to content

fix(clp-package): Expand ~ in aws_config_directory config (fixes #1257).#1258

Merged
junhaoliao merged 13 commits into
y-scope:mainfrom
junhaoliao:resolve-user-home-aws
Sep 27, 2025
Merged

fix(clp-package): Expand ~ in aws_config_directory config (fixes #1257).#1258
junhaoliao merged 13 commits into
y-scope:mainfrom
junhaoliao:resolve-user-home-aws

Conversation

@junhaoliao

@junhaoliao junhaoliao commented Aug 24, 2025

Copy link
Copy Markdown
Member

Description

see #1257 for the reason of the change.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  1. built the package
  2. cd build/clp-package/
  3. set up profile credentials as instructed at https://docs.yscope.com/clp/v0.5.0/user-guide/guides-using-object-storage/clp-config.html#profile
  4. added below line to the bottom of etc/clp-config.yml
    logs_input:
      type: "s3"
      aws_authentication:
        type: "profile"
        profile: "default"
    package:
      storage_engine: "clp-s"
      query_engine: "clp-s"
    aws_config_directory: "~/.aws"
  5. sbin/start-clp.sh and observed the package was started successfully.
  6. compressed sample log file with s3 path and observed the job completed successfully: ./compress.sh --timestamp-key=timestamp https://<file-on-s3>
  7. docker inspect clp-compression_worker-c415 | jq .[0].HostConfig.Mounts and observed a correct full path showing my .aws in my user's home directory.

Summary by CodeRabbit

  • Bug Fixes
    • The AWS config directory setting now expands the home-directory shortcut (~) before validation, resolving paths like ~/... to full paths. Empty or unspecified configurations remain unchanged. Error messages for nonexistent config paths now include the actual path to aid troubleshooting. No user action required; improves reliability across environments.

@junhaoliao junhaoliao requested a review from a team as a code owner August 24, 2025 09:00
@coderabbitai

coderabbitai Bot commented Aug 24, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Adds a Pydantic validator expand_profile_user_home to CLPConfig.aws_config_directory that expands a leading tilde via pathlib.Path.expanduser; None remains unchanged. The existing existence check/error remains and its message now includes the path.

Changes

Cohort / File(s) Summary
Config validation
components/clp-py-utils/clp_py_utils/clp_config.py
Adds expand_profile_user_home(cls, value: Optional[pathlib.Path]) Pydantic validator to expand a leading ~ via pathlib.Path.expanduser for aws_config_directory; preserves None and existing validation flow. Augments the non-existent-path error message to include the path.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller
    participant CLPConfig
    participant Validator as expand_profile_user_home
    participant Path as pathlib.Path.expanduser

    Caller->>CLPConfig: set/load aws_config_directory
    CLPConfig->>Validator: validate(value)
    alt value is not None
        Validator->>Path: expanduser(value)
        Path-->>Validator: expanded_path
        Validator-->>CLPConfig: expanded_path
    else value is None
        Validator-->>CLPConfig: None
    end
    CLPConfig-->>Caller: normalised aws_config_directory (or error with path included)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly summarises the primary change—expanding the tilde in the aws_config_directory config—while referencing the relevant component and issue number. It is concise, specific, and aligned with standard commit message conventions. A reader scanning PR history will immediately understand the main purpose of this changeset.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d55e6e and 63befcf.

📒 Files selected for processing (1)
  • components/clp-py-utils/clp_py_utils/clp_config.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: package-image

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Ruff (0.13.1)
components/clp-py-utils/clp_py_utils/clp_config.py

�[1;31mruff failed�[0m
�[1mCause:�[0m Failed to load configuration /ruff.toml
�[1mCause:�[0m Failed to parse /ruff.toml
�[1mCause:�[0m TOML parse error at line 26, column 3
|
26 | "RSE100", # Use of assert detected
| ^^^^^^^^
Unknown rule selector: RSE100


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao requested a review from haiqi96 August 24, 2025 09:00
_os_release_file_path: pathlib.Path = PrivateAttr(default=OS_RELEASE_FILE_PATH)

@validator("aws_config_directory", pre=True)
def validate_aws_config_directory(cls, value):

@junhaoliao junhaoliao Aug 24, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency with other validators, i prefixed this with validate though it does not really validate the path (the actual valuation is done in validate_aws_config_dir())

we may rename the method as expand_profile_user_home if the validate prefix convention does not need to be followed

@haiqi96 haiqi96 Aug 25, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quickly looked at some examples and docs online, I don't think we have to name it with a "validate" prefix.

let's go with expand_profile_user_home.

Btw, any chance we can use [expanduser] (https://docs.python.org/3/library/pathlib.html#pathlib.Path.expanduser)from Path()?

Since the type annotation of aws_directory is pathlib.Path, I feel it's more consistent to use the library from Path().

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. let me double check if the pre=True is still needed for good in that case

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4fa8a55 and bda9058.

📒 Files selected for processing (1)
  • components/clp-py-utils/clp_py_utils/clp_config.py (1 hunks)
🔇 Additional comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)

814-819: LGTM: sensible pre-validator to expand a leading tilde for aws_config_directory

Using a pre-validator here ensures the expansion occurs before Pydantic casts to pathlib.Path. This aligns with the PR objective and is low-risk.

Comment on lines +814 to +819
@validator("aws_config_directory", pre=True)
def validate_aws_config_directory(cls, value):
if value is not None:
return os.path.expanduser(value)
return value

@coderabbitai coderabbitai Bot Aug 24, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add focused tests for tilde expansion and edge cases

Recommend adding unit tests that cover:

  • "~/.aws" expands to Path.home()/".aws"
  • pathlib.Path("~/.aws") also expands correctly
  • None stays None
  • "" (or " ") normalizes to None (with the suggested change)

I can provide a test scaffold if helpful.

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/clp_config.py around lines 814 to 819,
add focused unit tests and adjust behavior so tilde expansion and
empty/whitespace inputs are handled: ensure "~/.aws" expands to
Path.home()/".aws", pathlib.Path("~/.aws") is expanded the same, None remains
None, and empty string or pure-whitespace is normalized to None; update the
validator to accept Path inputs and call os.path.expanduser on strings (or
convert Path to str first), and add pytest cases for the four scenarios
including both string and Path inputs to cover edge cases.

🛠️ Refactor suggestion

Return a Path and guard against empty strings to avoid accidental CWD usage

As written, an empty string would be preserved and later coerced to Path(""), which resolves to the current working directory. That can accidentally pass exists() checks and point AWS config resolution at the wrong directory. Also, returning a pathlib.Path directly avoids an extra conversion step and handles Path-like inputs cleanly.

Consider this tweak:

 @validator("aws_config_directory", pre=True)
 def validate_aws_config_directory(cls, value):
-    if value is not None:
-        return os.path.expanduser(value)
-    return value
+    if value is None:
+        return None
+    # Treat an empty/whitespace-only string as unset to force an explicit path when required.
+    if isinstance(value, str) and value.strip() == "":
+        return None
+    # Normalise to a Path and expand user (~) for both str and Path-like inputs.
+    return pathlib.Path(value).expanduser()

This makes misconfiguration harder and keeps the field’s effective type consistent.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/clp_config.py around lines 814-819, the
aws_config_directory validator currently returns strings (and preserves empty
strings) which can become Path("") and resolve to CWD; change the validator to
reject/normalize empty-string inputs and return a pathlib.Path for non-empty
values: if the incoming value is falsy (None or empty string) return None;
otherwise expanduser on the string (or convert Path-like input) and wrap with
pathlib.Path before returning so callers always get a Path or None and
accidental CWD usage is avoided.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i updated the code so the hook is no longer with pre=True. could you check if the latest code makes sense to you?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@junhaoliao Yes, the updated code makes much more sense! Removing pre=True is the right approach here. Here's why this implementation is better:

What pre=True removal accomplishes:

  • Without pre=True, Pydantic first converts the input to the field's declared type (Optional[pathlib.Path])
  • So when your validator runs, value is already a pathlib.Path object (or None)
  • You can then call value.expanduser() directly using the pathlib method

Why this is cleaner:

  • Type consistency: The validator receives and returns the same type (Optional[pathlib.Path])
  • Better method: Uses pathlib.Path.expanduser() instead of os.path.expanduser(), which is more appropriate
  • Simpler logic: No need to handle different input types (string vs Path) since Pydantic handles the conversion first

The current implementation handles:

  • NoneNone
  • pathlib.Path("~/.aws") → expanded Path object ✅
  • String inputs like "~/.aws" → converted to Path by Pydantic, then expanded ✅

This is a much cleaner approach than the previous pre=True version that had to deal with mixed input types. The flow is now: raw input → Pydantic conversion → your validator → final Path object.


🏁 Script executed:

#!/bin/bash
# Get the current aws_config_directory validator implementation
rg -nA 10 -B 2 "validate_aws_config_directory" components/clp-py-utils/clp_py_utils/clp_config.py

Length of output: 777


@junhaoliao Yes, the updated code makes perfect sense! Removing pre=True is exactly the right approach here. Here's why this implementation is much better:

What the updated code achieves:

  • Without pre=True, Pydantic first converts the input to the field's declared type (Optional[pathlib.Path])
  • Your validator then receives a pathlib.Path object (or None)
  • You can call value.expanduser() directly using the pathlib method

Why this is cleaner:

  • Type consistency: The validator receives and returns Optional[pathlib.Path]
  • Proper method usage: Uses pathlib.Path.expanduser() instead of os.path.expanduser()
  • Simpler logic: No need to handle mixed input types since Pydantic handles conversion first
  • No edge case handling: Pydantic's conversion from string to Path is robust

Flow is now clean:

  1. Raw input (e.g., "~/.aws") → Pydantic converts to pathlib.Path("~/.aws")
  2. Your validator expands the tilde → pathlib.Path("/home/user/.aws")
  3. Result is a proper pathlib.Path object

This addresses all the concerns from my previous suggestions about handling different input types and avoiding accidental current working directory usage. The implementation is now both simpler and more robust. Well done! 👍

@junhaoliao junhaoliao requested review from haiqi96 and removed request for haiqi96 August 27, 2025 20:03
Comment thread components/clp-py-utils/clp_py_utils/clp_config.py Outdated
Co-authored-by: haiqi96 <14502009+haiqi96@users.noreply.github.com>
@junhaoliao junhaoliao requested a review from haiqi96 August 27, 2025 22:05

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
components/clp-py-utils/clp_py_utils/clp_config.py (2)

814-819: Method name: keep as-is for consistency, or explicitly resolve prior suggestion

Earlier thread proposed renaming to expand_profile_user_home; current validate_aws_config_directory matches this file’s validator naming pattern. I’d keep it, or if you want a non-“validate” name, consider expand_aws_config_directory (clearer than “profile”). Please resolve with the reviewer to close the thread.


814-819: Normalise blank inputs to None to avoid accidental cwd (“.”) passing exists()

If users provide an empty string, Pydantic coerces it to Path("."), which will make validate_aws_config_dir’s exists() check succeed unintentionally. Add a small pre-validator to treat empty/whitespace-only strings as unset.

Apply this diff near the same block:

+    @validator("aws_config_directory", pre=True)
+    def _normalise_aws_config_directory(cls, value):
+        if value is None:
+            return None
+        if isinstance(value, str) and value.strip() == "":
+            return None
+        return value
@@
     @validator("aws_config_directory")
     def validate_aws_config_directory(cls, value: Optional[pathlib.Path]):
         if value is not None:
             value = value.expanduser()
         return value
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cba8a38 and b64f7b5.

📒 Files selected for processing (1)
  • components/clp-py-utils/clp_py_utils/clp_config.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)

814-819: Tilde expansion is correct and type-safe

Expanding via Path.expanduser post-parsing keeps the field as Path or None and aligns with the rest of the model validators. Looks good.

Comment on lines +814 to +819
@validator("aws_config_directory")
def validate_aws_config_directory(cls, value: Optional[pathlib.Path]):
if value is not None:
value = value.expanduser()
return value

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add focused tests for expansion and edge cases

Please add tests to cover:

  • "~/.aws" (string) expands to Path.home()/".aws"
  • Path("~/.aws") expands identically
  • None remains None
  • Empty string (e.g., " ") normalises to None (with the suggested pre-validator)

I can provide a pytest scaffold if helpful.

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/clp_config.py around lines 814 to 819,
add focused pytest tests and a small pre-validator to normalize empty/whitespace
strings to None so the existing aws_config_directory validator only needs to
expand paths; create tests that assert: the string "~/.aws" expands to
Path.home()/".aws"; Path("~/.aws") expands identically; None remains None; and
an empty/whitespace string like "  " is normalized to None before expansion.
Ensure tests call the model/validator in the same way production code does and
cover both str and pathlib.Path inputs.

_os_release_file_path: pathlib.Path = PrivateAttr(default=OS_RELEASE_FILE_PATH)

@validator("aws_config_directory")
def validate_aws_config_directory(cls, value: Optional[pathlib.Path]):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, as we agreed a while ago, can we rename it to expand_profile_user_home ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might to good to update the exception message at

raise ValueError("aws_config_directory does not exist")
to include the resolved path for ease of debugging

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

fix(clp-package): Expand `~` in `aws_config_directory` config (fixes #1257).

Comment thread components/clp-py-utils/clp_py_utils/clp_config.py Outdated
@junhaoliao junhaoliao changed the title feat(CLPConfig): Add validator to expand user home directory in aws_config_directory (fixes #1257). fix(clp-package): Expand ~ in aws_config_directory config (fixes #1257). Sep 27, 2025
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
@junhaoliao junhaoliao merged commit 0b05346 into y-scope:main Sep 27, 2025
19 checks passed
@junhaoliao junhaoliao deleted the resolve-user-home-aws branch September 27, 2025 04:34
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…-scope#1257). (y-scope#1258)

Co-authored-by: haiqi96 <14502009+haiqi96@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants