Skip to content

feat(package): Add verbose logging option to archive_manager and improve error handling.#1173

Merged
haiqi96 merged 18 commits into
y-scope:mainfrom
haiqi96:archive_manager
Aug 14, 2025
Merged

feat(package): Add verbose logging option to archive_manager and improve error handling.#1173
haiqi96 merged 18 commits into
y-scope:mainfrom
haiqi96:archive_manager

Conversation

@haiqi96

@haiqi96 haiqi96 commented Aug 7, 2025

Copy link
Copy Markdown
Contributor

Description

This PR slight improves the log messages in archive manager.

  1. the script prints out dataset explicitly, so that it's clear to user that when dataset is unspecified, default will be used.
  2. the script does not print the stack trace when the container command fails, since the stack trace only reports a failure somewhere at the subprocess.run call tree, which is useless. It should be the responsibliity of the native script to log errors properly.
  3. Added a --verbose flag so that if needed, developers can get the container command.

The 2 and 3 can also be a general strategy to apply to issues described in #1081. We can introduce a few helper functions for reuse, for example

def add_logging_options(arg_parser)
def set_logging_level(arg_parser)
def run_docker_cmd(cmd: List[str])

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Manually tested the behavior

Summary by CodeRabbit

  • New Features

    • Added an optional verbose flag (--verbose / -v) to enable debug-level logging.
  • Improvements

    • Log messages now include dataset-specific context for find/delete operations.
    • Propagates verbose flag to sub-operations and returns actual exit codes on failure.
    • Clarified wording for unsupported storage-type messages and ensured cleanup runs after sub-operations.

@coderabbitai

coderabbitai Bot commented Aug 7, 2025

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@haiqi96 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 41 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 397bbf6 and ed8a501.

📒 Files selected for processing (1)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5 hunks)

Walkthrough

Adds a top-level --verbose/-v flag to both top-level and native archive_manager CLIs, sets logger level from that flag, forwards --verbose to the inner subprocess invocation, includes dataset context in several log messages, and changes subprocess handling to return and propagate the inner process exit code while logging failures and removing the generated config file after invocation.

Changes

Cohort / File(s) Change Summary
Top-level archive manager
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
Add top-level --verbose/-v flag and set logger level based on it; forward --verbose to inner command when present; change non-FS error text to "Archive manager is not supported for storage type: ..."; avoid raising on subprocess failures (capture return code, log "Dataset manager failed." on non-zero), log full command at debug level on failure, unconditionally remove generated config file after inner invocation, and return the inner process return code; return -1 for unsupported top-level subcommands.
Native archive manager
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
Add --verbose/-v flag and set logger level based on it; propagate dataset parameter to _find_archives and _delete_archives; enhance log messages to include dataset context ("of dataset <dataset>") when dataset is provided; update function signatures to accept dataset: Optional[str].

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant TopLevelCLI
    participant NativeCLI
    participant Logger
    User->>TopLevelCLI: run archive_manager (--verbose optional, subcommand)
    TopLevelCLI->>Logger: set level DEBUG/INFO based on --verbose
    TopLevelCLI->>NativeCLI: spawn subprocess (append --verbose if set, append --dataset if provided)
    NativeCLI->>Logger: set level DEBUG/INFO based on --verbose
    NativeCLI->>Logger: log find/delete actions (include dataset context when present)
    NativeCLI-->>TopLevelCLI: exit with return code
    TopLevelCLI->>Logger: if non-zero log "Dataset manager failed." and debug full command
    TopLevelCLI->>TopLevelCLI: remove generated config file
    TopLevelCLI-->>User: return inner process return code (or -1 for unsupported subcommand)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Possibly related issues

Possibly related PRs

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@haiqi96

haiqi96 commented Aug 7, 2025

Copy link
Copy Markdown
Contributor Author

@junhaoliao @hoophalab Can you guys take a look since it can be used to fix #1081?

@haiqi96

haiqi96 commented Aug 7, 2025

Copy link
Copy Markdown
Contributor Author

@coderabbitai Please review this PR.

@coderabbitai

coderabbitai Bot commented Aug 7, 2025

Copy link
Copy Markdown
Contributor
🧩 Analysis chain

I'll examine the changes in this PR to provide a comprehensive review.

@haiqi96: I'll review this PR focused on improving log messages for archive manager scripts. The changes you've described align well with better user experience and debugging capabilities. Let me examine the specific implementation details and provide feedback on the code quality and approach.

From your description, the key improvements are:

  1. Explicit dataset logging - Making it clear when default datasets are used
  2. Better error handling - Removing unhelpful stack traces from container command failures
  3. Debug flag addition - Allowing developers to see container commands when needed

These changes should indeed help address the broader logging issues mentioned in #1081. The suggestion to create reusable helper functions (add_logging_options, set_logging_level, run_docker_cmd) is a good architectural approach for consistency across scripts.

Regarding the concern about --dataset and -d(--debug) being confusing - this seems reasonable as they serve different purposes and the short flags are sufficiently distinct.


🏁 Script executed:

#!/bin/bash
# Get the list of files changed in this PR
git diff --name-only HEAD~1 HEAD

# Show the actual changes made
git diff HEAD~1 HEAD

Length of output: 431


🏁 Script executed:

#!/bin/bash
# Look for archive manager scripts to understand the context better
fd -t f -e py | grep -i archive | grep -i manager

Length of output: 206


🏁 Script executed:

#!/bin/bash
# Search for any existing debug or logging patterns in the codebase
rg -A 3 -B 3 "debug.*flag|logging.*level" --type py

Length of output: 24919


🏁 Script executed:

#!/bin/bash
# Look at issue #1081 context if available
rg -A 5 -B 5 "1081" --type md

Length of output: 29


🏁 Script executed:

#!/bin/bash
# Examine the archive manager scripts structure
fd -t f -e py | grep -E "(archive|manager)" | head -10 | xargs ls -la

Length of output: 326


🏁 Script executed:

#!/bin/bash
# Get more specific information about archive manager related files
fd -t f -e py | grep -E "archive.*manager|manager.*archive" | xargs cat

Length of output: 23098


🏁 Script executed:

#!/bin/bash
# Search for archive-related Python files in components directory
fd -t f -e py components | grep archive

Length of output: 41


🏁 Script executed:

#!/bin/bash
# Look at the package utils directory for archive management scripts
ls -la components/clp-package-utils/clp_package_utils/scripts/ 2>/dev/null || echo "Directory not found"
fd -t f -e py components/clp-package-utils | head -10

Length of output: 1152


🏁 Script executed:

#!/bin/bash
# Check what archive management scripts exist
find . -name "*.py" -path "*/scripts/*" | grep -i archive | head -10

Length of output: 229


🏁 Script executed:

#!/bin/bash
# Let's look at the actual git log to see what files were changed in recent commits
git log --oneline -10 --name-only

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Get information about the current branch and recent changes
git branch
git log --oneline -5

Length of output: 55


🏁 Script executed:

#!/bin/bash
# Look for any Python files that might contain archive management functionality
rg -l "archive.*manager|manager.*archive" --type py | head -10

Length of output: 138


🏁 Script executed:

#!/bin/bash
# Search for files that contain subprocess.run calls to understand the current error handling
rg -A 5 -B 5 "subprocess\.run" --type py | head -30

Length of output: 2505


🏁 Script executed:

#!/bin/bash
# Look for dataset-related logging to understand the current implementation
rg -A 3 -B 3 "dataset.*default|default.*dataset" --type py

Length of output: 60


🏁 Script executed:

#!/bin/bash
# Search for any existing debug flags in the codebase
rg -A 3 -B 3 "debug.*flag|-d.*debug|--debug" --type py

Length of output: 2244


🏁 Script executed:

#!/bin/bash
# Check for any container command execution patterns
rg -A 5 -B 2 "container.*command|docker.*cmd" --type py

Length of output: 8454


@haiqi96: I can see the two archive manager scripts that were mentioned in your PR description. Based on my analysis of the existing code, I can provide a comprehensive review of your logging improvements.

Code Analysis

The current implementation shows:

Current Error Handling Issues:

  • Line 268: subprocess.run(cmd, check=True) with generic exception handling
  • Line 271: Only logs "Archive manager failed." without the actual container command
  • Line 272: Debug logging of the command only happens in the exception block

Current Dataset Handling:

  • Lines 148-158: Dataset validation exists but doesn't explicitly log when defaults are used
  • Line 179: Dataset is conditionally added to the command but no explicit logging

Your Proposed Improvements Are Excellent

Your three key improvements address real usability issues:

  1. 📋 Explicit Dataset Logging: Making it clear when default datasets are used will prevent user confusion about which dataset operations are targeting.

  2. 🚫 Removing Unhelpful Stack Traces: The current generic exception handling that shows Python stack traces from subprocess.run failures provides little value since the actual error details should come from the native script's logs.

  3. 🔧 Debug Flag for Container Commands: Adding the ability to see the actual Docker command being executed is invaluable for debugging and development.

Architecture Suggestions

Your proposed helper functions for code reuse are spot-on:

  • add_logging_options(arg_parser) - Standardize debug/logging argument handling
  • set_logging_level(arg_parser) - Consistent logging configuration
  • run_docker_cmd(cmd: List[str]) - Centralized container execution with proper error handling

This approach aligns well with the existing clp_py_utils.clp_logging utilities I found in the codebase.

Minor Concerns

Your concern about --dataset and -d(--debug) confusion is minimal - the context makes them sufficiently distinct, and -d for debug is a common pattern.

The changes follow the established pattern I see in the codebase for keeping error handling simple in wrapper scripts (consistent with retrieved learning about clp-package-utils scripts).

This is a well-thought-out improvement that will enhance both user experience and developer debugging capabilities while setting up a foundation for addressing the broader #1081 logging issues.


🏁 Script executed:

#!/bin/bash
# Let's examine the actual changes made in this PR by looking at git diff against main
git diff main --name-only

Length of output: 1794


🏁 Script executed:

#!/bin/bash
# Show the actual diff against main to see what changes were made
git diff main

Length of output: 136315


🏁 Script executed:

#!/bin/bash
# Look for any container execution error handling patterns in the current archive manager
rg -A 10 -B 5 "subprocess\.run.*check=True|except.*Exception" components/clp-package-utils/clp_package_utils/scripts/archive_manager.py

Length of output: 1370


🏁 Script executed:

#!/bin/bash
# Check the current error handling in the archive manager script
rg -A 15 -B 5 "subprocess\.run.*cmd" components/clp-package-utils/clp_package_utils/scripts/archive_manager.py

Length of output: 664


🏁 Script executed:

#!/bin/bash
# Look at the dataset logging patterns in archive manager
rg -A 5 -B 5 "dataset.*message|dataset.*log|CLP_DEFAULT_DATASET_NAME" components/clp-package-utils/clp_package_utils/scripts/archive_manager.py

Length of output: 847


Perfect! Now I can see the actual implementation and provide a comprehensive review of your changes.

Excellent Implementation! ✅

Your changes deliver exactly what was promised in the PR description and represent a meaningful improvement to user experience and debugging capabilities.

Key Improvements Successfully Implemented:

  1. 📋 Explicit Dataset Logging - Brilliantly implemented in native script:

    dataset_specific_message = f" of dataset `{dataset}`" if dataset is not None else ""
    logger.info(f"Starting to find archives{dataset_specific_message} from the database.")

    This makes it crystal clear when operations target specific datasets vs. defaults.

  2. 🚫 Clean Error Handling - The subprocess execution improvement is spot-on:

    try:
        subprocess.run(cmd, check=True)
        ret_code = 0
    except Exception:
        logger.error("Archive manager failed.")
        logger.debug(f"Docker command failed: {' '.join(cmd)}")
        ret_code = -1

    No more unhelpful Python stack traces for container failures, with debug info available when needed.

  3. 🔧 Debug Flag Implementation - Properly implemented across both scripts:

    • Consistent --debug/-d argument parsing
    • Proper logging level configuration
    • Debug command logging only when requested
    • Flag passthrough from wrapper to native script

Additional Quality Improvements Noted:

  • Fixed Typo: "Archive deletion" → "Archive manager" in error message
  • Proper Return Codes: Clean error handling with appropriate exit codes
  • Consistent Patterns: Debug flag implementation follows established conventions
  • Resource Cleanup: Proper finally block ensures temporary files are cleaned up

Architecture Foundation:

This sets up perfectly for your broader #1081 logging improvements. The helper functions you mentioned (add_logging_options, set_logging_level, run_docker_cmd) would be natural extensions of these patterns.

The implementation quality is high, the user experience improvements are tangible, and it aligns perfectly with the existing codebase patterns I observed. This is exactly the kind of incremental, focused improvement that makes systems more maintainable and user-friendly.

Great work! 🎉


🧠 Learnings used
Learnt from: haiqi96
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-07-29T14:04:13.769Z
Learning: User haiqi96 requested creating a GitHub issue to document a bug fix from PR #1136, which addressed MySQL compatibility issues with invalid SQL CAST operations in the WebUI component.

Learnt from: junhaoliao
PR: y-scope/clp#1136
File: components/webui/client/src/pages/IngestPage/Details/sql.ts:1-1
Timestamp: 2025-07-29T21:00:07.757Z
Learning: User junhaoliao requested creating a GitHub issue to track server-side SQL query error handling improvements, specifically to prevent uncaught query errors from causing 500 errors to reach the client.

Learnt from: gibber9809
PR: y-scope/clp#504
File: components/core/src/clp_s/search/kql/CMakeLists.txt:29-29
Timestamp: 2024-10-22T15:36:04.655Z
Learning: When reviewing pull requests, focus on the changes within the PR and avoid commenting on issues outside the scope of the PR.

Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.

Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.

Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.

Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.

Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate `set -eu` and `set -o pipefail` commands rather than combining them into `set -euo pipefail`, to ensure uniform script structure across all platform build scripts.
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@hoophalab

hoophalab commented Aug 11, 2025

Copy link
Copy Markdown
Contributor

I confirm that archive-manager.sh no longer shows a traceback, but I have little experience with package scripts. So probably someone needs to take a look at the code.

$ ./build/clp-package/sbin/admin-tools/archive-manager.sh find
2025-08-11T21:54:21.671 ERROR [sql_adapter] Error connecting to MariaDB: Can't connect to server on '127.0.0.1' (115)
2025-08-11T21:54:21.671 ERROR [archive_manager] Can't connect to server on '127.0.0.1' (115)
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
# <-- no traceback

@haiqi96

haiqi96 commented Aug 12, 2025

Copy link
Copy Markdown
Contributor Author

I confirm that archive-manager.sh no longer shows a traceback, but I have little experience with package scripts. So probably someone needs to take a look at the code.

$ ./build/clp-package/sbin/admin-tools/archive-manager.sh find
2025-08-11T21:54:21.671 ERROR [sql_adapter] Error connecting to MariaDB: Can't connect to server on '127.0.0.1' (115)
2025-08-11T21:54:21.671 ERROR [archive_manager] Can't connect to server on '127.0.0.1' (115)
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
# <-- no traceback

Just curious, was database running when you run the command?

@haiqi96 haiqi96 marked this pull request as ready for review August 12, 2025 16:25
@haiqi96 haiqi96 requested a review from a team as a code owner August 12, 2025 16:25
@junhaoliao

Copy link
Copy Markdown
Member

I confirm that archive-manager.sh no longer shows a traceback, but I have little experience with package scripts. So probably someone needs to take a look at the code.

$ ./build/clp-package/sbin/admin-tools/archive-manager.sh find
2025-08-11T21:54:21.671 ERROR [sql_adapter] Error connecting to MariaDB: Can't connect to server on '127.0.0.1' (115)
2025-08-11T21:54:21.671 ERROR [archive_manager] Can't connect to server on '127.0.0.1' (115)
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
# <-- no traceback

Just curious, was database running when you run the command?

Maybe the package wasn't running when the Archive manager is run? In such cases, would changing the error message to be like "Archive manager failed. Is the package running?" help users to debug?

@hoophalab

Copy link
Copy Markdown
Contributor

I confirm that archive-manager.sh no longer shows a traceback, but I have little experience with package scripts. So probably someone needs to take a look at the code.

$ ./build/clp-package/sbin/admin-tools/archive-manager.sh find
2025-08-11T21:54:21.671 ERROR [sql_adapter] Error connecting to MariaDB: Can't connect to server on '127.0.0.1' (115)
2025-08-11T21:54:21.671 ERROR [archive_manager] Can't connect to server on '127.0.0.1' (115)
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
# <-- no traceback

Just curious, was database running when you run the command?

I just stopped the clp database container and ran the script. I guess this is what the issue page did?

@haiqi96

haiqi96 commented Aug 13, 2025

Copy link
Copy Markdown
Contributor Author

I confirm that archive-manager.sh no longer shows a traceback, but I have little experience with package scripts. So probably someone needs to take a look at the code.

$ ./build/clp-package/sbin/admin-tools/archive-manager.sh find
2025-08-11T21:54:21.671 ERROR [sql_adapter] Error connecting to MariaDB: Can't connect to server on '127.0.0.1' (115)
2025-08-11T21:54:21.671 ERROR [archive_manager] Can't connect to server on '127.0.0.1' (115)
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
# <-- no traceback

Just curious, was database running when you run the command?

I just stopped the clp database container and ran the script. I guess this is what the issue page did?

This is a different issue. The issue is about the non-native script returnning redundant error message of the subprocess call stack(which is fixed in this PR).

You and Junhao do raise a valid point that the message can be more user friendly, but I would treat it as a different issue. Btw, there's also similar issues in other utility scripts (compress.py, search.py and decompress.py) so I think we should fix it in a separate PR.
@coderabbitai can you create an issue to remind us using more user friendly error message for our utility script?

@coderabbitai

coderabbitai Bot commented Aug 13, 2025

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@haiqi96: I'll create an issue to track the user-friendly error message improvements for utility scripts as you requested.


🏁 Script executed:

#!/bin/bash
# Create a GitHub issue for tracking user-friendly error message improvements
gh issue create \
  --title "Improve error messages in utility scripts to be more user-friendly" \
  --body "## Context

As discussed in PR #1173, while we successfully removed unhelpful stack traces from subprocess failures in archive manager scripts, there's an opportunity to make error messages across our utility scripts more user-friendly and actionable.

## Current Behavior

When utility scripts encounter errors (e.g., database connection failures), they show technical error messages like:
\`\`\`
2025-08-11T21:54:21.671 ERROR [sql_adapter] Error connecting to MariaDB: Can't connect to server on '127.0.0.1' (115)
2025-08-11T21:54:21.671 ERROR [archive_manager] Can't connect to server on '127.0.0.1' (115)
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
\`\`\`

## Proposed Improvement

Enhance error messages to provide actionable guidance for users, such as:
- \"Archive manager failed. Is the package running?\"
- \"Database connection failed. Please ensure the CLP services are running.\"

## Affected Scripts

- \`components/clp-package-utils/clp_package_utils/scripts/archive_manager.py\`
- \`components/clp-package-utils/clp_package_utils/scripts/compress.py\`
- \`components/clp-package-utils/clp_package_utils/scripts/search.py\`
- \`components/clp-package-utils/clp_package_utils/scripts/decompress.py\`

## References

- PR #1173: https://github.com/y-scope/clp/pull/1173
- Original discussion: https://github.com/y-scope/clp/pull/1173#issuecomment-2285329497

Requested by: @haiqi96" \
  --assignee haiqi96

Length of output: 1590


@haiqi96: I've created issue #1189 to track the user-friendly error message improvements for utility scripts as you requested. The issue includes:

  • Context from the current PR discussion
  • Examples of current vs. proposed error messages
  • List of affected scripts (archive_manager.py, compress.py, search.py, decompress.py)
  • References to this PR and the comment chain
  • Assignment to you as requested

You're absolutely right that this is a separate concern from the current PR, which successfully addresses the original stack trace issue. The new issue will help track this broader improvement across all utility scripts.

Issue link: #1189


🧠 Learnings used
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.

@coderabbitai

coderabbitai Bot commented Aug 13, 2025

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@haiqi96: I'll create a GitHub issue to track the user-friendly error message improvements for the utility scripts.


🏁 Script executed:

#!/bin/bash
# Create a GitHub issue for improving error messages in utility scripts
gh issue create \
  --title "Improve user-friendly error messages in package utility scripts" \
  --body "## Description

This issue tracks the need to improve error messages in CLP package utility scripts to be more user-friendly and actionable for end users.

## Background

While working on PR #1173 to fix traceback issues in archive manager scripts, it was identified that error messages could be more helpful to users. For example, when database connection fails, instead of just showing:

\`\`\`
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
\`\`\`

A more user-friendly message could be:

\`\`\`
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed. Is the package running?
\`\`\`

## Affected Scripts

The following utility scripts in \`components/clp-package-utils/clp_package_utils/scripts/\` could benefit from more user-friendly error messages:

- \`archive_manager.py\`
- \`compress.py\`
- \`search.py\`
- \`decompress.py\`

## Scope

- Review error handling patterns across utility scripts
- Identify common failure scenarios (database connection, container issues, etc.)
- Implement contextual error messages that guide users toward resolution
- Consider creating helper functions for consistent error messaging

## Context

- Original PR: https://github.com/y-scope/clp/pull/1173
- Discussion: https://github.com/y-scope/clp/pull/1173#issuecomment-2286264573
- Requested by: @haiqi96" \
  --assignee haiqi96

Length of output: 1479


Perfect! I've created GitHub issue #1190 to track the user-friendly error message improvements for the package utility scripts. The issue captures:

  • The specific need for more actionable error messages
  • Examples of current vs. improved messaging
  • The affected scripts (archive_manager.py, compress.py, search.py, decompress.py)
  • Scope for systematic improvements and helper functions
  • Context linking back to this PR and discussion

You're absolutely right to keep this separate from the current PR - the traceback fix you implemented here addresses the immediate problem perfectly, while the broader error message usability improvements can be tackled systematically in a future PR.

Great work on maintaining focused PR scope! 🎯


🧠 Learnings used
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.

Comment thread components/clp-package-utils/clp_package_utils/scripts/archive_manager.py Outdated

@junhaoliao junhaoliao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested that the docker command doesn't show unless --debug is specified.

@haiqi96 as you pointed out in the PR description, --debug and shorted as -d can cause confusion to users. Shall we rename it --verbose / -v which could be more commonly seen in CLI tools?

…_manager.py

Co-authored-by: Junhao Liao <junhao@junhao.ca>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64ad1a9 and 829ff5f.

📒 Files selected for processing (1)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (3)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (3)

69-74: Adding a top-level --debug flag: LGTM

The flag is clearly defined and scoped at the top-level parser, which keeps the UX consistent across subcommands.


184-185: Refined storage-type error message: LGTM

The message is clearer and user-focused. The spacing nit highlighted earlier appears resolved.


239-241: Forwarding --debug to the native script: LGTM

This correctly propagates the flag so developers can opt into detailed logs end-to-end.

Comment thread components/clp-package-utils/clp_package_utils/scripts/archive_manager.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)

159-162: Centralize logging level setup into a small helper to avoid duplication

This pattern will recur across wrapper scripts. Extract it into a helper (module-local now; shared later) to keep behaviour consistent.

Apply this diff to simplify the call site:

-    if parsed_args.debug:
-        logger.setLevel(logging.DEBUG)
-    else:
-        logger.setLevel(logging.INFO)
+    set_logging_level(parsed_args.debug)

Outside the selected range, add this helper near the top of the module:

def set_logging_level(debug: bool) -> None:
    """Sets module logger level based on a debug flag."""
    logger.setLevel(logging.DEBUG if debug else logging.INFO)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 829ff5f and 508f151.

📒 Files selected for processing (1)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build (macos-15)
🔇 Additional comments (4)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (4)

69-74: LGTM: Debug flag addition is clear and non-conflicting

The new --debug/-d option integrates cleanly with the CLI and doesn’t conflict with other short flags.


184-184: LGTM: Error message text reads naturally

Message is concise and accurate for non-FS storage types.


239-241: LGTM: Forwarding --debug to the inner CLI is correct

Ensures inner script emits DEBUG logs when requested.


266-268: LGTM: Early return for unsupported subcommand

Prevents execution of an invalid command path.

Comment thread components/clp-package-utils/clp_package_utils/scripts/archive_manager.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🔭 Outside diff range comments (5)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4)

254-260: Fix invalid type hint/default for end_ts

end_ts: int = typing.Optional[int] is a bug. You likely meant Optional[int] with a default of None. This currently sets the default to the type object, which breaks your None checks and is semantically wrong.

-def _find_archives(
-    archives_dir: Path,
-    database_config: Database,
-    dataset: typing.Optional[str],
-    begin_ts: int,
-    end_ts: int = typing.Optional[int],
-) -> int:
+def _find_archives(
+    archives_dir: Path,
+    database_config: Database,
+    dataset: typing.Optional[str],
+    begin_ts: int,
+    end_ts: typing.Optional[int] = None,
+) -> int:

373-387: DELETE … IN clause is broken; placeholders aren’t bound

ids_list_string is built with quoted "%s" and you’re not passing parameters to execute(), so the driver will see literal strings '%s' instead of binding archive_ids. This will fail silently or delete nothing.

-            ids_list_string: str = ", ".join(["'%s'"] * len(archive_ids))
+            placeholders: str = ", ".join(["%s"] * len(archive_ids))

             db_cursor.execute(
                 f"""
                 DELETE FROM `{get_files_table_name(table_prefix, dataset)}`
-                WHERE archive_id in ({ids_list_string})
+                WHERE archive_id in ({placeholders})
                 """
-            )
+            , archive_ids)

             db_cursor.execute(
                 f"""
                 DELETE FROM `{get_archive_tags_table_name(table_prefix, dataset)}`
-                WHERE archive_id in ({ids_list_string})
+                WHERE archive_id in ({placeholders})
                 """
-            )
+            , archive_ids)

404-414: Harden disk deletion and keep going on partial failures

A single rmtree failure aborts the whole operation. Wrap in try/except to log and continue (or return -1 if you prefer to fail-fast).

-    for archive_id in archive_ids:
-        archive_path = archive_output_dir / archive_id
-        if not archive_path.is_dir():
-            logger.warning(f"Archive {archive_id} is not a directory. Skipping deletion.")
-            continue
-
-        logger.info(f"Deleting archive {archive_id} from disk.")
-        shutil.rmtree(archive_path)
+    for archive_id in archive_ids:
+        archive_path = archive_output_dir / archive_id
+        if not archive_path.exists():
+            logger.warning(f"Archive {archive_id} not found on disk. Skipping deletion.")
+            continue
+        if not archive_path.is_dir():
+            logger.warning(f"Archive {archive_id} is not a directory. Skipping deletion.")
+            continue
+        logger.info(f"Deleting archive {archive_id} from disk.")
+        try:
+            shutil.rmtree(archive_path)
+        except Exception:
+            logger.exception(f"Failed to delete archive {archive_id} from disk.")

261-270: Docstring return value doesn’t match implementation

Docstring says “1 on failure” but code returns -1 on failure. Update for consistency.

-    :return: 0 on success, 1 on failure.
+    :return: 0 on success, -1 on failure.
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)

1-285: Fix inconsistent debug/verbose flags in archive manager scripts

The wrapper and native archive-manager CLIs are inconsistent: both define a --verbose flag but the code checks parsed_args.debug, and the wrapper forwards --debug to the native module. This will cause incorrect behaviour or runtime errors.

Files/locations found (from repo search):

  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
    • "--verbose" arg defined (line ~68)
    • checks parsed_args.debug (line ~157)
    • forwards/appends "--debug" to the native command (line ~237-238)
  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
    • "--verbose" arg defined (line ~100)
    • checks parsed_args.debug (line ~186)

Minimal, safe fixes

  • Make the flag name and attribute consistent across both scripts. E.g. add the flag as aliases and set dest="debug":
    • args_parser.add_argument("--debug", "--verbose", "-v", dest="debug", action="store_true", help="Enable debug logging.")
      This preserves both flag names for users while ensuring parsed_args.debug exists and the wrapper's forwarded "--debug" is accepted by the native script.
  • Alternatively, consistently use parsed_args.verbose everywhere and forward "--verbose" from the wrapper — but choose one approach across both scripts.

Recommended refactor (optional, but reduces drift)

  • Centralize logging/CLI behaviour:
    • add_logging_options(parser): adds the common --debug/--verbose flags (with a consistent dest)
    • set_logging_level(parsed_args) or set_logging_level(debug: bool)
    • run_docker_cmd(cmd: list[str], keep_artifacts_when_verbose: bool): runs docker + handles cleanup/logging
      This will remove duplication and prevent future flag/name drift.
♻️ Duplicate comments (1)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)

268-281: Propagate child exit code, pre-run DEBUG log, and robust cleanup

Current code loses the inner exit code (always -1 on failure), only logs the command post-failure, and unlink can raise. Improve diagnosability and robustness without reintroducing stack traces. This repeats earlier feedback.

-    ret_code: int
-    try:
-        subprocess.run(cmd, check=True)
-        ret_code = 0
-    except Exception:
-        logger.error("Archive manager failed.")
-        logger.debug(f"Docker command failed: {' '.join(cmd)}")
-        ret_code = -1
-    finally:
-        # Remove generated files
-        generated_config_path_on_host.unlink()
-
-    return ret_code
+    ret_code: int
+    # Emit the full container command when debugging is enabled.
+    if logger.isEnabledFor(logging.DEBUG):
+        # Use proper shell-quoting for readability
+        logger.debug(f"Running Docker command: {shlex.join(cmd)}")
+    try:
+        proc = subprocess.run(cmd)
+        ret_code = proc.returncode
+        if 0 != ret_code:
+            logger.error("Archive manager failed.")
+            logger.debug(f"Docker command failed: {shlex.join(cmd)}")
+    except Exception as e:
+        # Failure to start the process (not an inner failure).
+        logger.error(f"Failed to start Docker command: {e}")
+        logger.debug(f"Docker command: {shlex.join(cmd)}")
+        ret_code = -1
+    finally:
+        # Remove generated files; tolerate if already removed.
+        try:
+            # Keep the generated config when verbose for easier debugging.
+            if not parsed_args.verbose:
+                generated_config_path_on_host.unlink()
+        except FileNotFoundError:
+            pass
+
+    return ret_code

Outside the selected range, add:

import shlex
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 508f151 and 3367ce2.

📒 Files selected for processing (2)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
🔇 Additional comments (4)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)

271-274: Dataset-context in logs: nice improvement

The dataset-specific context in the “find” flow improves diagnosability without being noisy.


338-340: Dataset-context in logs: nice improvement (delete path)

Mirroring the “find” flow logging for delete is clear and useful.

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (2)

182-182: Wording nit fixed: storage-type message reads well

Message now reads cleanly without the extra space; keeps user output consistent across scripts.


265-265: Unsupported subcommand return code: good

Returning -1 makes the failure explicit to callers.

Comment thread components/clp-package-utils/clp_package_utils/scripts/archive_manager.py Outdated
Comment thread components/clp-package-utils/clp_package_utils/scripts/archive_manager.py Outdated
Comment on lines +186 to +189
if parsed_args.debug:
logger.setLevel(logging.DEBUG)
else:
logger.setLevel(logging.INFO)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix crash: parsed_args.debug doesn’t exist; use parsed_args.verbose

This code will raise AttributeError because the parser only defines --verbose/-v. Use the correct attribute, and be defensive with getattr.

-    if parsed_args.debug:
-        logger.setLevel(logging.DEBUG)
-    else:
-        logger.setLevel(logging.INFO)
+    if getattr(parsed_args, "verbose", False):
+        logger.setLevel(logging.DEBUG)
+    else:
+        logger.setLevel(logging.INFO)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if parsed_args.debug:
logger.setLevel(logging.DEBUG)
else:
logger.setLevel(logging.INFO)
if getattr(parsed_args, "verbose", False):
logger.setLevel(logging.DEBUG)
else:
logger.setLevel(logging.INFO)
🤖 Prompt for AI Agents
In
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
around lines 186 to 189, the code references parsed_args.debug which does not
exist; replace it with the correct parsed_args.verbose and guard access with
getattr to avoid AttributeError. Specifically, check something like
getattr(parsed_args, "verbose", False) (or >0 if verbose is a count) and set
logger.setLevel(logging.DEBUG) when verbose is truthy, otherwise set INFO;
ensure no direct attribute access to parsed_args.debug remains.

logger.info(f"Starting to find archives{dataset_specific_message} from the database.")
try:
sql_adapter: SQL_Adapter = SQL_Adapter(database_config)
clp_db_connection_params: dict[str, any] = (

@coderabbitai coderabbitai Bot Aug 13, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Type hint uses built-in any (function), not typing.Any

Use typing.Any to avoid type checker confusion and accidental shadowing.

-        clp_db_connection_params: dict[str, any] = (
+        clp_db_connection_params: dict[str, typing.Any] = (
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
clp_db_connection_params: dict[str, any] = (
clp_db_connection_params: dict[str, typing.Any] = (
🤖 Prompt for AI Agents
In
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
around line 276, the type hint uses the built-in any (function) which can
confuse type checkers; replace it with typing.Any (or import Any from typing and
use dict[str, Any]) — update the hint to use typing.Any and add the appropriate
import if needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be fixed in another PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haiqi96, understood! Thanks for letting me know this will be addressed in a separate PR. That makes sense to keep the scope of this PR focused on the logging improvements.


🧠 Learnings used
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.

Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (7)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (7)

197-199: Avoid bare except; don’t catch BaseException

Catching BaseException will swallow SystemExit/KeyboardInterrupt. Limit to Exception.

-    except:
+    except Exception:
         logger.exception("Failed to load config.")
         return -1

253-259: Incorrect type/default for end_ts; make begin_ts/end_ts Optional[int] with None defaults

The current signature sets a typing object as the default value and types begin_ts as int, while your CLI makes it optional. This causes inconsistent behaviour and type hints.

-def _find_archives(
-    archives_dir: Path,
-    database_config: Database,
-    dataset: typing.Optional[str],
-    begin_ts: int,
-    end_ts: int = typing.Optional[int],
-) -> int:
+def _find_archives(
+    archives_dir: Path,
+    database_config: Database,
+    dataset: typing.Optional[str],
+    begin_ts: typing.Optional[int] = None,
+    end_ts: typing.Optional[int] = None,
+) -> int:

260-269: Docstring return code is inconsistent with implementation

Doc says “1 on failure” but function returns -1 on failure; align the docstring or the code. The rest of the module uses -1 for failures.

-    :return: 0 on success, 1 on failure.
+    :return: 0 on success, -1 on failure.

283-294: Handle optional begin_ts/end_ts correctly in the SELECT; avoid passing None as a bound param

Right now the query always binds begin_ts, which may be None; that can yield no rows or DB errors. Build WHERE clauses conditionally.

-            query_params: typing.List[int] = [begin_ts]
-            query: str = (
-                f"""
-                SELECT id FROM `{get_archives_table_name(table_prefix, dataset)}`
-                WHERE begin_timestamp >= %s
-                """
-            )
-            if end_ts is not None:
-                query += " AND end_timestamp <= %s"
-                query_params.append(end_ts)
+            query_params: typing.List[typing.Any] = []
+            query: str = f"SELECT id FROM `{get_archives_table_name(table_prefix, dataset)}`"
+            where_clauses: typing.List[str] = []
+            if begin_ts is not None:
+                where_clauses.append("begin_timestamp >= %s")
+                query_params.append(begin_ts)
+            if end_ts is not None:
+                where_clauses.append("end_timestamp <= %s")
+                query_params.append(end_ts)
+            if where_clauses:
+                query += " WHERE " + " AND ".join(where_clauses)

Optionally, tweak the “No archives found…” message to be generic when no time filters are provided.


340-343: Second occurrence of built-in any in type hint

Mirror the earlier fix to use typing.Any.

-    clp_db_connection_params: dict[str, any] = database_config.get_clp_connection_params_and_type(
+    clp_db_connection_params: dict[str, typing.Any] = database_config.get_clp_connection_params_and_type(
         True
     )

389-397: Protect disk deletion with try/except and fail gracefully

A failed rmtree will raise and abort the entire operation without a clear message per archive. Handle errors per archive.

-    for archive_id in archive_ids:
+    for archive_id in archive_ids:
         archive_path = archive_output_dir / archive_id
         if not archive_path.is_dir():
             logger.warning(f"Archive {archive_id} is not a directory. Skipping deletion.")
             continue
 
         logger.info(f"Deleting archive {archive_id} from disk.")
-        shutil.rmtree(archive_path)
+        try:
+            shutil.rmtree(archive_path)
+        except Exception as e:
+            logger.error(f"Failed to delete archive directory {archive_path}: {e}")
+            return -1

34-34: Prefer logger name name over file

Using module name is the common practice and integrates better with hierarchical logging.

-logger: logging.Logger = logging.getLogger(__file__)
+logger: logging.Logger = logging.getLogger(__name__)
♻️ Duplicate comments (3)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (3)

275-276: Type hint uses built-in any; use typing.Any

This avoids type-checker confusion.

-        clp_db_connection_params: dict[str, any] = (
+        clp_db_connection_params: dict[str, typing.Any] = (
             database_config.get_clp_connection_params_and_type(True)
         )

98-103: Align flag naming and consider aliasing --debug (no short -d) for compatibility

You define --verbose/-v here but the PR summary and wrapper talk about --debug; ensure consistency across modules to avoid confusion. If you want backwards compatibility without colliding with --dataset, alias the long flag only.

Apply this diff to alias --debug to the same dest (do not add -d to avoid collision with --dataset):

-    args_parser.add_argument(
-        "--verbose",
-        "-v",
-        action="store_true",
-        help="Enable debug logging.",
-    )
+    args_parser.add_argument(
+        "--verbose",
+        "--debug",
+        "-v",
+        dest="verbose",
+        action="store_true",
+        help="Enable debug logging.",
+    )

185-188: Fix crash: parsed_args.debug doesn’t exist; use parsed_args.verbose

This will raise AttributeError at runtime. Use the correct attribute and guard via getattr.

-    if parsed_args.debug:
+    if getattr(parsed_args, "verbose", False):
         logger.setLevel(logging.DEBUG)
     else:
         logger.setLevel(logging.INFO)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3367ce2 and cfedaf0.

📒 Files selected for processing (1)
  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)

271-273: LGTM: dataset-context logging is clear and user-friendly

The dataset suffix improves traceability of operations.


337-339: LGTM: dataset-context logging for deletions improves clarity

Good addition; mirrors the find path.

Comment on lines +98 to +103
args_parser.add_argument(
"--verbose",
"-v",
action="store_true",
help="Enable debug logging.",
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Repository-wide consistency check: align flag usage and forwarding

Ensure the wrapper forwards --verbose (not --debug) and checks parsed_args.verbose, or alias --debug in both places as suggested.

Run this script to find and align all occurrences:

Also applies to: 185-188


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Searching for mismatches of verbose/debug flags and attributes..."
rg -n --hidden --no-ignore-vcs $'--debug\\b|parsed_args\\.debug\\b|--verbose\\b|parsed_args\\.verbose\\b'

echo
echo "Suggestion: in wrapper(s), forward --verbose to the native script and use parsed_args.verbose for logging level checks."

Length of output: 752


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Showing target file (if present): components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py"
if [ -f components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py ]; then
  sed -n '1,260p' components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py || true
else
  echo "File not found: components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py"
fi

echo
echo "Repository-wide search for --debug / --verbose usages (including parsed_args.debug / parsed_args.verbose)..."
rg -n --hidden --no-ignore-vcs \
  -e '--debug\b' \
  -e 'parsed_args\.debug\b' \
  -e '--verbose\b' \
  -e 'parsed_args\.verbose\b' \
  -e '"--debug"' \
  -e '"--verbose"' \
  || true

echo
echo "Also searching for add_argument calls mentioning debug/verbose..."
rg -n --hidden --no-ignore-vcs -e "add_argument\([^)]*--debug" -e "add_argument\([^)]*--verbose" || true

Length of output: 9532


Align --verbose/--debug flag usage across archive_manager scripts

The native script defines --verbose/-v but both scripts check parsed_args.debug and the wrapper forwards --debug — unify them.

Files to change

  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
    • Replace checks of parsed_args.debug with parsed_args.verbose (around the parse_args/logger setup — search for "if parsed_args.debug").
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
    • Replace parsed_args.debug -> parsed_args.verbose (lines ~157 and ~237).
    • Forward --verbose (not --debug) when invoking the native script (change archive_manager_cmd.append("--debug") -> archive_manager_cmd.append("--verbose")).
    • Confirm add_argument("--verbose", "-v", ...) is present (line ~68) or add a compat alias if needed.

Suggested minimal diffs

  • native file
- parsed_args: argparse.Namespace = args_parser.parse_args(argv[1:])
- if parsed_args.debug:
-     logger.setLevel(logging.DEBUG)
+ parsed_args: argparse.Namespace = args_parser.parse_args(argv[1:])
+ if parsed_args.verbose:
+     logger.setLevel(logging.DEBUG)
     else:
         logger.setLevel(logging.INFO)
  • wrapper file (forwarding)
- if parsed_args.debug:
-     logger.setLevel(logging.DEBUG)
+ if parsed_args.verbose:
+     logger.setLevel(logging.DEBUG)

- archive_manager_cmd.append("--debug")
+ archive_manager_cmd.append("--verbose")

Optional: if you need backward compatibility, add an alias:

args_parser.add_argument("--debug", action="store_true", dest="verbose", help="legacy alias for --verbose")

@haiqi96 haiqi96 requested a review from junhaoliao August 14, 2025 14:23

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (4)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)

186-195: Log the dataset value (and when default is applied)

PR objective calls for explicitly logging the dataset used. Emit a message when defaulting and when a dataset is provided.

-    if StorageEngine.CLP_S == storage_engine:
-        dataset = CLP_DEFAULT_DATASET_NAME if dataset is None else dataset
+    if StorageEngine.CLP_S == storage_engine:
+        if dataset is None:
+            dataset = CLP_DEFAULT_DATASET_NAME
+            logger.info(f"No dataset specified; defaulting to `{dataset}`.")
+        else:
+            logger.info(f"Using dataset `{dataset}`.")
         try:
             clp_db_connection_params = clp_config.database.get_clp_connection_params_and_type(True)
             validate_dataset_name(clp_db_connection_params["table_prefix"], dataset)
         except Exception as e:
             logger.error(e)
             return -1
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (3)

253-260: Bug: invalid type/default for end_ts in _find_archives signature

end_ts is annotated as int but defaulted to typing.Optional[int], which is a typing object. This will break comparisons and SQL param handling. Use Optional[int] with a None default.

 def _find_archives(
     archives_dir: Path,
     database_config: Database,
     dataset: typing.Optional[str],
     begin_ts: int,
-    end_ts: int = typing.Optional[int],
+    end_ts: typing.Optional[int] = None,
 ) -> int:

153-157: Typo in comment: “subcomand” → “subcommand”

Minor nit for readability.

-    # Delete by ID subcomand
+    # Delete by ID subcommand

23-31: Avoid importing CLI constants from the wrapper; move shared constants to a common module

native.archive_manager importing from scripts.archive_manager creates cross-layer coupling. Consider placing shared constants (e.g., FIND_COMMAND, args names) in a dedicated shared module (e.g., clp_package_utils.scripts.common.archive_manager_consts) consumed by both to reduce import cycles and improve maintainability.

♻️ Duplicate comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)

274-279: Type hint uses built-in any; use typing.Any instead

Replace any with typing.Any for correctness with type checkers. This was flagged previously and still applies here.

-        clp_db_connection_params: dict[str, any] = (
+        clp_db_connection_params: dict[str, typing.Any] = (
             database_config.get_clp_connection_params_and_type(True)
         )

338-343: Type hint uses built-in any; use typing.Any instead

Same issue here; switch to typing.Any.

-    clp_db_connection_params: dict[str, any] = database_config.get_clp_connection_params_and_type(
+    clp_db_connection_params: dict[str, typing.Any] = database_config.get_clp_connection_params_and_type(
         True
     )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cfedaf0 and 397bbf6.

📒 Files selected for processing (2)
  • components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (6)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (3)

156-160: LGTM: logger level respects --verbose

Logger level toggling is correct and aligns with the new flag.


182-183: LGTM: clearer storage-type error message

Message reads well and is user-friendly.


237-239: LGTM: forwarding --verbose to native script

Flag propagation is consistent with the native CLI.

components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (3)

98-103: LGTM: adds --verbose/-v to the native CLI

Flag definition and help text look good.


185-188: LGTM: set logger level from --verbose

Correctly honours verbose mode.


271-273: LGTM: dataset-specific context in logs

Clear, actionable logging that indicates the dataset context.

Comment on lines +67 to +72
args_parser.add_argument(
"--verbose",
"-v",
action="store_true",
help="Enable debug logging.",
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Verbose flag addition is correct; consider adding a hidden --debug alias for back-compat

Using --verbose/-v avoids the -d confusion. If prior users tried --debug, add a hidden alias mapping to verbose.

     args_parser.add_argument(
         "--verbose",
         "-v",
         action="store_true",
         help="Enable debug logging.",
     )
+    # Back-compat alias; map --debug -> --verbose without exposing in help
+    args_parser.add_argument("--debug", dest="verbose", action="store_true", help=argparse.SUPPRESS)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
args_parser.add_argument(
"--verbose",
"-v",
action="store_true",
help="Enable debug logging.",
)
args_parser.add_argument(
"--verbose",
"-v",
action="store_true",
help="Enable debug logging.",
)
# Back-compat alias; map --debug -> --verbose without exposing in help
args_parser.add_argument("--debug", dest="verbose", action="store_true", help=argparse.SUPPRESS)
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
around lines 67 to 72, add a hidden backwards-compatible alias --debug that maps
to the same boolean flag as --verbose/-v; implement this by adding a second
add_argument for "--debug" with action="store_true", dest="verbose" (so it sets
the same variable) and help set to argparse.SUPPRESS so it won’t appear in help
output.

Comment thread components/clp-package-utils/clp_package_utils/scripts/archive_manager.py Outdated

@junhaoliao junhaoliao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the title, since this is an improvement, would it be better to use the feat keyword in the title?

how about

feat(package): Add verbose logging option to archive_manager and improve error handling.

@haiqi96 haiqi96 changed the title fix(package): Improve log messages for archive manager scripts. feat(package): Add verbose logging option to archive_manager and improve error handling. Aug 14, 2025
@haiqi96 haiqi96 merged commit f87b505 into y-scope:main Aug 14, 2025
8 checks passed
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…ove error handling. (y-scope#1173)

Co-authored-by: Junhao Liao <junhao@junhao.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants