feat(package): Add verbose logging option to archive_manager and improve error handling. by haiqi96 · Pull Request #1173 · y-scope/clp

haiqi96 · 2025-08-07T15:25:18Z

Description

This PR slight improves the log messages in archive manager.

the script prints out dataset explicitly, so that it's clear to user that when dataset is unspecified, default will be used.
the script does not print the stack trace when the container command fails, since the stack trace only reports a failure somewhere at the subprocess.run call tree, which is useless. It should be the responsibliity of the native script to log errors properly.
Added a --verbose flag so that if needed, developers can get the container command.

The 2 and 3 can also be a general strategy to apply to issues described in #1081. We can introduce a few helper functions for reuse, for example

def add_logging_options(arg_parser)
def set_logging_level(arg_parser)
def run_docker_cmd(cmd: List[str])

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Manually tested the behavior

Summary by CodeRabbit

New Features
- Added an optional verbose flag (--verbose / -v) to enable debug-level logging.
Improvements
- Log messages now include dataset-specific context for find/delete operations.
- Propagates verbose flag to sub-operations and returns actual exit codes on failure.
- Clarified wording for unsupported storage-type messages and ensured cleanup runs after sub-operations.

coderabbitai · 2025-08-07T15:25:25Z

Warning

Rate limit exceeded

@haiqi96 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 41 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 397bbf6 and ed8a501.

📒 Files selected for processing (1)

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5 hunks)

Walkthrough

Adds a top-level --verbose/-v flag to both top-level and native archive_manager CLIs, sets logger level from that flag, forwards --verbose to the inner subprocess invocation, includes dataset context in several log messages, and changes subprocess handling to return and propagate the inner process exit code while logging failures and removing the generated config file after invocation.

Changes

Cohort / File(s)	Change Summary
Top-level archive manager `components/clp-package-utils/clp_package_utils/scripts/archive_manager.py`	Add top-level `--verbose`/`-v` flag and set logger level based on it; forward `--verbose` to inner command when present; change non-FS error text to "Archive manager is not supported for storage type: ..."; avoid raising on subprocess failures (capture return code, log "Dataset manager failed." on non-zero), log full command at debug level on failure, unconditionally remove generated config file after inner invocation, and return the inner process return code; return `-1` for unsupported top-level subcommands.
Native archive manager `components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py`	Add `--verbose`/`-v` flag and set logger level based on it; propagate dataset parameter to `_find_archives` and `_delete_archives`; enhance log messages to include dataset context ("of dataset `<dataset>`") when dataset is provided; update function signatures to accept `dataset: Optional[str]`.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant TopLevelCLI
    participant NativeCLI
    participant Logger
    User->>TopLevelCLI: run archive_manager (--verbose optional, subcommand)
    TopLevelCLI->>Logger: set level DEBUG/INFO based on --verbose
    TopLevelCLI->>NativeCLI: spawn subprocess (append --verbose if set, append --dataset if provided)
    NativeCLI->>Logger: set level DEBUG/INFO based on --verbose
    NativeCLI->>Logger: log find/delete actions (include dataset context when present)
    NativeCLI-->>TopLevelCLI: exit with return code
    TopLevelCLI->>Logger: if non-zero log "Dataset manager failed." and debug full command
    TopLevelCLI->>TopLevelCLI: remove generated config file
    TopLevelCLI-->>User: return inner process return code (or -1 for unsupported subcommand)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Possibly related issues

Improve user-friendly error messages in package utility scripts #1190 — Addresses user-facing error-message and logging improvements in archive_manager, matching the message and logging adjustments here.
Improve error messages in utility scripts to be more user-friendly #1189 — Related to subprocess return-code and logging/verbosity handling; this PR modifies subprocess invocation and failure logging accordingly.

Possibly related PRs

feat(clp-package): Refactor del_archives into archive_manager scripts for improved archive discovery and deletion. #665 — Touches the same archive_manager scripts and earlier CLI/logging/subprocess behavior; changes are directly connected to the CLI/verbosity and native invocation logic updated here.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

haiqi96 · 2025-08-07T15:26:28Z

@junhaoliao @hoophalab Can you guys take a look since it can be used to fix #1081?

…into archive_manager

haiqi96 · 2025-08-07T19:59:47Z

@coderabbitai Please review this PR.

coderabbitai · 2025-08-07T20:01:41Z

🧩 Analysis chain

I'll examine the changes in this PR to provide a comprehensive review.

@haiqi96: I'll review this PR focused on improving log messages for archive manager scripts. The changes you've described align well with better user experience and debugging capabilities. Let me examine the specific implementation details and provide feedback on the code quality and approach.

From your description, the key improvements are:

Explicit dataset logging - Making it clear when default datasets are used
Better error handling - Removing unhelpful stack traces from container command failures
Debug flag addition - Allowing developers to see container commands when needed

These changes should indeed help address the broader logging issues mentioned in #1081. The suggestion to create reusable helper functions (add_logging_options, set_logging_level, run_docker_cmd) is a good architectural approach for consistency across scripts.

Regarding the concern about --dataset and -d(--debug) being confusing - this seems reasonable as they serve different purposes and the short flags are sufficiently distinct.

🏁 Script executed:

#!/bin/bash
# Get the list of files changed in this PR
git diff --name-only HEAD~1 HEAD

# Show the actual changes made
git diff HEAD~1 HEAD

Length of output: 431

🏁 Script executed:

#!/bin/bash
# Look for archive manager scripts to understand the context better
fd -t f -e py | grep -i archive | grep -i manager

Length of output: 206

🏁 Script executed:

#!/bin/bash
# Search for any existing debug or logging patterns in the codebase
rg -A 3 -B 3 "debug.*flag|logging.*level" --type py

Length of output: 24919

🏁 Script executed:

#!/bin/bash
# Look at issue #1081 context if available
rg -A 5 -B 5 "1081" --type md

Length of output: 29

🏁 Script executed:

#!/bin/bash
# Examine the archive manager scripts structure
fd -t f -e py | grep -E "(archive|manager)" | head -10 | xargs ls -la

Length of output: 326

🏁 Script executed:

#!/bin/bash
# Get more specific information about archive manager related files
fd -t f -e py | grep -E "archive.*manager|manager.*archive" | xargs cat

Length of output: 23098

🏁 Script executed:

#!/bin/bash
# Search for archive-related Python files in components directory
fd -t f -e py components | grep archive

Length of output: 41

🏁 Script executed:

#!/bin/bash
# Look at the package utils directory for archive management scripts
ls -la components/clp-package-utils/clp_package_utils/scripts/ 2>/dev/null || echo "Directory not found"
fd -t f -e py components/clp-package-utils | head -10

Length of output: 1152

🏁 Script executed:

#!/bin/bash
# Check what archive management scripts exist
find . -name "*.py" -path "*/scripts/*" | grep -i archive | head -10

Length of output: 229

🏁 Script executed:

#!/bin/bash
# Let's look at the actual git log to see what files were changed in recent commits
git log --oneline -10 --name-only

Length of output: 0

🏁 Script executed:

#!/bin/bash
# Get information about the current branch and recent changes
git branch
git log --oneline -5

Length of output: 55

🏁 Script executed:

#!/bin/bash
# Look for any Python files that might contain archive management functionality
rg -l "archive.*manager|manager.*archive" --type py | head -10

Length of output: 138

🏁 Script executed:

#!/bin/bash
# Search for files that contain subprocess.run calls to understand the current error handling
rg -A 5 -B 5 "subprocess\.run" --type py | head -30

Length of output: 2505

🏁 Script executed:

#!/bin/bash
# Look for dataset-related logging to understand the current implementation
rg -A 3 -B 3 "dataset.*default|default.*dataset" --type py

Length of output: 60

🏁 Script executed:

#!/bin/bash
# Search for any existing debug flags in the codebase
rg -A 3 -B 3 "debug.*flag|-d.*debug|--debug" --type py

Length of output: 2244

🏁 Script executed:

#!/bin/bash
# Check for any container command execution patterns
rg -A 5 -B 2 "container.*command|docker.*cmd" --type py

Length of output: 8454

@haiqi96: I can see the two archive manager scripts that were mentioned in your PR description. Based on my analysis of the existing code, I can provide a comprehensive review of your logging improvements.

Code Analysis

The current implementation shows:

Current Error Handling Issues:

Line 268: subprocess.run(cmd, check=True) with generic exception handling
Line 271: Only logs "Archive manager failed." without the actual container command
Line 272: Debug logging of the command only happens in the exception block

Current Dataset Handling:

Lines 148-158: Dataset validation exists but doesn't explicitly log when defaults are used
Line 179: Dataset is conditionally added to the command but no explicit logging

Your Proposed Improvements Are Excellent

Your three key improvements address real usability issues:

📋 Explicit Dataset Logging: Making it clear when default datasets are used will prevent user confusion about which dataset operations are targeting.
🚫 Removing Unhelpful Stack Traces: The current generic exception handling that shows Python stack traces from subprocess.run failures provides little value since the actual error details should come from the native script's logs.
🔧 Debug Flag for Container Commands: Adding the ability to see the actual Docker command being executed is invaluable for debugging and development.

Architecture Suggestions

Your proposed helper functions for code reuse are spot-on:

add_logging_options(arg_parser) - Standardize debug/logging argument handling
set_logging_level(arg_parser) - Consistent logging configuration
run_docker_cmd(cmd: List[str]) - Centralized container execution with proper error handling

This approach aligns well with the existing clp_py_utils.clp_logging utilities I found in the codebase.

Minor Concerns

Your concern about --dataset and -d(--debug) confusion is minimal - the context makes them sufficiently distinct, and -d for debug is a common pattern.

The changes follow the established pattern I see in the codebase for keeping error handling simple in wrapper scripts (consistent with retrieved learning about clp-package-utils scripts).

This is a well-thought-out improvement that will enhance both user experience and developer debugging capabilities while setting up a foundation for addressing the broader #1081 logging issues.

🏁 Script executed:

#!/bin/bash
# Let's examine the actual changes made in this PR by looking at git diff against main
git diff main --name-only

Length of output: 1794

🏁 Script executed:

#!/bin/bash
# Show the actual diff against main to see what changes were made
git diff main

Length of output: 136315

🏁 Script executed:

#!/bin/bash
# Look for any container execution error handling patterns in the current archive manager
rg -A 10 -B 5 "subprocess\.run.*check=True|except.*Exception" components/clp-package-utils/clp_package_utils/scripts/archive_manager.py

Length of output: 1370

🏁 Script executed:

#!/bin/bash
# Check the current error handling in the archive manager script
rg -A 15 -B 5 "subprocess\.run.*cmd" components/clp-package-utils/clp_package_utils/scripts/archive_manager.py

Length of output: 664

🏁 Script executed:

#!/bin/bash
# Look at the dataset logging patterns in archive manager
rg -A 5 -B 5 "dataset.*message|dataset.*log|CLP_DEFAULT_DATASET_NAME" components/clp-package-utils/clp_package_utils/scripts/archive_manager.py

Length of output: 847

Perfect! Now I can see the actual implementation and provide a comprehensive review of your changes.

Excellent Implementation! ✅

Your changes deliver exactly what was promised in the PR description and represent a meaningful improvement to user experience and debugging capabilities.

Key Improvements Successfully Implemented:

📋 Explicit Dataset Logging - Brilliantly implemented in native script:

dataset_specific_message = f" of dataset `{dataset}`" if dataset is not None else ""
logger.info(f"Starting to find archives{dataset_specific_message} from the database.")

This makes it crystal clear when operations target specific datasets vs. defaults.

🚫 Clean Error Handling - The subprocess execution improvement is spot-on:

try:
    subprocess.run(cmd, check=True)
    ret_code = 0
except Exception:
    logger.error("Archive manager failed.")
    logger.debug(f"Docker command failed: {' '.join(cmd)}")
    ret_code = -1

No more unhelpful Python stack traces for container failures, with debug info available when needed.

🔧 Debug Flag Implementation - Properly implemented across both scripts:
- Consistent --debug/-d argument parsing
- Proper logging level configuration
- Debug command logging only when requested
- Flag passthrough from wrapper to native script

Additional Quality Improvements Noted:

Fixed Typo: "Archive deletion" → "Archive manager" in error message
Proper Return Codes: Clean error handling with appropriate exit codes
Consistent Patterns: Debug flag implementation follows established conventions
Resource Cleanup: Proper finally block ensures temporary files are cleaned up

Architecture Foundation:

This sets up perfectly for your broader #1081 logging improvements. The helper functions you mentioned (add_logging_options, set_logging_level, run_docker_cmd) would be natural extensions of these patterns.

The implementation quality is high, the user experience improvements are tangible, and it aligns perfectly with the existing codebase patterns I observed. This is exactly the kind of incremental, focused improvement that makes systems more maintainable and user-friendly.

Great work! 🎉

🧠 Learnings used

Learnt from: haiqi96
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-07-29T14:04:13.769Z
Learning: User haiqi96 requested creating a GitHub issue to document a bug fix from PR #1136, which addressed MySQL compatibility issues with invalid SQL CAST operations in the WebUI component.

Learnt from: junhaoliao
PR: y-scope/clp#1136
File: components/webui/client/src/pages/IngestPage/Details/sql.ts:1-1
Timestamp: 2025-07-29T21:00:07.757Z
Learning: User junhaoliao requested creating a GitHub issue to track server-side SQL query error handling improvements, specifically to prevent uncaught query errors from causing 500 errors to reach the client.

Learnt from: gibber9809
PR: y-scope/clp#504
File: components/core/src/clp_s/search/kql/CMakeLists.txt:29-29
Timestamp: 2024-10-22T15:36:04.655Z
Learning: When reviewing pull requests, focus on the changes within the PR and avoid commenting on issues outside the scope of the PR.

Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.

Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.

Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.

Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.

Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate `set -eu` and `set -o pipefail` commands rather than combining them into `set -euo pipefail`, to ensure uniform script structure across all platform build scripts.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

hoophalab · 2025-08-11T22:02:44Z

I confirm that archive-manager.sh no longer shows a traceback, but I have little experience with package scripts. So probably someone needs to take a look at the code.

$ ./build/clp-package/sbin/admin-tools/archive-manager.sh find
2025-08-11T21:54:21.671 ERROR [sql_adapter] Error connecting to MariaDB: Can't connect to server on '127.0.0.1' (115)
2025-08-11T21:54:21.671 ERROR [archive_manager] Can't connect to server on '127.0.0.1' (115)
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
# <-- no traceback

haiqi96 · 2025-08-12T16:25:35Z

I confirm that archive-manager.sh no longer shows a traceback, but I have little experience with package scripts. So probably someone needs to take a look at the code.
$ ./build/clp-package/sbin/admin-tools/archive-manager.sh find
2025-08-11T21:54:21.671 ERROR [sql_adapter] Error connecting to MariaDB: Can't connect to server on '127.0.0.1' (115)
2025-08-11T21:54:21.671 ERROR [archive_manager] Can't connect to server on '127.0.0.1' (115)
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
# <-- no traceback

Just curious, was database running when you run the command?

junhaoliao · 2025-08-12T17:41:30Z

I confirm that archive-manager.sh no longer shows a traceback, but I have little experience with package scripts. So probably someone needs to take a look at the code.
$ ./build/clp-package/sbin/admin-tools/archive-manager.sh find
2025-08-11T21:54:21.671 ERROR [sql_adapter] Error connecting to MariaDB: Can't connect to server on '127.0.0.1' (115)
2025-08-11T21:54:21.671 ERROR [archive_manager] Can't connect to server on '127.0.0.1' (115)
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
# <-- no traceback
Just curious, was database running when you run the command?

Maybe the package wasn't running when the Archive manager is run? In such cases, would changing the error message to be like "Archive manager failed. Is the package running?" help users to debug?

hoophalab · 2025-08-12T18:27:41Z

I confirm that archive-manager.sh no longer shows a traceback, but I have little experience with package scripts. So probably someone needs to take a look at the code.
$ ./build/clp-package/sbin/admin-tools/archive-manager.sh find
2025-08-11T21:54:21.671 ERROR [sql_adapter] Error connecting to MariaDB: Can't connect to server on '127.0.0.1' (115)
2025-08-11T21:54:21.671 ERROR [archive_manager] Can't connect to server on '127.0.0.1' (115)
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
# <-- no traceback
Just curious, was database running when you run the command?

I just stopped the clp database container and ran the script. I guess this is what the issue page did?

haiqi96 · 2025-08-13T02:25:55Z

I confirm that archive-manager.sh no longer shows a traceback, but I have little experience with package scripts. So probably someone needs to take a look at the code.
$ ./build/clp-package/sbin/admin-tools/archive-manager.sh find
2025-08-11T21:54:21.671 ERROR [sql_adapter] Error connecting to MariaDB: Can't connect to server on '127.0.0.1' (115)
2025-08-11T21:54:21.671 ERROR [archive_manager] Can't connect to server on '127.0.0.1' (115)
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
# <-- no traceback
Just curious, was database running when you run the command?
I just stopped the clp database container and ran the script. I guess this is what the issue page did?

This is a different issue. The issue is about the non-native script returnning redundant error message of the subprocess call stack(which is fixed in this PR).

You and Junhao do raise a valid point that the message can be more user friendly, but I would treat it as a different issue. Btw, there's also similar issues in other utility scripts (compress.py, search.py and decompress.py) so I think we should fix it in a separate PR.
@coderabbitai can you create an issue to remind us using more user friendly error message for our utility script?

coderabbitai · 2025-08-13T02:26:43Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@haiqi96: I'll create an issue to track the user-friendly error message improvements for utility scripts as you requested.

🏁 Script executed:

#!/bin/bash
# Create a GitHub issue for tracking user-friendly error message improvements
gh issue create \
  --title "Improve error messages in utility scripts to be more user-friendly" \
  --body "## Context

As discussed in PR #1173, while we successfully removed unhelpful stack traces from subprocess failures in archive manager scripts, there's an opportunity to make error messages across our utility scripts more user-friendly and actionable.

## Current Behavior

When utility scripts encounter errors (e.g., database connection failures), they show technical error messages like:
\`\`\`
2025-08-11T21:54:21.671 ERROR [sql_adapter] Error connecting to MariaDB: Can't connect to server on '127.0.0.1' (115)
2025-08-11T21:54:21.671 ERROR [archive_manager] Can't connect to server on '127.0.0.1' (115)
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
\`\`\`

## Proposed Improvement

Enhance error messages to provide actionable guidance for users, such as:
- \"Archive manager failed. Is the package running?\"
- \"Database connection failed. Please ensure the CLP services are running.\"

## Affected Scripts

- \`components/clp-package-utils/clp_package_utils/scripts/archive_manager.py\`
- \`components/clp-package-utils/clp_package_utils/scripts/compress.py\`
- \`components/clp-package-utils/clp_package_utils/scripts/search.py\`
- \`components/clp-package-utils/clp_package_utils/scripts/decompress.py\`

## References

- PR #1173: https://github.com/y-scope/clp/pull/1173
- Original discussion: https://github.com/y-scope/clp/pull/1173#issuecomment-2285329497

Requested by: @haiqi96" \
  --assignee haiqi96

Length of output: 1590

@haiqi96: I've created issue #1189 to track the user-friendly error message improvements for utility scripts as you requested. The issue includes:

Context from the current PR discussion
Examples of current vs. proposed error messages
List of affected scripts (archive_manager.py, compress.py, search.py, decompress.py)
References to this PR and the comment chain
Assignment to you as requested

You're absolutely right that this is a separate concern from the current PR, which successfully addresses the original stack trace issue. The new issue will help track this broader improvement across all utility scripts.

Issue link: #1189

🧠 Learnings used

Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.

coderabbitai · 2025-08-13T02:26:56Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@haiqi96: I'll create a GitHub issue to track the user-friendly error message improvements for the utility scripts.

🏁 Script executed:

#!/bin/bash
# Create a GitHub issue for improving error messages in utility scripts
gh issue create \
  --title "Improve user-friendly error messages in package utility scripts" \
  --body "## Description

This issue tracks the need to improve error messages in CLP package utility scripts to be more user-friendly and actionable for end users.

## Background

While working on PR #1173 to fix traceback issues in archive manager scripts, it was identified that error messages could be more helpful to users. For example, when database connection fails, instead of just showing:

\`\`\`
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed.
\`\`\`

A more user-friendly message could be:

\`\`\`
2025-08-11T17:54:22.390 ERROR [archive_manager] Archive manager failed. Is the package running?
\`\`\`

## Affected Scripts

The following utility scripts in \`components/clp-package-utils/clp_package_utils/scripts/\` could benefit from more user-friendly error messages:

- \`archive_manager.py\`
- \`compress.py\`
- \`search.py\`
- \`decompress.py\`

## Scope

- Review error handling patterns across utility scripts
- Identify common failure scenarios (database connection, container issues, etc.)
- Implement contextual error messages that guide users toward resolution
- Consider creating helper functions for consistent error messaging

## Context

- Original PR: https://github.com/y-scope/clp/pull/1173
- Discussion: https://github.com/y-scope/clp/pull/1173#issuecomment-2286264573
- Requested by: @haiqi96" \
  --assignee haiqi96

Length of output: 1479

Perfect! I've created GitHub issue #1190 to track the user-friendly error message improvements for the package utility scripts. The issue captures:

The specific need for more actionable error messages
Examples of current vs. improved messaging
The affected scripts (archive_manager.py, compress.py, search.py, decompress.py)
Scope for systematic improvements and helper functions
Context linking back to this PR and discussion

You're absolutely right to keep this separate from the current PR - the traceback fix you implemented here addresses the immediate problem perfectly, while the broader error message usability improvements can be tackled systematically in a future PR.

Great work on maintaining focused PR scope! 🎯

🧠 Learnings used

Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.

junhaoliao

tested that the docker command doesn't show unless --debug is specified.

@haiqi96 as you pointed out in the PR description, --debug and shorted as -d can cause confusion to users. Shall we rename it --verbose / -v which could be more commonly seen in CLI tools?

…_manager.py Co-authored-by: Junhao Liao <junhao@junhao.ca>

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64ad1a9 and 829ff5f.

📒 Files selected for processing (1)

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: lint-check (ubuntu-24.04)

🔇 Additional comments (3)

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (3)

69-74: Adding a top-level --debug flag: LGTM

The flag is clearly defined and scoped at the top-level parser, which keeps the UX consistent across subcommands.

184-185: Refined storage-type error message: LGTM

The message is clearer and user-focused. The spacing nit highlighted earlier appears resolved.

239-241: Forwarding --debug to the native script: LGTM

This correctly propagates the flag so developers can opt into detailed logs end-to-end.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
159-162: Centralize logging level setup into a small helper to avoid duplication

This pattern will recur across wrapper scripts. Extract it into a helper (module-local now; shared later) to keep behaviour consistent.

Apply this diff to simplify the call site:
-    if parsed_args.debug:
-        logger.setLevel(logging.DEBUG)
-    else:
-        logger.setLevel(logging.INFO)
+    set_logging_level(parsed_args.debug)
Outside the selected range, add this helper near the top of the module:
def set_logging_level(debug: bool) -> None:
    """Sets module logger level based on a debug flag."""
    logger.setLevel(logging.DEBUG if debug else logging.INFO)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 829ff5f and 508f151.

📒 Files selected for processing (1)

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: lint-check (ubuntu-24.04)
GitHub Check: build (macos-15)

🔇 Additional comments (4)

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (4)

69-74: LGTM: Debug flag addition is clear and non-conflicting

The new --debug/-d option integrates cleanly with the CLI and doesn’t conflict with other short flags.

184-184: LGTM: Error message text reads naturally

Message is concise and accurate for non-FS storage types.

239-241: LGTM: Forwarding --debug to the inner CLI is correct

Ensures inner script emits DEBUG logs when requested.

266-268: LGTM: Early return for unsupported subcommand

Prevents execution of an invalid command path.

coderabbitai

Actionable comments posted: 6

🔭 Outside diff range comments (5)

components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4)
254-260: Fix invalid type hint/default for end_ts

end_ts: int = typing.Optional[int] is a bug. You likely meant Optional[int] with a default of None. This currently sets the default to the type object, which breaks your None checks and is semantically wrong.
-def _find_archives(
-    archives_dir: Path,
-    database_config: Database,
-    dataset: typing.Optional[str],
-    begin_ts: int,
-    end_ts: int = typing.Optional[int],
-) -> int:
+def _find_archives(
+    archives_dir: Path,
+    database_config: Database,
+    dataset: typing.Optional[str],
+    begin_ts: int,
+    end_ts: typing.Optional[int] = None,
+) -> int:
373-387: DELETE … IN clause is broken; placeholders aren’t bound

ids_list_string is built with quoted "%s" and you’re not passing parameters to execute(), so the driver will see literal strings '%s' instead of binding archive_ids. This will fail silently or delete nothing.
-            ids_list_string: str = ", ".join(["'%s'"] * len(archive_ids))
+            placeholders: str = ", ".join(["%s"] * len(archive_ids))

             db_cursor.execute(
                 f"""
                 DELETE FROM `{get_files_table_name(table_prefix, dataset)}`
-                WHERE archive_id in ({ids_list_string})
+                WHERE archive_id in ({placeholders})
                 """
-            )
+            , archive_ids)

             db_cursor.execute(
                 f"""
                 DELETE FROM `{get_archive_tags_table_name(table_prefix, dataset)}`
-                WHERE archive_id in ({ids_list_string})
+                WHERE archive_id in ({placeholders})
                 """
-            )
+            , archive_ids)
404-414: Harden disk deletion and keep going on partial failures

A single rmtree failure aborts the whole operation. Wrap in try/except to log and continue (or return -1 if you prefer to fail-fast).
-    for archive_id in archive_ids:
-        archive_path = archive_output_dir / archive_id
-        if not archive_path.is_dir():
-            logger.warning(f"Archive {archive_id} is not a directory. Skipping deletion.")
-            continue
-
-        logger.info(f"Deleting archive {archive_id} from disk.")
-        shutil.rmtree(archive_path)
+    for archive_id in archive_ids:
+        archive_path = archive_output_dir / archive_id
+        if not archive_path.exists():
+            logger.warning(f"Archive {archive_id} not found on disk. Skipping deletion.")
+            continue
+        if not archive_path.is_dir():
+            logger.warning(f"Archive {archive_id} is not a directory. Skipping deletion.")
+            continue
+        logger.info(f"Deleting archive {archive_id} from disk.")
+        try:
+            shutil.rmtree(archive_path)
+        except Exception:
+            logger.exception(f"Failed to delete archive {archive_id} from disk.")
261-270: Docstring return value doesn’t match implementation

Docstring says “1 on failure” but code returns -1 on failure. Update for consistency.
-    :return: 0 on success, 1 on failure.
+    :return: 0 on success, -1 on failure.
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)

1-285: Fix inconsistent debug/verbose flags in archive manager scripts

The wrapper and native archive-manager CLIs are inconsistent: both define a --verbose flag but the code checks parsed_args.debug, and the wrapper forwards --debug to the native module. This will cause incorrect behaviour or runtime errors.

Files/locations found (from repo search):

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py

"--verbose" arg defined (line ~68)

checks parsed_args.debug (line ~157)

forwards/appends "--debug" to the native command (line ~237-238)

components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py

"--verbose" arg defined (line ~100)

checks parsed_args.debug (line ~186)

Minimal, safe fixes

Make the flag name and attribute consistent across both scripts. E.g. add the flag as aliases and set dest="debug":

args_parser.add_argument("--debug", "--verbose", "-v", dest="debug", action="store_true", help="Enable debug logging.")
This preserves both flag names for users while ensuring parsed_args.debug exists and the wrapper's forwarded "--debug" is accepted by the native script.

Alternatively, consistently use parsed_args.verbose everywhere and forward "--verbose" from the wrapper — but choose one approach across both scripts.

Recommended refactor (optional, but reduces drift)

Centralize logging/CLI behaviour:

add_logging_options(parser): adds the common --debug/--verbose flags (with a consistent dest)

set_logging_level(parsed_args) or set_logging_level(debug: bool)

run_docker_cmd(cmd: list[str], keep_artifacts_when_verbose: bool): runs docker + handles cleanup/logging
This will remove duplication and prevent future flag/name drift.

♻️ Duplicate comments (1)

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)

268-281: Propagate child exit code, pre-run DEBUG log, and robust cleanup

Current code loses the inner exit code (always -1 on failure), only logs the command post-failure, and unlink can raise. Improve diagnosability and robustness without reintroducing stack traces. This repeats earlier feedback.

-    ret_code: int
-    try:
-        subprocess.run(cmd, check=True)
-        ret_code = 0
-    except Exception:
-        logger.error("Archive manager failed.")
-        logger.debug(f"Docker command failed: {' '.join(cmd)}")
-        ret_code = -1
-    finally:
-        # Remove generated files
-        generated_config_path_on_host.unlink()
-
-    return ret_code
+    ret_code: int
+    # Emit the full container command when debugging is enabled.
+    if logger.isEnabledFor(logging.DEBUG):
+        # Use proper shell-quoting for readability
+        logger.debug(f"Running Docker command: {shlex.join(cmd)}")
+    try:
+        proc = subprocess.run(cmd)
+        ret_code = proc.returncode
+        if 0 != ret_code:
+            logger.error("Archive manager failed.")
+            logger.debug(f"Docker command failed: {shlex.join(cmd)}")
+    except Exception as e:
+        # Failure to start the process (not an inner failure).
+        logger.error(f"Failed to start Docker command: {e}")
+        logger.debug(f"Docker command: {shlex.join(cmd)}")
+        ret_code = -1
+    finally:
+        # Remove generated files; tolerate if already removed.
+        try:
+            # Keep the generated config when verbose for easier debugging.
+            if not parsed_args.verbose:
+                generated_config_path_on_host.unlink()
+        except FileNotFoundError:
+            pass
+
+    return ret_code

Outside the selected range, add:

import shlex

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 508f151 and 3367ce2.

📒 Files selected for processing (2)

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5 hunks)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

🔇 Additional comments (4)

components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)

271-274: Dataset-context in logs: nice improvement

The dataset-specific context in the “find” flow improves diagnosability without being noisy.

338-340: Dataset-context in logs: nice improvement (delete path)

Mirroring the “find” flow logging for delete is clear and useful.

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (2)

182-182: Wording nit fixed: storage-type message reads well

Message now reads cleanly without the extra space; keeps user output consistent across scripts.

265-265: Unsupported subcommand return code: good

Returning -1 makes the failure explicit to callers.

coderabbitai · 2025-08-13T20:37:21Z

+    if parsed_args.debug:
+        logger.setLevel(logging.DEBUG)
+    else:
+        logger.setLevel(logging.INFO)


⚠️ Potential issue

Fix crash: parsed_args.debug doesn’t exist; use parsed_args.verbose

This code will raise AttributeError because the parser only defines --verbose/-v. Use the correct attribute, and be defensive with getattr.

- if parsed_args.debug: - logger.setLevel(logging.DEBUG) - else: - logger.setLevel(logging.INFO) + if getattr(parsed_args, "verbose", False): + logger.setLevel(logging.DEBUG) + else: + logger.setLevel(logging.INFO)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if parsed_args.debug:

logger.setLevel(logging.DEBUG)

else:

logger.setLevel(logging.INFO)

if getattr(parsed_args, "verbose", False):

logger.setLevel(logging.DEBUG)

else:

logger.setLevel(logging.INFO)

🤖 Prompt for AI Agents

In components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py around lines 186 to 189, the code references parsed_args.debug which does not exist; replace it with the correct parsed_args.verbose and guard access with getattr to avoid AttributeError. Specifically, check something like getattr(parsed_args, "verbose", False) (or >0 if verbose is a count) and set logger.setLevel(logging.DEBUG) when verbose is truthy, otherwise set INFO; ensure no direct attribute access to parsed_args.debug remains.

coderabbitai · 2025-08-13T20:37:21Z

+    logger.info(f"Starting to find archives{dataset_specific_message} from the database.")
    try:
        sql_adapter: SQL_Adapter = SQL_Adapter(database_config)
        clp_db_connection_params: dict[str, any] = (


🧹 Nitpick (assertive)

Type hint uses built-in any (function), not typing.Any

Use typing.Any to avoid type checker confusion and accidental shadowing.

- clp_db_connection_params: dict[str, any] = ( + clp_db_connection_params: dict[str, typing.Any] = (

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

clp_db_connection_params: dict[str, any] = (

clp_db_connection_params: dict[str, typing.Any] = (

🤖 Prompt for AI Agents

In components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py around line 276, the type hint uses the built-in any (function) which can confuse type checkers; replace it with typing.Any (or import Any from typing and use dict[str, Any]) — update the hint to use typing.Any and add the appropriate import if needed.

This will be fixed in another PR.

@haiqi96, understood! Thanks for letting me know this will be addressed in a separate PR. That makes sense to keep the scope of this PR focused on the logging improvements.

🧠 Learnings used

Learnt from: haiqi96 PR: y-scope/clp#594 File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110 Timestamp: 2024-11-15T16:21:52.122Z Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.

Learnt from: junhaoliao PR: y-scope/clp#1152 File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613 Timestamp: 2025-08-08T06:59:42.436Z Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (7)

components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (7)

197-199: Avoid bare except; don’t catch BaseException

Catching BaseException will swallow SystemExit/KeyboardInterrupt. Limit to Exception.

-    except:
+    except Exception:
         logger.exception("Failed to load config.")
         return -1

253-259: Incorrect type/default for end_ts; make begin_ts/end_ts Optional[int] with None defaults

The current signature sets a typing object as the default value and types begin_ts as int, while your CLI makes it optional. This causes inconsistent behaviour and type hints.

-def _find_archives(
-    archives_dir: Path,
-    database_config: Database,
-    dataset: typing.Optional[str],
-    begin_ts: int,
-    end_ts: int = typing.Optional[int],
-) -> int:
+def _find_archives(
+    archives_dir: Path,
+    database_config: Database,
+    dataset: typing.Optional[str],
+    begin_ts: typing.Optional[int] = None,
+    end_ts: typing.Optional[int] = None,
+) -> int:

260-269: Docstring return code is inconsistent with implementation

Doc says “1 on failure” but function returns -1 on failure; align the docstring or the code. The rest of the module uses -1 for failures.

-    :return: 0 on success, 1 on failure.
+    :return: 0 on success, -1 on failure.

283-294: Handle optional begin_ts/end_ts correctly in the SELECT; avoid passing None as a bound param

Right now the query always binds begin_ts, which may be None; that can yield no rows or DB errors. Build WHERE clauses conditionally.

-            query_params: typing.List[int] = [begin_ts]
-            query: str = (
-                f"""
-                SELECT id FROM `{get_archives_table_name(table_prefix, dataset)}`
-                WHERE begin_timestamp >= %s
-                """
-            )
-            if end_ts is not None:
-                query += " AND end_timestamp <= %s"
-                query_params.append(end_ts)
+            query_params: typing.List[typing.Any] = []
+            query: str = f"SELECT id FROM `{get_archives_table_name(table_prefix, dataset)}`"
+            where_clauses: typing.List[str] = []
+            if begin_ts is not None:
+                where_clauses.append("begin_timestamp >= %s")
+                query_params.append(begin_ts)
+            if end_ts is not None:
+                where_clauses.append("end_timestamp <= %s")
+                query_params.append(end_ts)
+            if where_clauses:
+                query += " WHERE " + " AND ".join(where_clauses)

Optionally, tweak the “No archives found…” message to be generic when no time filters are provided.

340-343: Second occurrence of built-in any in type hint

Mirror the earlier fix to use typing.Any.

-    clp_db_connection_params: dict[str, any] = database_config.get_clp_connection_params_and_type(
+    clp_db_connection_params: dict[str, typing.Any] = database_config.get_clp_connection_params_and_type(
         True
     )

389-397: Protect disk deletion with try/except and fail gracefully

A failed rmtree will raise and abort the entire operation without a clear message per archive. Handle errors per archive.

-    for archive_id in archive_ids:
+    for archive_id in archive_ids:
         archive_path = archive_output_dir / archive_id
         if not archive_path.is_dir():
             logger.warning(f"Archive {archive_id} is not a directory. Skipping deletion.")
             continue
 
         logger.info(f"Deleting archive {archive_id} from disk.")
-        shutil.rmtree(archive_path)
+        try:
+            shutil.rmtree(archive_path)
+        except Exception as e:
+            logger.error(f"Failed to delete archive directory {archive_path}: {e}")
+            return -1

34-34: Prefer logger name name over file

Using module name is the common practice and integrates better with hierarchical logging.

-logger: logging.Logger = logging.getLogger(__file__)
+logger: logging.Logger = logging.getLogger(__name__)

♻️ Duplicate comments (3)

components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (3)
275-276: Type hint uses built-in any; use typing.Any

This avoids type-checker confusion.
-        clp_db_connection_params: dict[str, any] = (
+        clp_db_connection_params: dict[str, typing.Any] = (
             database_config.get_clp_connection_params_and_type(True)
         )
98-103: Align flag naming and consider aliasing --debug (no short -d) for compatibility

You define --verbose/-v here but the PR summary and wrapper talk about --debug; ensure consistency across modules to avoid confusion. If you want backwards compatibility without colliding with --dataset, alias the long flag only.

Apply this diff to alias --debug to the same dest (do not add -d to avoid collision with --dataset):
-    args_parser.add_argument(
-        "--verbose",
-        "-v",
-        action="store_true",
-        help="Enable debug logging.",
-    )
+    args_parser.add_argument(
+        "--verbose",
+        "--debug",
+        "-v",
+        dest="verbose",
+        action="store_true",
+        help="Enable debug logging.",
+    )
185-188: Fix crash: parsed_args.debug doesn’t exist; use parsed_args.verbose

This will raise AttributeError at runtime. Use the correct attribute and guard via getattr.
-    if parsed_args.debug:
+    if getattr(parsed_args, "verbose", False):
         logger.setLevel(logging.DEBUG)
     else:
         logger.setLevel(logging.INFO)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3367ce2 and cfedaf0.

📒 Files selected for processing (1)

components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: lint-check (ubuntu-24.04)

🔇 Additional comments (2)

components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)

271-273: LGTM: dataset-context logging is clear and user-friendly

The dataset suffix improves traceability of operations.

337-339: LGTM: dataset-context logging for deletions improves clarity

Good addition; mirrors the find path.

coderabbitai · 2025-08-13T22:37:31Z

+    args_parser.add_argument(
+        "--verbose",
+        "-v",
+        action="store_true",
+        help="Enable debug logging.",
+    )


💡 Verification agent

🧩 Analysis chain

Repository-wide consistency check: align flag usage and forwarding

Ensure the wrapper forwards --verbose (not --debug) and checks parsed_args.verbose, or alias --debug in both places as suggested.

Run this script to find and align all occurrences:

Also applies to: 185-188

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Searching for mismatches of verbose/debug flags and attributes..." rg -n --hidden --no-ignore-vcs $'--debug\\b|parsed_args\\.debug\\b|--verbose\\b|parsed_args\\.verbose\\b' echo echo "Suggestion: in wrapper(s), forward --verbose to the native script and use parsed_args.verbose for logging level checks."

Length of output: 752

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Showing target file (if present): components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py" if [ -f components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py ]; then sed -n '1,260p' components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py || true else echo "File not found: components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py" fi echo echo "Repository-wide search for --debug / --verbose usages (including parsed_args.debug / parsed_args.verbose)..." rg -n --hidden --no-ignore-vcs \ -e '--debug\b' \ -e 'parsed_args\.debug\b' \ -e '--verbose\b' \ -e 'parsed_args\.verbose\b' \ -e '"--debug"' \ -e '"--verbose"' \ || true echo echo "Also searching for add_argument calls mentioning debug/verbose..." rg -n --hidden --no-ignore-vcs -e "add_argument\([^)]*--debug" -e "add_argument\([^)]*--verbose" || true

Length of output: 9532

Align --verbose/--debug flag usage across archive_manager scripts

The native script defines --verbose/-v but both scripts check parsed_args.debug and the wrapper forwards --debug — unify them.

Files to change

components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py

Replace checks of parsed_args.debug with parsed_args.verbose (around the parse_args/logger setup — search for "if parsed_args.debug").

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py

Replace parsed_args.debug -> parsed_args.verbose (lines ~157 and ~237).

Forward --verbose (not --debug) when invoking the native script (change archive_manager_cmd.append("--debug") -> archive_manager_cmd.append("--verbose")).

Confirm add_argument("--verbose", "-v", ...) is present (line ~68) or add a compat alias if needed.

Suggested minimal diffs

native file

- parsed_args: argparse.Namespace = args_parser.parse_args(argv[1:]) - if parsed_args.debug: - logger.setLevel(logging.DEBUG) + parsed_args: argparse.Namespace = args_parser.parse_args(argv[1:]) + if parsed_args.verbose: + logger.setLevel(logging.DEBUG) else: logger.setLevel(logging.INFO)

wrapper file (forwarding)

- if parsed_args.debug: - logger.setLevel(logging.DEBUG) + if parsed_args.verbose: + logger.setLevel(logging.DEBUG) - archive_manager_cmd.append("--debug") + archive_manager_cmd.append("--verbose")

Optional: if you need backward compatibility, add an alias:

args_parser.add_argument("--debug", action="store_true", dest="verbose", help="legacy alias for --verbose")

…_manager.py

…_manager.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (4)

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
186-195: Log the dataset value (and when default is applied)

PR objective calls for explicitly logging the dataset used. Emit a message when defaulting and when a dataset is provided.
-    if StorageEngine.CLP_S == storage_engine:
-        dataset = CLP_DEFAULT_DATASET_NAME if dataset is None else dataset
+    if StorageEngine.CLP_S == storage_engine:
+        if dataset is None:
+            dataset = CLP_DEFAULT_DATASET_NAME
+            logger.info(f"No dataset specified; defaulting to `{dataset}`.")
+        else:
+            logger.info(f"Using dataset `{dataset}`.")
         try:
             clp_db_connection_params = clp_config.database.get_clp_connection_params_and_type(True)
             validate_dataset_name(clp_db_connection_params["table_prefix"], dataset)
         except Exception as e:
             logger.error(e)
             return -1
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (3)
253-260: Bug: invalid type/default for end_ts in _find_archives signature

end_ts is annotated as int but defaulted to typing.Optional[int], which is a typing object. This will break comparisons and SQL param handling. Use Optional[int] with a None default.
 def _find_archives(
     archives_dir: Path,
     database_config: Database,
     dataset: typing.Optional[str],
     begin_ts: int,
-    end_ts: int = typing.Optional[int],
+    end_ts: typing.Optional[int] = None,
 ) -> int:
153-157: Typo in comment: “subcomand” → “subcommand”

Minor nit for readability.
-    # Delete by ID subcomand
+    # Delete by ID subcommand
23-31: Avoid importing CLI constants from the wrapper; move shared constants to a common module

native.archive_manager importing from scripts.archive_manager creates cross-layer coupling. Consider placing shared constants (e.g., FIND_COMMAND, args names) in a dedicated shared module (e.g., clp_package_utils.scripts.common.archive_manager_consts) consumed by both to reduce import cycles and improve maintainability.

♻️ Duplicate comments (2)

components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)
274-279: Type hint uses built-in any; use typing.Any instead

Replace any with typing.Any for correctness with type checkers. This was flagged previously and still applies here.
-        clp_db_connection_params: dict[str, any] = (
+        clp_db_connection_params: dict[str, typing.Any] = (
             database_config.get_clp_connection_params_and_type(True)
         )
338-343: Type hint uses built-in any; use typing.Any instead

Same issue here; switch to typing.Any.
-    clp_db_connection_params: dict[str, any] = database_config.get_clp_connection_params_and_type(
+    clp_db_connection_params: dict[str, typing.Any] = database_config.get_clp_connection_params_and_type(
         True
     )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cfedaf0 and 397bbf6.

📒 Files selected for processing (2)

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (5 hunks)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: lint-check (macos-15)

🔇 Additional comments (6)

components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (3)

156-160: LGTM: logger level respects --verbose

Logger level toggling is correct and aligns with the new flag.

182-183: LGTM: clearer storage-type error message

Message reads well and is user-friendly.

237-239: LGTM: forwarding --verbose to native script

Flag propagation is consistent with the native CLI.

components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (3)

98-103: LGTM: adds --verbose/-v to the native CLI

Flag definition and help text look good.

185-188: LGTM: set logger level from --verbose

Correctly honours verbose mode.

271-273: LGTM: dataset-specific context in logs

Clear, actionable logging that indicates the dataset context.

coderabbitai · 2025-08-14T19:58:10Z

+    args_parser.add_argument(
+        "--verbose",
+        "-v",
+        action="store_true",
+        help="Enable debug logging.",
+    )


🧹 Nitpick (assertive)

Verbose flag addition is correct; consider adding a hidden --debug alias for back-compat

Using --verbose/-v avoids the -d confusion. If prior users tried --debug, add a hidden alias mapping to verbose.

args_parser.add_argument( "--verbose", "-v", action="store_true", help="Enable debug logging.", ) + # Back-compat alias; map --debug -> --verbose without exposing in help + args_parser.add_argument("--debug", dest="verbose", action="store_true", help=argparse.SUPPRESS)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

args_parser.add_argument(

"--verbose",

"-v",

action="store_true",

help="Enable debug logging.",

)

args_parser.add_argument(

"--verbose",

"-v",

action="store_true",

help="Enable debug logging.",

)

# Back-compat alias; map --debug -> --verbose without exposing in help

args_parser.add_argument("--debug", dest="verbose", action="store_true", help=argparse.SUPPRESS)

🤖 Prompt for AI Agents

In components/clp-package-utils/clp_package_utils/scripts/archive_manager.py around lines 67 to 72, add a hidden backwards-compatible alias --debug that maps to the same boolean flag as --verbose/-v; implement this by adding a second add_argument for "--debug" with action="store_true", dest="verbose" (so it sets the same variable) and help set to argparse.SUPPRESS so it won’t appear in help output.

…_manager.py

junhaoliao

for the title, since this is an improvement, would it be better to use the feat keyword in the title?

how about

feat(package): Add verbose logging option to archive_manager and improve error handling.

…ove error handling. (y-scope#1173) Co-authored-by: Junhao Liao <junhao@junhao.ca>

initial commit

69f9549

haiqi96 added 4 commits August 7, 2025 11:26

Merge branch 'main' into archive_manager

00cc187

fix

7e27aa6

Merge branch 'archive_manager' of https://github.com/haiqi96/clp_fork …

75c9800

…into archive_manager

Merge branch 'main' into archive_manager

64ad1a9

haiqi96 marked this pull request as ready for review August 12, 2025 16:25

haiqi96 requested a review from a team as a code owner August 12, 2025 16:25

coderabbitai Bot mentioned this pull request Aug 13, 2025

Improve error messages in utility scripts to be more user-friendly #1189

Open

coderabbitai Bot mentioned this pull request Aug 13, 2025

Improve user-friendly error messages in package utility scripts #1190

Closed

Merge branch 'main' into archive_manager

0507f9c

haiqi96 requested review from junhaoliao and kirkrodrigues August 13, 2025 14:13

junhaoliao reviewed Aug 13, 2025

View reviewed changes

Comment thread components/clp-package-utils/clp_package_utils/scripts/archive_manager.py Outdated

junhaoliao reviewed Aug 13, 2025

View reviewed changes

Update components/clp-package-utils/clp_package_utils/scripts/archive…

829ff5f

…_manager.py Co-authored-by: Junhao Liao <junhao@junhao.ca>

coderabbitai Bot reviewed Aug 13, 2025

View reviewed changes

haiqi96 commented Aug 13, 2025

View reviewed changes

Comment thread components/clp-package-utils/clp_package_utils/scripts/archive_manager.py Outdated

Fix previous errors

508f151

coderabbitai Bot reviewed Aug 13, 2025

View reviewed changes

Comment thread components/clp-package-utils/clp_package_utils/scripts/archive_manager.py Outdated

haiqi96 added 2 commits August 13, 2025 15:29

Update cmdline option

83fe5b2

Merge branch 'main' into archive_manager

3367ce2

coderabbitai Bot reviewed Aug 13, 2025

View reviewed changes

Merge branch 'main' into archive_manager

cfedaf0

coderabbitai Bot reviewed Aug 13, 2025

View reviewed changes

haiqi96 requested a review from junhaoliao August 14, 2025 14:23

haiqi96 and others added 4 commits August 14, 2025 10:23

Merge branch 'main' into archive_manager

fd82ba9

Update components/clp-package-utils/clp_package_utils/scripts/archive…

e813b49

…_manager.py

Update components/clp-package-utils/clp_package_utils/scripts/archive…

4acf9bd

…_manager.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

fix silly mistake

397bbf6

coderabbitai Bot reviewed Aug 14, 2025

View reviewed changes

Merge branch 'main' into archive_manager

b0ccf76

haiqi96 commented Aug 14, 2025

View reviewed changes

Comment thread components/clp-package-utils/clp_package_utils/scripts/archive_manager.py Outdated

Update components/clp-package-utils/clp_package_utils/scripts/archive…

d31ecf4

…_manager.py

haiqi96 commented Aug 14, 2025

View reviewed changes

Comment thread components/clp-package-utils/clp_package_utils/scripts/archive_manager.py

Update components/clp-package-utils/clp_package_utils/scripts/archive…

ed8a501

…_manager.py

junhaoliao approved these changes Aug 14, 2025

View reviewed changes

haiqi96 changed the title ~~fix(package): Improve log messages for archive manager scripts.~~ feat(package): Add verbose logging option to archive_manager and improve error handling. Aug 14, 2025

haiqi96 merged commit f87b505 into y-scope:main Aug 14, 2025
8 checks passed

haiqi96 mentioned this pull request Aug 15, 2025

feat(package): Add --verbose option to package_utils scripts and improve error diagnostics (fixes #1081). #1212

Merged

3 tasks

junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026

feat(package): Add verbose logging option to archive_manager and impr…

0294d55

…ove error handling. (y-scope#1173) Co-authored-by: Junhao Liao <junhao@junhao.ca>

	clp_db_connection_params: dict[str, any] = (
	clp_db_connection_params: dict[str, typing.Any] = (

Conversation

haiqi96 commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

haiqi96 commented Aug 7, 2025

Uh oh!

haiqi96 commented Aug 7, 2025

Uh oh!

coderabbitai Bot commented Aug 7, 2025

Code Analysis

Your Proposed Improvements Are Excellent

Architecture Suggestions

Minor Concerns

Excellent Implementation! ✅

Key Improvements Successfully Implemented:

Additional Quality Improvements Noted:

Architecture Foundation:

Uh oh!

hoophalab commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

haiqi96 commented Aug 12, 2025

Uh oh!

junhaoliao commented Aug 12, 2025

Uh oh!

hoophalab commented Aug 12, 2025

Uh oh!

haiqi96 commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Aug 13, 2025

Uh oh!

coderabbitai Bot commented Aug 13, 2025

Uh oh!

Uh oh!

junhaoliao left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haiqi96 Aug 14, 2025

Choose a reason for hiding this comment

haiqi96 commented Aug 7, 2025 •

edited

Loading

coderabbitai Bot commented Aug 7, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

hoophalab commented Aug 11, 2025 •

edited

Loading

haiqi96 commented Aug 13, 2025 •

edited

Loading

coderabbitai Bot Aug 13, 2025 •

edited

Loading