feat(package): Add verbose logging option to archive_manager and improve error handling.#1173
Conversation
|
Warning Rate limit exceeded@haiqi96 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 41 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughAdds a top-level --verbose/-v flag to both top-level and native archive_manager CLIs, sets logger level from that flag, forwards --verbose to the inner subprocess invocation, includes dataset context in several log messages, and changes subprocess handling to return and propagate the inner process exit code while logging failures and removing the generated config file after invocation. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant TopLevelCLI
participant NativeCLI
participant Logger
User->>TopLevelCLI: run archive_manager (--verbose optional, subcommand)
TopLevelCLI->>Logger: set level DEBUG/INFO based on --verbose
TopLevelCLI->>NativeCLI: spawn subprocess (append --verbose if set, append --dataset if provided)
NativeCLI->>Logger: set level DEBUG/INFO based on --verbose
NativeCLI->>Logger: log find/delete actions (include dataset context when present)
NativeCLI-->>TopLevelCLI: exit with return code
TopLevelCLI->>Logger: if non-zero log "Dataset manager failed." and debug full command
TopLevelCLI->>TopLevelCLI: remove generated config file
TopLevelCLI-->>User: return inner process return code (or -1 for unsupported subcommand)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15 minutes Possibly related issues
Possibly related PRs
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@junhaoliao @hoophalab Can you guys take a look since it can be used to fix #1081? |
|
@coderabbitai Please review this PR. |
🧩 Analysis chainI'll examine the changes in this PR to provide a comprehensive review. @haiqi96: I'll review this PR focused on improving log messages for archive manager scripts. The changes you've described align well with better user experience and debugging capabilities. Let me examine the specific implementation details and provide feedback on the code quality and approach. From your description, the key improvements are:
These changes should indeed help address the broader logging issues mentioned in #1081. The suggestion to create reusable helper functions ( Regarding the concern about 🏁 Script executed: Length of output: 431 🏁 Script executed: Length of output: 206 🏁 Script executed: Length of output: 24919 🏁 Script executed: Length of output: 29 🏁 Script executed: Length of output: 326 🏁 Script executed: Length of output: 23098 🏁 Script executed: Length of output: 41 🏁 Script executed: Length of output: 1152 🏁 Script executed: Length of output: 229 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 55 🏁 Script executed: Length of output: 138 🏁 Script executed: Length of output: 2505 🏁 Script executed: Length of output: 60 🏁 Script executed: Length of output: 2244 🏁 Script executed: Length of output: 8454 @haiqi96: I can see the two archive manager scripts that were mentioned in your PR description. Based on my analysis of the existing code, I can provide a comprehensive review of your logging improvements. Code AnalysisThe current implementation shows: Current Error Handling Issues:
Current Dataset Handling:
Your Proposed Improvements Are ExcellentYour three key improvements address real usability issues:
Architecture SuggestionsYour proposed helper functions for code reuse are spot-on:
This approach aligns well with the existing Minor ConcernsYour concern about The changes follow the established pattern I see in the codebase for keeping error handling simple in wrapper scripts (consistent with retrieved learning about This is a well-thought-out improvement that will enhance both user experience and developer debugging capabilities while setting up a foundation for addressing the broader #1081 logging issues. 🏁 Script executed: Length of output: 1794 🏁 Script executed: Length of output: 136315 🏁 Script executed: Length of output: 1370 🏁 Script executed: Length of output: 664 🏁 Script executed: Length of output: 847 Perfect! Now I can see the actual implementation and provide a comprehensive review of your changes. Excellent Implementation! ✅Your changes deliver exactly what was promised in the PR description and represent a meaningful improvement to user experience and debugging capabilities. Key Improvements Successfully Implemented:
Additional Quality Improvements Noted:
Architecture Foundation:This sets up perfectly for your broader #1081 logging improvements. The helper functions you mentioned ( The implementation quality is high, the user experience improvements are tangible, and it aligns perfectly with the existing codebase patterns I observed. This is exactly the kind of incremental, focused improvement that makes systems more maintainable and user-friendly. Great work! 🎉 🧠 Learnings used✅ Actions performedReview triggered.
|
|
I confirm that |
Just curious, was database running when you run the command? |
Maybe the package wasn't running when the Archive manager is run? In such cases, would changing the error message to be like "Archive manager failed. Is the package running?" help users to debug? |
I just stopped the clp database container and ran the script. I guess this is what the issue page did? |
This is a different issue. The issue is about the non-native script returnning redundant error message of the subprocess call stack(which is fixed in this PR). You and Junhao do raise a valid point that the message can be more user friendly, but I would treat it as a different issue. Btw, there's also similar issues in other utility scripts (compress.py, search.py and decompress.py) so I think we should fix it in a separate PR. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@haiqi96: I'll create an issue to track the user-friendly error message improvements for utility scripts as you requested. 🏁 Script executed: Length of output: 1590 @haiqi96: I've created issue #1189 to track the user-friendly error message improvements for utility scripts as you requested. The issue includes:
You're absolutely right that this is a separate concern from the current PR, which successfully addresses the original stack trace issue. The new issue will help track this broader improvement across all utility scripts. Issue link: #1189 🧠 Learnings used |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@haiqi96: I'll create a GitHub issue to track the user-friendly error message improvements for the utility scripts. 🏁 Script executed: Length of output: 1479 Perfect! I've created GitHub issue #1190 to track the user-friendly error message improvements for the package utility scripts. The issue captures:
You're absolutely right to keep this separate from the current PR - the traceback fix you implemented here addresses the immediate problem perfectly, while the broader error message usability improvements can be tackled systematically in a future PR. Great work on maintaining focused PR scope! 🎯 🧠 Learnings used |
junhaoliao
left a comment
There was a problem hiding this comment.
tested that the docker command doesn't show unless --debug is specified.
@haiqi96 as you pointed out in the PR description, --debug and shorted as -d can cause confusion to users. Shall we rename it --verbose / -v which could be more commonly seen in CLI tools?
…_manager.py Co-authored-by: Junhao Liao <junhao@junhao.ca>
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py(5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (3)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (3)
69-74: Adding a top-level --debug flag: LGTMThe flag is clearly defined and scoped at the top-level parser, which keeps the UX consistent across subcommands.
184-185: Refined storage-type error message: LGTMThe message is clearer and user-focused. The spacing nit highlighted earlier appears resolved.
239-241: Forwarding --debug to the native script: LGTMThis correctly propagates the flag so developers can opt into detailed logs end-to-end.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
159-162: Centralize logging level setup into a small helper to avoid duplicationThis pattern will recur across wrapper scripts. Extract it into a helper (module-local now; shared later) to keep behaviour consistent.
Apply this diff to simplify the call site:
- if parsed_args.debug: - logger.setLevel(logging.DEBUG) - else: - logger.setLevel(logging.INFO) + set_logging_level(parsed_args.debug)Outside the selected range, add this helper near the top of the module:
def set_logging_level(debug: bool) -> None: """Sets module logger level based on a debug flag.""" logger.setLevel(logging.DEBUG if debug else logging.INFO)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build (macos-15)
🔇 Additional comments (4)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (4)
69-74: LGTM: Debug flag addition is clear and non-conflictingThe new --debug/-d option integrates cleanly with the CLI and doesn’t conflict with other short flags.
184-184: LGTM: Error message text reads naturallyMessage is concise and accurate for non-FS storage types.
239-241: LGTM: Forwarding --debug to the inner CLI is correctEnsures inner script emits DEBUG logs when requested.
266-268: LGTM: Early return for unsupported subcommandPrevents execution of an invalid command path.
There was a problem hiding this comment.
Actionable comments posted: 6
🔭 Outside diff range comments (5)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (4)
254-260: Fix invalid type hint/default for end_tsend_ts: int = typing.Optional[int] is a bug. You likely meant Optional[int] with a default of None. This currently sets the default to the type object, which breaks your None checks and is semantically wrong.
-def _find_archives( - archives_dir: Path, - database_config: Database, - dataset: typing.Optional[str], - begin_ts: int, - end_ts: int = typing.Optional[int], -) -> int: +def _find_archives( + archives_dir: Path, + database_config: Database, + dataset: typing.Optional[str], + begin_ts: int, + end_ts: typing.Optional[int] = None, +) -> int:
373-387: DELETE … IN clause is broken; placeholders aren’t boundids_list_string is built with quoted "%s" and you’re not passing parameters to execute(), so the driver will see literal strings '%s' instead of binding archive_ids. This will fail silently or delete nothing.
- ids_list_string: str = ", ".join(["'%s'"] * len(archive_ids)) + placeholders: str = ", ".join(["%s"] * len(archive_ids)) db_cursor.execute( f""" DELETE FROM `{get_files_table_name(table_prefix, dataset)}` - WHERE archive_id in ({ids_list_string}) + WHERE archive_id in ({placeholders}) """ - ) + , archive_ids) db_cursor.execute( f""" DELETE FROM `{get_archive_tags_table_name(table_prefix, dataset)}` - WHERE archive_id in ({ids_list_string}) + WHERE archive_id in ({placeholders}) """ - ) + , archive_ids)
404-414: Harden disk deletion and keep going on partial failuresA single rmtree failure aborts the whole operation. Wrap in try/except to log and continue (or return -1 if you prefer to fail-fast).
- for archive_id in archive_ids: - archive_path = archive_output_dir / archive_id - if not archive_path.is_dir(): - logger.warning(f"Archive {archive_id} is not a directory. Skipping deletion.") - continue - - logger.info(f"Deleting archive {archive_id} from disk.") - shutil.rmtree(archive_path) + for archive_id in archive_ids: + archive_path = archive_output_dir / archive_id + if not archive_path.exists(): + logger.warning(f"Archive {archive_id} not found on disk. Skipping deletion.") + continue + if not archive_path.is_dir(): + logger.warning(f"Archive {archive_id} is not a directory. Skipping deletion.") + continue + logger.info(f"Deleting archive {archive_id} from disk.") + try: + shutil.rmtree(archive_path) + except Exception: + logger.exception(f"Failed to delete archive {archive_id} from disk.")
261-270: Docstring return value doesn’t match implementationDocstring says “1 on failure” but code returns -1 on failure. Update for consistency.
- :return: 0 on success, 1 on failure. + :return: 0 on success, -1 on failure.components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
1-285: Fix inconsistent debug/verbose flags in archive manager scriptsThe wrapper and native archive-manager CLIs are inconsistent: both define a --verbose flag but the code checks parsed_args.debug, and the wrapper forwards --debug to the native module. This will cause incorrect behaviour or runtime errors.
Files/locations found (from repo search):
- components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
- "--verbose" arg defined (line ~68)
- checks parsed_args.debug (line ~157)
- forwards/appends "--debug" to the native command (line ~237-238)
- components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
- "--verbose" arg defined (line ~100)
- checks parsed_args.debug (line ~186)
Minimal, safe fixes
- Make the flag name and attribute consistent across both scripts. E.g. add the flag as aliases and set dest="debug":
- args_parser.add_argument("--debug", "--verbose", "-v", dest="debug", action="store_true", help="Enable debug logging.")
This preserves both flag names for users while ensuring parsed_args.debug exists and the wrapper's forwarded "--debug" is accepted by the native script.- Alternatively, consistently use parsed_args.verbose everywhere and forward "--verbose" from the wrapper — but choose one approach across both scripts.
Recommended refactor (optional, but reduces drift)
- Centralize logging/CLI behaviour:
- add_logging_options(parser): adds the common --debug/--verbose flags (with a consistent dest)
- set_logging_level(parsed_args) or set_logging_level(debug: bool)
- run_docker_cmd(cmd: list[str], keep_artifacts_when_verbose: bool): runs docker + handles cleanup/logging
This will remove duplication and prevent future flag/name drift.
♻️ Duplicate comments (1)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
268-281: Propagate child exit code, pre-run DEBUG log, and robust cleanupCurrent code loses the inner exit code (always -1 on failure), only logs the command post-failure, and unlink can raise. Improve diagnosability and robustness without reintroducing stack traces. This repeats earlier feedback.
- ret_code: int - try: - subprocess.run(cmd, check=True) - ret_code = 0 - except Exception: - logger.error("Archive manager failed.") - logger.debug(f"Docker command failed: {' '.join(cmd)}") - ret_code = -1 - finally: - # Remove generated files - generated_config_path_on_host.unlink() - - return ret_code + ret_code: int + # Emit the full container command when debugging is enabled. + if logger.isEnabledFor(logging.DEBUG): + # Use proper shell-quoting for readability + logger.debug(f"Running Docker command: {shlex.join(cmd)}") + try: + proc = subprocess.run(cmd) + ret_code = proc.returncode + if 0 != ret_code: + logger.error("Archive manager failed.") + logger.debug(f"Docker command failed: {shlex.join(cmd)}") + except Exception as e: + # Failure to start the process (not an inner failure). + logger.error(f"Failed to start Docker command: {e}") + logger.debug(f"Docker command: {shlex.join(cmd)}") + ret_code = -1 + finally: + # Remove generated files; tolerate if already removed. + try: + # Keep the generated config when verbose for easier debugging. + if not parsed_args.verbose: + generated_config_path_on_host.unlink() + except FileNotFoundError: + pass + + return ret_codeOutside the selected range, add:
import shlex
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py(5 hunks)components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
🔇 Additional comments (4)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)
271-274: Dataset-context in logs: nice improvementThe dataset-specific context in the “find” flow improves diagnosability without being noisy.
338-340: Dataset-context in logs: nice improvement (delete path)Mirroring the “find” flow logging for delete is clear and useful.
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (2)
182-182: Wording nit fixed: storage-type message reads wellMessage now reads cleanly without the extra space; keeps user output consistent across scripts.
265-265: Unsupported subcommand return code: goodReturning -1 makes the failure explicit to callers.
| if parsed_args.debug: | ||
| logger.setLevel(logging.DEBUG) | ||
| else: | ||
| logger.setLevel(logging.INFO) |
There was a problem hiding this comment.
Fix crash: parsed_args.debug doesn’t exist; use parsed_args.verbose
This code will raise AttributeError because the parser only defines --verbose/-v. Use the correct attribute, and be defensive with getattr.
- if parsed_args.debug:
- logger.setLevel(logging.DEBUG)
- else:
- logger.setLevel(logging.INFO)
+ if getattr(parsed_args, "verbose", False):
+ logger.setLevel(logging.DEBUG)
+ else:
+ logger.setLevel(logging.INFO)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if parsed_args.debug: | |
| logger.setLevel(logging.DEBUG) | |
| else: | |
| logger.setLevel(logging.INFO) | |
| if getattr(parsed_args, "verbose", False): | |
| logger.setLevel(logging.DEBUG) | |
| else: | |
| logger.setLevel(logging.INFO) |
🤖 Prompt for AI Agents
In
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
around lines 186 to 189, the code references parsed_args.debug which does not
exist; replace it with the correct parsed_args.verbose and guard access with
getattr to avoid AttributeError. Specifically, check something like
getattr(parsed_args, "verbose", False) (or >0 if verbose is a count) and set
logger.setLevel(logging.DEBUG) when verbose is truthy, otherwise set INFO;
ensure no direct attribute access to parsed_args.debug remains.
| logger.info(f"Starting to find archives{dataset_specific_message} from the database.") | ||
| try: | ||
| sql_adapter: SQL_Adapter = SQL_Adapter(database_config) | ||
| clp_db_connection_params: dict[str, any] = ( |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Type hint uses built-in any (function), not typing.Any
Use typing.Any to avoid type checker confusion and accidental shadowing.
- clp_db_connection_params: dict[str, any] = (
+ clp_db_connection_params: dict[str, typing.Any] = (📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| clp_db_connection_params: dict[str, any] = ( | |
| clp_db_connection_params: dict[str, typing.Any] = ( |
🤖 Prompt for AI Agents
In
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
around line 276, the type hint uses the built-in any (function) which can
confuse type checkers; replace it with typing.Any (or import Any from typing and
use dict[str, Any]) — update the hint to use typing.Any and add the appropriate
import if needed.
There was a problem hiding this comment.
This will be fixed in another PR.
There was a problem hiding this comment.
@haiqi96, understood! Thanks for letting me know this will be addressed in a separate PR. That makes sense to keep the scope of this PR focused on the logging improvements.
🧠 Learnings used
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (7)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (7)
197-199: Avoid bare except; don’t catch BaseExceptionCatching BaseException will swallow SystemExit/KeyboardInterrupt. Limit to Exception.
- except: + except Exception: logger.exception("Failed to load config.") return -1
253-259: Incorrect type/default for end_ts; make begin_ts/end_ts Optional[int] with None defaultsThe current signature sets a typing object as the default value and types begin_ts as int, while your CLI makes it optional. This causes inconsistent behaviour and type hints.
-def _find_archives( - archives_dir: Path, - database_config: Database, - dataset: typing.Optional[str], - begin_ts: int, - end_ts: int = typing.Optional[int], -) -> int: +def _find_archives( + archives_dir: Path, + database_config: Database, + dataset: typing.Optional[str], + begin_ts: typing.Optional[int] = None, + end_ts: typing.Optional[int] = None, +) -> int:
260-269: Docstring return code is inconsistent with implementationDoc says “1 on failure” but function returns -1 on failure; align the docstring or the code. The rest of the module uses -1 for failures.
- :return: 0 on success, 1 on failure. + :return: 0 on success, -1 on failure.
283-294: Handle optional begin_ts/end_ts correctly in the SELECT; avoid passing None as a bound paramRight now the query always binds begin_ts, which may be None; that can yield no rows or DB errors. Build WHERE clauses conditionally.
- query_params: typing.List[int] = [begin_ts] - query: str = ( - f""" - SELECT id FROM `{get_archives_table_name(table_prefix, dataset)}` - WHERE begin_timestamp >= %s - """ - ) - if end_ts is not None: - query += " AND end_timestamp <= %s" - query_params.append(end_ts) + query_params: typing.List[typing.Any] = [] + query: str = f"SELECT id FROM `{get_archives_table_name(table_prefix, dataset)}`" + where_clauses: typing.List[str] = [] + if begin_ts is not None: + where_clauses.append("begin_timestamp >= %s") + query_params.append(begin_ts) + if end_ts is not None: + where_clauses.append("end_timestamp <= %s") + query_params.append(end_ts) + if where_clauses: + query += " WHERE " + " AND ".join(where_clauses)Optionally, tweak the “No archives found…” message to be generic when no time filters are provided.
340-343: Second occurrence of built-in any in type hintMirror the earlier fix to use typing.Any.
- clp_db_connection_params: dict[str, any] = database_config.get_clp_connection_params_and_type( + clp_db_connection_params: dict[str, typing.Any] = database_config.get_clp_connection_params_and_type( True )
389-397: Protect disk deletion with try/except and fail gracefullyA failed rmtree will raise and abort the entire operation without a clear message per archive. Handle errors per archive.
- for archive_id in archive_ids: + for archive_id in archive_ids: archive_path = archive_output_dir / archive_id if not archive_path.is_dir(): logger.warning(f"Archive {archive_id} is not a directory. Skipping deletion.") continue logger.info(f"Deleting archive {archive_id} from disk.") - shutil.rmtree(archive_path) + try: + shutil.rmtree(archive_path) + except Exception as e: + logger.error(f"Failed to delete archive directory {archive_path}: {e}") + return -1
34-34: Prefer logger name name over fileUsing module name is the common practice and integrates better with hierarchical logging.
-logger: logging.Logger = logging.getLogger(__file__) +logger: logging.Logger = logging.getLogger(__name__)
♻️ Duplicate comments (3)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (3)
275-276: Type hint uses built-in any; use typing.AnyThis avoids type-checker confusion.
- clp_db_connection_params: dict[str, any] = ( + clp_db_connection_params: dict[str, typing.Any] = ( database_config.get_clp_connection_params_and_type(True) )
98-103: Align flag naming and consider aliasing --debug (no short -d) for compatibilityYou define --verbose/-v here but the PR summary and wrapper talk about --debug; ensure consistency across modules to avoid confusion. If you want backwards compatibility without colliding with --dataset, alias the long flag only.
Apply this diff to alias --debug to the same dest (do not add -d to avoid collision with --dataset):
- args_parser.add_argument( - "--verbose", - "-v", - action="store_true", - help="Enable debug logging.", - ) + args_parser.add_argument( + "--verbose", + "--debug", + "-v", + dest="verbose", + action="store_true", + help="Enable debug logging.", + )
185-188: Fix crash: parsed_args.debug doesn’t exist; use parsed_args.verboseThis will raise AttributeError at runtime. Use the correct attribute and guard via getattr.
- if parsed_args.debug: + if getattr(parsed_args, "verbose", False): logger.setLevel(logging.DEBUG) else: logger.setLevel(logging.INFO)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)
271-273: LGTM: dataset-context logging is clear and user-friendlyThe dataset suffix improves traceability of operations.
337-339: LGTM: dataset-context logging for deletions improves clarityGood addition; mirrors the find path.
| args_parser.add_argument( | ||
| "--verbose", | ||
| "-v", | ||
| action="store_true", | ||
| help="Enable debug logging.", | ||
| ) |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Repository-wide consistency check: align flag usage and forwarding
Ensure the wrapper forwards --verbose (not --debug) and checks parsed_args.verbose, or alias --debug in both places as suggested.
Run this script to find and align all occurrences:
Also applies to: 185-188
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Searching for mismatches of verbose/debug flags and attributes..."
rg -n --hidden --no-ignore-vcs $'--debug\\b|parsed_args\\.debug\\b|--verbose\\b|parsed_args\\.verbose\\b'
echo
echo "Suggestion: in wrapper(s), forward --verbose to the native script and use parsed_args.verbose for logging level checks."Length of output: 752
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Showing target file (if present): components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py"
if [ -f components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py ]; then
sed -n '1,260p' components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py || true
else
echo "File not found: components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py"
fi
echo
echo "Repository-wide search for --debug / --verbose usages (including parsed_args.debug / parsed_args.verbose)..."
rg -n --hidden --no-ignore-vcs \
-e '--debug\b' \
-e 'parsed_args\.debug\b' \
-e '--verbose\b' \
-e 'parsed_args\.verbose\b' \
-e '"--debug"' \
-e '"--verbose"' \
|| true
echo
echo "Also searching for add_argument calls mentioning debug/verbose..."
rg -n --hidden --no-ignore-vcs -e "add_argument\([^)]*--debug" -e "add_argument\([^)]*--verbose" || trueLength of output: 9532
Align --verbose/--debug flag usage across archive_manager scripts
The native script defines --verbose/-v but both scripts check parsed_args.debug and the wrapper forwards --debug — unify them.
Files to change
- components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
- Replace checks of parsed_args.debug with parsed_args.verbose (around the parse_args/logger setup — search for "if parsed_args.debug").
- components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
- Replace parsed_args.debug -> parsed_args.verbose (lines ~157 and ~237).
- Forward --verbose (not --debug) when invoking the native script (change archive_manager_cmd.append("--debug") -> archive_manager_cmd.append("--verbose")).
- Confirm add_argument("--verbose", "-v", ...) is present (line ~68) or add a compat alias if needed.
Suggested minimal diffs
- native file
- parsed_args: argparse.Namespace = args_parser.parse_args(argv[1:])
- if parsed_args.debug:
- logger.setLevel(logging.DEBUG)
+ parsed_args: argparse.Namespace = args_parser.parse_args(argv[1:])
+ if parsed_args.verbose:
+ logger.setLevel(logging.DEBUG)
else:
logger.setLevel(logging.INFO)
- wrapper file (forwarding)
- if parsed_args.debug:
- logger.setLevel(logging.DEBUG)
+ if parsed_args.verbose:
+ logger.setLevel(logging.DEBUG)
- archive_manager_cmd.append("--debug")
+ archive_manager_cmd.append("--verbose")
Optional: if you need backward compatibility, add an alias:
args_parser.add_argument("--debug", action="store_true", dest="verbose", help="legacy alias for --verbose")
…_manager.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (4)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
186-195: Log the dataset value (and when default is applied)PR objective calls for explicitly logging the dataset used. Emit a message when defaulting and when a dataset is provided.
- if StorageEngine.CLP_S == storage_engine: - dataset = CLP_DEFAULT_DATASET_NAME if dataset is None else dataset + if StorageEngine.CLP_S == storage_engine: + if dataset is None: + dataset = CLP_DEFAULT_DATASET_NAME + logger.info(f"No dataset specified; defaulting to `{dataset}`.") + else: + logger.info(f"Using dataset `{dataset}`.") try: clp_db_connection_params = clp_config.database.get_clp_connection_params_and_type(True) validate_dataset_name(clp_db_connection_params["table_prefix"], dataset) except Exception as e: logger.error(e) return -1components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (3)
253-260: Bug: invalid type/default for end_ts in _find_archives signatureend_ts is annotated as int but defaulted to typing.Optional[int], which is a typing object. This will break comparisons and SQL param handling. Use Optional[int] with a None default.
def _find_archives( archives_dir: Path, database_config: Database, dataset: typing.Optional[str], begin_ts: int, - end_ts: int = typing.Optional[int], + end_ts: typing.Optional[int] = None, ) -> int:
153-157: Typo in comment: “subcomand” → “subcommand”Minor nit for readability.
- # Delete by ID subcomand + # Delete by ID subcommand
23-31: Avoid importing CLI constants from the wrapper; move shared constants to a common modulenative.archive_manager importing from scripts.archive_manager creates cross-layer coupling. Consider placing shared constants (e.g., FIND_COMMAND, args names) in a dedicated shared module (e.g., clp_package_utils.scripts.common.archive_manager_consts) consumed by both to reduce import cycles and improve maintainability.
♻️ Duplicate comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (2)
274-279: Type hint uses built-in any; use typing.Any insteadReplace any with typing.Any for correctness with type checkers. This was flagged previously and still applies here.
- clp_db_connection_params: dict[str, any] = ( + clp_db_connection_params: dict[str, typing.Any] = ( database_config.get_clp_connection_params_and_type(True) )
338-343: Type hint uses built-in any; use typing.Any insteadSame issue here; switch to typing.Any.
- clp_db_connection_params: dict[str, any] = database_config.get_clp_connection_params_and_type( + clp_db_connection_params: dict[str, typing.Any] = database_config.get_clp_connection_params_and_type( True )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these settings in your CodeRabbit configuration.
📒 Files selected for processing (2)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py(5 hunks)components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (6)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (3)
156-160: LGTM: logger level respects --verboseLogger level toggling is correct and aligns with the new flag.
182-183: LGTM: clearer storage-type error messageMessage reads well and is user-friendly.
237-239: LGTM: forwarding --verbose to native scriptFlag propagation is consistent with the native CLI.
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (3)
98-103: LGTM: adds --verbose/-v to the native CLIFlag definition and help text look good.
185-188: LGTM: set logger level from --verboseCorrectly honours verbose mode.
271-273: LGTM: dataset-specific context in logsClear, actionable logging that indicates the dataset context.
| args_parser.add_argument( | ||
| "--verbose", | ||
| "-v", | ||
| action="store_true", | ||
| help="Enable debug logging.", | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Verbose flag addition is correct; consider adding a hidden --debug alias for back-compat
Using --verbose/-v avoids the -d confusion. If prior users tried --debug, add a hidden alias mapping to verbose.
args_parser.add_argument(
"--verbose",
"-v",
action="store_true",
help="Enable debug logging.",
)
+ # Back-compat alias; map --debug -> --verbose without exposing in help
+ args_parser.add_argument("--debug", dest="verbose", action="store_true", help=argparse.SUPPRESS)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| args_parser.add_argument( | |
| "--verbose", | |
| "-v", | |
| action="store_true", | |
| help="Enable debug logging.", | |
| ) | |
| args_parser.add_argument( | |
| "--verbose", | |
| "-v", | |
| action="store_true", | |
| help="Enable debug logging.", | |
| ) | |
| # Back-compat alias; map --debug -> --verbose without exposing in help | |
| args_parser.add_argument("--debug", dest="verbose", action="store_true", help=argparse.SUPPRESS) |
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
around lines 67 to 72, add a hidden backwards-compatible alias --debug that maps
to the same boolean flag as --verbose/-v; implement this by adding a second
add_argument for "--debug" with action="store_true", dest="verbose" (so it sets
the same variable) and help set to argparse.SUPPRESS so it won’t appear in help
output.
junhaoliao
left a comment
There was a problem hiding this comment.
for the title, since this is an improvement, would it be better to use the feat keyword in the title?
how about
feat(package): Add verbose logging option to archive_manager and improve error handling.
archive manager scripts.…ove error handling. (y-scope#1173) Co-authored-by: Junhao Liao <junhao@junhao.ca>
Description
This PR slight improves the log messages in archive manager.
The 2 and 3 can also be a general strategy to apply to issues described in #1081. We can introduce a few helper functions for reuse, for example
Checklist
breaking change.
Validation performed
Manually tested the behavior
Summary by CodeRabbit
New Features
Improvements