fix(package): Fail job and report failure to user for compression jobs that encounter at least one invalid input path (fixes #308).#1125
Conversation
…atus_msg to user on compression job success
WalkthroughThe scheduler’s filesystem input processing now validates and buffers paths while collecting invalid-path messages. A new helper writes these messages to a per-job user log. The main search-and-schedule flow handles FS invalid paths, writes the log, marks jobs FAILED, broadens S3 error handling, and adjusts buffer flush/sequence. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant S as Scheduler
participant FS as _process_fs_input_paths
participant L as _write_failed_path_log
participant J as JobStore
rect rgba(230,245,255,0.6)
note over S: FS input handling (updated)
S->>FS: Validate and buffer FS paths
FS-->>S: invalid_path_messages (List[str])
alt invalid_path_messages not empty
S->>L: Write failed paths log (messages, logs_dir, job_id)
L-->>S: log_path or None
S->>J: Update job -> FAILED with status_msg (log_path or scheduler logs path)
S->>J: Commit
S-->>S: Continue to next job
else no invalid messages
S->>S: Flush buffer, collect tasks/partitions
note over S: Proceed with scheduling flow
end
end
sequenceDiagram
autonumber
participant S as Scheduler
participant S3 as _process_s3_input
participant J as JobStore
rect rgba(255,240,230,0.6)
note over S: S3 input handling (broadened)
S->>S3: Process S3 input
alt success
S-->>S: Continue normal scheduling
else exception
S3-->>S: throw Error
S->>J: Update job -> FAILED with "S3 Failure: {err}"
S->>J: Commit
S-->>S: Continue to next job
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
Quick question: what would appear on webui if a job partially fail? Edit: I guess this is irrelevant now since a job will either pass or fail. |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(4 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: haiqi96
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-07-29T14:04:13.769Z
Learning: User haiqi96 requested creating a GitHub issue to document a bug fix from PR #1136, which addressed MySQL compatibility issues with invalid SQL CAST operations in the WebUI component.
Learnt from: gibber9809
PR: y-scope/clp#504
File: components/core/src/clp_s/search/kql/CMakeLists.txt:29-29
Timestamp: 2024-10-22T15:36:04.655Z
Learning: When reviewing pull requests, focus on the changes within the PR and avoid commenting on issues outside the scope of the PR.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
📚 Learning: in the clp-package compression flow, path validation and error handling is performed at the schedule...
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
📚 Learning: in `fs_compression_task.py`, when refactoring the archive processing loop in python, the `src_archiv...
Learnt from: haiqi96
PR: y-scope/clp#634
File: components/job-orchestration/job_orchestration/executor/compress/fs_compression_task.py:0-0
Timestamp: 2024-12-16T21:49:06.483Z
Learning: In `fs_compression_task.py`, when refactoring the archive processing loop in Python, the `src_archive_file.unlink()` operation should remain outside of the `process_archive` function.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (4)
100-100: Inconsistency between PR objectives and implementation.The PR objectives state the goal is to "report all partial failures during otherwise-successful compression jobs" and mention adding an
appendparameter toupdate_compression_job_metadata()to concatenate error messages. However, the implementation now fails entire jobs when any path is invalid, contradicting the stated objective of allowing partial failures.Likely an incorrect or invalid review comment.
127-127: Consistent error handling but may be overly aggressive.Similar to line 112, this raises a
ValueErrorfor invalid internal paths within directories, causing the entire job to fail. While consistent with the top-level path handling, this approach prevents processing any valid paths when even one invalid path is encountered.⛔ Skipped due to learnings
Learnt from: AVMatthews PR: y-scope/clp#543 File: components/core/src/clp_s/clp-s.cpp:256-258 Timestamp: 2024-10-03T14:47:22.293Z Learning: Error handling for file path validation is performed in the `JsonParser` constructor.Learnt from: AVMatthews PR: y-scope/clp#543 File: components/core/src/clp_s/clp-s.cpp:256-258 Timestamp: 2024-10-08T15:52:50.753Z Learning: Error handling for file path validation is performed in the `JsonParser` constructor.Learnt from: haiqi96 PR: y-scope/clp#651 File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0 Timestamp: 2025-01-16T16:58:43.190Z Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.Learnt from: AVMatthews PR: y-scope/clp#543 File: components/core/src/clp_s/JsonParser.cpp:654-656 Timestamp: 2024-10-08T15:52:50.753Z Learning: In the `parse_kv_log_event` function, catch-all exception handlers are acceptable because exceptions are intentionally rethrown to be handled gracefully at a higher level.Learnt from: AVMatthews PR: y-scope/clp#543 File: components/core/src/clp_s/JsonParser.cpp:654-656 Timestamp: 2024-10-07T21:18:02.510Z Learning: In the `parse_kv_log_event` function, catch-all exception handlers are acceptable because exceptions are intentionally rethrown to be handled gracefully at a higher level.Learnt from: haiqi96 PR: y-scope/clp#594 File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110 Timestamp: 2024-11-15T16:21:52.122Z Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.
210-223: Verify partial-failure behaviour against PR objectives and issue #308The current logic aborts the entire compression job on any invalid input path, but the PR’s stated goal is to record individual path failures and allow the overall job to succeed. We need to confirm the intended semantics before merging.
Please:
- Review
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(lines 210–223) to ensure invalid paths are logged and marked as failed, without terminating the whole job.- Consult the CLP issue #308 (internal tracker) or related PR documentation to clarify whether “partial success” behaviour was specified.
- Update the error-handling logic so that:
- Each bad path is reported in job metadata.
- The job status reflects a partial-success state rather than outright failure, if that aligns with the intended fix.
112-112: [file content truncated for brevity]
Let's focus on the function that handles compression scheduling. We'll search fordef schedule_compressionor similar.
| except ValueError as ex: | ||
| logger.error(str(ex)) | ||
| continue | ||
| raise ValueError(str(ex)) |
There was a problem hiding this comment.
We are duplicating the try-catch in both this method and it's caller, which is unnecessary. In addition, if we are going to log an exception, we can use logger.exception which will print some fancy stack trace for you.
@kirkrodrigues
Not an expert of error handling so want to get your opinion. I feel it's not a good practice to catch an error and re-raise it. We can go either
- Remove the try-except block in
_process_fs_input_pathsso it will throw the exception, and let caller handle the exception with try-catch. What Quinn currently have in the caller code should work. - Update
_process_fs_input_pathsrun try-catch, but return a True/False to indicate success or failure
I feel we should go with 2? but 1 isn't that bad either imo.
There was a problem hiding this comment.
Nvm, now I realized we also need to report the path that causes the error. which gives a good reason to re-raise with a different error message.
Can you raise an error message saying something like "blabla file is invalid blabla" and store that as our status message? (it should be within the 512 char limit if we only report the first file that causes the error)
@kirkrodrigues wonder if you see any better way to propagate the error message? Alternatively, the method can return a tuple of [Bool, Optional[str]] (success_flag, failure_flag), but I don't feel it is any better.
There was a problem hiding this comment.
I've changed it so that the first invalid file path is now reported; @kirkrodrigues let us know if you see a better way to propagate the error message.
There was a problem hiding this comment.
now I think about it again, we don't really need to catch in _process_fs_input_paths since we are just rethrowing the error?
we can simply let caller at line 211 to catch the exception (and I think you don't even need to change the caller try except)
I will ask Kirk to confirm
…ess/compression_scheduler.py Co-authored-by: haiqi96 <14502009+haiqi96@users.noreply.github.com>
| except ValueError as ex: | ||
| logger.error(str(ex)) | ||
| continue | ||
| raise ValueError(str(ex)) |
There was a problem hiding this comment.
now I think about it again, we don't really need to catch in _process_fs_input_paths since we are just rethrowing the error?
we can simply let caller at line 211 to catch the exception (and I think you don't even need to change the caller try except)
I will ask Kirk to confirm
|
@haiqi96 and @kirkrodrigues, here are my thoughts on what I should change re. catching the error: I could omit OR I could omit Not sure about the whole "double-raising" thing; I don't know how else to pass the error message from |
|
Sorry, just getting to this now. I feel failing on the first invalid path is a bit harsh since a user might be compressing a directory with files that change or get deleted (e.g., files in Ideally, we should report a partial error with the job and what paths failed, so that the user could retry just those paths if they like (and maybe one day we can even support retrying the files at the click of a button). However, before then, I think a reasonable compromise is to log the failing paths and inform the user to check the logs for which paths failed. What do y'all think? |
I am not against it. in that case, the
@kirkrodrigues any comment? |
Sure |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
267-278: Revise redundant/ambiguous failure branch when no tasks are created.Given early failure on invalid paths, this branch likely means “no compressible inputs found” (e.g., only empty dirs). Update the message for clarity, or remove the branch if it’s truly unreachable.
- "status_msg": "invalid input path", + "status_msg": "No compressible inputs found",
♻️ Duplicate comments (2)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (2)
114-119: Avoid catch-and-rethrow; preserve original traceback and remove duplicate logging.Catching ValueError, logging, and re-raising a new ValueError loses the original traceback and duplicates logs. Let the original exception propagate; the caller already logs with logger.exception. This mirrors prior feedback.
- except ValueError as ex: - logger.error(str(ex)) - raise ValueError(str(ex)) + except ValueError: + # Let caller log and handle; preserve original traceback. + raise
127-134: Same as above for internal paths.Propagate the original ValueError without re-wrapping and avoid duplicate logging.
- except ValueError as ex: - logger.error(str(ex)) - raise ValueError(str(ex)) + except ValueError: + raise
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
components/job-orchestration/job_orchestration/scheduler/constants.py (1)
CompressionJobStatus(27-32)
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (2)
128-137: Also collect invalid internal paths from directory traversal.Same reasoning as above; don't just flip a flag.
Apply:
- except ValueError as ex: - logger.error(str(ex)) - paths_ok = False + except ValueError as ex: + logger.error(str(ex)) + invalid_paths.append(str(internal_path)) continue
241-254: Bound status_msg to column size; preserve details.Truncate to avoid DB errors and keep the salient part.
Apply:
- update_compression_job_metadata( + update_compression_job_metadata( db_cursor, job_id, { "status": CompressionJobStatus.FAILED, - "status_msg": f"S3 Failure: {err}", + "status_msg": f"S3 Failure: {err}"[:512], }, )
♻️ Duplicate comments (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
224-241: Fail the job early with a per-job invalid-paths log; don't proceed to schedule tasks.Current code only sets a note and continues, which contradicts the PR objective and the reviewers’ consensus to fail on invalid inputs and point users to a generated log.
Apply:
- if input_type == InputType.FS.value: - paths_ok = _process_fs_input_paths(input_config, paths_to_compress_buffer) - if not paths_ok: - error_msg = ( - "NOTE: At least one input path could not be processed. " - "Check the compression_scheduler log for more details." - ) - update_compression_job_metadata( - db_cursor, - job_id, - { - "status_msg": error_msg, - }, - ) - db_conn.commit() + if input_type == InputType.FS.value: + invalid_paths = _process_fs_input_paths(input_config, paths_to_compress_buffer) + if invalid_paths: + logs_dir = Path(os.getenv("CLP_LOGS_DIR", ".")) + invalid_log = logs_dir / f"compression_job_{job_id}_invalid_paths.txt" + try: + invalid_log.write_text("\n".join(invalid_paths), encoding="utf-8") + except Exception: + logger.exception("Failed to write invalid paths log") + error_msg = f"Invalid input path(s). See {invalid_log.name}." + update_compression_job_metadata( + db_cursor, + job_id, + { + "status": CompressionJobStatus.FAILED, + "status_msg": error_msg[:512], + }, + ) + db_conn.commit() + continue
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
components/clp-package-utils/clp_package_utils/scripts/native/compress.py(1 hunks)components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(4 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/native/compress.pycomponents/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/native/compress.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (2)
components/clp-py-utils/clp_py_utils/compression.py (2)
validate_path_and_get_info(94-128)add_file(20-25)components/job-orchestration/job_orchestration/scheduler/compress/partition.py (2)
add_file(52-57)add_empty_directory(59-62)
🔇 Additional comments (2)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (2)
272-281: Message clarity LGTM; keep as a safety net.“All provided paths were invalid.” is clear. If you adopt early-fail on first invalid detection, this block still protects against edge cases where none yielded tasks but no explicit invalids were captured.
144-145: Return the list of invalid paths.Apply:
- return paths_ok + return invalid_pathsLikely an incorrect or invalid review comment.
|
@haiqi96 and @kirkrodrigues, I’ve pushed a new implementation. The core elements of the new strategy are a combination of your proposals, as follows:
That being said, I felt that part of the solution that Haiqi proposed, i.e.
could instead be fulfilled by the logging infrastructure already used in this function (the failed path is already being recorded to the Let me know what you think. |
I don't agree with the approach of logging the paths in compression_scheduer.
Because of the points above, it would makes sense to generate a separate log file for clear/efficient error reporting. |
@haiqi96 that's true. But you also mention:
... I could rig it so that it prints out the specific path of the As for this point:
that's fair. @haiqi96, I guess my objection to your proposal that " Also, see my comment re. the "immediate failure" vs "continue compression" discussion. |
There was a problem hiding this comment.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(6 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
📚 Learning: 2025-08-25T06:29:59.610Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1261
File: .github/workflows/clp-core-build.yaml:294-332
Timestamp: 2025-08-25T06:29:59.610Z
Learning: In the CLP project, Bill-hbrhbr prefers a "fail fast" approach for CI workflows - allowing potential command availability issues (like getconf in musllinux) to surface through CI failures rather than preemptively adding fallback logic, as they will fix issues when they occur.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (2)
components/clp-py-utils/clp_py_utils/compression.py (2)
validate_path_and_get_info(94-128)add_file(20-25)components/job-orchestration/job_orchestration/scheduler/constants.py (1)
CompressionJobStatus(27-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: package-image
🔇 Additional comments (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
121-126: Confirm whether validate_path_and_get_info already logs; avoid duplicate error entries.If
validate_path_and_get_infodoesn’t log, keepinglogger.error(error_msg)here is correct. If it does, this will double-log per invalid path. Please confirm and adjust.Also applies to: 141-145
| # if len(tasks) == 0: | ||
| # logger.warning(f"No tasks were created for job {job_id}") | ||
| # update_compression_job_metadata( | ||
| # db_cursor, | ||
| # job_id, | ||
| # { | ||
| # "status": CompressionJobStatus.FAILED, | ||
| # "status_msg": "All provided paths were invalid.", | ||
| # }, | ||
| # ) | ||
| # db_conn.commit() | ||
| # continue |
There was a problem hiding this comment.
@kirkrodrigues do you think that this commented section can be deleted? I have confirmed (best that I can) that this if statement will only be entered if all paths are invalid (there are no other conditions that would cause len(tasks) to be 0)
kirkrodrigues
left a comment
There was a problem hiding this comment.
For the PR title, how about:
fix(package): Fail job and report failure to user for compression jobs that encounter at least one invalid input path (fixes #308).
@quinntaylormitchell can you re-test before I approve?
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(5 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: quinntaylormitchell
PR: y-scope/clp#1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
📚 Learning: 2025-09-15T22:20:40.750Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
📚 Learning: 2025-08-25T06:29:59.610Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1261
File: .github/workflows/clp-core-build.yaml:294-332
Timestamp: 2025-08-25T06:29:59.610Z
Learning: In the CLP project, Bill-hbrhbr prefers a "fail fast" approach for CI workflows - allowing potential command availability issues (like getconf in musllinux) to surface through CI failures rather than preemptively adding fallback logic, as they will fix issues when they occur.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
📚 Learning: 2025-09-19T18:28:26.747Z
Learnt from: gibber9809
PR: y-scope/clp#1169
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:463-469
Timestamp: 2025-09-19T18:28:26.747Z
Learning: In the compression scheduler (components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py), the SIGTERM handling uses a coarse-grained approach that only checks received_sigterm before calling search_and_schedule_new_tasks. The maintainers consider the race condition where SIGTERM arrives mid-execution to be benign, as scheduling a few extra tasks during shutdown is acceptable and will be handled by cleanup mechanisms.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (2)
components/clp-py-utils/clp_py_utils/compression.py (2)
validate_path_and_get_info(94-128)add_file(20-25)components/job-orchestration/job_orchestration/scheduler/constants.py (1)
CompressionJobStatus(27-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (4)
9-9: Typed imports expansion looks good.This aligns with the new return types and helper signature.
111-151: Path gating + full error collection is consistent with fail‑on‑any‑invalid.The
paths_okgate prevents unnecessary buffering once failure is inevitable, while still enumerating all invalid entries for the per‑job log. Good trade‑off.
175-205: Don’t silently swallow log‑write failures; emit warnings and dedupe messages.If directory/file creation fails, we return None without any trace. Log at warning level and dedupe to keep user logs concise.
def _write_failed_path_log( invalid_path_messages: List[str], logs_directory: Path, job_id: Any ) -> Optional[Path]: @@ - try: - user_logs_dir.mkdir(parents=True, exist_ok=True) - except Exception: - return None + try: + user_logs_dir.mkdir(parents=True, exist_ok=True) + except Exception as e: + logger.warning(f"Failed to create user logs dir {user_logs_dir}: {e}") + return None @@ - try: - with log_path.open("w", encoding="utf-8") as f: + try: + with log_path.open("w", encoding="utf-8") as f: timestamp = datetime.datetime.now().isoformat(timespec="seconds") f.write(f"Failed input paths log.\nGenerated at {timestamp}.\n\n") - for msg in invalid_path_messages: + for msg in dict.fromkeys(invalid_path_messages): # preserve order; remove dups f.write(f"{msg.rstrip()}\n") - except Exception: - return None + except Exception as e: + logger.warning(f"Failed to write user log {log_path}: {e}") + return None
265-289: Point users to the concrete scheduler log file in the fallback path.The current message references a directory; link the actual file for better discoverability.
- if user_log_path is None: - error_msg = base_msg + ( - f" Check the compression scheduler logs in {clp_config.logs_directory} for" - " more details." - ) + if user_log_path is None: + scheduler_log = (Path(os.getenv("CLP_LOGS_DIR", str(clp_config.logs_directory))) + / "compression_scheduler.log") + error_msg = f"{base_msg} Check {scheduler_log} for more details."
…s that encounter at least one invalid input path (fixes y-scope#308). (y-scope#1125) Co-authored-by: haiqi96 <14502009+haiqi96@users.noreply.github.com> Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Description
This PR fixes issue #308 by changing the reporting system that provides the user with information about the status of their compression jobs.
Any compression job that encounters at least one invalid path will be failed. The user will be notified that at least one invalid path has been encountered, and they will be directed to a log file that communicates all these invalid paths.
Checklist
breaking change.
Validation performed
Compressed three arrangements of log files:
The correct log file was produced in all cases, and the user was notified correctly in all cases.
Summary by CodeRabbit
New Features
Bug Fixes