fix(clp-package): Write compression task failure errors to a log, and store the log path (instead of the errors) in status_msg (fixes #716).#1425
Conversation
WalkthroughReplaced Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Scheduler
participant Search as "search_and_schedule_new_tasks"
participant Poll as "poll_running_jobs"
participant Logger as "_write_user_failure_log"
rect rgb(235,245,255)
Scheduler->>Search: validate inputs
alt invalid inputs
Search->>Logger: _write_user_failure_log("Failed input paths log.", content, logs_directory, job_id, "failed_paths")
Logger-->>Search: returns relative path or None
alt returned path
Search-->>Scheduler: raise/error referencing relative log path
else None
Search-->>Scheduler: raise/error indicating log write failure
end
end
end
rect rgb(245,255,235)
Scheduler->>Poll: poll_running_jobs(logs_directory, db_conn, db_cursor)
Poll->>Poll: detect task failures -> collect List[str]
alt task failures
Poll->>Logger: _write_user_failure_log("Compression task errors.", content, logs_directory, job_id, "task_errors")
Logger-->>Poll: returns relative path or None
alt returned path
Poll-->>Scheduler: status/error referencing relative log path
else None
Poll-->>Scheduler: status/error indicating log write failure
end
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related issues
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (3)📓 Common learnings📚 Learning: 2025-01-16T16:58:43.190ZApplied to files:
📚 Learning: 2025-09-28T15:00:22.170ZApplied to files:
🧬 Code graph analysis (1)components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
🪛 Ruff (0.14.2)components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py187-187: Dynamically typed expressions (typing.Any) are disallowed in (ANN401) 206-206: Do not catch blind exception: (BLE001) 207-207: Use Replace with (TRY400) 213-213: (DTZ005) 217-217: Do not catch blind exception: (BLE001) 218-218: Use Replace with (TRY400) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (2)
213-242: Log exceptions before returning None.Similar to
_write_failed_compression_log, the exception handlers should log errors before returningNone.Apply this diff:
user_logs_dir = Path(logs_directory) / "user" try: user_logs_dir.mkdir(parents=True, exist_ok=True) - except Exception: + except Exception as e: + logger.error(f"Failed to create user logs directory: {e}") return None log_path = user_logs_dir / f"failed_paths_{job_id}.txt" try: with log_path.open("w", encoding="utf-8") as f: timestamp = datetime.datetime.now().isoformat(timespec="seconds") f.write(f"Failed input paths log.\nGenerated at {timestamp}.\n\n") for msg in invalid_path_messages: f.write(f"{msg.rstrip()}\n") - except Exception: + except Exception as e: + logger.error(f"Failed to write invalid paths log: {e}") return None
235-235: Consider adding timezone to datetime.now().For consistency with the suggestion in
_write_failed_compression_log, consider usingdatetime.datetime.now(datetime.timezone.utc).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
components/clp-package-utils/clp_package_utils/general.py(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/compress.py(3 hunks)components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(6 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: quinntaylormitchell
PR: y-scope/clp#1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
PR: y-scope/clp#1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (2)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
CLPConfig(576-769)components/job-orchestration/job_orchestration/scheduler/constants.py (1)
CompressionJobStatus(27-32)
🪛 Ruff (0.14.0)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
183-183: Dynamically typed expressions (typing.Any) are disallowed in job_id
(ANN401)
198-198: Do not catch blind exception: Exception
(BLE001)
204-204: datetime.datetime.now() called without a tz argument
(DTZ005)
207-207: Do not catch blind exception: Exception
(BLE001)
214-214: Dynamically typed expressions (typing.Any) are disallowed in job_id
(ANN401)
components/clp-package-utils/clp_package_utils/scripts/native/compress.py
94-94: Logging statement uses f-string
(G004)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: package-image
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: rust-checks (macos-15)
🔇 Additional comments (6)
components/clp-package-utils/clp_package_utils/scripts/native/compress.py (2)
93-94: LGTM! Environment variable expansion enables macro paths.The expansion of environment variables in
status_msgcorrectly enables the use of macro paths like${CLP_LOGS_DIR}/...introduced by this PR. SinceCLP_LOGS_DIRis set at line 247 before job handling, it will be available for expansion here.Note: The static analysis hint about f-strings in logging (G004) is a minor style preference and doesn't affect functionality.
247-247: LGTM! Setting CLP_LOGS_DIR enables macro path expansion.Setting
CLP_LOGS_DIRafter loading the configuration ensures the environment variable is available for macro expansion when the job fails and the status message is logged (line 93).components/clp-package-utils/clp_package_utils/general.py (1)
436-439: LGTM! Defensive check prevents double-appending hostname.The conditional check ensures that the hostname is only appended to
data_directoryandlogs_directoryif it's not already the final path component. This prevents issues where the configuration might already include the hostname or where the function is called multiple times.components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (3)
307-318: LGTM! Macro path construction handles 512-char limit.The use of macro paths (
${CLP_LOGS_DIR}/...) with relative log paths is a clean solution to avoid exceeding the 512-characterstatus_msglimit. The fallback to the scheduler log when user log writing fails is appropriate.
417-417: LGTM! Function signature updated to accept clp_config.The addition of the
clp_configparameter is necessary for accessinglogs_directorywhen writing failure logs.
560-560: LGTM! Updated function call to pass clp_config.The call to
poll_running_jobsnow correctly passesclp_configas required by the updated function signature.
junhaoliao
left a comment
There was a problem hiding this comment.
Path resolution and handling may be simplified once the Docker Compose changes in #1178 are merged, so it might be worth holding off on these changes
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (3)
205-205: Add timezone to datetime.now().For consistency and to avoid timezone-related issues, use
datetime.datetime.now(datetime.timezone.utc)instead ofdatetime.datetime.now().Apply this diff:
with log_path.open("w", encoding="utf-8") as f: - timestamp = datetime.datetime.now().isoformat(timespec="seconds") + timestamp = datetime.datetime.now(datetime.timezone.utc).isoformat(timespec="seconds") f.write(f"Failed compression job log.\nGenerated at {timestamp}.\n\n")
208-210: Consider using logger.error instead of logger.warning for critical failures.When the compression failure log cannot be written, this prevents users from seeing detailed failure information. Using
logger.errorwould better reflect the severity and make debugging easier.Apply this diff:
try: with log_path.open("w", encoding="utf-8") as f: timestamp = datetime.datetime.now().isoformat(timespec="seconds") f.write(f"Failed compression job log.\nGenerated at {timestamp}.\n\n") f.write(f"{status_msg.rstrip()}\n") except Exception as e: - logger.warning(f"Failed to write compression failure log: {e}") + logger.error(f"Failed to write compression failure log: {e}") return None
198-200: Consider using logger.error instead of logger.warning for critical failures.When the user logs directory cannot be created, this prevents users from seeing detailed failure information. Using
logger.errorwould better reflect the severity and make debugging easier.Apply this diff:
try: user_logs_dir.mkdir(parents=True, exist_ok=True) except Exception as e: - logger.warning(f"Failed to create user logs directory: {e}") + logger.error(f"Failed to create user logs directory: {e}") return None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(6 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: quinntaylormitchell
PR: y-scope/clp#1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (2)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
CLPConfig(576-769)components/job-orchestration/job_orchestration/scheduler/constants.py (1)
CompressionJobStatus(27-32)
🪛 Ruff (0.14.0)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
183-183: Dynamically typed expressions (typing.Any) are disallowed in job_id
(ANN401)
198-198: Do not catch blind exception: Exception
(BLE001)
199-199: Logging statement uses f-string
(G004)
205-205: datetime.datetime.now() called without a tz argument
(DTZ005)
208-208: Do not catch blind exception: Exception
(BLE001)
209-209: Logging statement uses f-string
(G004)
216-216: Dynamically typed expressions (typing.Any) are disallowed in job_id
(ANN401)
231-231: Do not catch blind exception: Exception
(BLE001)
232-232: Logging statement uses f-string
(G004)
242-242: Do not catch blind exception: Exception
(BLE001)
243-243: Logging statement uses f-string
(G004)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: package-image
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (2)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (2)
311-322: LGTM! Clean error handling with appropriate fallback.The macro path construction and fallback to the scheduler log when user log creation fails is well-implemented. The error message clearly directs users to the relevant log file.
421-421: LGTM! Function signature correctly updated.Adding
clp_configparameter enables access tologs_directoryfor writing user logs, which aligns with the PR objectives.
/var/log/user and store log path in status_msg (fixes #716).
| clp_config.validate_logs_input_config() | ||
| clp_config.validate_logs_dir() | ||
| clp_config.database.load_credentials_from_env() | ||
| os.environ["CLP_LOGS_DIR"] = str(clp_config.logs_directory) |
There was a problem hiding this comment.
I believe that the CLP_LOGS_DIR environment variable has already been set in the compression worker container in
There was a problem hiding this comment.
not sure why, but when I remove this line, the "CLP_LOGS_DIR" in the path stored in status_msg can't be expanded with os.path.expandvars(), and we get
Compression failed. One or more compression tasks failed. See ${CLP_LOGS_DIR}/user/failed_compression_log_3.txt for more details.
as our ERROR output. any thoughts as to why that might be happening?
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: quinntaylormitchell
PR: y-scope/clp#1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
components/job-orchestration/job_orchestration/scheduler/constants.py (1)
CompressionJobStatus(27-32)
🪛 Ruff (0.14.1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
10-10: typing.Dict is deprecated, use dict instead
(UP035)
10-10: typing.List is deprecated, use list instead
(UP035)
10-10: typing.Set is deprecated, use set instead
(UP035)
184-184: Use X | Y for type annotations
Convert to X | Y
(UP007)
186-186: Dynamically typed expressions (typing.Any) are disallowed in job_id
(ANN401)
206-206: Do not catch blind exception: Exception
(BLE001)
207-207: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
213-213: datetime.datetime.now() called without a tz argument
(DTZ005)
220-220: Do not catch blind exception: Exception
(BLE001)
221-221: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (5)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (5)
10-10: LGTM! Import addition is necessary.The
Unionimport is required for the new_write_user_failure_logfunction signature.
289-304: LGTM! Error handling is appropriate.The invalid path handling correctly writes the user log, validates the result, and provides a clear error message referencing the relative log path within the configured logs directory. The RuntimeError propagation is appropriate since
_write_user_failure_logalready logs the failure details.
403-403: LGTM! Signature update aligns with past feedback.The addition of
logs_directory: Pathparameter is necessary for writing user failure logs and aligns with the agreed approach to pass only the required parameter instead of the entire config object.Based on learnings.
454-470: LGTM! Consistent error handling pattern.The failed compression job handling mirrors the approach used for invalid paths (lines 289-304), maintaining consistency across failure scenarios. The error message clearly directs users to the log file location within their configured logs directory.
553-553: LGTM! Call site properly updated.The call to
poll_running_jobscorrectly passesclp_config.logs_directoryto match the updated function signature.
| def _write_user_failure_log( | ||
| content: Union[str, List[str]], | ||
| logs_directory: Path, | ||
| job_id: Any, | ||
| filename_prefix: str, | ||
| header_title: str, | ||
| ) -> Optional[Path]: | ||
| """ | ||
| Writes the error messages in `invalid_path_messages` to a log file, | ||
| `{logs_directory}/user/failed_paths_{job_id}.txt`. The directory will be created if it doesn't | ||
| already exist. | ||
| :param invalid_path_messages: | ||
| Writes a user-visible failure log to `{logs_directory}/user/{filename_prefix}_{job_id}.txt`. The | ||
| /user directory will be created if it doesn't already exist. `content` may be either a single | ||
| string or a List of strings. | ||
|
|
||
| :param content: | ||
| :param logs_directory: | ||
| :param job_id: | ||
| :return: Path to the written log file or `None` if error is encountered. | ||
| :param filename_prefix: | ||
| :param header_title: | ||
| :return: Path to the written log file relative to `logs_directory`, or None on error. | ||
| """ | ||
|
|
||
| user_logs_dir = Path(logs_directory) / "user" | ||
| relative_log_path = Path("user") / f"{filename_prefix}_{job_id}.txt" | ||
| user_logs_dir = Path(logs_directory) / relative_log_path.parent | ||
| try: | ||
| user_logs_dir.mkdir(parents=True, exist_ok=True) | ||
| except Exception: | ||
| except Exception as e: | ||
| logger.error("Failed to create user logs directory: '%s' - %s", user_logs_dir, e) | ||
| return None | ||
|
|
||
| log_path = user_logs_dir / f"failed_paths_{job_id}.txt" | ||
| log_path = Path(logs_directory) / relative_log_path | ||
| try: | ||
| with log_path.open("w", encoding="utf-8") as f: | ||
| timestamp = datetime.datetime.now().isoformat(timespec="seconds") | ||
| f.write(f"Failed input paths log.\nGenerated at {timestamp}.\n\n") | ||
| for msg in invalid_path_messages: | ||
| f.write(f"{msg.rstrip()}\n") | ||
| except Exception: | ||
| f.write(f"{header_title}\nGenerated at {timestamp}.\n\n") | ||
| if isinstance(content, str): | ||
| f.write(f"{content.rstrip()}\n") | ||
| else: | ||
| for item in content: | ||
| f.write(f"{str(item).rstrip()}\n") | ||
| except Exception as e: | ||
| logger.error("Failed to write compression failure user log: '%s' - %s", log_path, e) | ||
| return None | ||
|
|
||
| return log_path | ||
| return relative_log_path |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use logger.exception for exception handlers.
The exception handlers currently use logger.error, but logger.exception is preferred as it automatically includes the stack trace, which is valuable for debugging log writing failures.
Apply this diff:
try:
user_logs_dir.mkdir(parents=True, exist_ok=True)
except Exception as e:
- logger.error("Failed to create user logs directory: '%s' - %s", user_logs_dir, e)
+ logger.exception("Failed to create user logs directory: '%s'", user_logs_dir)
return None
log_path = Path(logs_directory) / relative_log_path
try:
with log_path.open("w", encoding="utf-8") as f:
timestamp = datetime.datetime.now().isoformat(timespec="seconds")
f.write(f"{header_title}\nGenerated at {timestamp}.\n\n")
if isinstance(content, str):
f.write(f"{content.rstrip()}\n")
else:
for item in content:
f.write(f"{str(item).rstrip()}\n")
except Exception as e:
- logger.error("Failed to write compression failure user log: '%s' - %s", log_path, e)
+ logger.exception("Failed to write compression failure user log: '%s'", log_path)
return NoneNote: logger.exception automatically includes the exception details, so the e parameter can be removed from the format string.
🧰 Tools
🪛 Ruff (0.14.1)
184-184: Use X | Y for type annotations
Convert to X | Y
(UP007)
186-186: Dynamically typed expressions (typing.Any) are disallowed in job_id
(ANN401)
206-206: Do not catch blind exception: Exception
(BLE001)
207-207: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
213-213: datetime.datetime.now() called without a tz argument
(DTZ005)
220-220: Do not catch blind exception: Exception
(BLE001)
221-221: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
lines 183-224: the except handlers use logger.error and pass the exception
object into the format string; change both to logger.exception and remove the
exception argument from the format string so the stack trace is automatically
logged (i.e., call logger.exception("Failed to create user logs directory:
'%s'", user_logs_dir) and logger.exception("Failed to write compression failure
user log: '%s'", log_path)).
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (2)
206-207: Uselogger.exceptionfor automatic stack trace inclusion.The exception handlers currently use
logger.error, butlogger.exceptionautomatically includes the stack trace, which is valuable for debugging log writing failures.Apply this diff:
try: user_logs_dir.mkdir(parents=True, exist_ok=True) - except Exception as e: - logger.error("Failed to create user logs directory: '%s' - %s", user_logs_dir, e) + except Exception: + logger.exception("Failed to create user logs directory: '%s'", user_logs_dir) return None log_path = logs_directory / relative_log_path try: with log_path.open("w", encoding="utf-8") as f: timestamp = datetime.datetime.now().isoformat(timespec="seconds") f.write(f"{title}\nGenerated at {timestamp}.\n\n") for item in content: f.write(f"{item.rstrip()}\n") - except Exception as e: - logger.error("Failed to write compression failure user log: '%s' - %s", log_path, e) + except Exception: + logger.exception("Failed to write compression failure user log: '%s'", log_path) return NoneBased on learnings
Also applies to: 217-218
213-213: Add timezone todatetime.now()for consistency.For consistency and to avoid timezone-related issues, use
datetime.datetime.now(datetime.timezone.utc)instead ofdatetime.datetime.now().Apply this diff:
- timestamp = datetime.datetime.now().isoformat(timespec="seconds") + timestamp = datetime.datetime.now(datetime.timezone.utc).isoformat(timespec="seconds")Based on learnings
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(8 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: quinntaylormitchell
PR: y-scope/clp#1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
PR: y-scope/clp#1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
components/job-orchestration/job_orchestration/scheduler/constants.py (1)
CompressionJobStatus(27-32)
🪛 Ruff (0.14.2)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
10-10: typing.Dict is deprecated, use dict instead
(UP035)
10-10: typing.List is deprecated, use list instead
(UP035)
10-10: typing.Set is deprecated, use set instead
(UP035)
186-186: Dynamically typed expressions (typing.Any) are disallowed in job_id
(ANN401)
206-206: Do not catch blind exception: Exception
(BLE001)
207-207: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
213-213: datetime.datetime.now() called without a tz argument
(DTZ005)
217-217: Do not catch blind exception: Exception
(BLE001)
218-218: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (5)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (5)
286-301: LGTM! Consistent error handling for invalid input paths.The error handling correctly writes the failure log and provides clear guidance to users about where to find detailed error information. The relative path approach ensures the status message stays within the 512-character limit.
400-400: LGTM! Parameter addition enables user log writing.Adding the
logs_directoryparameter enables the function to write user-facing failure logs, aligning with the PR's objectives.
411-411: LGTM! Improved error collection structure.Changing
error_messagefrom a string toList[str]enables cleaner collection of individual task errors and integrates well with the new_write_user_failure_logfunction.Also applies to: 428-428
451-466: LGTM! Consistent error handling for task failures.The error handling pattern mirrors the invalid input path handling (lines 286-301), writing detailed errors to a user log and providing clear guidance about the log location. The approach successfully avoids exceeding the 512-character limit for
status_msg.
549-549: LGTM! Correct parameter passing.The call site correctly passes
clp_config.logs_directoryto match the updated function signature.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (2)
411-429: Name reflects plurality.
error_messageis a list. Rename toerror_messagesfor clarity; update uses.- error_message: List[str] = [] + error_messages: List[str] = [] @@ - job_success = False - error_message.append(f"task {task_result.task_id}: {task_result.error_message}") + job_success = False + error_messages.append( + f"task {task_result.task_id}: {task_result.error_message}" + )
434-436: Log stack trace when result retrieval fails.Use
logger.exceptionto aid debugging.- except Exception as e: - logger.error(f"Error while getting results for job {job_id}: {e}") + except Exception: + logger.exception("Error while getting results for job %s", job_id) job_success = False
♻️ Duplicate comments (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
202-221: Harden logging and time handling in user-log writer.
- Catch
OSErrorinstead of blindException(BLE001).- Use
logger.exceptionto capture stack traces (TRY400).- Use UTC, timezone-aware timestamp (DTZ005).
user_logs_dir = logs_directory / relative_log_path.parent try: user_logs_dir.mkdir(parents=True, exist_ok=True) - except Exception as e: - logger.error("Failed to create user logs directory: '%s' - %s", user_logs_dir, e) + except OSError: + logger.exception("Failed to create user logs directory: '%s'", user_logs_dir) return None log_path = logs_directory / relative_log_path try: with log_path.open("w", encoding="utf-8") as f: - timestamp = datetime.datetime.now().isoformat(timespec="seconds") + timestamp = datetime.datetime.now(datetime.timezone.utc).isoformat( + timespec="seconds" + ) f.write(f"{title}\nGenerated at {timestamp}.\n\n") for item in content: f.write(f"{item.rstrip()}\n") - except Exception as e: - logger.error("Failed to write compression failure user log: '%s' - %s", log_path, e) + except OSError: + logger.exception("Failed to write compression failure user log: '%s'", log_path) return None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(7 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: quinntaylormitchell
PR: y-scope/clp#1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
PR: y-scope/clp#1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
components/job-orchestration/job_orchestration/scheduler/constants.py (1)
CompressionJobStatus(27-32)
🪛 Ruff (0.14.2)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
186-186: Dynamically typed expressions (typing.Any) are disallowed in job_id
(ANN401)
206-206: Do not catch blind exception: Exception
(BLE001)
207-207: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
213-213: datetime.datetime.now() called without a tz argument
(DTZ005)
217-217: Do not catch blind exception: Exception
(BLE001)
218-218: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
| def _write_user_failure_log( | ||
| content: List[str], | ||
| logs_directory: Path, | ||
| job_id: Any, | ||
| filename_suffix: str, | ||
| title: str, | ||
| ) -> Optional[Path]: | ||
| """ | ||
| Writes the error messages in `invalid_path_messages` to a log file, | ||
| `{logs_directory}/user/failed_paths_{job_id}.txt`. The directory will be created if it doesn't | ||
| already exist. | ||
| :param invalid_path_messages: | ||
| Writes a user-oriented failure log to | ||
| `{logs_directory}/user/job_{job_id}_{filename_suffix}.txt`. The `{logs_directory}/user` | ||
| directory will be created if it does not already exist. | ||
|
|
||
| :param content: | ||
| :param logs_directory: | ||
| :param job_id: | ||
| :return: Path to the written log file or `None` if error is encountered. | ||
| :param filename_suffix: | ||
| :param title: | ||
| :return: Path to the written log file relative to `logs_directory`, or `None` on error. | ||
| """ |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Narrow job_id type; align with lint and intent.
job_id: Any trips ANN401 and weakens the API. Use an integer.
def _write_user_failure_log(
- content: List[str],
- logs_directory: Path,
- job_id: Any,
+ content: List[str],
+ logs_directory: Path,
+ job_id: int,
filename_suffix: str,
title: str,
) -> Optional[Path]:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _write_user_failure_log( | |
| content: List[str], | |
| logs_directory: Path, | |
| job_id: Any, | |
| filename_suffix: str, | |
| title: str, | |
| ) -> Optional[Path]: | |
| """ | |
| Writes the error messages in `invalid_path_messages` to a log file, | |
| `{logs_directory}/user/failed_paths_{job_id}.txt`. The directory will be created if it doesn't | |
| already exist. | |
| :param invalid_path_messages: | |
| Writes a user-oriented failure log to | |
| `{logs_directory}/user/job_{job_id}_{filename_suffix}.txt`. The `{logs_directory}/user` | |
| directory will be created if it does not already exist. | |
| :param content: | |
| :param logs_directory: | |
| :param job_id: | |
| :return: Path to the written log file or `None` if error is encountered. | |
| :param filename_suffix: | |
| :param title: | |
| :return: Path to the written log file relative to `logs_directory`, or `None` on error. | |
| """ | |
| def _write_user_failure_log( | |
| content: List[str], | |
| logs_directory: Path, | |
| job_id: int, | |
| filename_suffix: str, | |
| title: str, | |
| ) -> Optional[Path]: | |
| """ | |
| Writes a user-oriented failure log to | |
| `{logs_directory}/user/job_{job_id}_{filename_suffix}.txt`. The `{logs_directory}/user` | |
| directory will be created if it does not already exist. | |
| :param content: | |
| :param logs_directory: | |
| :param job_id: | |
| :param filename_suffix: | |
| :param title: | |
| :return: Path to the written log file relative to `logs_directory`, or `None` on error. | |
| """ |
🧰 Tools
🪛 Ruff (0.14.2)
186-186: Dynamically typed expressions (typing.Any) are disallowed in job_id
(ANN401)
🤖 Prompt for AI Agents
In
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
around lines 183 to 201, change the job_id parameter's type from Any to int to
satisfy ANN401 and reflect intent; update the function signature to job_id: int,
adjust any internal code that formats or concatenates job_id (use str(job_id) or
f-strings) if necessary, and update the docstring/type hints accordingly so the
API and typing are consistent.
| user_log_relative_path = _write_user_failure_log( | ||
| invalid_path_messages, | ||
| clp_config.logs_directory, | ||
| job_id, | ||
| filename_suffix="failed_paths", | ||
| title="Failed input paths log.", | ||
| ) | ||
| if user_log_relative_path is None: | ||
| err_msg = "Failed to write user log for invalid input paths." | ||
| raise RuntimeError(err_msg) | ||
|
|
||
| error_msg = ( | ||
| "At least one of your input paths could not be processed." | ||
| f" See the error log at '{user_log_relative_path}' inside your configured logs" | ||
| " directory (`logs_directory`) for more details." | ||
| ) |
There was a problem hiding this comment.
Don’t raise on log-write failure; fail job gracefully and continue.
Raising here aborts the scheduler loop (main returns -1), stalling all jobs. Update job status with a short fallback message instead, commit, and continue.
- if user_log_relative_path is None:
- err_msg = "Failed to write user log for invalid input paths."
- raise RuntimeError(err_msg)
+ if user_log_relative_path is None:
+ fallback_msg = (
+ "At least one of your input paths could not be processed."
+ " Additionally, writing a user error log failed. See the scheduler logs"
+ " for details."
+ )
+ update_compression_job_metadata(
+ db_cursor,
+ job_id,
+ {
+ "status": CompressionJobStatus.FAILED,
+ "status_msg": fallback_msg,
+ },
+ )
+ db_conn.commit()
+ continueBased on learnings.
🤖 Prompt for AI Agents
In
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
around lines 286 to 301, the current code raises a RuntimeError when
_write_user_failure_log returns None which aborts the scheduler loop; instead,
replace the raise with logic that sets the job status to a graceful failure
using a short fallback message (e.g., "Could not write user failure log; see
scheduler logs for details"), persist/commit that status update to the
datastore, and then continue execution without throwing so the scheduler loop is
not interrupted.
| error_log_relative_path = _write_user_failure_log( | ||
| error_message, | ||
| logs_directory, | ||
| job_id, | ||
| filename_suffix="task_errors", | ||
| title="Compression task errors.", | ||
| ) | ||
| if error_log_relative_path is None: | ||
| err_msg = "Failed to write user log for failed compression job." | ||
| raise RuntimeError(err_msg) | ||
|
|
||
| error_msg = ( | ||
| "One or more compression tasks failed." | ||
| f" See the error log at '{error_log_relative_path}' inside your configured logs" | ||
| " directory (`logs_directory`) for more details." | ||
| ) | ||
|
|
There was a problem hiding this comment.
Avoid crashing the scheduler on task-error log-write failure.
Same concern as the invalid-path branch: don’t raise. Mark job FAILED with a compact message, commit, add to jobs_to_delete, and continue.
- error_log_relative_path = _write_user_failure_log(
- error_message,
+ error_log_relative_path = _write_user_failure_log(
+ error_messages,
logs_directory,
job_id,
filename_suffix="task_errors",
title="Compression task errors.",
)
- if error_log_relative_path is None:
- err_msg = "Failed to write user log for failed compression job."
- raise RuntimeError(err_msg)
+ if error_log_relative_path is None:
+ fallback_msg = (
+ "One or more compression tasks failed."
+ " Additionally, writing the user error log failed. See worker logs for details."
+ )
+ update_compression_job_metadata(
+ db_cursor,
+ job_id,
+ dict(
+ status=CompressionJobStatus.FAILED,
+ status_msg=fallback_msg,
+ ),
+ )
+ db_conn.commit()
+ jobs_to_delete.append(job_id)
+ continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| error_log_relative_path = _write_user_failure_log( | |
| error_message, | |
| logs_directory, | |
| job_id, | |
| filename_suffix="task_errors", | |
| title="Compression task errors.", | |
| ) | |
| if error_log_relative_path is None: | |
| err_msg = "Failed to write user log for failed compression job." | |
| raise RuntimeError(err_msg) | |
| error_msg = ( | |
| "One or more compression tasks failed." | |
| f" See the error log at '{error_log_relative_path}' inside your configured logs" | |
| " directory (`logs_directory`) for more details." | |
| ) | |
| error_log_relative_path = _write_user_failure_log( | |
| error_messages, | |
| logs_directory, | |
| job_id, | |
| filename_suffix="task_errors", | |
| title="Compression task errors.", | |
| ) | |
| if error_log_relative_path is None: | |
| fallback_msg = ( | |
| "One or more compression tasks failed." | |
| " Additionally, writing the user error log failed. See worker logs for details." | |
| ) | |
| update_compression_job_metadata( | |
| db_cursor, | |
| job_id, | |
| dict( | |
| status=CompressionJobStatus.FAILED, | |
| status_msg=fallback_msg, | |
| ), | |
| ) | |
| db_conn.commit() | |
| jobs_to_delete.append(job_id) | |
| continue | |
| error_msg = ( | |
| "One or more compression tasks failed." | |
| f" See the error log at '{error_log_relative_path}' inside your configured logs" | |
| " directory (`logs_directory`) for more details." | |
| ) |
🤖 Prompt for AI Agents
In
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
around lines 451 to 467, the code raises a RuntimeError when writing the
user-facing task-error log fails; instead follow the same non-crashing behavior
used for the invalid-path branch: do not raise, set the job state to FAILED with
a compact message noting the inability to write the user error log, commit the
job state change, append the job_id to jobs_to_delete, and continue execution;
ensure the compact message is concise, persist the update via the same commit
method used elsewhere, and keep control flow consistent with the surrounding
error-handling pattern.
kirkrodrigues
left a comment
There was a problem hiding this comment.
For the PR title, how about:
fix(clp-package): Write compression task failure errors to a log, and store the log path (instead of the errors) in `status_msg` (fixes #716).
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
435-437: Log exceptions with stack traces.Use
logger.exceptionto retain context.- except Exception as e: - logger.error(f"Error while getting results for job {job_id}: {e}") + except Exception: + logger.exception("Error while getting results for job %s", job_id) job_success = False
♻️ Duplicate comments (4)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (4)
183-189: Narrowjob_idtyping; avoidAny.
job_idis a DB identifier; type it asintto satisfy ANN401 and tighten the API.def _write_user_failure_log( title: str, content: List[str], logs_directory: Path, - job_id: Any, + job_id: int, filename_suffix: str, ) -> Optional[Path]:
206-219: Uselogger.exception, catch narrower errors, and make timestamp TZ‑aware.
- Catch
OSErrorfor filesystem ops instead of blindException(BLE001).- Use
logger.exception(...)to include stack traces (TRY400).- Write UTC timestamps to avoid naive datetimes (DTZ005).
- except Exception as e: - logger.error("Failed to create user logs directory: '%s' - %s", user_logs_dir, e) + except OSError: + logger.exception("Failed to create user logs directory: '%s'", user_logs_dir) return None log_path = logs_directory / relative_log_path try: with log_path.open("w", encoding="utf-8") as f: - timestamp = datetime.datetime.now().isoformat(timespec="seconds") + timestamp = datetime.datetime.now(datetime.timezone.utc).isoformat(timespec="seconds") f.write(f"{title}\nGenerated at {timestamp}.\n\n") for item in content: f.write(f"{item.rstrip()}\n") - except Exception as e: - logger.error("Failed to write compression failure user log: '%s' - %s", log_path, e) + except OSError: + logger.exception("Failed to write compression failure user log: '%s'", log_path) return None
286-301: Don’t raise when writing the user log fails; fail the job gracefully.Throwing here aborts the scheduler loop. Persist a compact fallback message and continue.
- if user_log_relative_path is None: - err_msg = "Failed to write user log for invalid input paths." - raise RuntimeError(err_msg) + if user_log_relative_path is None: + fallback_msg = ( + "At least one of your input paths could not be processed." + " Additionally, writing a user error log failed. See the scheduler logs" + " for details." + ) + update_compression_job_metadata( + db_cursor, + job_id, + { + "status": CompressionJobStatus.FAILED, + "status_msg": fallback_msg, + }, + ) + db_conn.commit() + continue
451-460: Avoid crashing on task‑error log‑write failure; mark FAILED and continue.Mirrors the invalid‑path branch; keeps the scheduler alive.
- if error_log_relative_path is None: - err_msg = "Failed to write user log for failed compression job." - raise RuntimeError(err_msg) + if error_log_relative_path is None: + fallback_msg = ( + "One or more compression tasks failed." + " Additionally, writing the user error log failed. See worker logs for details." + ) + update_compression_job_metadata( + db_cursor, + job_id, + dict( + status=CompressionJobStatus.FAILED, + status_msg=fallback_msg, + ), + ) + db_conn.commit() + jobs_to_delete.append(job_id) + continue
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(7 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: quinntaylormitchell
PR: y-scope/clp#1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
PR: y-scope/clp#1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
components/job-orchestration/job_orchestration/scheduler/constants.py (1)
CompressionJobStatus(27-32)
🪛 Ruff (0.14.2)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
187-187: Dynamically typed expressions (typing.Any) are disallowed in job_id
(ANN401)
206-206: Do not catch blind exception: Exception
(BLE001)
207-207: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
213-213: datetime.datetime.now() called without a tz argument
(DTZ005)
217-217: Do not catch blind exception: Exception
(BLE001)
218-218: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🔇 Additional comments (1)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
549-549: LGTM: correct propagation oflogs_directoryintopoll_running_jobs.Signature and call site are aligned; enables relative‑path log reporting.
/var/log/user and store log path in status_msg (fixes #716).status_msg (fixes #716).
… store the log path (instead of the errors) in `status_msg` (fixes y-scope#716). (y-scope#1425)
Description
This PR changes the way errors are reported in the case that a compression job fails due to one or more of its compression tasks failing. In this case, the
status_msgwhich holds information about the failed task(s) is written to a user log file at/clp-package/var/log/user, and the path to this log file is reported to the user throughstatus_msg.If the full path to this user log were to be stored in
status_msg, then the length ofstatus_msgwould possibly exceed that field's 512-char limit. We instead store the path relative to theclp_config.logs_directorypath. The absolute path is then constructed when it is reported to the user via the command line.Note: this PR applies the same "path-length guard" fix to the already-existing "invalid input path" compression job failure sequence, as there was no guard against exceeding 512 characters in that sequence.
Checklist
breaking change.
Validation performed
Revoked read permissions on all files in
mongodb, a 65GB dataset and then attempted to compress the dataset. The compression job launched 260 tasks, all of which failed because of the lack of read permissions.compression_schedulerdoes not crash, the user log is written accurately, and the path to this user log is reported accurately via the command line.In addition, tested and confirmed that the user log path is reported accurately in the case of a compression job failing due to an invalid input path.
Summary by CodeRabbit
New Features
Bug Fixes