fix(clp-package): Use job duration for final compression speed summary.#823
Conversation
WalkthroughThis update refines how compression speed is computed in the Changes
Sequence Diagram(s)sequenceDiagram
participant HU as handle_job_update
participant PC as print_compression_job_status
participant DB as Database
HU->>DB: Execute SQL query (selects duration, start_time, job status)
DB-->>HU: Return job_row with duration, start_time, status
HU->>PC: Call print_compression_job_status(job_row)
alt Job status is SUCCEEDED
PC->>PC: Compute speed using job_row.duration
else Job status is not SUCCEEDED
PC->>PC: Compute speed as (current time - job_row.start_time)
end
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
duration for final compression speed summary.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
components/clp-package-utils/clp_package_utils/scripts/native/compress.py (3)
43-49: Improved compression speed calculation provides more accurate reportingThe code now correctly uses the job's
durationfield for completed jobs, while calculating on-the-fly duration for in-progress jobs. This aligns with the PR objective of enhancing accuracy in the compression speed reporting.However, there's no explicit handling for the case where the
durationfield might be NULL or invalid for a successful job.Consider adding error handling:
if CompressionJobStatus.SUCCEEDED == job_row["status"]: + if job_row["duration"] is None or job_row["duration"] <= 0: + # Handle invalid duration (fallback to calculation) + compression_speed = job_uncompressed_size / (datetime.datetime.now() - job_row["start_time"]).total_seconds() + else: compression_speed = job_uncompressed_size / job_row["duration"]
64-65: SQL query updated to include necessary duration fieldGood change to include the
durationfield in the query, which is required for the new compression speed calculation logic in theprint_compression_job_statusfunction.Note that the SQL query is constructed using string concatenation, which could potentially be vulnerable to SQL injection.
Consider using parameterized queries for better security:
polling_query = ( f"SELECT start_time, status, status_msg, uncompressed_size, compressed_size, duration " - f"FROM {COMPRESSION_JOBS_TABLE_NAME} WHERE id={job_id}" + f"FROM {COMPRESSION_JOBS_TABLE_NAME} WHERE id=%s" ) -db_cursor.execute(polling_query) +db_cursor.execute(polling_query, (job_id,))
39-54: Add documentation to clarify function behaviorWhile the changes are good, it would be helpful to add a docstring to explain the function's purpose, parameters, and behavior, especially regarding how it now handles different job statuses.
def print_compression_job_status(job_row): + """ + Print the status of a compression job, including the compression ratio and speed. + + Args: + job_row (dict): A dictionary containing job information, including status, + uncompressed_size, compressed_size, start_time, and duration. + + Note: + For successful jobs, compression speed is calculated using the job's duration. + For in-progress jobs, it's calculated using the time elapsed since job start. + """ job_uncompressed_size = job_row["uncompressed_size"] job_compressed_size = job_row["compressed_size"] compression_ratio = float(job_uncompressed_size) / job_compressed_size
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/clp-package-utils/clp_package_utils/scripts/native/compress.py(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
components/clp-package-utils/clp_package_utils/scripts/native/compress.py (1)
components/job-orchestration/job_orchestration/scheduler/constants.py (1)
CompressionJobStatus(14-18)
🔇 Additional comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/compress.py (2)
39-39: Function signature update removes external time dependencyGood change to remove the
current_timeparameter from the function signature. This makes the function more self-contained and easier to reason about as it will now calculate time internally.
86-86: Function calls updated to match new signatureAll calls to
print_compression_job_statushave been correctly updated to match the new function signature without thecurrent_timeparameter.Also applies to: 97-97
|
Could you add more details about what tests you ran? Just want to make sure you covered the base cases. |
Description
For accurate reporting of the final compression speed after a job succeeds, the
durationfield of the job should be used rather than relying on the progress reporter's polling interval.Checklist
breaking change.
Validation performed
Report with the PR change:
Report without the PR change and with a polling interval of
0.5sSpeed becomes inaccurate (slower) because we accounted for the polling interval when calculating the speed.
Report without the PR change and with the polling interval changed to
10s(to magnify the inaccuracy we get by depending on the polling interval to report compression speed status)Summary by CodeRabbit