Skip to content

fix(clp-package): Use job duration for final compression speed summary.#823

Merged
Bill-hbrhbr merged 2 commits into
y-scope:mainfrom
Bill-hbrhbr:compression-worker-fix-status-summary
Apr 15, 2025
Merged

fix(clp-package): Use job duration for final compression speed summary.#823
Bill-hbrhbr merged 2 commits into
y-scope:mainfrom
Bill-hbrhbr:compression-worker-fix-status-summary

Conversation

@Bill-hbrhbr

@Bill-hbrhbr Bill-hbrhbr commented Apr 13, 2025

Copy link
Copy Markdown
Contributor

Description

For accurate reporting of the final compression speed after a job succeeds, the duration field of the job should be used rather than relying on the progress reporter's polling interval.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Tested with CLP package.

Report with the PR change:

bingranhu@baker22:/home/bingranhu/clp/build/clp-package$ ./sbin/compress.sh --config etc/clp-config.yml fs ~/dataset/data/
2025-04-15T15:14:57.294 INFO [compress] Compression job 1 submitted.
2025-04-15T15:14:57.797 INFO [compress] Compressed 1.93KB into 827.00B (2.39x). Speed: 4.32KB/s.
2025-04-15T15:14:58.299 INFO [compress] Compression finished.
2025-04-15T15:14:58.299 INFO [compress] Compressed 1.93KB into 827.00B (2.39x). Speed: 2.80KB/s.
bingranhu@baker22:/home/bingranhu/clp/build/clp-package$ ./sbin/compress.sh --config etc/clp-config.yml fs ~/dataset/data/
2025-04-15T15:15:06.585 INFO [compress] Compression job 2 submitted.
2025-04-15T15:15:07.088 INFO [compress] Compressed 1.93KB into 827.00B (2.39x). Speed: 4.55KB/s.
2025-04-15T15:15:07.590 INFO [compress] Compression finished.
2025-04-15T15:15:07.590 INFO [compress] Compressed 1.93KB into 827.00B (2.39x). Speed: 3.63KB/s.

Report without the PR change and with a polling interval of 0.5s

bingranhu@baker22:/home/bingranhu/clp/build/clp-package$ ./sbin/compress.sh --config etc/clp-config.yml fs ~/dataset/data/
2025-04-15T15:19:55.590 INFO [compress] Compression job 3 submitted.
2025-04-15T15:19:56.092 INFO [compress] Compressed 1.93KB into 827.00B (2.39x). Speed: 4.21KB/s.
2025-04-15T15:19:56.594 INFO [compress] Compression finished.
2025-04-15T15:19:56.594 INFO [compress] Compressed 1.93KB into 827.00B (2.39x). Speed: 2.01KB/s.
bingranhu@baker22:/home/bingranhu/clp/build/clp-package$ ./sbin/compress.sh --config etc/clp-config.yml fs ~/dataset/data/
2025-04-15T15:20:01.539 INFO [compress] Compression job 4 submitted.
2025-04-15T15:20:02.041 INFO [compress] Compressed 1.93KB into 827.00B (2.39x). Speed: 4.09KB/s.
2025-04-15T15:20:02.543 INFO [compress] Compression finished.
2025-04-15T15:20:02.543 INFO [compress] Compressed 1.93KB into 827.00B (2.39x). Speed: 1.98KB/s.

Speed becomes inaccurate (slower) because we accounted for the polling interval when calculating the speed.

Report without the PR change and with the polling interval changed to 10s (to magnify the inaccuracy we get by depending on the polling interval to report compression speed status)

bingranhu@baker22:/home/bingranhu/clp/build/clp-package$ ./sbin/compress.sh --config etc/clp-config.yml fs ~/dataset/data/
2025-04-15T15:21:15.329 INFO [compress] Compression job 5 submitted.
2025-04-15T15:21:25.341 INFO [compress] Compression finished.
2025-04-15T15:21:25.341 INFO [compress] Compressed 1.93KB into 827.00B (2.39x). Speed: 198.21B/s.
bingranhu@baker22:/home/bingranhu/clp/build/clp-package$ ./sbin/compress.sh --config etc/clp-config.yml fs ~/dataset/data/
2025-04-15T15:22:17.688 INFO [compress] Compression job 6 submitted.
2025-04-15T15:22:27.700 INFO [compress] Compression finished.
2025-04-15T15:22:27.701 INFO [compress] Compressed 1.93KB into 827.00B (2.39x). Speed: 198.61B/s.

Summary by CodeRabbit

  • Refactor
    • Updated the logic for calculating compression job performance. Speed metrics now adjust automatically: successful jobs display a recorded duration value, while others reflect elapsed time since the start, resulting in more accurate feedback for users.

@coderabbitai

coderabbitai Bot commented Apr 13, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

This update refines how compression speed is computed in the print_compression_job_status function. The function now calculates speed using the job's status: if the status is SUCCEEDED, it uses the duration from the job record; otherwise, it computes the difference between the current time and the job's start_time. Additionally, the SQL query within handle_job_update is amended to select the duration field, and all calls to print_compression_job_status have been revised to reflect the updated function signature.

Changes

File(s) Change Summary
components/clp-package-utils/clp_package_utils/scripts/native/compress.py Removed the current_time parameter from print_compression_job_status; updated logic to compute speed based on job status (using duration for SUCCEEDED jobs and current time minus start_time otherwise); modified the SQL query in handle_job_update to include the duration field and adjusted all function calls accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant HU as handle_job_update
    participant PC as print_compression_job_status
    participant DB as Database

    HU->>DB: Execute SQL query (selects duration, start_time, job status)
    DB-->>HU: Return job_row with duration, start_time, status
    HU->>PC: Call print_compression_job_status(job_row)
    alt Job status is SUCCEEDED
        PC->>PC: Compute speed using job_row.duration
    else Job status is not SUCCEEDED
        PC->>PC: Compute speed as (current time - job_row.start_time)
    end
Loading

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Bill-hbrhbr Bill-hbrhbr changed the title Compression worker fix summary fix(clp-package): Use job duration for final compression speed summary. Apr 14, 2025
@Bill-hbrhbr Bill-hbrhbr marked this pull request as ready for review April 14, 2025 00:14
@Bill-hbrhbr Bill-hbrhbr requested a review from a team as a code owner April 14, 2025 00:14

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
components/clp-package-utils/clp_package_utils/scripts/native/compress.py (3)

43-49: Improved compression speed calculation provides more accurate reporting

The code now correctly uses the job's duration field for completed jobs, while calculating on-the-fly duration for in-progress jobs. This aligns with the PR objective of enhancing accuracy in the compression speed reporting.

However, there's no explicit handling for the case where the duration field might be NULL or invalid for a successful job.

Consider adding error handling:

if CompressionJobStatus.SUCCEEDED == job_row["status"]:
+    if job_row["duration"] is None or job_row["duration"] <= 0:
+        # Handle invalid duration (fallback to calculation)
+        compression_speed = job_uncompressed_size / (datetime.datetime.now() - job_row["start_time"]).total_seconds()
+    else:
        compression_speed = job_uncompressed_size / job_row["duration"]

64-65: SQL query updated to include necessary duration field

Good change to include the duration field in the query, which is required for the new compression speed calculation logic in the print_compression_job_status function.

Note that the SQL query is constructed using string concatenation, which could potentially be vulnerable to SQL injection.

Consider using parameterized queries for better security:

polling_query = (
    f"SELECT start_time, status, status_msg, uncompressed_size, compressed_size, duration "
-    f"FROM {COMPRESSION_JOBS_TABLE_NAME} WHERE id={job_id}"
+    f"FROM {COMPRESSION_JOBS_TABLE_NAME} WHERE id=%s"
)
-db_cursor.execute(polling_query)
+db_cursor.execute(polling_query, (job_id,))

39-54: Add documentation to clarify function behavior

While the changes are good, it would be helpful to add a docstring to explain the function's purpose, parameters, and behavior, especially regarding how it now handles different job statuses.

def print_compression_job_status(job_row):
+    """
+    Print the status of a compression job, including the compression ratio and speed.
+    
+    Args:
+        job_row (dict): A dictionary containing job information, including status, 
+                        uncompressed_size, compressed_size, start_time, and duration.
+    
+    Note:
+        For successful jobs, compression speed is calculated using the job's duration.
+        For in-progress jobs, it's calculated using the time elapsed since job start.
+    """
    job_uncompressed_size = job_row["uncompressed_size"]
    job_compressed_size = job_row["compressed_size"]
    compression_ratio = float(job_uncompressed_size) / job_compressed_size
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e7f63c and 70321df.

📒 Files selected for processing (1)
  • components/clp-package-utils/clp_package_utils/scripts/native/compress.py (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
components/clp-package-utils/clp_package_utils/scripts/native/compress.py (1)
components/job-orchestration/job_orchestration/scheduler/constants.py (1)
  • CompressionJobStatus (14-18)
🔇 Additional comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/compress.py (2)

39-39: Function signature update removes external time dependency

Good change to remove the current_time parameter from the function signature. This makes the function more self-contained and easier to reason about as it will now calculate time internally.


86-86: Function calls updated to match new signature

All calls to print_compression_job_status have been correctly updated to match the new function signature without the current_time parameter.

Also applies to: 97-97

@kirkrodrigues

Copy link
Copy Markdown
Member

Could you add more details about what tests you ran? Just want to make sure you covered the base cases.

@Bill-hbrhbr Bill-hbrhbr merged commit ffbd533 into y-scope:main Apr 15, 2025
@Bill-hbrhbr Bill-hbrhbr deleted the compression-worker-fix-status-summary branch April 15, 2025 18:34
anlowee pushed a commit to anlowee/clp that referenced this pull request Apr 25, 2025
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants