Skip to content

fix(job-orchestration): Set query job duration only when start_time is not None (fixes #1806).#1809

Merged
sitaowang1998 merged 11 commits into
y-scope:mainfrom
sitaowang1998:cancel-none-fix
Jan 5, 2026
Merged

fix(job-orchestration): Set query job duration only when start_time is not None (fixes #1806).#1809
sitaowang1998 merged 11 commits into
y-scope:mainfrom
sitaowang1998:cancel-none-fix

Conversation

@sitaowang1998

@sitaowang1998 sitaowang1998 commented Dec 19, 2025

Copy link
Copy Markdown
Contributor

Description

This PR fixes #1806 by only setting the query job duration in database if local job's start_time is not None during job cancellation handling.

This PR fixes another bug by checking database cursor rowcount right after execute instead of after commit. After commit, rowcount is always 0.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Follow the reproduce step in #1806, now

  • Search and cancel does not hang forever, and the job exits with correct state 5.
Container clp-package-clp-runtime-run-0b7c3a441ed2 Creating
Container clp-package-clp-runtime-run-0b7c3a441ed2 Created
2025-12-19T01:52:25.418 INFO [search] Job submitted.
2025-12-19T01:52:25.527 INFO [search] Job cancelled.
2025-12-19T01:52:26.053 ERROR [search] job 6 finished with unexpected status: 5
  • Query scheduler does not crash, and the log looks correct.
2025-12-19 01:52:14,513 search-job-handler [INFO] Connected to archive database database:3306.
2025-12-19 01:52:14,513 search-job-handler [INFO] query_scheduler started.
2025-12-19 01:52:25,635 search-job-handler [INFO] Cancelled job 6.

Summary by CodeRabbit

  • Bug Fixes
    • Improved job cancellation handling to skip invalid duration calculations when a job’s start time is unavailable, preventing errors during cancellation.
    • Enhanced reliability of scheduler status updates so cancellation outcomes are tracked and applied more consistently across jobs.

✏️ Tip: You can customize this high-level summary in your review settings.

@sitaowang1998 sitaowang1998 requested a review from a team as a code owner December 19, 2025 01:53
@coderabbitai

coderabbitai Bot commented Dec 19, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Fixed a crash in handle_cancelling_search_jobs by avoiding duration calculation when a job's start_time is None. set_job_or_task_status call site now conditionally supplies duration only if available; public signature remains unchanged.

Changes

Cohort / File(s) Summary
Query scheduler: cancel handling
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
handle_cancelling_search_jobs now builds set_job_or_task_status kwargs and adds duration only when job.start_time is not None, preventing a TypeError for cancelled jobs without start times. Public set_job_or_task_status signature unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: fixing job duration setting by checking if start_time exists, directly addressing the root cause in issue #1806.
Linked Issues check ✅ Passed The PR changes directly implement the objective from #1806: avoiding duration calculation when start_time is None by conditionally adding duration only if start_time is not None.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the handle_cancelling_search_jobs function to prevent TypeError when start_time is None, directly addressing the linked issue with no unrelated modifications.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f393e8 and e1cf69e.

📒 Files selected for processing (1)
  • components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-09-19T18:28:26.747Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 1169
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:463-469
Timestamp: 2025-09-19T18:28:26.747Z
Learning: In the compression scheduler (components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py), the SIGTERM handling uses a coarse-grained approach that only checks received_sigterm before calling search_and_schedule_new_tasks. The maintainers consider the race condition where SIGTERM arrives mid-execution to be benign, as scheduling a few extra tasks during shutdown is acceptable and will be handled by cleanup mechanisms.

Applied to files:

  • components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
📚 Learning: 2025-06-24T20:13:46.758Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 939
File: components/package-template/src/etc/clp-config.yml:64-64
Timestamp: 2025-06-24T20:13:46.758Z
Learning: When users ask CodeRabbit to create an issue after providing suggestions, they want a GitHub issue created with the high-level requirements and context, not specific code implementations.

Applied to files:

  • components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
📚 Learning: 2024-11-15T20:07:22.256Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 569
File: components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py:380-392
Timestamp: 2024-11-15T20:07:22.256Z
Learning: The current implementation assumes single-threaded execution, so race conditions in functions like `is_target_extracted` in `components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py` are not currently a concern.

Applied to files:

  • components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
📚 Learning: 2025-07-29T21:00:07.757Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1136
File: components/webui/client/src/pages/IngestPage/Details/sql.ts:1-1
Timestamp: 2025-07-29T21:00:07.757Z
Learning: User junhaoliao requested creating a GitHub issue to track server-side SQL query error handling improvements, specifically to prevent uncaught query errors from causing 500 errors to reach the client.

Applied to files:

  • components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Applied to files:

  • components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
📚 Learning: 2025-07-29T14:04:13.769Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-07-29T14:04:13.769Z
Learning: User haiqi96 requested creating a GitHub issue to document a bug fix from PR #1136, which addressed MySQL compatibility issues with invalid SQL CAST operations in the WebUI component.

Applied to files:

  • components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
📚 Learning: 2025-11-17T22:58:50.056Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.

Applied to files:

  • components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (1)
components/job-orchestration/job_orchestration/scheduler/constants.py (1)
  • QueryJobStatus (50-61)
🪛 Ruff (0.14.10)
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py

369-369: datetime.datetime.now() called without a tz argument

(DTZ005)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: check-generated
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (1)
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (1)

366-378: LGTM! Fix correctly prevents crash when cancelling jobs before start_time is set.

The conditional duration calculation properly addresses issue #1806 by avoiding the TypeError that occurred when job.start_time was None. The implementation ensures that the job duration is only computed and passed to the database when the job has actually started.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes in handle_cancelling_search_jobs look correct and optimal to me. i left a comment about splitting the fix of logging messages - correct me if i understood the fix wrong

for the title, i believe a better scope can be job-orchestration? i.e., fix(job-orchestration)

Comment thread components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py Outdated
Comment thread components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py Outdated
@sitaowang1998 sitaowang1998 changed the title fix(query): Set query job duration only when start_time is not None (fixes #1806). fix(job-orchestration): Set query job duration only when start_time is not None (fixes #1806). Jan 5, 2026
@sitaowang1998 sitaowang1998 merged commit 6f5bf1e into y-scope:main Jan 5, 2026
21 checks passed
@sitaowang1998 sitaowang1998 deleted the cancel-none-fix branch January 5, 2026 17:27
davidlion pushed a commit to davidlion/clp that referenced this pull request Jan 17, 2026
… is not None (fixes y-scope#1806). (y-scope#1809)

Co-authored-by: Junhao Liao <junhao.liao@yscope.com>
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
… is not None (fixes y-scope#1806). (y-scope#1809)

Co-authored-by: Junhao Liao <junhao.liao@yscope.com>
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
… is not None (fixes y-scope#1806). (y-scope#1809)

Co-authored-by: Junhao Liao <junhao.liao@yscope.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Query job cancellation crashes query scheduler

2 participants