Skip to content

fix(job-orchestration): Make tag_ids a required list[int] for compatibility with Spider compressor.#1453

Merged
sitaowang1998 merged 7 commits into
y-scope:mainfrom
sitaowang1998:tag-id-none
Oct 21, 2025
Merged

fix(job-orchestration): Make tag_ids a required list[int] for compatibility with Spider compressor.#1453
sitaowang1998 merged 7 commits into
y-scope:mainfrom
sitaowang1998:tag-id-none

Conversation

@sitaowang1998

@sitaowang1998 sitaowang1998 commented Oct 20, 2025

Copy link
Copy Markdown
Contributor

Description

Compression orchestration fails, as identified by #1446, because tag_ids could be None, which is not supported by Spider.

This PR fixes #1446 by initializing tag_ids to empty list instead of None, so that tag_ids will never be None, and is safe to be directly passed into Spider.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Compression works end-to-end.
  • GitHub workflows pass.

Summary by CodeRabbit

  • Bug Fixes

    • Avoided unnecessary tag updates when no tags are provided, improving processing efficiency.
    • Ensured consistent tag initialization in the compression scheduler to make task scheduling more reliable.
  • Chores

    • Added and standardized internal type annotations to improve code clarity and maintainability (no user-facing changes).

@sitaowang1998 sitaowang1998 requested a review from a team as a code owner October 20, 2025 19:43
@coderabbitai

coderabbitai Bot commented Oct 20, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Initialized tag_ids to an empty list in the compression scheduler and changed the executor to only call tag updates when tag_ids is non-empty; replaced List[int] type hints with PEP 585 list[int] annotations in executor and Celery entry points. No other control-flow changes.

Changes

Cohort / File(s) Summary
Scheduler: tag_ids initialization
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
Changed local tag_ids initial value from None to [] in search_and_schedule_new_tasks to avoid iterating over None.
Executor: conditional tag update & type annotations
components/job-orchestration/job_orchestration/executor/compress/compression_task.py
Replaced List[int] with list[int] type annotations for tag_ids; changed check from if tag_ids is not None to if len(tag_ids) > 0 before calling update_tags. Added tag_ids: list[int] annotations on run_clp and compression_entry_point.
Celery task: type annotation
components/job-orchestration/job_orchestration/executor/compress/celery_compress.py
Annotated tag_ids parameter as list[int] in the compress task signature (no runtime behavior changes).

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler
    participant DB
    participant WorkerExecutor
    rect #DFF0D8
        note right of Scheduler: scheduling phase
        Scheduler->>DB: read/create tag rows (if any)
        DB-->>Scheduler: tag_ids (may be [])
        Scheduler->>WorkerExecutor: enqueue compress task with tag_ids (list[int])
    end
    rect #FFF4D6
        note right of WorkerExecutor: execution phase
        WorkerExecutor->>WorkerExecutor: run_clp / compression_entry_point(tag_ids)
        alt tag_ids non-empty
            WorkerExecutor->>WorkerExecutor: update_job_metadata_and_tags -> update_tags(tag_ids)
        else tag_ids empty
            WorkerExecutor->>WorkerExecutor: skip update_tags (no-op)
        end
        WorkerExecutor->>DB: persist compression metadata
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues Check ✅ Passed The code changes directly satisfy the objective stated in linked issue #1446. The primary modification—initializing tag_ids to an empty list ([]) instead of None in compression_scheduler.py—ensures that tag_ids will never be None and can be safely iterated by the scheduler and Spider. Supporting changes include updating the conditional logic from if tag_ids is not None to if len(tag_ids) > 0 in compression_task.py and adding explicit type annotations (tag_ids: list[int]) across function signatures, all of which reinforce that tag_ids is always a list and prevent the iteration failure described in the bug report.
Out of Scope Changes Check ✅ Passed All changes in the pull request are directly related to the stated objective of ensuring tag_ids is never None. The core fix—initializing tag_ids to an empty list in the scheduler—addresses the bug directly. The addition of explicit type annotations (tag_ids: list[int]) to previously unannotated parameters is part of the fix, making it clear that tag_ids should always be a list. The modernization of type hints from List[int] to list[int] in already-typed functions is a minor stylistic improvement that is closely aligned with the overall fix and does not introduce unrelated changes.
Title Check ✅ Passed The pull request title accurately reflects the main changes in the changeset. The primary modifications involve ensuring tag_ids is always a list[int] type by initializing it to an empty list instead of None, updating type hints from List[int] to list[int] across multiple function signatures, and adding type annotations where previously missing. The title clearly summarizes these changes by stating the objective to "Make tag_ids a required list[int]" and connects this to the concrete purpose of "compatibility with Spider compressor," which aligns directly with the PR's objective to fix the compression failure issue. A teammate scanning the repository history would immediately understand that this PR addresses type consistency and prevents None values for tag_ids.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)

315-315: Add type hint for tag_ids parameter.

The tag_ids parameter lacks a type hint, which is inconsistent with update_job_metadata_and_tags (line 90) where it's typed as List[int].

Apply this diff to add the type hint:

     tag_ids,
+    tag_ids: List[int],

Note: This also applies to compression_entry_point at line 525.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11d73fb and 812bb25.

📒 Files selected for processing (2)
  • components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1 hunks)
  • components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-22.04)
🔇 Additional comments (2)
components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)

93-102: The fix correctly addresses the issue.

The conditional guard ensures that update_tags is only called when tag_ids is non-empty, while increment_compression_job_metadata is always executed. This aligns with the scheduler initializing tag_ids to an empty list and prevents the failure when iterating over None.

components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)

345-359: Excellent fix for the root cause.

Initializing tag_ids to an empty list instead of None ensures that the variable is always iterable. The subsequent logic correctly populates tag_ids from the database when tags are present, or leaves it as an empty list otherwise. This prevents the failure described in issue #1446.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 812bb25 and 62b2c2c.

📒 Files selected for processing (2)
  • components/job-orchestration/job_orchestration/executor/compress/celery_compress.py (2 hunks)
  • components/job-orchestration/job_orchestration/executor/compress/compression_task.py (3 hunks)
🧰 Additional context used
🪛 Ruff (0.14.1)
components/job-orchestration/job_orchestration/executor/compress/celery_compress.py

1-1: typing.List is deprecated, use list instead

(UP035)

🔇 Additional comments (3)
components/job-orchestration/job_orchestration/executor/compress/compression_task.py (3)

93-94: Conditional check appropriately addresses past feedback.

The guard condition if len(tag_ids) > 0: prevents unnecessary database calls when there are no tags to update. This implementation addresses the previous review feedback and is clear and explicit.


315-315: Type annotation correctly addresses reviewer feedback.

The List[int] type annotation makes the contract explicit and addresses LinZhihao-723's specific request to update type annotations throughout the call chain.


525-525: Type annotation correctly added.

The List[int] type annotation completes the chain of type annotations from the Celery task through to the core compression logic, ensuring consistency throughout.

@junhaoliao junhaoliao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the pr title, how about:

fix(job-orchestration): Make `tag_ids` a required `list[int]` for compatibility with spider compressor.

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junhao's title should be ok, but make sure to use Spider not spider.

@sitaowang1998 sitaowang1998 changed the title fix(job-orchestration): Use empty list instead of None for tag_ids. fix(job-orchestration): Make tag_ids a required list[int] for compatibility with Spider compressor. Oct 21, 2025
@sitaowang1998 sitaowang1998 merged commit 332fde3 into y-scope:main Oct 21, 2025
22 checks passed
@sitaowang1998 sitaowang1998 deleted the tag-id-none branch October 21, 2025 20:19
LinZhihao-723 added a commit to LinZhihao-723/clp that referenced this pull request Oct 22, 2025
* feat(webui): Add drawer to display guided query and errors. (y-scope#1421)

* docs: Add Slack community invite badge to home page README. (y-scope#1418)

* refactor(clp-package): Simplify StrEnum and Path serialization via Annotated serializers. (y-scope#1417)

Co-authored-by: Junhao Liao <junhao@junhao.ca>
Co-authored-by: Junhao Liao <junhao.liao@yscope.com>

* build(clp-package): Adopt uv + hatchling as the build and packaging backend for Python components (resolves y-scope#1396); Upgrade dependencies for Python components. (y-scope#1405)

* chore(docker): Add `--link` flags to COPY/ADD operations for improved build performance (fixes y-scope#1408). (y-scope#1411)

* fix(ci): Correctly update and restore cache of `lint:check-cpp-lint-static-full`'s generated files (fixes y-scope#1419): (y-scope#1430)

- Save cache entries using unique key per entry.
- Restore latest entries using key prefix.
- Avoid using outputs from optionally-run `restore` task.

* fix(clp-rust-utils): Use AWS SDK default configuration with latest behavior version for S3 client. (y-scope#1445)

Co-authored-by: Junhao Liao <junhao.liao@yscope.com>

* refactor(clp-package): Remove unused `python-dotenv` dependency and related imports (fixes y-scope#1443). (y-scope#1444)

* fix(webui): Submit queries that failed ANTLR validation to Presto.  (y-scope#1450)

* feat(clp-s): Explicitly reject unstructured log inputs during compression. (y-scope#1434)

* feat(webui): Show query speed in native search status. (y-scope#1429)

* fix(job-orchestration): Make `tag_ids` a required `list[int]` for compatibility with Spider compressor. (y-scope#1453)

* feat(clp-mcp-server): Add log viewer links to query results for displaying in LLM output. (y-scope#1454)

Co-authored-by: Junhao Liao <junhao.liao@yscope.com>

* feat(ci): Add tasks for checking and updating Rust lock file (`Cargo.lock`); Add check to GH workflow. (y-scope#1448)

* feat(webui): Trigger submit action when pressing Enter on Monaco single line editor. (y-scope#1459)

* Add filters.

* Update cargo lock.

---------

Co-authored-by: davemarco <83603688+davemarco@users.noreply.github.com>
Co-authored-by: Abigail Matthews <abigail.v.matthews@gmail.com>
Co-authored-by: sitaowang1998 <sitaowang1998@outlook.com>
Co-authored-by: Junhao Liao <junhao@junhao.ca>
Co-authored-by: Junhao Liao <junhao.liao@yscope.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: Devin Gibson <gibber9809@users.noreply.github.com>
Co-authored-by: hoophalab <200652805+hoophalab@users.noreply.github.com>
Co-authored-by: Huangshi Tian <All-less@users.noreply.github.com>
LinZhihao-723 added a commit to LinZhihao-723/clp that referenced this pull request Oct 22, 2025
* feat(webui): Add drawer to display guided query and errors. (y-scope#1421)

* docs: Add Slack community invite badge to home page README. (y-scope#1418)

* refactor(clp-package): Simplify StrEnum and Path serialization via Annotated serializers. (y-scope#1417)

Co-authored-by: Junhao Liao <junhao@junhao.ca>
Co-authored-by: Junhao Liao <junhao.liao@yscope.com>

* build(clp-package): Adopt uv + hatchling as the build and packaging backend for Python components (resolves y-scope#1396); Upgrade dependencies for Python components. (y-scope#1405)

* chore(docker): Add `--link` flags to COPY/ADD operations for improved build performance (fixes y-scope#1408). (y-scope#1411)

* fix(ci): Correctly update and restore cache of `lint:check-cpp-lint-static-full`'s generated files (fixes y-scope#1419): (y-scope#1430)

- Save cache entries using unique key per entry.
- Restore latest entries using key prefix.
- Avoid using outputs from optionally-run `restore` task.

* fix(clp-rust-utils): Use AWS SDK default configuration with latest behavior version for S3 client. (y-scope#1445)

Co-authored-by: Junhao Liao <junhao.liao@yscope.com>

* refactor(clp-package): Remove unused `python-dotenv` dependency and related imports (fixes y-scope#1443). (y-scope#1444)

* fix(webui): Submit queries that failed ANTLR validation to Presto.  (y-scope#1450)

* feat(clp-s): Explicitly reject unstructured log inputs during compression. (y-scope#1434)

* feat(webui): Show query speed in native search status. (y-scope#1429)

* fix(job-orchestration): Make `tag_ids` a required `list[int]` for compatibility with Spider compressor. (y-scope#1453)

* feat(clp-mcp-server): Add log viewer links to query results for displaying in LLM output. (y-scope#1454)

Co-authored-by: Junhao Liao <junhao.liao@yscope.com>

* feat(ci): Add tasks for checking and updating Rust lock file (`Cargo.lock`); Add check to GH workflow. (y-scope#1448)

* feat(webui): Trigger submit action when pressing Enter on Monaco single line editor. (y-scope#1459)

* Add filters.

* Update cargo lock.

---------

Co-authored-by: davemarco <83603688+davemarco@users.noreply.github.com>
Co-authored-by: Abigail Matthews <abigail.v.matthews@gmail.com>
Co-authored-by: sitaowang1998 <sitaowang1998@outlook.com>
Co-authored-by: Junhao Liao <junhao@junhao.ca>
Co-authored-by: Junhao Liao <junhao.liao@yscope.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: Devin Gibson <gibber9809@users.noreply.github.com>
Co-authored-by: hoophalab <200652805+hoophalab@users.noreply.github.com>
Co-authored-by: Huangshi Tian <All-less@users.noreply.github.com>
LinZhihao-723 added a commit to LinZhihao-723/clp that referenced this pull request Oct 22, 2025
* feat(webui): Add drawer to display guided query and errors. (y-scope#1421)

* docs: Add Slack community invite badge to home page README. (y-scope#1418)

* refactor(clp-package): Simplify StrEnum and Path serialization via Annotated serializers. (y-scope#1417)

Co-authored-by: Junhao Liao <junhao@junhao.ca>
Co-authored-by: Junhao Liao <junhao.liao@yscope.com>

* build(clp-package): Adopt uv + hatchling as the build and packaging backend for Python components (resolves y-scope#1396); Upgrade dependencies for Python components. (y-scope#1405)

* chore(docker): Add `--link` flags to COPY/ADD operations for improved build performance (fixes y-scope#1408). (y-scope#1411)

* fix(ci): Correctly update and restore cache of `lint:check-cpp-lint-static-full`'s generated files (fixes y-scope#1419): (y-scope#1430)

- Save cache entries using unique key per entry.
- Restore latest entries using key prefix.
- Avoid using outputs from optionally-run `restore` task.

* fix(clp-rust-utils): Use AWS SDK default configuration with latest behavior version for S3 client. (y-scope#1445)

Co-authored-by: Junhao Liao <junhao.liao@yscope.com>

* refactor(clp-package): Remove unused `python-dotenv` dependency and related imports (fixes y-scope#1443). (y-scope#1444)

* fix(webui): Submit queries that failed ANTLR validation to Presto.  (y-scope#1450)

* feat(clp-s): Explicitly reject unstructured log inputs during compression. (y-scope#1434)

* feat(webui): Show query speed in native search status. (y-scope#1429)

* fix(job-orchestration): Make `tag_ids` a required `list[int]` for compatibility with Spider compressor. (y-scope#1453)

* feat(clp-mcp-server): Add log viewer links to query results for displaying in LLM output. (y-scope#1454)

Co-authored-by: Junhao Liao <junhao.liao@yscope.com>

* feat(ci): Add tasks for checking and updating Rust lock file (`Cargo.lock`); Add check to GH workflow. (y-scope#1448)

* feat(webui): Trigger submit action when pressing Enter on Monaco single line editor. (y-scope#1459)

* Add filters.

* Update cargo lock.

* Stupid fix

---------

Co-authored-by: davemarco <83603688+davemarco@users.noreply.github.com>
Co-authored-by: Abigail Matthews <abigail.v.matthews@gmail.com>
Co-authored-by: sitaowang1998 <sitaowang1998@outlook.com>
Co-authored-by: Junhao Liao <junhao@junhao.ca>
Co-authored-by: Junhao Liao <junhao.liao@yscope.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: Devin Gibson <gibber9809@users.noreply.github.com>
Co-authored-by: hoophalab <200652805+hoophalab@users.noreply.github.com>
Co-authored-by: Huangshi Tian <All-less@users.noreply.github.com>
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compression fails due to None tag_ids

3 participants