feat(job-orchestration): Add Spider compression task; Move Celery compression task into a dedicated submodule.#1340
Conversation
WalkthroughAdds a Celery-based compression task and worker-shutdown handler, introduces a Spider wrapper and byte-string utilities for compression, refactors the original compression task into a core plain function Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Scheduler
participant Broker as Celery Broker
participant Worker as Celery Worker
participant Task as celery_compress.compress
participant Core as compression_entry_point
Scheduler->>Broker: Enqueue compress(job_id, task_id, tag_ids, configs)
Broker->>Worker: Deliver task
Worker->>Task: Execute task
note right of Task: Uses Celery task logger\nand delegates to core entry point
Task->>Core: compression_entry_point(params, logger)
Core-->>Task: result
Task-->>Scheduler: result
rect rgb(245,250,255)
note over Worker,Task: worker_shutdown signal logs shutdown
end
sequenceDiagram
autonumber
actor Spider as Spider TaskContext
participant SpiderWrap as spider_compress.compress
participant Utils as spider_utils
participant Core as compression_entry_point
Spider->>SpiderWrap: Invoke compress(ctx, job_id, task_id, tag_ids, configs_bytes)
note right of SpiderWrap: Initializes logger "spider_compression"
SpiderWrap->>Utils: int8_list_to_utf8_str(...) for each JSON input
SpiderWrap->>Core: compression_entry_point(params, logger)
Core-->>SpiderWrap: result
SpiderWrap->>Utils: utf8_str_to_int8_list(json.dumps(result))
SpiderWrap-->>Spider: result (encoded bytes/list[Int8])
sequenceDiagram
autonumber
participant Old as compression_task (previous)
participant New as compression_task (refactored)
participant Core as compression_entry_point
note over Old: Previously Celery-bound with module logger and signals
note over New: Now plain function(s) with explicit logger parameter
New->>Core: compression_entry_point(params, logger)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)
521-571: Add missingloggerargument to downstream wrappers
- In
components/job-orchestration/job_orchestration/executor/compress/spider_comrpess.py(line 18) andcelery_compress.py(line 26), calls togeneral_compressomit the requiredloggerparameter. Update each wrapper’s signature to accept aloggerand pass it as the last argument togeneral_compress.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
components/job-orchestration/job_orchestration/executor/compress/celery_compress.py(1 hunks)components/job-orchestration/job_orchestration/executor/compress/celeryconfig.py(1 hunks)components/job-orchestration/job_orchestration/executor/compress/compression_task.py(4 hunks)components/job-orchestration/job_orchestration/executor/compress/spider_comrpess.py(1 hunks)components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
🧬 Code graph analysis (4)
components/job-orchestration/job_orchestration/executor/compress/celery_compress.py (2)
components/job-orchestration/job_orchestration/executor/compress/spider_comrpess.py (1)
compress(9-26)components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)
general_compress(521-602)
components/job-orchestration/job_orchestration/executor/compress/celeryconfig.py (1)
components/job-orchestration/job_orchestration/scheduler/constants.py (1)
SchedulerType(9-11)
components/job-orchestration/job_orchestration/executor/compress/spider_comrpess.py (3)
components/clp-py-utils/clp_py_utils/clp_logging.py (1)
get_logger(17-25)components/job-orchestration/job_orchestration/executor/compress/celery_compress.py (1)
compress(17-34)components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)
general_compress(521-602)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
components/job-orchestration/job_orchestration/executor/compress/celery_compress.py (1)
compress(17-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: package-image
🔇 Additional comments (3)
components/job-orchestration/job_orchestration/executor/compress/celeryconfig.py (1)
9-16: Queue route updated to match relocated taskThe module path and routing key now align with
celery_compress.compress, so the worker will continue to discover the task on the intended queue.components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (1)
34-34: Scheduler import follows task relocationImport now targets the Celery-facing wrapper, keeping the existing
compress.s(...)scheduling flow intact.components/job-orchestration/job_orchestration/executor/compress/celery_compress.py (1)
11-34: Celery wrapper cleanly delegates to shared core
get_task_loggerkeeps Celery-managed logging while centralising the heavy lifting ingeneral_compress, which keeps behaviour consistent across execution contexts.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/executor/compress/spider_comrpess.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/executor/compress/spider_comrpess.py (3)
components/clp-py-utils/clp_py_utils/clp_logging.py (1)
get_logger(17-25)components/job-orchestration/job_orchestration/executor/compress/celery_compress.py (1)
compress(17-34)components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)
general_compress(521-602)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
| from clp_py_utils.clp_logging import get_logger | ||
| from job_orchestration.executor.compress.compression_task import general_compress | ||
| from spider_py import TaskContext | ||
|
|
||
| # Setup logging | ||
| logger = get_logger("spider_compression_scheduler") | ||
|
|
||
|
|
||
| def compress( | ||
| _: TaskContext, | ||
| job_id: int, | ||
| task_id: int, | ||
| tag_ids: list[int] | None, | ||
| clp_io_config_json: str, | ||
| paths_to_compress_json: str, | ||
| clp_metadata_db_connection_config: dict[str, bool | int | str | None], | ||
| ) -> dict[str, int | float | str | None]: |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Widen the metadata config annotation.
Database.parse_obj consumes the raw JSON payload, which can contain nested dict structures or scalar types outside bool | int | str | None (e.g., floats, sub-mappings for SSL/client options). The new hint is therefore misleading—it flags legitimate configs as incompatible even though runtime accepts them. Please keep this argument typed as a generic mapping.
-from clp_py_utils.clp_logging import get_logger
-from job_orchestration.executor.compress.compression_task import general_compress
-from spider_py import TaskContext
+from typing import Any, Mapping
+
+from clp_py_utils.clp_logging import get_logger
+from job_orchestration.executor.compress.compression_task import general_compress
+from spider_py import TaskContext
@@
- clp_metadata_db_connection_config: dict[str, bool | int | str | None],
+ clp_metadata_db_connection_config: Mapping[str, Any],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from clp_py_utils.clp_logging import get_logger | |
| from job_orchestration.executor.compress.compression_task import general_compress | |
| from spider_py import TaskContext | |
| # Setup logging | |
| logger = get_logger("spider_compression_scheduler") | |
| def compress( | |
| _: TaskContext, | |
| job_id: int, | |
| task_id: int, | |
| tag_ids: list[int] | None, | |
| clp_io_config_json: str, | |
| paths_to_compress_json: str, | |
| clp_metadata_db_connection_config: dict[str, bool | int | str | None], | |
| ) -> dict[str, int | float | str | None]: | |
| from typing import Any, Mapping | |
| from clp_py_utils.clp_logging import get_logger | |
| from job_orchestration.executor.compress.compression_task import general_compress | |
| from spider_py import TaskContext | |
| # Setup logging | |
| logger = get_logger("spider_compression_scheduler") | |
| def compress( | |
| _: TaskContext, | |
| job_id: int, | |
| task_id: int, | |
| tag_ids: list[int] | None, | |
| clp_io_config_json: str, | |
| paths_to_compress_json: str, | |
| clp_metadata_db_connection_config: Mapping[str, Any], | |
| ) -> dict[str, int | float | str | None]: | |
| ... |
🤖 Prompt for AI Agents
In
components/job-orchestration/job_orchestration/executor/compress/spider_comrpess.py
around lines 1 to 17, the clp_metadata_db_connection_config parameter is
currently annotated as dict[str, bool | int | str | None] which is too narrow
and rejects valid nested or scalar values; change its type to a generic mapping
such as dict[str, Any] (or Mapping[str, Any]) and add the necessary import for
Any from typing, so the function accepts arbitrary JSON-like nested structures
without altering runtime behavior.
LinZhihao-723
left a comment
There was a problem hiding this comment.
Before I post other comments: passing the logger by adding a param looks like a hack tbh...
We should probably create a function to setup/get the global logger instead.
| _: TaskContext, | ||
| job_id: int, | ||
| task_id: int, | ||
| tag_ids: list[int] | None, |
There was a problem hiding this comment.
Since the package is running on Python3.9, I think we should use Union or Optional instead.
There was a problem hiding this comment.
How about from __future__ import annotations?
There was a problem hiding this comment.
Sure. Just make sure it's compatible with Python 3.9.
|
|
||
|
|
||
| def compress( | ||
| _: TaskContext, |
There was a problem hiding this comment.
Are we planning to use this variable in the future PRs?
There was a problem hiding this comment.
Maybe. Depend on how we plan the failure handling.
There was a problem hiding this comment.
Can u post a draft PR for the actual integration? Without seeing how things get integrated, it's a little hard to determine whether this PR is the best practice.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
…to spider-job-executor
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/job-orchestration/job_orchestration/executor/compress/celery_compress.py(1 hunks)components/job-orchestration/job_orchestration/executor/compress/spider_compress.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: LinZhihao-723
PR: y-scope/clp#1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.108Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
📚 Learning: 2025-09-28T15:00:22.108Z
Learnt from: LinZhihao-723
PR: y-scope/clp#1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.108Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/celery_compress.pycomponents/job-orchestration/job_orchestration/executor/compress/spider_compress.py
📚 Learning: 2025-07-03T13:33:27.460Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-07-03T13:33:27.460Z
Learning: In the CLP codebase, `from __future__ import annotations` imports are used to enable Python pipe notation for union types (e.g., `int | str`) in Python versions before 3.10, and should not be considered unused imports.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py
🧬 Code graph analysis (2)
components/job-orchestration/job_orchestration/executor/compress/celery_compress.py (3)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py (1)
compress(11-43)components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)
compression_entry_point(521-602)components/clp-py-utils/clp_py_utils/sql_adapter.py (1)
connect(31-51)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py (3)
components/clp-py-utils/clp_py_utils/clp_logging.py (1)
get_logger(17-25)components/job-orchestration/job_orchestration/executor/compress/celery_compress.py (1)
compress(17-34)components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)
compression_entry_point(521-602)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: LinZhihao-723
PR: y-scope/clp#1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
PR: y-scope/clp#1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py
📚 Learning: 2025-07-03T13:33:27.460Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-07-03T13:33:27.460Z
Learning: In the CLP codebase, `from __future__ import annotations` imports are used to enable Python pipe notation for union types (e.g., `int | str`) in Python versions before 3.10, and should not be considered unused imports.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py (4)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
Database(168-269)components/clp-py-utils/clp_py_utils/clp_logging.py (1)
get_logger(17-25)components/job-orchestration/job_orchestration/executor/compress/celery_compress.py (1)
compress(17-34)components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)
compression_entry_point(521-602)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: LinZhihao-723
PR: y-scope/clp#1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
PR: y-scope/clp#1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py
📚 Learning: 2025-07-03T13:33:27.460Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-07-03T13:33:27.460Z
Learning: In the CLP codebase, `from __future__ import annotations` imports are used to enable Python pipe notation for union types (e.g., `int | str`) in Python versions before 3.10, and should not be considered unused imports.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py (4)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
Database(168-269)components/clp-py-utils/clp_py_utils/clp_logging.py (1)
get_logger(17-25)components/job-orchestration/job_orchestration/executor/compress/celery_compress.py (1)
compress(17-34)components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)
compression_entry_point(521-602)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py (2)
10-10: Align logger name with module name.The logger is named
"spider_compression_scheduler"but the module isspider_compress.py. This inconsistency makes logs harder to trace.Apply this diff:
-logger = get_logger("spider_compression_scheduler") +logger = get_logger("spider_compress")
1-7: Addfrom __future__ import annotationsfor forward-compatible type hints.The file uses modern type syntax like
list[Int8](lines 17-21) which requires Python 3.9+. Adding the__future__import enables this syntax to work reliably across Python 3.9 environments and aligns with the codebase convention.Based on learnings.
Apply this diff:
+from __future__ import annotations + import json from clp_py_utils.clp_config import Database
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py(1 hunks)components/job-orchestration/job_orchestration/utils/spider_utils.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-03T13:33:27.460Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-07-03T13:33:27.460Z
Learning: In the CLP codebase, `from __future__ import annotations` imports are used to enable Python pipe notation for union types (e.g., `int | str`) in Python versions before 3.10, and should not be considered unused imports.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
PR: y-scope/clp#1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py (4)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
Database(177-287)components/clp-py-utils/clp_py_utils/clp_logging.py (1)
get_logger(17-25)components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)
compression_entry_point(521-602)components/job-orchestration/job_orchestration/utils/spider_utils.py (2)
int8_list_to_utf8_str(4-11)utf8_str_to_int8_list(14-21)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (macos-15)
🔇 Additional comments (3)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py (1)
38-52: LGTM: JSON serialization flow is correct.The function properly:
- Converts Spider's
Int8lists to UTF-8 strings for JSON inputs- Parses the database config using Pydantic
- Calls
compression_entry_pointwhich returns a dict (via.model_dump())- Serializes the result dict with
json.dumps- Converts back to
Int8list for SpiderThis matches the expected Spider task interface.
components/job-orchestration/job_orchestration/utils/spider_utils.py (2)
4-11: LGTM: Conversion logic is sound.The function correctly constructs bytes from the Int8 list by casting to int, then decodes to UTF-8. This is the proper way to deserialize Spider's byte representation.
14-20: LGTM: Inverse conversion is correctly structured.The function properly encodes the string to UTF-8 bytes and wraps each byte as a
spider_py.Int8. Once the variable name bug on line 21 is fixed, this will work as intended.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py (1)
9-10: Logger name doesn't match module name.The logger is named
"spider_compression_scheduler"but the module isspider_compress.py. Consider renaming to"spider_compress"for consistency.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py(1 hunks)components/job-orchestration/job_orchestration/utils/spider_utils.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
PR: y-scope/clp#1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py
📚 Learning: 2025-07-03T13:33:27.460Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-07-03T13:33:27.460Z
Learning: In the CLP codebase, `from __future__ import annotations` imports are used to enable Python pipe notation for union types (e.g., `int | str`) in Python versions before 3.10, and should not be considered unused imports.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py (5)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
Database(177-287)components/clp-py-utils/clp_py_utils/clp_logging.py (1)
get_logger(17-25)components/job-orchestration/job_orchestration/executor/compress/celery_compress.py (1)
compress(17-34)components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)
compression_entry_point(521-602)components/job-orchestration/job_orchestration/utils/spider_utils.py (2)
int8_list_to_utf8_str(6-13)utf8_str_to_int8_list(16-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: package-image
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: rust-checks (ubuntu-22.04)
🔇 Additional comments (3)
components/job-orchestration/job_orchestration/utils/spider_utils.py (2)
6-13: LGTM!The function correctly converts
spider_py.Int8values to Python integers, constructs a bytes object, and decodes it as UTF-8.
16-23: LGTM!The function correctly encodes the UTF-8 string and wraps each byte in
spider_py.Int8. The previous issue with the undefined variable has been resolved.components/job-orchestration/job_orchestration/executor/compress/spider_compress.py (1)
38-52: Result serialization is correct.The function properly serializes the dict returned by
compression_entry_pointto JSON, then converts it tolist[Int8]for Spider's return type.
✅ Actions performedReview triggered.
|
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-03T13:33:27.460Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-07-03T13:33:27.460Z
Learning: In the CLP codebase, `from __future__ import annotations` imports are used to enable Python pipe notation for union types (e.g., `int | str`) in Python versions before 3.10, and should not be considered unused imports.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
PR: y-scope/clp#1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py (4)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
Database(177-287)components/clp-py-utils/clp_py_utils/clp_logging.py (1)
get_logger(17-25)components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)
compression_entry_point(521-602)components/job-orchestration/job_orchestration/utils/spider_utils.py (2)
int8_list_to_utf8_str(6-13)utf8_str_to_int8_list(16-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: package-image
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (macos-15)
🔇 Additional comments (1)
components/job-orchestration/job_orchestration/executor/compress/spider_compress.py (1)
1-10: No changes needed for import annotationsPEP 585 (introduced in Python 3.9) enables built-in collection generics (e.g.,
list[int]) without requiringfrom __future__ import annotations, so the imports and type annotations can remain as-is.Likely an incorrect or invalid review comment.
…8/clp into spider-job-executor
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
| Database.model_validate( | ||
| json.loads(int8_list_to_utf8_str(clp_metadata_db_connection_config_json)) | ||
| ), | ||
| json.loads(int8_list_to_utf8_str(clp_metadata_db_connection_config_json)), |
There was a problem hiding this comment.
@sitaowang1998 compression_entry_point requires the database config to be a dictionary. I believe this was a mistake.
LinZhihao-723
left a comment
There was a problem hiding this comment.
- Tested that the celeray compression worked.
- Can't test Spider compression since the scheduler side is not ready.
…pression task into a dedicated submodule. (y-scope#1340) Co-authored-by: LinZhihao-723 <zh.lin@mail.utoronto.ca>
Description
This PR
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Refactor
Chores