feat(clp-package): Add an option to enable outputting search results to a file.#1630
Conversation
WalkthroughAdds a new boolean configuration field Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant API as API Server
participant Scheduler as Job Scheduler
participant Executor as Worker Executor
participant CLP as CLP Binary
note right of API `#E5F6FF`: New field: write_to_file
API->>Scheduler: Submit job (includes write_to_file)
Scheduler->>Executor: Dispatch job config
alt write_to_file == true
Executor->>Executor: check guard in _make_core_clp_command_and_env_vars
opt supported
Executor->>Executor: create output dir & compute output path
Executor->>CLP: build command with file-output arg and path
CLP-->>Executor: write results to file
end
opt unsupported
Executor-->>Scheduler: abort command assembly (returns (None, None))
end
else write_to_file == false
Executor->>CLP: build standard command (streaming/storage)
CLP-->>Executor: stream results
end
Executor->>Scheduler: report job status
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
components/api-server/src/client.rs(2 hunks)components/clp-rust-utils/src/job_config/search.rs(1 hunks)components/job-orchestration/job_orchestration/executor/query/fs_search_task.py(2 hunks)components/job-orchestration/job_orchestration/scheduler/job_config.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Applied to files:
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
get_directory(552-553)get_directory(563-564)
🪛 Ruff (0.14.5)
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
44-44: Avoid equality comparisons to True; use search_config.write_to_file: for truth checks
Replace with search_config.write_to_file
(E712)
46-46: f-string without any placeholders
Remove extraneous f prefix
(F541)
46-47: Logging statement uses f-string
(G004)
171-171: Logging statement uses f-string
(G004)
171-171: Statement ends with an unnecessary semicolon
Remove unnecessary semicolon
(E703)
🔇 Additional comments (5)
components/api-server/src/client.rs (2)
29-30: LGTM!The new
write_to_filefield is properly defined with#[serde(default)], which will default tofalsewhen not provided in deserialization.
42-42: LGTM!The
write_to_filefield is correctly propagated fromQueryConfigtoSearchJobConfig.components/clp-rust-utils/src/job_config/search.rs (1)
24-24: LGTM!The
write_to_filefield is correctly added toSearchJobConfigand aligns with the Python-side definition injob_config.py.components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
96-96: LGTM!The
write_to_filefield is correctly added with an appropriate default value ofFalse, maintaining synchronization with the Rust-sideSearchJobConfig.components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (1)
44-49: Verify test coverage and engine-specific behavior.The implementation guards against using
write_to_filewith the core CLP engine but allows it with CLP_S. Additionally, no automated tests appear to be included for this new functionality. Please verify:
- Is the asymmetric support between CLP and CLP_S engines intentional and properly documented?
- Are there plans to add automated tests for the
write_to_filefunctionality, or will this be covered in the follow-up PR mentioned in the description?Also applies to: 162-172
| if True == search_config.write_to_file: | ||
| logger.error( | ||
| f"Outputting search results to the file system is not supported while using the" | ||
| "'{worker_config.package.storage_engine}' storage engine." | ||
| ) | ||
| return None, None |
There was a problem hiding this comment.
Fix code quality issues in the guard logic.
The guard logic has several style and correctness issues:
- Line 44: Use Pythonic truth check instead of
True ==comparison. - Lines 46-47: The error message has formatting issues - line 46 has an unnecessary
fprefix and line 47 is missing thefprefix for the storage engine interpolation. - Logging with f-strings is discouraged; use lazy % formatting instead.
Apply this diff:
- if True == search_config.write_to_file:
- logger.error(
- f"Outputting search results to the file system is not supported while using the"
- "'{worker_config.package.storage_engine}' storage engine."
- )
+ if search_config.write_to_file:
+ logger.error(
+ "Outputting search results to the file system is not supported while using the"
+ " '%s' storage engine.",
+ worker_config.package.storage_engine
+ )
return None, None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if True == search_config.write_to_file: | |
| logger.error( | |
| f"Outputting search results to the file system is not supported while using the" | |
| "'{worker_config.package.storage_engine}' storage engine." | |
| ) | |
| return None, None | |
| if search_config.write_to_file: | |
| logger.error( | |
| "Outputting search results to the file system is not supported while using the" | |
| " '%s' storage engine.", | |
| worker_config.package.storage_engine | |
| ) | |
| return None, None |
🧰 Tools
🪛 Ruff (0.14.5)
44-44: Avoid equality comparisons to True; use search_config.write_to_file: for truth checks
Replace with search_config.write_to_file
(E712)
46-46: f-string without any placeholders
Remove extraneous f prefix
(F541)
46-47: Logging statement uses f-string
(G004)
There was a problem hiding this comment.
@LinZhihao-723, thank you for confirming! The suggested changes will improve the code quality and align with Python best practices.
🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
| if True == search_config.write_to_file: | ||
| logger.error( | ||
| f"Outputting search results to the file system is not supported while using the" | ||
| "'{worker_config.package.storage_engine}' storage engine." | ||
| ) | ||
| return None, None |
LinZhihao-723
left a comment
There was a problem hiding this comment.
For the PR title, how about:
feat(clp-package): Add an option to enable outputting search results to a file.
This matches #1590.
Description
Will put up another PR to collect the results from the file system.
breaking change.
Validation performed
{"query_string": "Timestamp < 1633615049894", "dataset": "spark", "write_to_file": True}tolocalhost:3001var/data/streamsSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.