Skip to content

feat(clp-package): Add an option to enable outputting search results to a file.#1630

Merged
hoophalab merged 7 commits into
y-scope:mainfrom
hoophalab:fsjob-submit
Nov 20, 2025
Merged

feat(clp-package): Add an option to enable outputting search results to a file.#1630
hoophalab merged 7 commits into
y-scope:mainfrom
hoophalab:fsjob-submit

Conversation

@hoophalab

@hoophalab hoophalab commented Nov 19, 2025

Copy link
Copy Markdown
Contributor

Description

  1. Add a new job config option “write results to disk”.
  2. fs_search_task.py writes query results to the file system if that option is enabled.
  3. Users can enable the option from the API server

Will put up another PR to collect the results from the file system.

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  1. submit a query {"query_string": "Timestamp < 1633615049894", "dataset": "spark", "write_to_file": True} to localhost:3001
  2. output files are stored to var/data/streams

Summary by CodeRabbit

  • New Features
    • Option to persist search results to files so query outputs can be saved to the worker’s output directory.
  • Bug Fixes / Behaviour
    • Improved handling when file output is requested: an output path is created and unsupported contexts will now skip file writing instead of failing.

✏️ Tip: You can customize this high-level summary in your review settings.

@hoophalab hoophalab requested a review from a team as a code owner November 19, 2025 12:57
@coderabbitai

coderabbitai Bot commented Nov 19, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Adds a new boolean configuration field write_to_file to API, Rust and Python job configs and propagates it into job conversion; updates executor to handle write_to_file by creating output paths and appending file-output arguments, and adds an early guard returning (None, None) for unsupported combinations.

Changes

Cohort / File(s) Summary
API config & conversion
components/api-server/src/client.rs
Added public write_to_file: bool to QueryConfig with serde(default); From<QueryConfig> for SearchJobConfig conversion now propagates this field.
Rust job config
components/clp-rust-utils/src/job_config/search.rs, Cargo.toml
Added pub write_to_file: bool to SearchJobConfig (affects public API, serialization and initialization).
Scheduler Python config
components/job-orchestration/job_orchestration/scheduler/job_config.py
Added write_to_file: bool = False to SearchJobConfig model (default False).
Executor output handling
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
_make_core_clp_command_and_env_vars returns (None, None) early for unsupported combos when write_to_file is true; _make_command_and_env_vars creates output directory/path and appends file-output mode and path to the CLP command when write_to_file is enabled.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant API as API Server
    participant Scheduler as Job Scheduler
    participant Executor as Worker Executor
    participant CLP as CLP Binary

    note right of API `#E5F6FF`: New field: write_to_file
    API->>Scheduler: Submit job (includes write_to_file)
    Scheduler->>Executor: Dispatch job config

    alt write_to_file == true
        Executor->>Executor: check guard in _make_core_clp_command_and_env_vars
        opt supported
            Executor->>Executor: create output dir & compute output path
            Executor->>CLP: build command with file-output arg and path
            CLP-->>Executor: write results to file
        end
        opt unsupported
            Executor-->>Scheduler: abort command assembly (returns (None, None))
        end
    else write_to_file == false
        Executor->>CLP: build standard command (streaming/storage)
        CLP-->>Executor: stream results
    end

    Executor->>Scheduler: report job status
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review focus:
    • components/job-orchestration/job_orchestration/executor/query/fs_search_task.py: correctness of the early guard, path concatenation, directory creation, and error handling.
    • components/api-server/src/client.rs: serde(default) usage and correct propagation in From<QueryConfig>.
    • components/clp-rust-utils/src/job_config/search.rs and components/job-orchestration/job_orchestration/scheduler/job_config.py: public API/serialization impact and default compatibility.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: adding a file output option for search results across the API and job configuration.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c539bb and 0c76616.

📒 Files selected for processing (4)
  • components/api-server/src/client.rs (2 hunks)
  • components/clp-rust-utils/src/job_config/search.rs (1 hunks)
  • components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (2 hunks)
  • components/job-orchestration/job_orchestration/scheduler/job_config.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.

Applied to files:

  • components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
🧬 Code graph analysis (1)
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
  • get_directory (552-553)
  • get_directory (563-564)
🪛 Ruff (0.14.5)
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py

44-44: Avoid equality comparisons to True; use search_config.write_to_file: for truth checks

Replace with search_config.write_to_file

(E712)


46-46: f-string without any placeholders

Remove extraneous f prefix

(F541)


46-47: Logging statement uses f-string

(G004)


171-171: Logging statement uses f-string

(G004)


171-171: Statement ends with an unnecessary semicolon

Remove unnecessary semicolon

(E703)

🔇 Additional comments (5)
components/api-server/src/client.rs (2)

29-30: LGTM!

The new write_to_file field is properly defined with #[serde(default)], which will default to false when not provided in deserialization.


42-42: LGTM!

The write_to_file field is correctly propagated from QueryConfig to SearchJobConfig.

components/clp-rust-utils/src/job_config/search.rs (1)

24-24: LGTM!

The write_to_file field is correctly added to SearchJobConfig and aligns with the Python-side definition in job_config.py.

components/job-orchestration/job_orchestration/scheduler/job_config.py (1)

96-96: LGTM!

The write_to_file field is correctly added with an appropriate default value of False, maintaining synchronization with the Rust-side SearchJobConfig.

components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (1)

44-49: Verify test coverage and engine-specific behavior.

The implementation guards against using write_to_file with the core CLP engine but allows it with CLP_S. Additionally, no automated tests appear to be included for this new functionality. Please verify:

  1. Is the asymmetric support between CLP and CLP_S engines intentional and properly documented?
  2. Are there plans to add automated tests for the write_to_file functionality, or will this be covered in the follow-up PR mentioned in the description?

Also applies to: 162-172

Comment on lines +44 to +49
if True == search_config.write_to_file:
logger.error(
f"Outputting search results to the file system is not supported while using the"
"'{worker_config.package.storage_engine}' storage engine."
)
return None, None

@coderabbitai coderabbitai Bot Nov 19, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix code quality issues in the guard logic.

The guard logic has several style and correctness issues:

  1. Line 44: Use Pythonic truth check instead of True == comparison.
  2. Lines 46-47: The error message has formatting issues - line 46 has an unnecessary f prefix and line 47 is missing the f prefix for the storage engine interpolation.
  3. Logging with f-strings is discouraged; use lazy % formatting instead.

Apply this diff:

-    if True == search_config.write_to_file:
-        logger.error(
-            f"Outputting search results to the file system is not supported while using the"
-            "'{worker_config.package.storage_engine}' storage engine."
-        )
+    if search_config.write_to_file:
+        logger.error(
+            "Outputting search results to the file system is not supported while using the"
+            " '%s' storage engine.",
+            worker_config.package.storage_engine
+        )
         return None, None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if True == search_config.write_to_file:
logger.error(
f"Outputting search results to the file system is not supported while using the"
"'{worker_config.package.storage_engine}' storage engine."
)
return None, None
if search_config.write_to_file:
logger.error(
"Outputting search results to the file system is not supported while using the"
" '%s' storage engine.",
worker_config.package.storage_engine
)
return None, None
🧰 Tools
🪛 Ruff (0.14.5)

44-44: Avoid equality comparisons to True; use search_config.write_to_file: for truth checks

Replace with search_config.write_to_file

(E712)


46-46: f-string without any placeholders

Remove extraneous f prefix

(F541)


46-47: Logging statement uses f-string

(G004)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the rabbit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinZhihao-723, thank you for confirming! The suggested changes will improve the code quality and align with Python best practices.


🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.

Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.

Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm

Comment on lines +44 to +49
if True == search_config.write_to_file:
logger.error(
f"Outputting search results to the file system is not supported while using the"
"'{worker_config.package.storage_engine}' storage engine."
)
return None, None

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the rabbit.

Comment thread components/job-orchestration/job_orchestration/executor/query/fs_search_task.py Outdated

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat(clp-package): Add an option to enable outputting search results to a file.

This matches #1590.

@hoophalab hoophalab changed the title feat(api): Add option to write query results to file. feat(clp-package): Add an option to enable outputting search results to a file. Nov 20, 2025
@hoophalab hoophalab merged commit 42c18c7 into y-scope:main Nov 20, 2025
20 checks passed
@hoophalab hoophalab deleted the fsjob-submit branch December 2, 2025 18:22
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants