Skip to content

feat(clp-rust-utils): Add serde-compatible mirror structs for package and search job configs based on existing Python definitions.#1535

Merged
hoophalab merged 6 commits into
y-scope:mainfrom
hoophalab:configyml
Nov 3, 2025

Conversation

@hoophalab

@hoophalab hoophalab commented Oct 31, 2025

Copy link
Copy Markdown
Contributor

Description

  1. Add the necessary ser/de types for clp-config.yml and credentials.yml (only used fields are added for now)
  2. Add the necessary ser/de types for search jobs in the sql database

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Those structs works correctly in #1534

Summary by CodeRabbit

  • Chores
    • Introduced search/query job configuration and status types to the crate, enabling structured job records and status values.
    • Expanded the configuration package with components for package, database, API server, results cache and credentials, including sensible defaults and deserializable types.
    • Added enum-serialization support as a dependency to improve serialized enum handling.

@hoophalab hoophalab requested a review from a team as a code owner October 31, 2025 16:02
@coderabbitai

coderabbitai Bot commented Oct 31, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Added one dependency and multiple new modules and types: a search job config module with job enums and struct, and a clp_config package submodule with config and credentials types plus default file path constants and re-exports.

Changes

Cohort / File(s) Summary
Dependency Addition
components/clp-rust-utils/Cargo.toml
Added dependency num_enum = "0.7.5".
Job Search Module
components/clp-rust-utils/src/job_config.rs, components/clp-rust-utils/src/job_config/search.rs
Added mod search; and pub use search::*;. Introduced QUERY_JOBS_TABLE_NAME, SearchJobConfig struct, QueryJobStatus enum, and QueryJobType enum with serde and num_enum derives and #[repr(i32)].
clp_config Package Module
components/clp-rust-utils/src/clp_config.rs, components/clp-rust-utils/src/clp_config/package.rs
Added internal package module and pub use package::*;, declared config and credentials submodules, and added DEFAULT_CONFIG_FILE_PATH / DEFAULT_CREDENTIALS_FILE_PATH constants.
Package Configuration Types
components/clp-rust-utils/src/clp_config/package/config.rs
Added Config, Package, Database, ApiServer, ResultsCache structs and StorageEngine enum; types derive Deserialize/Clone (and others), use #[serde(default)], and several implement Default.
Credentials Types
components/clp-rust-utils/src/clp_config/package/credentials.rs
Added Credentials and nested Database structs; Database uses secrecy::SecretString for password; both derive Deserialize, Clone, Debug.

Sequence Diagram(s)

(omitted — changes add types and re-exports only; no runtime control-flow modifications)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Areas to check closely:
    • num_enum usage and Cargo feature compatibility with other crates.
    • #[serde(default)] and Default impls in config.rs for consistent deserialization behavior.
    • Visibility and re-exports: pub use package::*; and pub use search::*; to ensure no unintended API collisions.

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and accurately summarizes the main objective of the changeset. It indicates that the PR adds serde-compatible mirror structs for package and search job configurations based on existing Python definitions. This directly aligns with the raw summary, which shows the addition of new Rust struct definitions for configuration types in clp-rust-utils, including SearchJobConfig, QueryJobStatus, QueryJobType in the search module, and Config, Database, ApiServer, Package, StorageEngine, ResultsCache in the package module, along with Credentials types. The title is concise, specific, and accurately represents the primary change.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9a6c5c and bcb123e.

📒 Files selected for processing (3)
  • components/clp-rust-utils/src/clp_config/package/config.rs (1 hunks)
  • components/clp-rust-utils/src/clp_config/package/credentials.rs (1 hunks)
  • components/clp-rust-utils/src/job_config/search.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.214Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.
📚 Learning: 2025-11-03T16:17:40.214Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.214Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/credentials.rs
  • components/clp-rust-utils/src/clp_config/package/config.rs
📚 Learning: 2025-10-22T21:01:31.391Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.

Applied to files:

  • components/clp-rust-utils/src/job_config/search.rs
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/config.rs
📚 Learning: 2025-10-18T15:38:01.210Z
Learnt from: sitaowang1998
Repo: y-scope/clp PR: 1417
File: components/clp-py-utils/clp_py_utils/serialization_utils.py:6-13
Timestamp: 2025-10-18T15:38:01.210Z
Learning: In the y-scope/clp repository, parameter descriptions in docstrings are intentionally omitted when the parameter name is self-explanatory. This is the project's preferred documentation style.

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/config.rs
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/config.rs
📚 Learning: 2025-09-02T13:35:33.396Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1228
File: tools/deployment/presto-clp/scripts/init.py:298-306
Timestamp: 2025-09-02T13:35:33.396Z
Learning: In the CLP codebase, all configuration values loaded from YAML files are string values, so functions that retrieve config values should use Optional[str] rather than Optional[Any] for type hints.

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/config.rs
📚 Learning: 2025-08-20T08:36:38.391Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/__init__.py:1-1
Timestamp: 2025-08-20T08:36:38.391Z
Learning: In the CLP project, __init__.py files should include short module-level docstrings describing the package purpose, as required by their ruff linting configuration, rather than being left empty.

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/config.rs
🧬 Code graph analysis (3)
components/clp-rust-utils/src/clp_config/package/credentials.rs (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • Database (162-249)
components/clp-rust-utils/src/job_config/search.rs (1)
components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
  • SearchJobConfig (85-103)
components/clp-rust-utils/src/clp_config/package/config.rs (1)
components/clp-py-utils/clp_py_utils/clp_config.py (4)
  • Package (135-159)
  • Database (162-249)
  • ResultsCache (327-341)
  • StorageEngine (95-97)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: package-image
  • GitHub Check: antlr-code-committed (macos-15)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: rust-checks (ubuntu-22.04)
🔇 Additional comments (6)
components/clp-rust-utils/src/job_config/search.rs (4)

1-4: LGTM! Imports and constant are appropriate.

The imports support the enum conversions and serialisation requirements, and the table name constant is well-defined.


6-10: Good documentation improvements.

The documentation clearly indicates the struct mirrors Python and explains the aggregation_config placeholder. This addresses previous feedback about clarifying the relationship with Python code.


26-37: LGTM! QueryJobStatus enum is well-defined.

The documentation clearly indicates the Python mirror, and the enum structure with explicit discriminants and conversion traits is appropriate for database integration.


39-46: LGTM! QueryJobType enum is well-defined.

Consistent with QueryJobStatus, this enum has clear documentation and appropriate structure for database integration.

components/clp-rust-utils/src/clp_config/package/credentials.rs (1)

4-13: Good alignment with credentials schema.

Using SecretString for the password while keeping the username as a plain String lines up well with credentials.yml and keeps sensitive data appropriately wrapped. Looks great.

components/clp-rust-utils/src/clp_config/package/config.rs (1)

10-113: Defaults + serde(default) look solid.

Deriving Default on the nested mirrors and combining that with #[serde(default)] on Config keeps deserialisation resilient when YAML omits unused fields. Thanks for documenting the sync requirements too.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f24600 and a167af3.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • components/clp-rust-utils/Cargo.toml (1 hunks)
  • components/clp-rust-utils/src/job_config.rs (1 hunks)
  • components/clp-rust-utils/src/job_config/search.rs (1 hunks)
  • components/clp-rust-utils/src/lib.rs (1 hunks)
  • components/clp-rust-utils/src/package_config.rs (1 hunks)
  • components/clp-rust-utils/src/package_config/config.rs (1 hunks)
  • components/clp-rust-utils/src/package_config/credentials.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

Applied to files:

  • components/clp-rust-utils/src/package_config.rs
  • components/clp-rust-utils/src/lib.rs
  • components/clp-rust-utils/Cargo.toml
📚 Learning: 2025-10-22T21:01:31.391Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.

Applied to files:

  • components/clp-rust-utils/Cargo.toml
🧬 Code graph analysis (4)
components/clp-rust-utils/src/lib.rs (1)
integration-tests/tests/fixtures/integration_test_config.py (1)
  • package_config (24-28)
components/clp-rust-utils/src/package_config/credentials.rs (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • Database (162-249)
components/clp-rust-utils/src/package_config/config.rs (1)
components/clp-py-utils/clp_py_utils/clp_config.py (4)
  • Package (135-159)
  • Database (162-249)
  • ResultsCache (327-341)
  • StorageEngine (95-97)
components/clp-rust-utils/src/job_config/search.rs (1)
components/job-orchestration/job_orchestration/scheduler/job_config.py (2)
  • AggregationConfig (61-66)
  • SearchJobConfig (85-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: package-image
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (7)
components/clp-rust-utils/src/lib.rs (1)

1-2: LGTM!

The module declarations correctly expose the new job_config and package_config modules as part of the crate's public API.

components/clp-rust-utils/src/job_config.rs (1)

1-2: LGTM!

The module structure and re-export pattern are idiomatic and correctly expose the search submodule's public API.

components/clp-rust-utils/src/package_config/credentials.rs (1)

1-13: LGTM!

The use of SecretString for the password field is appropriate for secure handling of credentials. The structure aligns well with the Python equivalent.

components/clp-rust-utils/src/package_config.rs (1)

1-7: LGTM!

The module structure, re-exports, and default file path constants are well-organized and provide a clean public API for package configuration.

components/clp-rust-utils/src/job_config/search.rs (1)

24-42: LGTM!

The enum definitions for QueryJobStatus and QueryJobType are well-structured with appropriate derives (Debug, Serialize, Deserialize, IntoPrimitive, TryFromPrimitive) and correct #[repr(i32)] for database storage compatibility.

components/clp-rust-utils/src/package_config/config.rs (1)

15-87: LGTM!

The configuration structures are well-designed with appropriate defaults that match the Python equivalents. The use of #[serde(default)] combined with Default implementations provides robust deserialization with sensible fallbacks. The StorageEngine enum correctly uses #[serde(rename = ...)] for kebab-case serialization.

components/clp-rust-utils/Cargo.toml (1)

10-12: Dependency versions verified as current and secure.

num_enum "0.7" resolves to the latest 0.7.5, and serde "1.0" resolves to 1.0.228. No known security vulnerabilities exist for either dependency.

Comment on lines +6 to +7
// TODO: Add definition.
pub type AggregationConfig = ();

@coderabbitai coderabbitai Bot Oct 31, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Track the TODO for AggregationConfig definition.

The placeholder type alias should be replaced with the actual definition. Based on the Python equivalent in components/job-orchestration/job_orchestration/scheduler/job_config.py, AggregationConfig should include fields like job_id, reducer_host, reducer_port, do_count_aggregation, and count_by_time_bucket_size.

Do you want me to generate the complete AggregationConfig struct definition based on the Python implementation, or open a new issue to track this task?

🤖 Prompt for AI Agents
In components/clp-rust-utils/src/job_config/search.rs around lines 6 to 7, the
AggregationConfig alias is a TODO placeholder; replace it with a proper struct
mirroring the Python version: define an AggregationConfig struct with fields
job_id (String), reducer_host (String), reducer_port (u16 or u32 as appropriate
for your ports), do_count_aggregation (bool), and count_by_time_bucket_size
(Option<u64> or appropriate numeric type), derive serde Serialize/Deserialize
(and Debug/Clone/PartialEq as used elsewhere), and export it from the module;
ensure field names and types align with the Python implementation and other Rust
code that will consume this struct.

@hoophalab hoophalab Oct 31, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that we will never need aggregation config in rust

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hoophalab, understood! In that case, would you like to update the TODO comment to clarify that AggregationConfig is intentionally left as a placeholder type and won't need a full implementation? Something like:

// Intentionally left as a no-op placeholder; aggregation config is not needed in Rust.
pub type AggregationConfig = ();

This would make the intent clearer for future maintainers. Alternatively, if the field isn't being used at all, you could consider removing it from SearchJobConfig entirely to simplify the API.


🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.

Comment on lines +9 to +22
#[derive(Default, Serialize, Deserialize)]
#[serde(default)]
pub struct SearchJobConfig {
pub dataset: Option<String>,
pub query_string: String,
pub max_num_results: u32,
pub tags: Option<Vec<String>>,
pub begin_timestamp: Option<i64>,
pub end_timestamp: Option<i64>,
pub ignore_case: bool,
pub path_filter: Option<String>,
pub network_address: Option<(String, u16)>,
pub aggregation_config: Option<AggregationConfig>,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reconsider the Default derive for SearchJobConfig.

Deriving Default for SearchJobConfig allows creating instances with an empty query_string, which is semantically invalid for a search configuration. This could lead to runtime errors or invalid database entries.

Consider one of the following solutions:

  1. Remove the Default derive and require explicit construction.
  2. Make query_string an Option<String> to explicitly represent the absence of a query.
  3. Add a validation method that must be called before use.

Apply this diff to remove the problematic Default derive:

-#[derive(Default, Serialize, Deserialize)]
+#[derive(Serialize, Deserialize)]
 #[serde(default)]
 pub struct SearchJobConfig {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[derive(Default, Serialize, Deserialize)]
#[serde(default)]
pub struct SearchJobConfig {
pub dataset: Option<String>,
pub query_string: String,
pub max_num_results: u32,
pub tags: Option<Vec<String>>,
pub begin_timestamp: Option<i64>,
pub end_timestamp: Option<i64>,
pub ignore_case: bool,
pub path_filter: Option<String>,
pub network_address: Option<(String, u16)>,
pub aggregation_config: Option<AggregationConfig>,
}
#[derive(Serialize, Deserialize)]
#[serde(default)]
pub struct SearchJobConfig {
pub dataset: Option<String>,
pub query_string: String,
pub max_num_results: u32,
pub tags: Option<Vec<String>>,
pub begin_timestamp: Option<i64>,
pub end_timestamp: Option<i64>,
pub ignore_case: bool,
pub path_filter: Option<String>,
pub network_address: Option<(String, u16)>,
pub aggregation_config: Option<AggregationConfig>,
}
🤖 Prompt for AI Agents
In components/clp-rust-utils/src/job_config/search.rs around lines 9 to 22,
remove the automatic Default implementation that allows an empty query_string:
delete the #[derive(Default, Serialize, Deserialize)] default derive and the
#[serde(default)] attribute (or at least the Default part) so SearchJobConfig
cannot be constructed with an empty query by accident; then require explicit
construction of instances (e.g., provide a constructor/new function or make
callers supply all fields) or change query_string to Option<String> or add a
validate() method as appropriate — for this change apply the diff that removes
the Default derive and serde default usage and update callers to
construct/validate SearchJobConfig explicitly.

Comment thread components/clp-rust-utils/src/job_config/search.rs
Comment thread components/clp-rust-utils/src/clp_config/package/config.rs Outdated
@LinZhihao-723 LinZhihao-723 self-requested a review October 31, 2025 16:17
Comment thread components/clp-rust-utils/src/clp_config/package/config.rs
Comment thread components/clp-rust-utils/src/job_config/search.rs Outdated
Comment thread components/clp-rust-utils/src/job_config/search.rs Outdated
Comment thread components/clp-rust-utils/src/job_config/search.rs
Comment thread components/clp-rust-utils/src/clp_config/package/config.rs
Comment thread components/clp-rust-utils/src/clp_config/package/config.rs
Comment thread components/clp-rust-utils/Cargo.toml Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/clp-rust-utils/Cargo.toml (1)

4-4: Invalid Rust edition specified.

The edition is set to "2024", which is not a valid Rust edition. Valid editions are 2015, 2018, and 2021. This will cause compilation errors.

Apply this diff to fix the edition:

-edition = "2024"
+edition = "2021"
♻️ Duplicate comments (3)
components/clp-rust-utils/src/job_config/search.rs (2)

22-22: Consider adding port validation for network_address.

While u16 naturally enforces the range [0, 65535], port 0 is typically invalid for network addresses. The Python implementation validates that the port is in the range [1, 65535].

Consider adding a validation method:

impl SearchJobConfig {
    pub fn validate_network_address(&self) -> Result<(), String> {
        if let Some((_, port)) = &self.network_address {
            if *port < 1 {
                return Err("Port must be in the range [1, 65535]".to_string());
            }
        }
        Ok(())
    }
}

11-24: Reconsider the Default derive for SearchJobConfig.

The Default derive allows creating instances with an empty query_string, which is semantically invalid for a search configuration. This could lead to runtime errors or invalid database entries.

Consider removing the Default derive and requiring explicit construction:

-#[derive(Default, Serialize, Deserialize)]
+#[derive(Serialize, Deserialize)]
-#[serde(default)]
 pub struct SearchJobConfig {
components/clp-rust-utils/src/clp_config/package/config.rs (1)

10-20: Consider deriving Default for Config.

For consistency with the child structs that all implement Default, and given that all fields use #[serde(default)], it would be convenient to also derive Default for Config itself.

Apply this diff:

-#[derive(Deserialize, Clone)]
+#[derive(Deserialize, Clone, Default)]
 pub struct Config {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd9a09c and b9e148d.

📒 Files selected for processing (6)
  • components/clp-rust-utils/Cargo.toml (1 hunks)
  • components/clp-rust-utils/src/clp_config.rs (1 hunks)
  • components/clp-rust-utils/src/clp_config/package.rs (1 hunks)
  • components/clp-rust-utils/src/clp_config/package/config.rs (1 hunks)
  • components/clp-rust-utils/src/clp_config/package/credentials.rs (1 hunks)
  • components/clp-rust-utils/src/job_config/search.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

Applied to files:

  • components/clp-rust-utils/src/clp_config/package.rs
  • components/clp-rust-utils/src/clp_config/package/config.rs
  • components/clp-rust-utils/Cargo.toml
📚 Learning: 2025-10-22T21:01:31.391Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.

Applied to files:

  • components/clp-rust-utils/src/job_config/search.rs
  • components/clp-rust-utils/Cargo.toml
📚 Learning: 2025-10-18T15:38:01.210Z
Learnt from: sitaowang1998
Repo: y-scope/clp PR: 1417
File: components/clp-py-utils/clp_py_utils/serialization_utils.py:6-13
Timestamp: 2025-10-18T15:38:01.210Z
Learning: In the y-scope/clp repository, parameter descriptions in docstrings are intentionally omitted when the parameter name is self-explanatory. This is the project's preferred documentation style.

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/config.rs
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/config.rs
📚 Learning: 2025-09-02T13:35:33.396Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1228
File: tools/deployment/presto-clp/scripts/init.py:298-306
Timestamp: 2025-09-02T13:35:33.396Z
Learning: In the CLP codebase, all configuration values loaded from YAML files are string values, so functions that retrieve config values should use Optional[str] rather than Optional[Any] for type hints.

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/config.rs
📚 Learning: 2025-08-20T08:36:38.391Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/__init__.py:1-1
Timestamp: 2025-08-20T08:36:38.391Z
Learning: In the CLP project, __init__.py files should include short module-level docstrings describing the package purpose, as required by their ruff linting configuration, rather than being left empty.

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/config.rs
🧬 Code graph analysis (3)
components/clp-rust-utils/src/job_config/search.rs (1)
components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
  • SearchJobConfig (85-103)
components/clp-rust-utils/src/clp_config/package/config.rs (1)
components/clp-py-utils/clp_py_utils/clp_config.py (4)
  • Package (135-159)
  • Database (162-249)
  • ResultsCache (327-341)
  • StorageEngine (95-97)
components/clp-rust-utils/src/clp_config/package/credentials.rs (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • Database (162-249)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build (macos-15)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-22.04)
🔇 Additional comments (14)
components/clp-rust-utils/Cargo.toml (1)

11-11: LGTM!

The num_enum dependency is correctly added to support the IntoPrimitive and TryFromPrimitive derives used by the query job enums in the search module. The specific version 0.7.5 appears to be intentionally chosen.

components/clp-rust-utils/src/job_config/search.rs (4)

1-2: LGTM!

The imports are appropriate for the enum conversions and serialization/deserialization functionality.


4-4: LGTM!

The constant correctly defines the table name for query jobs.


26-37: LGTM!

The QueryJobStatus enum correctly mirrors the Python definition with appropriate documentation and derives for primitive conversions. The explicit repr(i32) and variant values ensure synchronization with the Python implementation.


39-46: LGTM!

The QueryJobType enum is properly documented and structured with correct primitive conversions matching the Python implementation.

components/clp-rust-utils/src/clp_config/package/config.rs (4)

22-45: LGTM!

The Database struct is well-documented and the default values correctly match the Python implementation in clp_py_utils.clp_config.Database.


63-82: LGTM!

The Package struct is properly documented and the default storage engine correctly matches the Python implementation.


84-91: LGTM!

The StorageEngine enum correctly mirrors the Python definition with proper serde rename attributes for "clp" and "clp-s".


93-116: LGTM!

The ResultsCache struct is well-documented and all default values correctly match the Python implementation.

components/clp-rust-utils/src/clp_config.rs (1)

4-5: LGTM!

The package module is properly declared and re-exported, making the configuration types available through the crate's public API.

components/clp-rust-utils/src/clp_config/package/credentials.rs (2)

9-13: LGTM!

The Database struct correctly uses SecretString for the password field, which provides important security benefits (prevents accidental logging via Debug, zeroizes memory on drop). The field name "user" matches the YAML key structure used in the Python implementation.


4-7: LGTM!

The Credentials struct provides a clean wrapper for database credentials, properly structured for deserialization from YAML configuration files.

components/clp-rust-utils/src/clp_config/package.rs (2)

1-4: LGTM!

The module structure cleanly organizes configuration and credentials types with appropriate public visibility and re-exports.


6-8: Verification complete — paths match Python implementation.

The Rust constants in components/clp-rust-utils/src/clp_config/package.rs correctly mirror the Python paths in components/clp-py-utils/clp_py_utils/clp_config.py:

  • DEFAULT_CONFIG_FILE_PATH"etc/clp-config.yml" (matches CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH)
  • DEFAULT_CREDENTIALS_FILE_PATH"etc/credentials.yml" (matches CLP_DEFAULT_CREDENTIALS_FILE_PATH)

No issues identified.

Comment on lines +47 to +61
#[derive(Deserialize, Clone)]
#[serde(default)]
pub struct ApiServer {
pub host: String,
pub port: u16,
}

impl Default for ApiServer {
fn default() -> Self {
Self {
host: "localhost".to_owned(),
port: 3001,
}
}
}

@coderabbitai coderabbitai Bot Nov 3, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding documentation for ApiServer.

Unlike the other structs, ApiServer lacks documentation about its Python counterpart and whether the default values need to be kept in sync. For consistency with other types in this file, consider adding similar documentation.

/// Mirror of `clp_py_utils.clp_config.ApiServer`.
///
/// # NOTE
///
/// * This type is partially defined: unused fields are omitted and discarded through
///   deserialization.
/// * The default values must be kept in sync with the Python definition.
#[derive(Deserialize, Clone)]
#[serde(default)]
pub struct ApiServer {
🤖 Prompt for AI Agents
In components/clp-rust-utils/src/clp_config/package/config.rs around lines 47 to
61, add documentation comments for ApiServer to mirror the style used by other
structs: include a top-line indicating it mirrors
clp_py_utils.clp_config.ApiServer, a NOTE block stating that the type is
partially defined and unused fields are discarded during deserialization, and a
NOTE that default values must be kept in sync with the Python definition; place
this doc comment immediately above the #[derive(...)] so it documents the
ApiServer struct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api server is in rust and is not a mirror of python code

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hoophalab, thank you for the clarification! I understand now that ApiServer is Rust-native and doesn't correspond to a Python definition like the other structs in this file.

In that case, my suggestion was incorrect. However, you might still consider adding a brief doc comment to explain what ApiServer represents (without the "mirror" language), such as:

/// Configuration for the CLP API server.
#[derive(Deserialize, Clone)]
#[serde(default)]
pub struct ApiServer {

This would provide context for future maintainers, but it's entirely optional since the field names are self-explanatory.


✏️ Learnings added
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.214Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

LinZhihao-723
LinZhihao-723 previously approved these changes Nov 3, 2025

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat(clp-rust-utils): Add serde-compatible mirror structs for package and search job configs based on existing Python definitions.

@hoophalab hoophalab changed the title feat(clp-rust-utils): Add serde structs for package configuration files and search job configuration. feat(clp-rust-utils): Add serde-compatible mirror structs for package and search job configs based on existing Python definitions. Nov 3, 2025
@hoophalab hoophalab merged commit bf1937d into y-scope:main Nov 3, 2025
22 checks passed
@hoophalab hoophalab deleted the configyml branch December 2, 2025 18:21
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
… and search job configs based on existing Python definitions. (y-scope#1535)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants