Skip to content

feat(api-server): Initial commit for CLP API server:#1534

Merged
hoophalab merged 15 commits into
y-scope:mainfrom
hoophalab:apiserver
Nov 7, 2025
Merged

feat(api-server): Initial commit for CLP API server:#1534
hoophalab merged 15 commits into
y-scope:mainfrom
hoophalab:apiserver

Conversation

@hoophalab

@hoophalab hoophalab commented Oct 31, 2025

Copy link
Copy Markdown
Contributor

Description

This PR adds the API server. Only clp-json is supported for now.

  1. Add the necessary serialization types for clp-config.yml and credentials.yml (This is moved to feat(clp-rust-utils): Add serde-compatible mirror structs for package and search job configs based on existing Python definitions. #1535)
  2. Add a clp client which can be also used to create a cli app.
  3. /query route: insert a compression job to jobs table
  4. /query_results route: fetch results from the results_cache

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  1. start the package using

CLP_HOME=<...>/build/clp-package RUST_LOG=trace cargo run --bin api_server

or directly calling the binary in /target/debug/

CLP_HOME=<...>/build/clp-package RUST_LOG=trace api_server
2. Submit a query

POST http://localhost:3001/query

{"query_string": "pid:7891", "dataset": "default", "begin_timestamp": 1761923334000000}

Response: {"query_results_uri":"/query_results/10"}

  1. Get results in the event stream format
    GET http://localhost:3001/query_results/10
    Response:
data: {...}

data: {...}

Summary by CodeRabbit

  • New Features

    • Added an API server with health, query submission and streaming (SSE) endpoints.
    • Added a client API to submit queries, poll status and stream results; query options include dataset, max results, timestamp range and case sensitivity.
    • Added YAML config loading utility.
  • Bug Fixes / Improvements

    • Unified and improved error handling and reporting (includes YAML errors) and graceful shutdown.
  • Chores

    • Updated AWS SDK versions and added serde_yaml dependency.
  • Other

    • Extended config defaults (logs directory, polling/backoff and default max results).

@hoophalab hoophalab requested a review from a team as a code owner October 31, 2025 15:23
@coderabbitai

coderabbitai Bot commented Oct 31, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Adds a new api-server crate (library clp_client and binary api_server) implementing an Axum HTTP API and a distributed query client (MySQL metadata, MongoDB results) with SSE streaming. Extends clp-rust-utils with YAML deserialization, config defaults and polling settings, and a serde_yaml error variant.

Changes

Cohort / File(s) Summary
Workspace
Cargo.toml
Added components/api-server to workspace members.
API server manifest
components/api-server/Cargo.toml
New Cargo manifest for api-server: package metadata, lib clp_client (src/lib.rs), bin api_server (src/bin/api_server.rs), and dependencies with features (anyhow, axum[json], clap[derive], clp-rust-utils (path), mongodb, sqlx[runtime-tokio,mysql], serde[derive], tokio[full], tracing, tracing-subscriber[json,env-filter], serde_yaml, etc.).
API server binary
components/api-server/src/bin/api_server.rs
New Axum binary: CLI parsing, YAML config/credentials loading, client construction, routes GET /, GET /health, POST /query, GET /query_results/{search_job_id} streaming SSE results; graceful shutdown, tracing, and handler error conversions.
API server library surface
components/api-server/src/lib.rs
Added mod client; and mod error; and pub use client::*;.
Query client implementation
components/api-server/src/client.rs
New Client with MySQL pool and MongoDB client; QueryConfig type; async connect, submit_query (inserts msgpack payload into SQL), get_status, and fetch_results (polls until succeeded, then streams MongoDB documents); ClientError variants and conversions.
Client error module
components/api-server/src/error.rs
New ClientError enum (Sql, Mongo, Io, QueryNotSucceeded, MalformedData), private IsMalformedData trait and impls, and From<clp_rust_utils::Error> mapping.
Utilities — Cargo updates
components/clp-rust-utils/Cargo.toml
Bumped aws-config/aws-sdk-s3/aws-sdk-sqs patch versions and added serde_yaml = "0.9.34".
Utilities — error enum
components/clp-rust-utils/src/error.rs
Added SerdeYaml(#[from] serde_yaml::Error) variant to the Error enum.
Utilities — config defaults & types
components/clp-rust-utils/src/clp_config/package/config.rs
Added logs_directory: String to Config with an explicit Default impl; extended ApiServer with query_job_polling: QueryJobPollingConfig and default_max_num_query_results: u32 plus defaults; introduced QueryJobPollingConfig with initial_backoff_ms and max_backoff_ms.
Utilities — serde YAML helper
components/clp-rust-utils/src/serde.rs, components/clp-rust-utils/src/serde/yaml.rs
Added pub mod yaml; and yaml::from_path<T>(path) helper to read and deserialize YAML via serde_yaml::from_reader, returning Result<T, Error> and forwarding IO/YAML errors.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Server as Axum
    participant Client as clp_client::Client
    participant MySQL
    participant MongoDB

    User->>Server: POST /query (QueryConfig)
    Server->>Client: submit_query(query_config)
    Client->>MySQL: INSERT serialized SearchJobConfig (msgpack)
    MySQL-->>Client: last_insert_id (job_id)
    Client-->>Server: job_id
    Server-->>User: { "job_id": u64 }

    User->>Server: GET /query_results/{job_id}
    Server->>Client: fetch_results(job_id)
    loop poll status
        Client->>MySQL: query job status
        MySQL-->>Client: status
    end
    alt status == Succeeded
        Client->>MongoDB: read collection for job_id (stream)
        MongoDB-->>Client: documents stream
        Client-->>Server: stream messages
        Server-->>User: SSE events (keep-alive)
    else terminal failure
        Client-->>Server: QueryNotSucceeded/Error
        Server-->>User: 500 InternalServer
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing extra attention:
    • Error mapping between clp-rust-utils::Error and api-server::ClientError (components/api-server/src/error.rs, components/clp-rust-utils/src/error.rs).
    • Polling/backoff defaults and field-name consistency in ApiServer default impl (components/clp-rust-utils/src/clp_config/package/config.rs).
    • Async stream lifetimes, SSE handler correctness and backpressure in fetch_results() and handler (components/api-server/src/client.rs, components/api-server/src/bin/api_server.rs).
    • Serialization/deserialization (msgpack insertion, MongoDB message extraction, YAML parsing) and error propagation.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding an initial API server implementation for CLP. It directly reflects the primary objective of the pull request.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hoophalab hoophalab changed the title feat(api): Add api server. feat(api): Add API server for clp-json. Oct 31, 2025
coderabbitai[bot]

This comment was marked as resolved.

@hoophalab

This comment was marked as outdated.

Comment thread components/api-server/src/client.rs Outdated

@junhaoliao junhaoliao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed the axum server code

i agree with the RESTful API endpoint designs in the code. as discussed offline, in a future PR, please consider integrating frameworks to generate OpenAPI specs to make the code more testable (maybe https://docs.rs/utoipa/latest/utoipa/ ?)

@LinZhihao-723 please help with the rest of the review as i observe and learn

Comment on lines +6 to +8
[lib]
name = "clp_client"
path = "src/lib.rs"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: do we have to define a [lib] in this Cargo project? as in, do we plan to reuse this lib in other projects at the moment?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess it is useful if other Rust projects want to use the client without the HTTP server?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the reason to define it is to allow unused public symbols. It also helps when writing test cases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel another benefit is that a separate client library enables us to create an alternative to search.py -- simply create a clp_client, submit_query and print fetch results to stdout.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess that's to say we don't need those for now. i'm fine if we want to keep it given we should add the tests soon

Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/api-server/src/bin/api_server.rs Outdated
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suppose we use anyhow::Result<()> because we want the error type to be flexible?

@hoophalab hoophalab Nov 4, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only using anyhow::Result<()> to pretty print the error message, but the new version also uses anyhow to add context as you suggested.

Comment thread components/api-server/src/bin/api_server.rs
Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/api-server/src/bin/api_server.rs
Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/api-server/src/bin/api_server.rs Outdated
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

LinZhihao-723

This comment was marked as resolved.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
components/api-server/src/client.rs (1)

70-75: Pass MongoDB credentials when connecting.
We drop credentials.results_cache here and attempt an anonymous MongoDB connection, so any secured deployment will reject /query_results outright. Please plumb the results-cache username and password into the ClientOptions/URI (retrieving the secret via ExposeSecret) before constructing the client. Without that change the API server cannot function when auth is enabled.

-        let mongo_client = mongodb::Client::with_uri_str(mongo_uri).await?;
+        use secrecy::ExposeSecret;
+        let mut client_options =
+            mongodb::options::ClientOptions::parse(&mongo_uri).await?;
+        client_options.credential = Some(
+            mongodb::options::Credential::builder()
+                .username(Some(credentials.results_cache.user.clone()))
+                .password(Some(
+                    credentials.results_cache.password.expose_secret().to_owned(),
+                ))
+                .build(),
+        );
+        let mongo_client = mongodb::Client::with_options(client_options)?;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70604b9 and fce5626.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • components/api-server/Cargo.toml (1 hunks)
  • components/api-server/src/bin/api_server.rs (1 hunks)
  • components/api-server/src/client.rs (1 hunks)
  • components/api-server/src/error.rs (1 hunks)
  • components/api-server/src/lib.rs (1 hunks)
  • components/clp-rust-utils/Cargo.toml (1 hunks)
  • components/clp-rust-utils/src/clp_config/package/config.rs (1 hunks)
  • components/clp-rust-utils/src/error.rs (1 hunks)
  • components/clp-rust-utils/src/serde.rs (1 hunks)
  • components/clp-rust-utils/src/serde/yaml_file.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (20)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

Applied to files:

  • components/clp-rust-utils/src/serde.rs
  • components/api-server/src/bin/api_server.rs
  • components/clp-rust-utils/Cargo.toml
  • components/api-server/Cargo.toml
  • components/api-server/src/client.rs
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/config.rs
  • components/api-server/src/bin/api_server.rs
  • components/api-server/Cargo.toml
  • components/api-server/src/client.rs
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2024-09-27T23:19:17.079Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:717-726
Timestamp: 2024-09-27T23:19:17.079Z
Learning: In the `JsonParser::parse_from_IR` function, when deserializing, it's intentional to break out of the loop without additional error handling when `kv_log_event_result.has_error()` returns true for errors other than `no_message_available` and `result_out_of_range`, as this signifies the end of the IR file.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-17T19:59:35.299Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:757-757
Timestamp: 2025-10-17T19:59:35.299Z
Learning: In the y-scope/clp repository, prefer long-form command-line flags (e.g., `--recursive`) over short-form flags (e.g., `-R`) for readability, even if the long form is GNU-specific.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2024-11-18T16:49:20.248Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-01T14:51:19.172Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-01T14:52:15.217Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like `apk update && apk add`, `apt update && apt install`, `yum update && yum install`) rather than platform-specific optimizations, to ensure uniform script structure and readability.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-22T21:01:31.391Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.

Applied to files:

  • components/api-server/src/bin/api_server.rs
  • components/clp-rust-utils/Cargo.toml
📚 Learning: 2025-10-20T22:20:45.733Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1448
File: components/clp-rust-utils/Cargo.toml:4-4
Timestamp: 2025-10-20T22:20:45.733Z
Learning: Rust edition 2024 is a valid edition value in Cargo.toml files. It was released with Rust 1.85 on February 20, 2025.

Applied to files:

  • components/clp-rust-utils/Cargo.toml
📚 Learning: 2025-10-31T15:41:23.952Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1534
File: components/api-server/src/client.rs:129-133
Timestamp: 2025-10-31T15:41:23.952Z
Learning: In Rust 2024 edition, the syntax `+ use<>` in return-position `impl Trait` types (e.g., `-> impl Stream<Item = T> + use<>`) is valid and indicates that the opaque type captures no generic parameters from the surrounding scope. This is part of the new precise capture rules introduced in Rust 2024 edition.

Applied to files:

  • components/api-server/src/client.rs
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:209-210
Timestamp: 2024-10-08T15:52:50.753Z
Learning: Until #486 is merged with customized error code support, introducing a layer from `IRErrorCode` to `std::errc` is unnecessary.

Applied to files:

  • components/api-server/src/client.rs
🧬 Code graph analysis (3)
components/clp-rust-utils/src/error.rs (1)
components/api-server/src/error.rs (2)
  • from (32-34)
  • from (38-46)
components/api-server/src/bin/api_server.rs (2)
components/clp-rust-utils/src/serde/yaml_file.rs (1)
  • from_path (20-23)
components/api-server/src/client.rs (2)
  • connect (67-81)
  • from (32-42)
components/api-server/src/client.rs (4)
components/clp-rust-utils/src/database/mysql.rs (1)
  • create_mysql_pool (20-36)
components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
  • SearchJobConfig (85-103)
components/api-server/src/bin/api_server.rs (2)
  • from (110-112)
  • query (79-86)
components/clp-rust-utils/src/clp_config/package/config.rs (4)
  • default (35-41)
  • default (55-63)
  • default (80-84)
  • default (112-118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (macos-15)
🔇 Additional comments (6)
components/clp-rust-utils/Cargo.toml (1)

7-9: LGTM! Dependency updates are appropriate.

The AWS SDK minor version bumps and the addition of serde_yaml align well with the new YAML deserialization functionality introduced in this PR.

Also applies to: 15-15

components/clp-rust-utils/src/clp_config/package/config.rs (1)

49-51: LGTM! Configuration fields are well-defined.

The new polling configuration fields have clear names and sensible defaults. The exponential backoff range (100ms to 5000ms) and result limit (1000) are reasonable starting values.

Also applies to: 59-61

components/clp-rust-utils/src/error.rs (1)

14-15: LGTM! YAML error variant correctly integrated.

The new SerdeYaml variant follows the established pattern and correctly uses the #[from] attribute for automatic error conversion. This aligns with the new YAML deserialization functionality.

components/api-server/src/error.rs (2)

22-35: Elegant error conversion pattern with IsMalformedData trait.

The private IsMalformedData marker trait with a blanket From<T: IsMalformedData> implementation is a clean way to unify multiple serialization error types into MalformedData. This avoids boilerplate while maintaining type safety.


37-47: LGTM! Correct delegation of clp_rust_utils errors.

The From<clp_rust_utils::Error> implementation correctly:

  • Maps serialization errors (MsgpackEncode, SerdeYaml) to MalformedData
  • Delegates Io and Sqlx to their respective From implementations via .into()

This maintains clean error semantics across crate boundaries.

components/api-server/src/lib.rs (1)

1-3: No issues found — code is properly designed as-is.

The verification confirms that ClientError is already publicly re-exported through the client module at client.rs:13 with pub use crate::error::ClientError;. This means pub use client::*; in lib.rs already exposes error types to users of the library.

The current design is intentional and clean: error types are accessible via the client namespace (e.g., api_server::client::ClientError), which creates a logical grouping. The suggested change to directly re-export the error module is unnecessary.

Likely an incorrect or invalid review comment.

Comment on lines +14 to +29
[dependencies]
anyhow = "1.0.100"
axum = { version = "0.8.6", features = ["json"] }
clap = { version = "4.5.51", features = ["derive"] }
clp-rust-utils = { path = "../clp-rust-utils" }
futures = "0.3.31"
mongodb = "3.3.0"
num_enum = "0.7.5"
rmp-serde = "1.3.0"
serde = { version = "1.0.228", features = ["derive"] }
serde_json = "1.0.145"
sqlx = { version = "0.8.6", features = ["runtime-tokio", "mysql"] }
thiserror = "2.0.17"
tokio = { version = "1.48.0", features = ["full"] }
tracing = "0.1.41"
tracing-subscriber = { version = "0.3.20", features = ["json", "env-filter"] }

@coderabbitai coderabbitai Bot Nov 4, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding tower-http for request logging.

The current setup uses tracing-subscriber, which is good. However, consider adding tower-http with the trace feature for automatic per-request logging middleware. This provides structured request/response logging (method, URI, status, latency) without manual instrumentation.

Add to dependencies:

tower-http = { version = "0.6", features = ["trace"] }

This would complement your existing tracing setup and provide production-ready request observability.

Based on past review comments.

🤖 Prompt for AI Agents
In components/api-server/Cargo.toml around lines 14 to 29, add the tower-http
crate with the "trace" feature to enable automatic per-request tracing
middleware; update the [dependencies] section to include tower-http (use a
compatible version such as 0.6) and then run cargo update / cargo build to fetch
the crate and confirm compatibility with existing tracing/tokio versions so you
can wire the TraceLayer into your axum server.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the referenced discussion comment from the PR
gh api \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /repos/y-scope/clp/pulls/comments/2491919405

Length of output: 2719


@hoophalab, understood! Since you already have per-request logging via RUST_LOG=trace with axum's built-in tracing, the tower-http suggestion isn't necessary. The existing setup provides connection-level logging, which covers the core observability needs.

If you later want to add request/response body logging or structured request metadata (method, URI, status, latency), tower-http's TraceLayer would be an option, but your current approach is perfectly valid for now.


🧠 Learnings used
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1466
File: .github/workflows/clp-rust-checks.yaml:14-15
Timestamp: 2025-10-22T21:14:12.225Z
Learning: Repository y-scope/clp: In GitHub Actions workflows (e.g., .github/workflows/clp-rust-checks.yaml), YAML anchors/aliases are acceptable and preferred to avoid duplication; if actionlint flags an alias node (e.g., on push.paths) as an error, treat it as a tool limitation and do not require inlining unless the team asks to silence the warning.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1078
File: components/webui/server/src/plugins/MongoSocketIoServer/index.ts:105-108
Timestamp: 2025-07-08T17:20:02.560Z
Learning: The y-scope/clp codebase has a 100-character line length limit that takes precedence over other style preferences like template literals when they conflict.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:757-757
Timestamp: 2025-10-17T19:59:35.299Z
Learning: In the y-scope/clp repository, prefer long-form command-line flags (e.g., `--recursive`) over short-form flags (e.g., `-R`) for readability, even if the long form is GNU-specific.

Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1291
File: docs/src/user-docs/quick-start/text-v-json.md:84-88
Timestamp: 2025-09-08T21:41:32.059Z
Learning: In the y-scope/clp project documentation, maintain informal contractions like "can't" and "don't" rather than formal alternatives like "cannot" and "do not" - the maintainer prefers a conversational tone.

Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1261
File: .github/workflows/clp-core-build.yaml:294-332
Timestamp: 2025-08-25T06:29:59.610Z
Learning: In the CLP project, Bill-hbrhbr prefers a "fail fast" approach for CI workflows - allowing potential command availability issues (like getconf in musllinux) to surface through CI failures rather than preemptively adding fallback logic, as they will fix issues when they occur.

Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.

Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/clp-rust-utils/src/serde.rs Outdated
Comment thread components/clp-rust-utils/src/serde/yaml.rs
@hoophalab

Copy link
Copy Markdown
Contributor Author

@junhaoliao @LinZhihao-723 What do you think about renaming /query to /submit_query? I feel like /query on its own implies you should get the results back immediately, which might not be what's happening.

LinZhihao-723

This comment was marked as resolved.

@LinZhihao-723

Copy link
Copy Markdown
Member

@junhaoliao @LinZhihao-723 What do you think about renaming /query to /submit_query? I feel like /query on its own implies you should get the results back immediately, which might not be what's happening.

I agree with submit_query.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
components/api-server/src/bin/api_server.rs (1)

64-64: Emit the bound address as a tracing field

tracing::event! doesn’t interpolate {addr}, so the log prints the braces literally. Record the address as a structured field to get useful output.

Apply this diff to log the address correctly:

-    tracing::event!(tracing::Level::INFO, "Server started at {addr}");
+    tracing::event!(tracing::Level::INFO, %addr, "Server started");
components/api-server/src/client.rs (1)

70-74: Add MongoDB credentials when connecting to results cache.

The MongoDB client is created without using credentials from credentials.results_cache, which will cause authentication failures in secured deployments. The credentials must be plumbed into the MongoDB connection.

Apply this diff to add credential support:

+        let mut mongo_client_options = mongodb::options::ClientOptions::parse(
+            format!(
+                "mongodb://{}:{}/{}?directConnection=true",
+                config.results_cache.host, config.results_cache.port, config.results_cache.db_name,
+            )
+        ).await?;
+        
+        if let (Some(username), Some(password)) = (
+            &credentials.results_cache.username,
+            &credentials.results_cache.password,
+        ) {
+            mongo_client_options.credential = Some(
+                mongodb::options::Credential::builder()
+                    .username(username.clone())
+                    .password(password.expose_secret().to_string())
+                    .build()
+            );
+        }
+        
-        let mongo_uri = format!(
-            "mongodb://{}:{}/{}?directConnection=true",
-            config.results_cache.host, config.results_cache.port, config.results_cache.db_name,
-        );
-        let mongo_client = mongodb::Client::with_uri_str(mongo_uri).await?;
+        let mongo_client = mongodb::Client::with_options(mongo_client_options)?;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fce5626 and 458a4b7.

📒 Files selected for processing (6)
  • components/api-server/src/bin/api_server.rs (1 hunks)
  • components/api-server/src/client.rs (1 hunks)
  • components/api-server/src/error.rs (1 hunks)
  • components/clp-rust-utils/src/error.rs (1 hunks)
  • components/clp-rust-utils/src/serde.rs (1 hunks)
  • components/clp-rust-utils/src/serde/yaml.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (19)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/api-server/src/error.rs
  • components/api-server/src/bin/api_server.rs
  • components/clp-rust-utils/src/serde/yaml.rs
  • components/api-server/src/client.rs
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2024-09-27T23:19:17.079Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:717-726
Timestamp: 2024-09-27T23:19:17.079Z
Learning: In the `JsonParser::parse_from_IR` function, when deserializing, it's intentional to break out of the loop without additional error handling when `kv_log_event_result.has_error()` returns true for errors other than `no_message_available` and `result_out_of_range`, as this signifies the end of the IR file.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-17T19:59:35.299Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:757-757
Timestamp: 2025-10-17T19:59:35.299Z
Learning: In the y-scope/clp repository, prefer long-form command-line flags (e.g., `--recursive`) over short-form flags (e.g., `-R`) for readability, even if the long form is GNU-specific.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2024-11-18T16:49:20.248Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-01T14:51:19.172Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-01T14:52:15.217Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like `apk update && apk add`, `apt update && apt install`, `yum update && yum install`) rather than platform-specific optimizations, to ensure uniform script structure and readability.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

Applied to files:

  • components/api-server/src/bin/api_server.rs
  • components/clp-rust-utils/src/serde.rs
  • components/api-server/src/client.rs
📚 Learning: 2025-10-22T21:01:31.391Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-31T15:41:23.952Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1534
File: components/api-server/src/client.rs:129-133
Timestamp: 2025-10-31T15:41:23.952Z
Learning: In Rust 2024 edition, the syntax `+ use<>` in return-position `impl Trait` types (e.g., `-> impl Stream<Item = T> + use<>`) is valid and indicates that the opaque type captures no generic parameters from the surrounding scope. This is part of the new precise capture rules introduced in Rust 2024 edition.

Applied to files:

  • components/api-server/src/client.rs
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:209-210
Timestamp: 2024-10-08T15:52:50.753Z
Learning: Until #486 is merged with customized error code support, introducing a layer from `IRErrorCode` to `std::errc` is unnecessary.

Applied to files:

  • components/api-server/src/client.rs
🧬 Code graph analysis (3)
components/clp-rust-utils/src/error.rs (1)
components/api-server/src/error.rs (2)
  • from (32-34)
  • from (38-46)
components/api-server/src/bin/api_server.rs (3)
components/clp-rust-utils/src/serde/yaml.rs (1)
  • from_path (18-21)
components/api-server/src/client.rs (2)
  • connect (67-81)
  • from (32-42)
components/api-server/src/error.rs (2)
  • from (32-34)
  • from (38-46)
components/api-server/src/client.rs (5)
components/clp-rust-utils/src/database/mysql.rs (1)
  • create_mysql_pool (20-36)
components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
  • SearchJobConfig (85-103)
components/api-server/src/bin/api_server.rs (2)
  • from (117-119)
  • query (79-86)
components/api-server/src/error.rs (2)
  • from (32-34)
  • from (38-46)
components/clp-rust-utils/src/clp_config/package/config.rs (4)
  • default (35-41)
  • default (55-63)
  • default (80-84)
  • default (112-118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: package-image
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
components/api-server/src/client.rs (1)

149-164: Excellent implementation of exponential backoff.

The polling logic correctly implements exponential backoff with configurable initial and maximum delays, preventing excessive database load while waiting for job completion.

Comment thread components/api-server/src/bin/api_server.rs
Comment thread components/api-server/src/client.rs
/// * Forwards [`create_mysql_pool`]'s errors on failure.
/// * Forwards [`mongodb::Client::with_uri_str`]'s errors on failure.
pub async fn connect(config: &Config, credentials: &Credentials) -> Result<Self, ClientError> {
let sql_pool = create_mysql_pool(&config.database, &credentials.database, 10).await?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider making MySQL max connections configurable.

The max_connections value is hardcoded to 10. Consider adding a field to the ApiServer config struct to make this configurable based on deployment needs.

🤖 Prompt for AI Agents
In components/api-server/src/client.rs around line 68, the MySQL pool is created
with a hardcoded max_connections value of 10; add a new configurable field
(e.g., database_max_connections or mysql_pool_size) to the ApiServer config
struct, propagate it through config loading/validation, and replace the literal
10 with that config value when calling create_mysql_pool so deployments can tune
the pool size; ensure the new field has a sensible default and is documented in
config parsing/loading code.

}

#[derive(Clone)]
pub struct Client {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess u're using Client for CLP client. After further thinking this class isn't exactly a client...
How about Middleware suggested by @kirkrodrigues? And the lib should be named to clp_middleware.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to suggestions, but here's my initial thinking:

It seems there are two ways to look at it. If the library is only useful to the API server, then Middleware is a fitting name.

However, because we could easily create something similar to search.py using this library, we might also consider it an internal client, which would be faster than a typical API client if the user has direct access to the database and file system.

For example, the SeaweedFS module that discovers volume nodes and builds a direct connecting URL is also named wdclient.

Comment thread components/api-server/src/bin/api_server.rs
Comment thread components/api-server/src/bin/api_server.rs
Comment thread components/api-server/src/bin/api_server.rs
Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/api-server/Cargo.toml
Comment thread components/api-server/src/bin/api_server.rs Outdated
hoophalab and others added 2 commits November 6, 2025 13:01
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
hoophalab and others added 2 commits November 6, 2025 13:04
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
components/api-server/src/bin/api_server.rs (2)

61-61: Fix the route path parameter syntax.

Axum requires path parameters to use :param syntax, not {param}. The current route pattern /query_results/{search_job_id} will be treated literally and won't match requests like /query_results/10, resulting in 404 errors.

Apply this diff to fix the route:

-        .route("/query_results/{search_job_id}", get(query_results))
+        .route("/query_results/:search_job_id", get(query_results))

64-64: Prefer structured logging fields over string interpolation.

While the current format string interpolation works, structured logging is the recommended pattern with tracing as it enables better filtering and analysis.

Apply this diff:

-    tracing::info!("Server started at {addr}");
+    tracing::info!(%addr, "Server started");
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 458a4b7 and b7962b1.

📒 Files selected for processing (1)
  • components/api-server/src/bin/api_server.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (16)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2024-09-27T23:19:17.079Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:717-726
Timestamp: 2024-09-27T23:19:17.079Z
Learning: In the `JsonParser::parse_from_IR` function, when deserializing, it's intentional to break out of the loop without additional error handling when `kv_log_event_result.has_error()` returns true for errors other than `no_message_available` and `result_out_of_range`, as this signifies the end of the IR file.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-17T19:59:35.299Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:757-757
Timestamp: 2025-10-17T19:59:35.299Z
Learning: In the y-scope/clp repository, prefer long-form command-line flags (e.g., `--recursive`) over short-form flags (e.g., `-R`) for readability, even if the long form is GNU-specific.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2024-11-18T16:49:20.248Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-01T14:51:19.172Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-01T14:52:15.217Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like `apk update && apk add`, `apt update && apt install`, `yum update && yum install`) rather than platform-specific optimizations, to ensure uniform script structure and readability.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-22T21:01:31.391Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.

Applied to files:

  • components/api-server/src/bin/api_server.rs
🧬 Code graph analysis (1)
components/api-server/src/bin/api_server.rs (2)
components/clp-rust-utils/src/serde/yaml.rs (1)
  • from_path (18-21)
components/api-server/src/client.rs (2)
  • connect (67-81)
  • from (32-42)
🔇 Additional comments (3)
components/api-server/src/bin/api_server.rs (3)

96-98: Verify the single-line message requirement.

The handler rejects any message that spans multiple lines, returning a 500 error. This validation appears strict—if the results_cache can produce valid multi-line JSON or formatted output, those results will be rejected.

Confirm whether single-line formatting is guaranteed by the upstream data source or if this check should be relaxed.


117-121: All errors map to Internal Server Error.

The blanket From implementation converts all errors (database, network, deserialization) into a generic 500 response without logging or preserving diagnostic context. This design prioritizes security (not leaking internals) but may complicate troubleshooting.

Consider whether structured logging of the original error (before conversion) would aid operations without compromising security.


93-100: Consider extracting the mapping closure for clarity.

The inline closure that maps results to SSE events could be extracted into a separate statement to improve readability, as suggested by junhaoliao.

For example:

     let results_stream = client.fetch_results(search_job_id).await?;
-    let event_stream = results_stream.map(|res| {
+    let event_stream = results_stream.map(|res| -> Result<Event, HandlerError> {
         let message = res?;
         let trimmed_message = message.trim();
         if trimmed_message.lines().count() != 1 {
             return Err(HandlerError::InternalServer);
         }
         Ok(Event::default().data(trimmed_message))
     });

Or extract into a helper function if the logic becomes more complex.

Likely an incorrect or invalid review comment.

Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/api-server/src/bin/api_server.rs

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment missed from previous rounds. Otherwise lgtm.

Comment thread components/clp-rust-utils/src/clp_config/package/config.rs
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ec25e8 and 3510bda.

📒 Files selected for processing (1)
  • components/clp-rust-utils/src/clp_config/package/config.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/config.rs
🧬 Code graph analysis (1)
components/clp-rust-utils/src/clp_config/package/config.rs (1)
components/clp-py-utils/clp_py_utils/clp_config.py (3)
  • Package (135-159)
  • Database (162-249)
  • ResultsCache (327-341)

Comment thread components/clp-rust-utils/src/clp_config/package/config.rs Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
components/api-server/src/client.rs (1)

67-81: Ensure MongoDB credentials are included and connection pool size is configurable.

Two issues remain from previous reviews:

  1. Critical: Lines 70-74 construct the MongoDB URI without using credentials.results_cache, causing authentication failures in secured deployments. This was flagged in previous reviews.

  2. Minor: Line 68 hardcodes max_connections to 10. This was also flagged previously as a candidate for configuration.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3510bda and 0c51a0a.

📒 Files selected for processing (2)
  • components/api-server/src/client.rs (1 hunks)
  • components/clp-rust-utils/src/clp_config/package/config.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/config.rs
  • components/api-server/src/client.rs
📚 Learning: 2025-10-31T15:41:23.952Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1534
File: components/api-server/src/client.rs:129-133
Timestamp: 2025-10-31T15:41:23.952Z
Learning: In Rust 2024 edition, the syntax `+ use<>` in return-position `impl Trait` types (e.g., `-> impl Stream<Item = T> + use<>`) is valid and indicates that the opaque type captures no generic parameters from the surrounding scope. This is part of the new precise capture rules introduced in Rust 2024 edition.

Applied to files:

  • components/api-server/src/client.rs
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

Applied to files:

  • components/api-server/src/client.rs
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:209-210
Timestamp: 2024-10-08T15:52:50.753Z
Learning: Until #486 is merged with customized error code support, introducing a layer from `IRErrorCode` to `std::errc` is unnecessary.

Applied to files:

  • components/api-server/src/client.rs
🧬 Code graph analysis (2)
components/clp-rust-utils/src/clp_config/package/config.rs (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
  • Database (162-249)
  • ResultsCache (327-341)
components/api-server/src/client.rs (6)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • StorageEngine (95-97)
components/clp-rust-utils/src/database/mysql.rs (1)
  • create_mysql_pool (20-36)
components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
  • SearchJobConfig (85-103)
components/api-server/src/error.rs (2)
  • from (32-34)
  • from (38-46)
components/api-server/src/bin/api_server.rs (2)
  • from (126-128)
  • query (87-94)
components/clp-rust-utils/src/clp_config/package/config.rs (6)
  • default (21-29)
  • default (48-54)
  • default (67-74)
  • default (87-92)
  • default (109-113)
  • default (141-147)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: package-image
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (11)
components/clp-rust-utils/src/clp_config/package/config.rs (4)

10-30: LGTM!

The addition of logs_directory and the explicit Default implementation for Config are well-structured and consistent with the existing pattern for mirrored Python types.


59-64: LGTM!

The nested query_job_polling structure and renamed default_max_num_query_results field align with the previous review feedback and improve clarity.


66-75: LGTM!

The Default implementation now correctly instantiates the nested QueryJobPollingConfig and uses default_max_num_query_results, resolving the previous compilation issue.


86-93: LGTM!

The default polling configuration values (100ms initial, 5000ms max) are reasonable for the exponential backoff strategy used in the API server.

components/api-server/src/client.rs (7)

1-14: LGTM!

The imports are well-organized and all appear to be used appropriately throughout the file.


15-29: LGTM!

The QueryConfig struct is correctly defined with appropriate serde defaults. The field structure maps cleanly to the SearchJobConfig conversion.


31-43: LGTM!

The conversion from QueryConfig to SearchJobConfig is clean and correctly maps all fields while using sensible defaults for the remainder.


45-51: LGTM!

The Client struct appropriately encapsulates the MongoDB client, MySQL pool, and configuration needed for distributed query operations.


83-119: LGTM!

The submit_query implementation correctly applies storage-engine-specific defaults and safely inserts the job configuration using parameterized queries. The use of rmp_serde for binary serialization is appropriate for the database schema.


121-186: LGTM!

The fetch_results implementation is well-designed:

  • The exponential backoff polling correctly uses the new query_job_polling configuration.
  • The MongoDB cursor mapping efficiently extracts and validates the "message" field.
  • The + use<> syntax is valid Rust 2024 edition for precise capture control.

188-210: LGTM!

The get_status method correctly queries the jobs table with parameterized binding and safely converts the status integer to the appropriate enum type.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c51a0a and 72528db.

📒 Files selected for processing (1)
  • components/clp-rust-utils/src/clp_config/package/config.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/clp-rust-utils/src/clp_config/package/config.rs
🧬 Code graph analysis (1)
components/clp-rust-utils/src/clp_config/package/config.rs (1)
components/clp-py-utils/clp_py_utils/clp_config.py (3)
  • Package (135-159)
  • Database (162-249)
  • ResultsCache (327-341)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: package-image
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (5)
components/clp-rust-utils/src/clp_config/package/config.rs (5)

10-18: LGTM: Config struct extended correctly.

The removal of Default from the derive list and the addition of the logs_directory field are correct. The explicit Default implementation below (lines 20-30) properly initializes all fields including the new one.


20-30: LGTM: Default implementation is correct.

The explicit Default implementation properly initializes all fields, including the new logs_directory with a sensible default value.


57-64: LGTM: ApiServer structure follows reviewer feedback.

The nested QueryJobPollingConfig and the renamed default_max_num_query_results field incorporate the suggestions from previous reviews. The structure is clean and logically organized.


66-75: LGTM: Default implementation correctly updated.

The Default implementation properly initializes all fields, including the nested query_job_polling configuration. The defaults are sensible for an API server.


86-93: LGTM: Sensible polling defaults.

The default backoff values (100ms initial, 5000ms maximum) provide a reasonable exponential backoff configuration for query job polling.

Comment thread components/clp-rust-utils/src/clp_config/package/config.rs

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using a multi-line commit message and PR title:

feat(api-server): Initial commit for CLP API server:
* Add CLP client to send queries and retrieve results from clp-json.
* Define the package-level config schema for the API server.
* Implement an API server to forward incoming query requests to the underlying CLP client.

@hoophalab hoophalab changed the title feat(api): Add API server for clp-json. feat(api-server): Initial commit for CLP API server: Nov 7, 2025
@hoophalab hoophalab merged commit fe2720e into y-scope:main Nov 7, 2025
22 checks passed
@hoophalab hoophalab deleted the apiserver branch December 2, 2025 18:21

let config_path = home.join(package::DEFAULT_CONFIG_FILE_PATH);
let config: package::config::Config = yaml::from_path(&config_path).context(format!(
"Config file {} does not exist",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it appears that the log message is not so accurate because YAML schema mismatch can also cause failure:

[2026-01-05  15:20:01]	(test-clp-api-server-5f57bdf49d-nz7wr wait-for-db-table-creator)	job.batch/test-clp-db-table-creator condition met
[2026-01-05  15:20:02]	(test-clp-api-server-5f57bdf49d-nz7wr wait-for-results-cache-indices-creator)	job.batch/test-clp-results-cache-indices-creator condition met
[2026-01-05  15:21:42]	(test-clp-api-server-5f57bdf49d-nz7wr api-server)	Error: Config file /etc/clp-config.yaml does not exist
[2026-01-05  15:21:42]	(test-clp-api-server-5f57bdf49d-nz7wr api-server)	
[2026-01-05  15:21:42]	(test-clp-api-server-5f57bdf49d-nz7wr api-server)	Caused by:
[2026-01-05  15:21:42]	(test-clp-api-server-5f57bdf49d-nz7wr api-server)	    0: `serde_yaml::Error`: stream_output: unknown variant `profile`, expected `credentials` at line 85 column 3
[2026-01-05  15:21:42]	(test-clp-api-server-5f57bdf49d-nz7wr api-server)	    1: stream_output: unknown variant `profile`, expected `credentials` at line 85 column 3

can we improve the log message? @coderabbitai create an issue to track, and assign to @hoophalab

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@junhaoliao, I'll create a GitHub issue to track improving the error message.


🏁 Script executed:

#!/bin/bash
# Create a GitHub issue for improving the config loading error message

gh issue create \
  --repo y-scope/clp \
  --title "Improve error message when config file loading fails in API server" \
  --body "## Description

The error message when loading the config file in the API server is misleading. It currently says \"Config file {path} does not exist\", but the actual failure can be due to multiple reasons, including YAML schema mismatches.

## Example

When there's a YAML schema mismatch, the error message is:
\`\`\`
Error: Config file /etc/clp-config.yaml does not exist

Caused by:
    0: \`serde_yaml::Error\`: stream_output: unknown variant \`profile\`, expected \`credentials\` at line 85 column 3
    1: stream_output: unknown variant \`profile\`, expected \`credentials\` at line 85 column 3
\`\`\`

The file exists, but the schema is invalid. The message should be more generic, such as \"Failed to load config file\" to avoid confusion.

## Location

\`components/api-server/src/bin/api_server.rs\` lines 31-33

## Related Links

- PR: https://github.com/y-scope/clp/pull/1534
- Comment: https://github.com/y-scope/clp/pull/1534#discussion_r2662713050

Requested by: @junhaoliao" \
  --assignee hoophalab

Length of output: 42


@junhaoliao, I've created issue #1839 to track improving the error message when config file loading fails. The issue has been assigned to @hoophalab as you requested.


🧠 Learnings used
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1136
File: components/webui/client/src/pages/IngestPage/Details/sql.ts:1-1
Timestamp: 2025-07-29T21:00:07.757Z
Learning: User junhaoliao requested creating a GitHub issue to track server-side SQL query error handling improvements, specifically to prevent uncaught query errors from causing 500 errors to reach the client.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
* Add CLP client to send queries and retrieve results from clp-json.
* Define the package-level config schema for the API server.
* Implement an API server to forward incoming query requests to the underlying CLP client.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants