feat(api-server): Initial commit for CLP API server:#1534
Conversation
WalkthroughAdds a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Server as Axum
participant Client as clp_client::Client
participant MySQL
participant MongoDB
User->>Server: POST /query (QueryConfig)
Server->>Client: submit_query(query_config)
Client->>MySQL: INSERT serialized SearchJobConfig (msgpack)
MySQL-->>Client: last_insert_id (job_id)
Client-->>Server: job_id
Server-->>User: { "job_id": u64 }
User->>Server: GET /query_results/{job_id}
Server->>Client: fetch_results(job_id)
loop poll status
Client->>MySQL: query job status
MySQL-->>Client: status
end
alt status == Succeeded
Client->>MongoDB: read collection for job_id (stream)
MongoDB-->>Client: documents stream
Client-->>Server: stream messages
Server-->>User: SSE events (keep-alive)
else terminal failure
Client-->>Server: QueryNotSucceeded/Error
Server-->>User: 500 InternalServer
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This comment was marked as outdated.
This comment was marked as outdated.
junhaoliao
left a comment
There was a problem hiding this comment.
reviewed the axum server code
i agree with the RESTful API endpoint designs in the code. as discussed offline, in a future PR, please consider integrating frameworks to generate OpenAPI specs to make the code more testable (maybe https://docs.rs/utoipa/latest/utoipa/ ?)
@LinZhihao-723 please help with the rest of the review as i observe and learn
| [lib] | ||
| name = "clp_client" | ||
| path = "src/lib.rs" |
There was a problem hiding this comment.
question: do we have to define a [lib] in this Cargo project? as in, do we plan to reuse this lib in other projects at the moment?
There was a problem hiding this comment.
i guess it is useful if other Rust projects want to use the client without the HTTP server?
There was a problem hiding this comment.
I think one of the reason to define it is to allow unused public symbols. It also helps when writing test cases.
There was a problem hiding this comment.
I feel another benefit is that a separate client library enables us to create an alternative to search.py -- simply create a clp_client, submit_query and print fetch results to stdout.
There was a problem hiding this comment.
i guess that's to say we don't need those for now. i'm fine if we want to keep it given we should add the tests soon
| } | ||
|
|
||
| #[tokio::main] | ||
| async fn main() -> anyhow::Result<()> { |
There was a problem hiding this comment.
i suppose we use anyhow::Result<()> because we want the error type to be flexible?
There was a problem hiding this comment.
I was only using anyhow::Result<()> to pretty print the error message, but the new version also uses anyhow to add context as you suggested.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
components/api-server/src/client.rs (1)
70-75: Pass MongoDB credentials when connecting.
We dropcredentials.results_cachehere and attempt an anonymous MongoDB connection, so any secured deployment will reject/query_resultsoutright. Please plumb the results-cache username and password into theClientOptions/URI (retrieving the secret viaExposeSecret) before constructing the client. Without that change the API server cannot function when auth is enabled.- let mongo_client = mongodb::Client::with_uri_str(mongo_uri).await?; + use secrecy::ExposeSecret; + let mut client_options = + mongodb::options::ClientOptions::parse(&mongo_uri).await?; + client_options.credential = Some( + mongodb::options::Credential::builder() + .username(Some(credentials.results_cache.user.clone())) + .password(Some( + credentials.results_cache.password.expose_secret().to_owned(), + )) + .build(), + ); + let mongo_client = mongodb::Client::with_options(client_options)?;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
components/api-server/Cargo.toml(1 hunks)components/api-server/src/bin/api_server.rs(1 hunks)components/api-server/src/client.rs(1 hunks)components/api-server/src/error.rs(1 hunks)components/api-server/src/lib.rs(1 hunks)components/clp-rust-utils/Cargo.toml(1 hunks)components/clp-rust-utils/src/clp_config/package/config.rs(1 hunks)components/clp-rust-utils/src/error.rs(1 hunks)components/clp-rust-utils/src/serde.rs(1 hunks)components/clp-rust-utils/src/serde/yaml_file.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (20)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Applied to files:
components/clp-rust-utils/src/serde.rscomponents/api-server/src/bin/api_server.rscomponents/clp-rust-utils/Cargo.tomlcomponents/api-server/Cargo.tomlcomponents/api-server/src/client.rs
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Applied to files:
components/clp-rust-utils/src/clp_config/package/config.rscomponents/api-server/src/bin/api_server.rscomponents/api-server/Cargo.tomlcomponents/api-server/src/client.rs
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2024-09-27T23:19:17.079Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:717-726
Timestamp: 2024-09-27T23:19:17.079Z
Learning: In the `JsonParser::parse_from_IR` function, when deserializing, it's intentional to break out of the loop without additional error handling when `kv_log_event_result.has_error()` returns true for errors other than `no_message_available` and `result_out_of_range`, as this signifies the end of the IR file.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-17T19:59:35.299Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:757-757
Timestamp: 2025-10-17T19:59:35.299Z
Learning: In the y-scope/clp repository, prefer long-form command-line flags (e.g., `--recursive`) over short-form flags (e.g., `-R`) for readability, even if the long form is GNU-specific.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2024-11-18T16:49:20.248Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-01T14:51:19.172Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-01T14:52:15.217Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like `apk update && apk add`, `apt update && apt install`, `yum update && yum install`) rather than platform-specific optimizations, to ensure uniform script structure and readability.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-22T21:01:31.391Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.
Applied to files:
components/api-server/src/bin/api_server.rscomponents/clp-rust-utils/Cargo.toml
📚 Learning: 2025-10-20T22:20:45.733Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1448
File: components/clp-rust-utils/Cargo.toml:4-4
Timestamp: 2025-10-20T22:20:45.733Z
Learning: Rust edition 2024 is a valid edition value in Cargo.toml files. It was released with Rust 1.85 on February 20, 2025.
Applied to files:
components/clp-rust-utils/Cargo.toml
📚 Learning: 2025-10-31T15:41:23.952Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1534
File: components/api-server/src/client.rs:129-133
Timestamp: 2025-10-31T15:41:23.952Z
Learning: In Rust 2024 edition, the syntax `+ use<>` in return-position `impl Trait` types (e.g., `-> impl Stream<Item = T> + use<>`) is valid and indicates that the opaque type captures no generic parameters from the surrounding scope. This is part of the new precise capture rules introduced in Rust 2024 edition.
Applied to files:
components/api-server/src/client.rs
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:209-210
Timestamp: 2024-10-08T15:52:50.753Z
Learning: Until #486 is merged with customized error code support, introducing a layer from `IRErrorCode` to `std::errc` is unnecessary.
Applied to files:
components/api-server/src/client.rs
🧬 Code graph analysis (3)
components/clp-rust-utils/src/error.rs (1)
components/api-server/src/error.rs (2)
from(32-34)from(38-46)
components/api-server/src/bin/api_server.rs (2)
components/clp-rust-utils/src/serde/yaml_file.rs (1)
from_path(20-23)components/api-server/src/client.rs (2)
connect(67-81)from(32-42)
components/api-server/src/client.rs (4)
components/clp-rust-utils/src/database/mysql.rs (1)
create_mysql_pool(20-36)components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
SearchJobConfig(85-103)components/api-server/src/bin/api_server.rs (2)
from(110-112)query(79-86)components/clp-rust-utils/src/clp_config/package/config.rs (4)
default(35-41)default(55-63)default(80-84)default(112-118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: rust-checks (macos-15)
🔇 Additional comments (6)
components/clp-rust-utils/Cargo.toml (1)
7-9: LGTM! Dependency updates are appropriate.The AWS SDK minor version bumps and the addition of
serde_yamlalign well with the new YAML deserialization functionality introduced in this PR.Also applies to: 15-15
components/clp-rust-utils/src/clp_config/package/config.rs (1)
49-51: LGTM! Configuration fields are well-defined.The new polling configuration fields have clear names and sensible defaults. The exponential backoff range (100ms to 5000ms) and result limit (1000) are reasonable starting values.
Also applies to: 59-61
components/clp-rust-utils/src/error.rs (1)
14-15: LGTM! YAML error variant correctly integrated.The new
SerdeYamlvariant follows the established pattern and correctly uses the#[from]attribute for automatic error conversion. This aligns with the new YAML deserialization functionality.components/api-server/src/error.rs (2)
22-35: Elegant error conversion pattern with IsMalformedData trait.The private
IsMalformedDatamarker trait with a blanketFrom<T: IsMalformedData>implementation is a clean way to unify multiple serialization error types intoMalformedData. This avoids boilerplate while maintaining type safety.
37-47: LGTM! Correct delegation of clp_rust_utils errors.The
From<clp_rust_utils::Error>implementation correctly:
- Maps serialization errors (
MsgpackEncode,SerdeYaml) toMalformedData- Delegates
IoandSqlxto their respectiveFromimplementations via.into()This maintains clean error semantics across crate boundaries.
components/api-server/src/lib.rs (1)
1-3: No issues found — code is properly designed as-is.The verification confirms that
ClientErroris already publicly re-exported through the client module atclient.rs:13withpub use crate::error::ClientError;. This meanspub use client::*;inlib.rsalready exposes error types to users of the library.The current design is intentional and clean: error types are accessible via the client namespace (e.g.,
api_server::client::ClientError), which creates a logical grouping. The suggested change to directly re-export the error module is unnecessary.Likely an incorrect or invalid review comment.
| [dependencies] | ||
| anyhow = "1.0.100" | ||
| axum = { version = "0.8.6", features = ["json"] } | ||
| clap = { version = "4.5.51", features = ["derive"] } | ||
| clp-rust-utils = { path = "../clp-rust-utils" } | ||
| futures = "0.3.31" | ||
| mongodb = "3.3.0" | ||
| num_enum = "0.7.5" | ||
| rmp-serde = "1.3.0" | ||
| serde = { version = "1.0.228", features = ["derive"] } | ||
| serde_json = "1.0.145" | ||
| sqlx = { version = "0.8.6", features = ["runtime-tokio", "mysql"] } | ||
| thiserror = "2.0.17" | ||
| tokio = { version = "1.48.0", features = ["full"] } | ||
| tracing = "0.1.41" | ||
| tracing-subscriber = { version = "0.3.20", features = ["json", "env-filter"] } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding tower-http for request logging.
The current setup uses tracing-subscriber, which is good. However, consider adding tower-http with the trace feature for automatic per-request logging middleware. This provides structured request/response logging (method, URI, status, latency) without manual instrumentation.
Add to dependencies:
tower-http = { version = "0.6", features = ["trace"] }This would complement your existing tracing setup and provide production-ready request observability.
Based on past review comments.
🤖 Prompt for AI Agents
In components/api-server/Cargo.toml around lines 14 to 29, add the tower-http
crate with the "trace" feature to enable automatic per-request tracing
middleware; update the [dependencies] section to include tower-http (use a
compatible version such as 0.6) and then run cargo update / cargo build to fetch
the crate and confirm compatibility with existing tracing/tokio versions so you
can wire the TraceLayer into your axum server.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Fetch the referenced discussion comment from the PR
gh api \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
/repos/y-scope/clp/pulls/comments/2491919405
Length of output: 2719
@hoophalab, understood! Since you already have per-request logging via RUST_LOG=trace with axum's built-in tracing, the tower-http suggestion isn't necessary. The existing setup provides connection-level logging, which covers the core observability needs.
If you later want to add request/response body logging or structured request metadata (method, URI, status, latency), tower-http's TraceLayer would be an option, but your current approach is perfectly valid for now.
🧠 Learnings used
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1466
File: .github/workflows/clp-rust-checks.yaml:14-15
Timestamp: 2025-10-22T21:14:12.225Z
Learning: Repository y-scope/clp: In GitHub Actions workflows (e.g., .github/workflows/clp-rust-checks.yaml), YAML anchors/aliases are acceptable and preferred to avoid duplication; if actionlint flags an alias node (e.g., on push.paths) as an error, treat it as a tool limitation and do not require inlining unless the team asks to silence the warning.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1078
File: components/webui/server/src/plugins/MongoSocketIoServer/index.ts:105-108
Timestamp: 2025-07-08T17:20:02.560Z
Learning: The y-scope/clp codebase has a 100-character line length limit that takes precedence over other style preferences like template literals when they conflict.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:757-757
Timestamp: 2025-10-17T19:59:35.299Z
Learning: In the y-scope/clp repository, prefer long-form command-line flags (e.g., `--recursive`) over short-form flags (e.g., `-R`) for readability, even if the long form is GNU-specific.
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1291
File: docs/src/user-docs/quick-start/text-v-json.md:84-88
Timestamp: 2025-09-08T21:41:32.059Z
Learning: In the y-scope/clp project documentation, maintain informal contractions like "can't" and "don't" rather than formal alternatives like "cannot" and "do not" - the maintainer prefers a conversational tone.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1261
File: .github/workflows/clp-core-build.yaml:294-332
Timestamp: 2025-08-25T06:29:59.610Z
Learning: In the CLP project, Bill-hbrhbr prefers a "fail fast" approach for CI workflows - allowing potential command availability issues (like getconf in musllinux) to surface through CI failures rather than preemptively adding fallback logic, as they will fix issues when they occur.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.
|
@junhaoliao @LinZhihao-723 What do you think about renaming |
I agree with |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
components/api-server/src/bin/api_server.rs (1)
64-64: Emit the bound address as a tracing field
tracing::event!doesn’t interpolate{addr}, so the log prints the braces literally. Record the address as a structured field to get useful output.Apply this diff to log the address correctly:
- tracing::event!(tracing::Level::INFO, "Server started at {addr}"); + tracing::event!(tracing::Level::INFO, %addr, "Server started");components/api-server/src/client.rs (1)
70-74: Add MongoDB credentials when connecting to results cache.The MongoDB client is created without using credentials from
credentials.results_cache, which will cause authentication failures in secured deployments. The credentials must be plumbed into the MongoDB connection.Apply this diff to add credential support:
+ let mut mongo_client_options = mongodb::options::ClientOptions::parse( + format!( + "mongodb://{}:{}/{}?directConnection=true", + config.results_cache.host, config.results_cache.port, config.results_cache.db_name, + ) + ).await?; + + if let (Some(username), Some(password)) = ( + &credentials.results_cache.username, + &credentials.results_cache.password, + ) { + mongo_client_options.credential = Some( + mongodb::options::Credential::builder() + .username(username.clone()) + .password(password.expose_secret().to_string()) + .build() + ); + } + - let mongo_uri = format!( - "mongodb://{}:{}/{}?directConnection=true", - config.results_cache.host, config.results_cache.port, config.results_cache.db_name, - ); - let mongo_client = mongodb::Client::with_uri_str(mongo_uri).await?; + let mongo_client = mongodb::Client::with_options(mongo_client_options)?;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
components/api-server/src/bin/api_server.rs(1 hunks)components/api-server/src/client.rs(1 hunks)components/api-server/src/error.rs(1 hunks)components/clp-rust-utils/src/error.rs(1 hunks)components/clp-rust-utils/src/serde.rs(1 hunks)components/clp-rust-utils/src/serde/yaml.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (19)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Applied to files:
components/api-server/src/error.rscomponents/api-server/src/bin/api_server.rscomponents/clp-rust-utils/src/serde/yaml.rscomponents/api-server/src/client.rs
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2024-09-27T23:19:17.079Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:717-726
Timestamp: 2024-09-27T23:19:17.079Z
Learning: In the `JsonParser::parse_from_IR` function, when deserializing, it's intentional to break out of the loop without additional error handling when `kv_log_event_result.has_error()` returns true for errors other than `no_message_available` and `result_out_of_range`, as this signifies the end of the IR file.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-17T19:59:35.299Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:757-757
Timestamp: 2025-10-17T19:59:35.299Z
Learning: In the y-scope/clp repository, prefer long-form command-line flags (e.g., `--recursive`) over short-form flags (e.g., `-R`) for readability, even if the long form is GNU-specific.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2024-11-18T16:49:20.248Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-01T14:51:19.172Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-01T14:52:15.217Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like `apk update && apk add`, `apt update && apt install`, `yum update && yum install`) rather than platform-specific optimizations, to ensure uniform script structure and readability.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Applied to files:
components/api-server/src/bin/api_server.rscomponents/clp-rust-utils/src/serde.rscomponents/api-server/src/client.rs
📚 Learning: 2025-10-22T21:01:31.391Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-31T15:41:23.952Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1534
File: components/api-server/src/client.rs:129-133
Timestamp: 2025-10-31T15:41:23.952Z
Learning: In Rust 2024 edition, the syntax `+ use<>` in return-position `impl Trait` types (e.g., `-> impl Stream<Item = T> + use<>`) is valid and indicates that the opaque type captures no generic parameters from the surrounding scope. This is part of the new precise capture rules introduced in Rust 2024 edition.
Applied to files:
components/api-server/src/client.rs
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:209-210
Timestamp: 2024-10-08T15:52:50.753Z
Learning: Until #486 is merged with customized error code support, introducing a layer from `IRErrorCode` to `std::errc` is unnecessary.
Applied to files:
components/api-server/src/client.rs
🧬 Code graph analysis (3)
components/clp-rust-utils/src/error.rs (1)
components/api-server/src/error.rs (2)
from(32-34)from(38-46)
components/api-server/src/bin/api_server.rs (3)
components/clp-rust-utils/src/serde/yaml.rs (1)
from_path(18-21)components/api-server/src/client.rs (2)
connect(67-81)from(32-42)components/api-server/src/error.rs (2)
from(32-34)from(38-46)
components/api-server/src/client.rs (5)
components/clp-rust-utils/src/database/mysql.rs (1)
create_mysql_pool(20-36)components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
SearchJobConfig(85-103)components/api-server/src/bin/api_server.rs (2)
from(117-119)query(79-86)components/api-server/src/error.rs (2)
from(32-34)from(38-46)components/clp-rust-utils/src/clp_config/package/config.rs (4)
default(35-41)default(55-63)default(80-84)default(112-118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
components/api-server/src/client.rs (1)
149-164: Excellent implementation of exponential backoff.The polling logic correctly implements exponential backoff with configurable initial and maximum delays, preventing excessive database load while waiting for job completion.
| /// * Forwards [`create_mysql_pool`]'s errors on failure. | ||
| /// * Forwards [`mongodb::Client::with_uri_str`]'s errors on failure. | ||
| pub async fn connect(config: &Config, credentials: &Credentials) -> Result<Self, ClientError> { | ||
| let sql_pool = create_mysql_pool(&config.database, &credentials.database, 10).await?; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider making MySQL max connections configurable.
The max_connections value is hardcoded to 10. Consider adding a field to the ApiServer config struct to make this configurable based on deployment needs.
🤖 Prompt for AI Agents
In components/api-server/src/client.rs around line 68, the MySQL pool is created
with a hardcoded max_connections value of 10; add a new configurable field
(e.g., database_max_connections or mysql_pool_size) to the ApiServer config
struct, propagate it through config loading/validation, and replace the literal
10 with that config value when calling create_mysql_pool so deployments can tune
the pool size; ensure the new field has a sensible default and is documented in
config parsing/loading code.
| } | ||
|
|
||
| #[derive(Clone)] | ||
| pub struct Client { |
There was a problem hiding this comment.
I guess u're using Client for CLP client. After further thinking this class isn't exactly a client...
How about Middleware suggested by @kirkrodrigues? And the lib should be named to clp_middleware.
There was a problem hiding this comment.
Open to suggestions, but here's my initial thinking:
It seems there are two ways to look at it. If the library is only useful to the API server, then Middleware is a fitting name.
However, because we could easily create something similar to search.py using this library, we might also consider it an internal client, which would be faster than a typical API client if the user has direct access to the database and file system.
For example, the SeaweedFS module that discovers volume nodes and builds a direct connecting URL is also named wdclient.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
components/api-server/src/bin/api_server.rs (2)
61-61: Fix the route path parameter syntax.Axum requires path parameters to use
:paramsyntax, not{param}. The current route pattern/query_results/{search_job_id}will be treated literally and won't match requests like/query_results/10, resulting in 404 errors.Apply this diff to fix the route:
- .route("/query_results/{search_job_id}", get(query_results)) + .route("/query_results/:search_job_id", get(query_results))
64-64: Prefer structured logging fields over string interpolation.While the current format string interpolation works, structured logging is the recommended pattern with
tracingas it enables better filtering and analysis.Apply this diff:
- tracing::info!("Server started at {addr}"); + tracing::info!(%addr, "Server started");
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/api-server/src/bin/api_server.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (16)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2024-09-27T23:19:17.079Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:717-726
Timestamp: 2024-09-27T23:19:17.079Z
Learning: In the `JsonParser::parse_from_IR` function, when deserializing, it's intentional to break out of the loop without additional error handling when `kv_log_event_result.has_error()` returns true for errors other than `no_message_available` and `result_out_of_range`, as this signifies the end of the IR file.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-17T19:59:35.299Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:757-757
Timestamp: 2025-10-17T19:59:35.299Z
Learning: In the y-scope/clp repository, prefer long-form command-line flags (e.g., `--recursive`) over short-form flags (e.g., `-R`) for readability, even if the long form is GNU-specific.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2024-11-18T16:49:20.248Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-01T14:51:19.172Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-07-01T14:52:15.217Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like `apk update && apk add`, `apt update && apt install`, `yum update && yum install`) rather than platform-specific optimizations, to ensure uniform script structure and readability.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Applied to files:
components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-22T21:01:31.391Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:01:31.391Z
Learning: In y-scope/clp, for the clp-rust-checks workflow (GitHub Actions) and the Taskfile target deps:lock:check-rust, we should verify Rust Cargo.lock is in sync with Cargo.toml using a non-mutating method (e.g., cargo metadata --locked / cargo check --locked) to keep CI deterministic and avoid updating dependencies during validation.
Applied to files:
components/api-server/src/bin/api_server.rs
🧬 Code graph analysis (1)
components/api-server/src/bin/api_server.rs (2)
components/clp-rust-utils/src/serde/yaml.rs (1)
from_path(18-21)components/api-server/src/client.rs (2)
connect(67-81)from(32-42)
🔇 Additional comments (3)
components/api-server/src/bin/api_server.rs (3)
96-98: Verify the single-line message requirement.The handler rejects any message that spans multiple lines, returning a 500 error. This validation appears strict—if the results_cache can produce valid multi-line JSON or formatted output, those results will be rejected.
Confirm whether single-line formatting is guaranteed by the upstream data source or if this check should be relaxed.
117-121: All errors map to Internal Server Error.The blanket
Fromimplementation converts all errors (database, network, deserialization) into a generic 500 response without logging or preserving diagnostic context. This design prioritizes security (not leaking internals) but may complicate troubleshooting.Consider whether structured logging of the original error (before conversion) would aid operations without compromising security.
93-100: Consider extracting the mapping closure for clarity.The inline closure that maps results to SSE events could be extracted into a separate statement to improve readability, as suggested by junhaoliao.
For example:
let results_stream = client.fetch_results(search_job_id).await?; - let event_stream = results_stream.map(|res| { + let event_stream = results_stream.map(|res| -> Result<Event, HandlerError> { let message = res?; let trimmed_message = message.trim(); if trimmed_message.lines().count() != 1 { return Err(HandlerError::InternalServer); } Ok(Event::default().data(trimmed_message)) });Or extract into a helper function if the logic becomes more complex.
Likely an incorrect or invalid review comment.
LinZhihao-723
left a comment
There was a problem hiding this comment.
One last comment missed from previous rounds. Otherwise lgtm.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/clp-rust-utils/src/clp_config/package/config.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Applied to files:
components/clp-rust-utils/src/clp_config/package/config.rs
🧬 Code graph analysis (1)
components/clp-rust-utils/src/clp_config/package/config.rs (1)
components/clp-py-utils/clp_py_utils/clp_config.py (3)
Package(135-159)Database(162-249)ResultsCache(327-341)
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
components/api-server/src/client.rs (1)
67-81: Ensure MongoDB credentials are included and connection pool size is configurable.Two issues remain from previous reviews:
Critical: Lines 70-74 construct the MongoDB URI without using
credentials.results_cache, causing authentication failures in secured deployments. This was flagged in previous reviews.Minor: Line 68 hardcodes
max_connectionsto10. This was also flagged previously as a candidate for configuration.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/api-server/src/client.rs(1 hunks)components/clp-rust-utils/src/clp_config/package/config.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Applied to files:
components/clp-rust-utils/src/clp_config/package/config.rscomponents/api-server/src/client.rs
📚 Learning: 2025-10-31T15:41:23.952Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1534
File: components/api-server/src/client.rs:129-133
Timestamp: 2025-10-31T15:41:23.952Z
Learning: In Rust 2024 edition, the syntax `+ use<>` in return-position `impl Trait` types (e.g., `-> impl Stream<Item = T> + use<>`) is valid and indicates that the opaque type captures no generic parameters from the surrounding scope. This is part of the new precise capture rules introduced in Rust 2024 edition.
Applied to files:
components/api-server/src/client.rs
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Applied to files:
components/api-server/src/client.rs
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:209-210
Timestamp: 2024-10-08T15:52:50.753Z
Learning: Until #486 is merged with customized error code support, introducing a layer from `IRErrorCode` to `std::errc` is unnecessary.
Applied to files:
components/api-server/src/client.rs
🧬 Code graph analysis (2)
components/clp-rust-utils/src/clp_config/package/config.rs (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
Database(162-249)ResultsCache(327-341)
components/api-server/src/client.rs (6)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
StorageEngine(95-97)components/clp-rust-utils/src/database/mysql.rs (1)
create_mysql_pool(20-36)components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
SearchJobConfig(85-103)components/api-server/src/error.rs (2)
from(32-34)from(38-46)components/api-server/src/bin/api_server.rs (2)
from(126-128)query(87-94)components/clp-rust-utils/src/clp_config/package/config.rs (6)
default(21-29)default(48-54)default(67-74)default(87-92)default(109-113)default(141-147)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: package-image
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (11)
components/clp-rust-utils/src/clp_config/package/config.rs (4)
10-30: LGTM!The addition of
logs_directoryand the explicitDefaultimplementation forConfigare well-structured and consistent with the existing pattern for mirrored Python types.
59-64: LGTM!The nested
query_job_pollingstructure and renameddefault_max_num_query_resultsfield align with the previous review feedback and improve clarity.
66-75: LGTM!The
Defaultimplementation now correctly instantiates the nestedQueryJobPollingConfigand usesdefault_max_num_query_results, resolving the previous compilation issue.
86-93: LGTM!The default polling configuration values (100ms initial, 5000ms max) are reasonable for the exponential backoff strategy used in the API server.
components/api-server/src/client.rs (7)
1-14: LGTM!The imports are well-organized and all appear to be used appropriately throughout the file.
15-29: LGTM!The
QueryConfigstruct is correctly defined with appropriate serde defaults. The field structure maps cleanly to theSearchJobConfigconversion.
31-43: LGTM!The conversion from
QueryConfigtoSearchJobConfigis clean and correctly maps all fields while using sensible defaults for the remainder.
45-51: LGTM!The
Clientstruct appropriately encapsulates the MongoDB client, MySQL pool, and configuration needed for distributed query operations.
83-119: LGTM!The
submit_queryimplementation correctly applies storage-engine-specific defaults and safely inserts the job configuration using parameterized queries. The use ofrmp_serdefor binary serialization is appropriate for the database schema.
121-186: LGTM!The
fetch_resultsimplementation is well-designed:
- The exponential backoff polling correctly uses the new
query_job_pollingconfiguration.- The MongoDB cursor mapping efficiently extracts and validates the "message" field.
- The
+ use<>syntax is valid Rust 2024 edition for precise capture control.
188-210: LGTM!The
get_statusmethod correctly queries the jobs table with parameterized binding and safely converts the status integer to the appropriate enum type.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/clp-rust-utils/src/clp_config/package/config.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Applied to files:
components/clp-rust-utils/src/clp_config/package/config.rs
🧬 Code graph analysis (1)
components/clp-rust-utils/src/clp_config/package/config.rs (1)
components/clp-py-utils/clp_py_utils/clp_config.py (3)
Package(135-159)Database(162-249)ResultsCache(327-341)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: package-image
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (5)
components/clp-rust-utils/src/clp_config/package/config.rs (5)
10-18: LGTM: Config struct extended correctly.The removal of
Defaultfrom the derive list and the addition of thelogs_directoryfield are correct. The explicitDefaultimplementation below (lines 20-30) properly initializes all fields including the new one.
20-30: LGTM: Default implementation is correct.The explicit
Defaultimplementation properly initializes all fields, including the newlogs_directorywith a sensible default value.
57-64: LGTM: ApiServer structure follows reviewer feedback.The nested
QueryJobPollingConfigand the renameddefault_max_num_query_resultsfield incorporate the suggestions from previous reviews. The structure is clean and logically organized.
66-75: LGTM: Default implementation correctly updated.The Default implementation properly initializes all fields, including the nested
query_job_pollingconfiguration. The defaults are sensible for an API server.
86-93: LGTM: Sensible polling defaults.The default backoff values (100ms initial, 5000ms maximum) provide a reasonable exponential backoff configuration for query job polling.
LinZhihao-723
left a comment
There was a problem hiding this comment.
How about using a multi-line commit message and PR title:
feat(api-server): Initial commit for CLP API server:
* Add CLP client to send queries and retrieve results from clp-json.
* Define the package-level config schema for the API server.
* Implement an API server to forward incoming query requests to the underlying CLP client.
|
|
||
| let config_path = home.join(package::DEFAULT_CONFIG_FILE_PATH); | ||
| let config: package::config::Config = yaml::from_path(&config_path).context(format!( | ||
| "Config file {} does not exist", |
There was a problem hiding this comment.
it appears that the log message is not so accurate because YAML schema mismatch can also cause failure:
[2026-01-05 15:20:01] (test-clp-api-server-5f57bdf49d-nz7wr wait-for-db-table-creator) job.batch/test-clp-db-table-creator condition met
[2026-01-05 15:20:02] (test-clp-api-server-5f57bdf49d-nz7wr wait-for-results-cache-indices-creator) job.batch/test-clp-results-cache-indices-creator condition met
[2026-01-05 15:21:42] (test-clp-api-server-5f57bdf49d-nz7wr api-server) Error: Config file /etc/clp-config.yaml does not exist
[2026-01-05 15:21:42] (test-clp-api-server-5f57bdf49d-nz7wr api-server)
[2026-01-05 15:21:42] (test-clp-api-server-5f57bdf49d-nz7wr api-server) Caused by:
[2026-01-05 15:21:42] (test-clp-api-server-5f57bdf49d-nz7wr api-server) 0: `serde_yaml::Error`: stream_output: unknown variant `profile`, expected `credentials` at line 85 column 3
[2026-01-05 15:21:42] (test-clp-api-server-5f57bdf49d-nz7wr api-server) 1: stream_output: unknown variant `profile`, expected `credentials` at line 85 column 3
can we improve the log message? @coderabbitai create an issue to track, and assign to @hoophalab
There was a problem hiding this comment.
🧩 Analysis chain
@junhaoliao, I'll create a GitHub issue to track improving the error message.
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue for improving the config loading error message
gh issue create \
--repo y-scope/clp \
--title "Improve error message when config file loading fails in API server" \
--body "## Description
The error message when loading the config file in the API server is misleading. It currently says \"Config file {path} does not exist\", but the actual failure can be due to multiple reasons, including YAML schema mismatches.
## Example
When there's a YAML schema mismatch, the error message is:
\`\`\`
Error: Config file /etc/clp-config.yaml does not exist
Caused by:
0: \`serde_yaml::Error\`: stream_output: unknown variant \`profile\`, expected \`credentials\` at line 85 column 3
1: stream_output: unknown variant \`profile\`, expected \`credentials\` at line 85 column 3
\`\`\`
The file exists, but the schema is invalid. The message should be more generic, such as \"Failed to load config file\" to avoid confusion.
## Location
\`components/api-server/src/bin/api_server.rs\` lines 31-33
## Related Links
- PR: https://github.com/y-scope/clp/pull/1534
- Comment: https://github.com/y-scope/clp/pull/1534#discussion_r2662713050
Requested by: @junhaoliao" \
--assignee hoophalab
Length of output: 42
@junhaoliao, I've created issue #1839 to track improving the error message when config file loading fails. The issue has been assigned to @hoophalab as you requested.
🧠 Learnings used
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1136
File: components/webui/client/src/pages/IngestPage/Details/sql.ts:1-1
Timestamp: 2025-07-29T21:00:07.757Z
Learning: User junhaoliao requested creating a GitHub issue to track server-side SQL query error handling improvements, specifically to prevent uncaught query errors from causing 500 errors to reach the client.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
* Add CLP client to send queries and retrieve results from clp-json. * Define the package-level config schema for the API server. * Implement an API server to forward incoming query requests to the underlying CLP client.
Description
This PR adds the API server. Only clp-json is supported for now.
clp-config.ymlandcredentials.yml(This is moved to feat(clp-rust-utils): Add serde-compatible mirror structs for package and search job configs based on existing Python definitions. #1535)/queryroute: insert a compression job to jobs table/query_resultsroute: fetch results from the results_cacheChecklist
breaking change.
Validation performed
CLP_HOME=<...>/build/clp-package RUST_LOG=trace cargo run --bin api_serveror directly calling the binary in
/target/debug/CLP_HOME=<...>/build/clp-package RUST_LOG=trace api_server2. Submit a query
Response:
{"query_results_uri":"/query_results/10"}GET http://localhost:3001/query_results/10Response:
Summary by CodeRabbit
New Features
Bug Fixes / Improvements
Chores
Other