Skip to content

feat(api-server): Document APIs using OpenAPI and serve on /openapi.json endpoint.#1663

Merged
hoophalab merged 16 commits into
y-scope:mainfrom
hoophalab:openapi
Nov 27, 2025
Merged

feat(api-server): Document APIs using OpenAPI and serve on /openapi.json endpoint.#1663
hoophalab merged 16 commits into
y-scope:mainfrom
hoophalab:openapi

Conversation

@hoophalab

@hoophalab hoophalab commented Nov 25, 2025

Copy link
Copy Markdown
Contributor

Description

This PR adds an endpoint to server open api docs.
Move routes to lib.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Put http://localhost:3001/openapi.json on https://petstore.swagger.io/. The endpoints are shown correctly.

Summary by CodeRabbit

  • New Features
    • Added OpenAPI documentation at /openapi.json with improved API docs.
    • Implemented CORS support for cross-origin requests.
    • Added a health check endpoint.
    • Expanded query configuration with dataset selection, result limits, timestamp filtering, case sensitivity, and file-output options.
    • Added Server-Sent Events streaming for query results.

✏️ Tip: You can customize this high-level summary in your review settings.

@hoophalab hoophalab requested a review from a team as a code owner November 25, 2025 20:34
@coderabbitai

coderabbitai Bot commented Nov 25, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Centralizes Axum route definitions into a new routes module with OpenAPI (utoipa) and CORS support, exposes a router factory from_client, expands QueryConfig schema, makes client public, and renames the library target in Cargo.toml.

Changes

Cohort / File(s) Summary
Library configuration and dependencies
components/api-server/Cargo.toml
Library target renamed from clp_client to api_server. Added dependencies: tower-http (with cors feature), utoipa (with axum_extras feature), and utoipa-axum.
Module exports and entrypoint
components/api-server/src/lib.rs, components/api-server/src/bin/api_server.rs
client module made public and new routes module added; wildcard re-export pub use client::* removed. Binary now obtains a router from api_server::routes::from_client() and no longer contains inline route handlers or the previous HandlerError/handler functions.
QueryConfig schema expansion
components/api-server/src/client.rs
QueryConfig now derives utoipa::ToSchema and adds fields: dataset: Option<String>, max_num_results: u32, begin_timestamp: Option<i64>, end_timestamp: Option<i64>, ignore_case: bool, and write_to_file: bool (with serde defaults and field docs).
Router, handlers, OpenAPI, SSE
components/api-server/src/routes.rs
New module exposing pub fn from_client(client: Client) -> Result<axum::Router, serde_json::Error> and ApiDoc. Registers routes (/, /openapi.json, /health, /query, /query_results/{search_job_id}), integrates utoipa/utoipa_axum for OpenAPI, applies CORS layer, provides handlers for health, query (returns QueryResultsUri), and SSE streaming for query results, and defines HandlerError with conversions and IntoResponse mapping.

Sequence Diagram(s)

sequenceDiagram
    participant Browser as API Client
    participant Router as Axum Router
    participant OpenAPI as OpenAPI Layer
    participant Handler as Route Handler
    participant CLP as CLP Client

    Browser->>Router: GET /openapi.json
    Router->>OpenAPI: Request ApiDoc JSON
    OpenAPI-->>Browser: Return OpenAPI JSON

    Browser->>Router: POST /query (QueryConfig)
    Router->>Handler: invoke query handler
    Handler->>CLP: submit query with config
    CLP-->>Handler: return QueryResultsUri
    Handler-->>Browser: 200 OK with results URI

    Browser->>Router: GET /query_results/{id}
    Router->>Handler: invoke streaming handler
    Handler->>CLP: request stream
    loop per result line
        CLP-->>Handler: single-line JSON
        Handler-->>Browser: SSE event (JSON)
    end
    Note over Router: CORS applied to all routes
Loading
sequenceDiagram
    participant App as Application (bin)
    participant Factory as routes::from_client()
    participant Router as Axum Router
    participant State as Router State

    App->>Factory: from_client(client)
    Factory->>Router: build OpenApiRouter
    Factory->>Router: register handlers (/health, /query, /query_results)
    Factory->>Router: add /openapi.json
    Factory->>Router: apply CORS middleware
    Factory->>State: attach CLP client as state
    Factory-->>App: return configured Router (Result)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Review components/api-server/src/routes.rs for OpenAPI registration, SSE single-line validation, and correct error-to-response mappings.
  • Verify binary (src/bin/api_server.rs) switch to from_client() preserves initialization (state, tracing, middleware).
  • Check client::QueryConfig serde defaults and how new fields affect downstream query logic and validation.
  • Confirm Cargo rename and dependency additions do not break build or CI integration.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding OpenAPI documentation and serving it on /openapi.json, with secondary refactoring of routes into a separate module.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
components/api-server/src/bin/api_server.rs (2)

193-200: **Multi-line messages will cause 500 Internal Server Error.**Log messages containing newlines will cause a 500 Internal Server Error. "When the EventSource receives multiple consecutive lines that begin with data:, it concatenates them, inserting a newline character between each one." The SSE specification supports multi-line data by sending multiple data: lines.

Consider handling multi-line messages properly instead of rejecting them:

 let event_stream = results_stream.map(|res| {
     let message = res?;
     let trimmed_message = message.trim();
-    if trimmed_message.lines().count() != 1 {
-        return Err(HandlerError::InternalServer);
-    }
-    Ok(Event::default().data(trimmed_message))
+    // Axum's Event::data handles multi-line content by sending multiple data: lines
+    Ok(Event::default().data(trimmed_message))
 });

Alternatively, if single-line is a deliberate constraint, return a more descriptive error or escape newlines within JSON.


217-221: Error context is discarded.

The blanket From<T> implementation converts all errors to InternalServer without preserving the original error. This makes debugging production issues difficult. Consider logging the original error before conversion.

 impl<T: IntoHandlerError> From<T> for HandlerError {
-    fn from(_: T) -> Self {
+    fn from(err: T) -> Self {
+        tracing::error!("Handler error: {:?}", std::any::type_name::<T>());
         Self::InternalServer
     }
 }

For more detailed logging, you could use a trait bound like std::fmt::Debug:

impl<T: IntoHandlerError + std::fmt::Debug> From<T> for HandlerError {
    fn from(err: T) -> Self {
        tracing::error!("Handler error: {:?}", err);
        Self::InternalServer
    }
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca3947c and dbfa38e.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • components/api-server/Cargo.toml (1 hunks)
  • components/api-server/README.md (1 hunks)
  • components/api-server/src/bin/api_server.rs (3 hunks)
  • components/api-server/src/client.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/api-server/src/client.rs
  • components/api-server/src/bin/api_server.rs
🪛 markdownlint-cli2 (0.18.1)
components/api-server/README.md

12-12: Blank line inside blockquote

(MD028, no-blanks-blockquote)


15-15: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


56-56: Blank line inside blockquote

(MD028, no-blanks-blockquote)


91-91: Blank line inside blockquote

(MD028, no-blanks-blockquote)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: package-image
  • GitHub Check: rust-checks
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (3)
components/api-server/src/client.rs (1)

16-21: LGTM!

The ToSchema derive is correctly added to enable OpenAPI schema generation for QueryConfig, which is used as the request body for the /query endpoint.

components/api-server/README.md (1)

1-3: Documentation provides good coverage of the API.

The README effectively documents the endpoints and schemas. For the Swagger UI link, consider noting that users need to enter the full URL including protocol.

components/api-server/src/bin/api_server.rs (1)

103-109: OpenAPI router setup looks correct.

The setup registers /health, /query, and /query_results endpoints with OpenAPI metadata. The root path / also serves the health check, which is a common pattern for load balancer probes.

Comment on lines +32 to +37
tower-http = { version = "0.6.7", features = ["cors"] }
tracing = "0.1.41"
tracing-appender = "0.2.3"
tracing-subscriber = { version = "0.3.20", features = ["json", "env-filter", "fmt", "std"] }
utoipa = { version = "5.4.0", features = ["axum_extras"] }
utoipa-axum = "0.2.0"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

utoipa-axum crate latest version 2024 2025


<web_search_result>
As of Nov 25, 2025 the latest published version of the utoipa-axum crate is 0.2.0. [1][2]

Sources:

  • docs.rs crate page for utoipa-axum (shows v0.2.0). [1]
  • deps.rs / crate metadata for utoipa-axum v0.2.0. [2]
    </web_search_result>

utoipa crate version 5.4.0 latest 2024 2025


<web_search_result>
Yes — utoipa 5.4.0 was released June 16, 2025. Prior releases: 5.3.0 on Dec 19, 2024 (and 5.3.1 on Jan 5, 2025). Sources: [1][2][3].

Sources

  • Release list (lib.rs) — shows 5.4.0 (Jun 16, 2025) and 5.3.0 (Dec 19, 2024). [1]
  • docs.rs page for utoipa 5.4.0. [2]
  • utoipa-gen changelog showing 5.4.0 entry (Jun 16, 2025). [3]
    </web_search_result>

tower-http crate version 0.6.7 latest 2024 2025


<web_search_result>
As of today (2025-11-25) the latest published tower-http release is 0.6.6 (released Jun 3, 2025). I couldn't find a 0.6.7 release. [1][2]

Sources:

  • docs.rs/tower-http (shows 0.6.6 builds). [1]
  • lib.rs (crate versions listing, latest 0.6.6). [2]
    </web_search_result>

tower-http version 0.6.7 does not exist — the latest published version is 0.6.6 (released June 3, 2025).

This will cause a build failure. Update line 32 to use "0.6.6" instead.

The other versions are correct:

  • utoipa 5.4.0 is the latest (released June 16, 2025)
  • utoipa-axum 0.2.0 is the latest (released November 25, 2025)
🤖 Prompt for AI Agents
In components/api-server/Cargo.toml around lines 32 to 37, the dependency entry
for tower-http specifies version "0.6.7" which does not exist and will break the
build; change that version to "0.6.6" (the latest published 0.6.x), leaving the
other dependency lines unchanged so the file uses tower-http = { version =
"0.6.6", features = ["cors"] } instead of "0.6.7".

Comment thread components/api-server/README.md Outdated
Comment on lines +15 to +17
```
"API server is running"
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language specifier to fenced code block.

The code block is missing a language specifier. Since this is a plain text response example, use text or plain to satisfy linting and improve rendering consistency.

-```
+```text
 "API server is running"

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

15-15: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In components/api-server/README.md around lines 15 to 17, the fenced code block
lacks a language specifier; update the block to include a plain text language
tag (e.g., "text" or "plain") so it reads as a fenced block with a language
specifier — replace the existing wrapper withtext (or ```plain) and keep
the inner content unchanged.


</details>

<!-- This is an auto-generated comment by CodeRabbit -->

"/openapi.json",
get(|| async { (StatusCode::OK, api_json) }),
)
.layer(CorsLayer::new().allow_origin(Any).allow_headers(Any));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

CORS configuration is very permissive.

allow_origin(Any).allow_headers(Any) allows requests from any origin with any headers. While this enables Swagger UI and similar tools to work, it also permits cross-origin requests from any website. Consider:

  1. Adding .allow_methods([Method::GET, Method::POST]) to explicitly allow only the required methods.
  2. Documenting that this permissive configuration is intentional for development/tooling purposes.
  3. For production deployments, consider making the allowed origins configurable.
-        .layer(CorsLayer::new().allow_origin(Any).allow_headers(Any));
+        .layer(
+            CorsLayer::new()
+                .allow_origin(Any)
+                .allow_headers(Any)
+                .allow_methods([axum::http::Method::GET, axum::http::Method::POST]),
+        );

You'll need to add the import:

use axum::http::Method;
🤖 Prompt for AI Agents
In components/api-server/src/bin/api_server.rs around line 116, the CORS layer
is overly permissive using allow_origin(Any).allow_headers(Any); update it to
restrict methods by adding .allow_methods([Method::GET, Method::POST]) (or
whichever methods are required), make allowed origins configurable via
environment/config (replace Any with a configurable list in production), and add
a comment documenting that the current permissive settings are intentional for
development/Swagger only; also add the import use axum::http::Method; at the top
of the file.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
components/api-server/src/bin/api_server.rs (2)

175-203: Consider logging malformed SSE messages before returning InternalServer

In query_results, when trimmed_message.lines().count() != 1 you immediately return HandlerError::InternalServer (Line 199). That protects clients from malformed/multi-line events, but without logging it will be hard to diagnose why the stream failed.

Consider adding a tracing::error! (including search_job_id and perhaps the raw message length) before returning the error so you have observability when upstream produces bad data.


212-222: Simplify IntoHandlerError/From pattern for clarity

Using an empty IntoHandlerError marker trait plus a blanket impl<T: IntoHandlerError> From<T> for HandlerError works, but it hides exactly which error types are converted and can make future additions less explicit.

You could simplify this by dropping the marker trait and writing concrete impls instead:

-enum HandlerError {
-    #[error("Internal server error")]
-    InternalServer,
-}
-
-trait IntoHandlerError {}
-
-impl IntoHandlerError for axum::Error {}
-impl IntoHandlerError for clp_client::ClientError {}
-
-impl<T: IntoHandlerError> From<T> for HandlerError {
-    fn from(_: T) -> Self {
-        Self::InternalServer
-    }
-}
+enum HandlerError {
+    #[error("Internal server error")]
+    InternalServer,
+}
+
+impl From<axum::Error> for HandlerError {
+    fn from(_: axum::Error) -> Self {
+        HandlerError::InternalServer
+    }
+}
+
+impl From<clp_client::ClientError> for HandlerError {
+    fn from(_: clp_client::ClientError) -> Self {
+        HandlerError::InternalServer
+    }
+}

This keeps behaviour the same while making the mapping between concrete errors and HandlerError easier to follow.

♻️ Duplicate comments (1)
components/api-server/src/bin/api_server.rs (1)

104-121: Tighten CORS and serve /openapi.json with proper JSON content-type

Two points here:

  1. /openapi.json currently returns a raw String, which Axum will serve as text/plain. For OpenAPI consumers it’s better to return a JSON body (application/json) via Json.
  2. CorsLayer::new().allow_origin(Any) enables all origins for every route, which is quite permissive for production. It’s fine for local Swagger usage, but consider tightening it (at least to explicit methods, and ideally to configurable origins).

A possible adjustment:

+use axum::http::Method;
@@
-    let (router, api) = OpenApiRouter::with_openapi(ApiDoc::openapi())
+    let (router, api) = OpenApiRouter::with_openapi(ApiDoc::openapi())
         .route("/", get(health))
         .routes(routes!(health))
         .routes(routes!(query))
         .routes(routes!(query_results))
         .with_state(client)
         .split_for_parts();
-    let api_json = api.to_json()?;
-    let router = router
-        .route(
-            "/openapi.json",
-            get(|| async { (StatusCode::OK, api_json) }),
-        )
-        .layer(CorsLayer::new().allow_origin(Any));
+    let api = api.clone();
+    let router = router
+        .route(
+            "/openapi.json",
+            // Serve OpenAPI JSON with the correct content type.
+            get(move || {
+                let api = api.clone();
+                async move { Json(api) }
+            }),
+        )
+        .layer(
+            CorsLayer::new()
+                .allow_origin(Any) // consider making this configurable for production
+                .allow_methods([Method::GET, Method::POST]),
+        );

This keeps current behaviour for tooling while improving correctness and reducing some CORS surface area; for production you may still want to drive the allowed origins from config.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dbfa38e and fb250a4.

📒 Files selected for processing (1)
  • components/api-server/src/bin/api_server.rs (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/api-server/src/bin/api_server.rs
🔇 Additional comments (2)
components/api-server/src/bin/api_server.rs (2)

11-21: New routing/OpenAPI imports look correct

The added imports for routing::get, CorsLayer, Any, OpenApi, and OpenApiRouter/routes line up with their usage below; nothing problematic here.


143-173: Based on my investigation, I cannot access the repository to verify the actual imports in components/api-server/src/bin/api_server.rs due to cloning failures in the sandbox environment. However, the web search confirms that the review comment's concern is technically valid: the json! macro from serde_json must be explicitly imported or qualified in scope to use it within utoipa attribute macros.

The documentation shows that use serde_json::json; is the standard required import when using json!(...) in utoipa path annotations.


Verify json! macro visibility in query OpenAPI annotation

The annotation at lines 157–162 uses example = json!({"query_results_uri":"/query_results/1"}), which requires serde_json::json to be in scope. Confirm that one of the following exists in the file:

  • use serde_json::json; at module level, or
  • #[macro_use] extern crate serde_json; at crate level, or
  • Qualification as serde_json::json!(...)

If none of these are present, add:

+use serde_json::json;

Comment on lines +126 to +132
#[derive(utoipa::OpenApi)]
#[openapi(info(
title = "API Server",
description = "API Server for CLP",
contact(name = "YScope")
))]
struct ApiDoc;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Enrich OpenAPI metadata (version, servers, etc.)

ApiDoc currently sets only title/description/contact. For external consumers it’s often useful to also specify version and servers (and optionally license, terms_of_service) in the #[openapi(info(...))] block so clients can better understand the API surface and deployment environment. This is optional but improves docs quality.

🤖 Prompt for AI Agents
In components/api-server/src/bin/api_server.rs around lines 126 to 132, the
#[openapi(info(...))] block for ApiDoc only includes title/description/contact;
update that info block to also provide a version string and a servers array (and
optionally license and terms_of_service) so the generated OpenAPI metadata
includes API version and deployment server URLs—add entries like version =
"x.y.z" and servers([OpenApiServer::new("https://api.example.com")]) or
equivalent attributes supported by utoipa, keeping the existing fields intact.

Comment thread components/api-server/src/bin/api_server.rs Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
components/api-server/src/bin/api_server.rs (3)

199-206: Log errors before returning HandlerError for better observability.

When a multi-line message is encountered (Line 203), the code returns HandlerError::InternalServer without logging the problematic message or any context. This makes debugging production issues difficult.

Apply this diff to add logging:

     let event_stream = results_stream.map(|res| {
         let message = res?;
         let trimmed_message = message.trim();
         if trimmed_message.lines().count() != 1 {
+            tracing::error!("Received multi-line message in SSE stream: {}", trimmed_message);
             return Err(HandlerError::InternalServer);
         }
         Ok(Event::default().data(trimmed_message))
     });

223-227: Generic error conversion loses all diagnostic context.

The From<T> implementation (lines 223-227) converts all errors into HandlerError::InternalServer without preserving any error details. This severely impacts observability and makes debugging production issues nearly impossible.

Apply this diff to add logging before error conversion:

 impl<T: IntoHandlerError> From<T> for HandlerError {
-    fn from(_: T) -> Self {
+    fn from(error: T) -> Self {
+        tracing::error!("Handler error: {:?}", error);
         Self::InternalServer
     }
 }

Note: This requires T: IntoHandlerError + std::fmt::Debug. You may need to add Debug bounds to the trait implementations or use separate From implementations for each error type.


217-221: Consider removing the marker trait for simpler error handling.

The IntoHandlerError trait serves only as a marker with no methods. While it provides type safety for error conversions, it adds complexity. Consider using individual From implementations instead:

impl From<axum::Error> for HandlerError {
    fn from(error: axum::Error) -> Self {
        tracing::error!("Axum error: {:?}", error);
        Self::InternalServer
    }
}

impl From<clp_client::ClientError> for HandlerError {
    fn from(error: clp_client::ClientError) -> Self {
        tracing::error!("Client error: {:?}", error);
        Self::InternalServer
    }
}

This approach is more explicit and makes it easier to add error-specific handling or logging in the future.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fb250a4 and fdaf062.

📒 Files selected for processing (1)
  • components/api-server/src/bin/api_server.rs (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/api-server/src/bin/api_server.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: rust-checks
🔇 Additional comments (1)
components/api-server/src/bin/api_server.rs (1)

1-21: Imports look good.

The new imports for CORS, OpenAPI, and utoipa-axum are appropriate for the functionality being added.

Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/api-server/src/bin/api_server.rs Outdated

@junhaoliao junhaoliao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great

instead of serving the openapi stats on the API server (i.e., the API server needs to be up and running so developers can read the specs), can we make it accessible in the docs site?

We can:

  1. create a binary to call the to_yaml() and output to a file during docs build? We can refer to #1654 for a similar setup.
  2. Then we serve the specs with the swagger UI:
    1. We can either use https://github.com/swagger-api/swagger-codegen to generate a static site during docs build and place that somewhere in the dist assets
    2. OR use sphinxcontrib-openapi which should better integrate with the docs site

Then any developers / users who visit our docs site can read the API specs effortlessly

Let me know what you think

Comment thread components/api-server/README.md Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this / can this be auto-generated?

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially reviewed. Will get back to u tmr.

Comment thread components/api-server/src/bin/api_server.rs
Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/api-server/src/bin/api_server.rs Outdated
Comment thread components/api-server/src/client.rs
Comment thread components/api-server/src/bin/api_server.rs Outdated
hoophalab and others added 7 commits November 26, 2025 14:29
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
@hoophalab hoophalab changed the title feat(api-server): Add OpenAPI endpoint and README.md. feat(api-server): Add OpenAPI endpoint; Move API routes to a separate module. Nov 26, 2025
Comment thread components/api-server/src/client.rs
Comment thread components/api-server/src/routes.rs Outdated
Comment thread components/api-server/src/routes.rs Outdated
Comment thread components/api-server/src/routes.rs Outdated
"/openapi.json",
get(|| async { (StatusCode::OK, api_json) }),
)
.layer(CorsLayer::new().allow_origin(Any));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: Why do we need this?

@hoophalab hoophalab Nov 26, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, the user cannot put openapi.json url to swagger ui in the browser, because the json url is from a different host.

Comment thread components/api-server/src/routes.rs
Comment thread components/api-server/src/routes.rs
.context(format!("Cannot listen to {addr}"))?;

let client = clp_client::Client::connect(&config, &credentials)
let client = api_server::client::Client::connect(&config, &credentials)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we name the library crate to api_server, shall we rename client module to clp_client for clarity?

@hoophalab hoophalab Nov 26, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I still feel we shouldn't mention clp in any place of the code. Let's rename this after we have another client

hoophalab and others added 5 commits November 26, 2025 18:43
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
components/api-server/Cargo.toml (1)

32-32: tower-http version 0.6.7 does not exist.

This was already flagged in a previous review. The latest published version is 0.6.6.

components/api-server/src/routes.rs (1)

156-172: Factory function placement and CORS configuration.

Per past review feedback, consider moving from_client to the top of the file since it's the only public symbol.

The CORS configuration with allow_origin(Any) was flagged in past reviews as permissive. While necessary for Swagger UI access from different origins, consider:

  1. Adding .allow_methods([Method::GET, Method::POST]) to restrict HTTP methods
  2. Documenting that this is intentional for tooling purposes
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fdaf062 and ddf6294.

📒 Files selected for processing (5)
  • components/api-server/Cargo.toml (2 hunks)
  • components/api-server/src/bin/api_server.rs (1 hunks)
  • components/api-server/src/client.rs (1 hunks)
  • components/api-server/src/lib.rs (1 hunks)
  • components/api-server/src/routes.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (9)
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.

Applied to files:

  • components/api-server/Cargo.toml
  • components/api-server/src/routes.rs
  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

Applied to files:

  • components/api-server/Cargo.toml
📚 Learning: 2025-11-10T05:19:56.600Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1575
File: components/clp-py-utils/clp_py_utils/clp_config.py:602-607
Timestamp: 2025-11-10T05:19:56.600Z
Learning: In the y-scope/clp repository, the `ApiServer` class in `components/clp-py-utils/clp_py_utils/clp_config.py` does not need a `transform_for_container()` method because no other containerized service depends on the API server - it's only accessed from the host, so no docker-network communication is expected.

Applied to files:

  • components/api-server/Cargo.toml
  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-08-25T00:45:05.464Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1242
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:39-41
Timestamp: 2025-08-25T00:45:05.464Z
Learning: Task v3.44.1 causes issues in the clp-ffi-js project (issue #110), so CLP should avoid upgrading to v3.44.1 and stick with v3.44.0 for now.

Applied to files:

  • components/api-server/Cargo.toml
📚 Learning: 2025-10-20T22:20:45.733Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1448
File: components/clp-rust-utils/Cargo.toml:4-4
Timestamp: 2025-10-20T22:20:45.733Z
Learning: Rust edition 2024 is a valid edition value in Cargo.toml files. It was released with Rust 1.85 on February 20, 2025.

Applied to files:

  • components/api-server/Cargo.toml
📚 Learning: 2025-07-05T03:38:16.779Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-09-22T15:46:34.873Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1311
File: integration-tests/tests/test_identity_transformation.py:93-97
Timestamp: 2025-09-22T15:46:34.873Z
Learning: In the CLP project, multi-line formatting is preferred over single-line ternary operators when the single line would exceed the project's line length limits, prioritizing code readability and adherence to coding standards.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-09-08T21:41:32.059Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1291
File: docs/src/user-docs/quick-start/text-v-json.md:84-88
Timestamp: 2025-09-08T21:41:32.059Z
Learning: In the y-scope/clp project documentation, maintain informal contractions like "can't" and "don't" rather than formal alternatives like "cannot" and "do not" - the maintainer prefers a conversational tone.

Applied to files:

  • components/api-server/src/bin/api_server.rs
📚 Learning: 2025-09-03T07:29:35.223Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1107
File: components/core/src/clp_s/CMakeLists.txt:106-116
Timestamp: 2025-09-03T07:29:35.223Z
Learning: In the CLP project, do not suggest adding CONFIG or MODULE specifications to find_package calls. The maintainers prefer using find_package without these mode specifications.

Applied to files:

  • components/api-server/src/bin/api_server.rs
🧬 Code graph analysis (2)
components/api-server/src/routes.rs (1)
components/api-server/src/client.rs (1)
  • from (54-65)
components/api-server/src/bin/api_server.rs (2)
components/api-server/src/client.rs (1)
  • connect (90-104)
components/api-server/src/routes.rs (1)
  • from_client (156-172)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: rust-checks
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (11)
components/api-server/Cargo.toml (2)

7-7: Library rename looks good.

Renaming the library from clp_client to api_server aligns with the module restructuring and better reflects the crate's purpose.


36-37: OpenAPI dependencies look correct.

The utoipa and utoipa-axum versions are appropriate for the OpenAPI integration.

components/api-server/src/lib.rs (1)

1-3: Module restructuring is clean.

Making client public directly (instead of re-exporting with pub use client::*) and adding the new routes module improves the API surface organization. Note that removing the wildcard re-export is a breaking change for any external consumers that relied on api_server::Client or api_server::QueryConfig directly—they now need to use api_server::client::Client etc.

components/api-server/src/client.rs (2)

16-16: ToSchema import supports OpenAPI integration.

Adding the ToSchema derive enables automatic OpenAPI schema generation for QueryConfig.


21-51: QueryConfig documentation and structure look good.

The field-level documentation is comprehensive, and the structure addresses the feedback from previous reviews. The #[serde(deny_unknown_fields)] attribute provides good input validation.

components/api-server/src/bin/api_server.rs (1)

84-88: Clean separation of concerns.

The binary now properly delegates routing to the routes module, significantly simplifying this file. This addresses the past review feedback about the file being "too overloaded."

components/api-server/src/routes.rs (5)

21-32: OpenAPI metadata is minimal but functional.

The ApiDoc struct provides basic metadata. As noted in past reviews, consider enriching with version and servers for better documentation, but this is optional.


34-41: Health endpoint is straightforward.

Simple and effective health check implementation.


43-77: Query endpoint looks good.

The OpenAPI annotations are comprehensive with examples, and the handler properly delegates to the client.


108-116: Empty message edge case may cause unexpected error.

If message.trim() results in an empty string, lines().count() returns 0, which fails the != 1 check and returns InternalServer error. This might be the intended behaviour for malformed data, but consider whether empty results should be silently skipped instead:

     let event_stream = results_stream.map(|res| {
         let message = res?;
         let trimmed_message = message.trim();
+        if trimmed_message.is_empty() {
+            return Ok(Event::default().data(""));
+        }
         if trimmed_message.lines().count() != 1 {
             return Err(HandlerError::InternalServer);
         }
         Ok(Event::default().data(trimmed_message))
     });

Verify if empty messages are expected from the results stream and how they should be handled.


157-159: Health endpoint is mounted at both / and /health.

Line 158 mounts health at /, and line 159 uses routes!(health) which registers it at /health per the #[utoipa::path] annotation. This is intentional but worth documenting—clients can use either path.

Comment thread components/api-server/src/bin/api_server.rs

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat(api-server): Document APIs using OpenAPI and serve on `/openapi.json` endpoint.

@hoophalab hoophalab changed the title feat(api-server): Add OpenAPI endpoint; Move API routes to a separate module. feat(api-server): Document APIs using OpenAPI and serve on /openapi.json endpoint. Nov 27, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
components/api-server/src/routes.rs (1)

21-48: Move public function to the top of the file.

As noted in previous review, since from_client is the only public symbol in this module, it should be positioned at the top for better discoverability and consistency with Rust conventions.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddf6294 and db3ca73.

📒 Files selected for processing (1)
  • components/api-server/src/routes.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
components/api-server/src/routes.rs (1)
components/api-server/src/client.rs (1)
  • from (54-65)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: rust-checks

Comment on lines +140 to +142
if trimmed_message.lines().count() != 1 {
return Err(HandlerError::InternalServer);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the routes.rs file
find . -name "routes.rs" -type f | head -20

Repository: y-scope/clp

Length of output: 166


🏁 Script executed:

# Analyzing the optimization suggestion without needing repository files
cat << 'EOF'
# Verification of the optimization suggestion in the review comment

# Original code:
# if trimmed_message.lines().count() != 1 {
#     return Err(HandlerError::InternalServer);
# }

# Suggested code:
# if trimmed_message.is_empty() || trimmed_message.contains('\n') {
#     return Err(HandlerError::InternalServer);
# }

# Analysis:
# 1. lines().count() vs contains('\n'):
#    - lines().count() iterates entire string and counts line breaks + 1
#    - contains('\n') stops at first occurrence
#    - For single-line messages: lines().count() == 1, contains('\n') == false (both pass)
#    - For multi-line messages: lines().count() > 1, contains('\n') == true (both reject)
#    - For empty string: lines().count() == 0, contains('\n') == false
#
# 2. Issue with suggested optimization:
#    - Original: trimmed_message.lines().count() != 1 (rejects if NOT exactly 1 line)
#    - Suggested: trimmed_message.is_empty() || trimmed_message.contains('\n')
#    - Empty string case:
#      * Original: lines().count() on "" returns 0, so 0 != 1 is true -> rejects
#      * Suggested: is_empty() is true -> rejects
#      * Both reject, so equivalent here
#    - Single line case:
#      * Original: "hello".lines().count() == 1, so 1 != 1 is false -> accepts
#      * Suggested: is_empty() is false, contains('\n') is false, so false || false = false -> accepts
#      * Both accept, so equivalent here
#    - Multi-line case:
#      * Original: "a\nb".lines().count() == 2, so 2 != 1 is true -> rejects
#      * Suggested: is_empty() is false, contains('\n') is true, so false || true = true -> rejects
#      * Both reject, so equivalent here
#
# CONCLUSION: The optimization is semantically correct!
# Performance claim is also valid:
#    - lines().count() scans entire string
#    - contains('\n') stops at first newline
#    - For large single-line messages, the optimization is faster

echo "Technical analysis complete"
EOF

Repository: y-scope/clp

Length of output: 166


🌐 Web query:

Rust lines().count() performance vs contains() for checking single line

💡 Result:

Short answer: don't use lines().count() to check for a single line — it's slower and allocates more. Read the file (or stream) and search directly.

Why:

  • std::io::Lines yields owned Strings (allocations) per line when reading via BufRead::lines(), which adds significant overhead compared with scanning bytes. (reddit.com)
  • A single-pass substring/byte search (e.g., str::contains or a memchr/memmem-based search) is linear and much faster in practice than splitting into owned lines. Benchmarks and high‑performance tools use byte-oriented scans (memchr/memmem/optimized libraries) for line/substring work. (github.com)

Practical options (fast → faster):

  1. If file fits memory: read_to_string and use either:
    • s.lines().any(|l| l == target) — avoids allocating new Strings because std::str::lines yields &str slices; or
    • s.contains(&format!("{}\n", target)) or a memmem search for exact-line-with-newline (fast but be careful about file end without newline). (reddit.com)
  2. If streaming large files: read large byte buffers and search with memchr/memmem (or use an optimized crate/tool like ripwc/higher‑performance line counters) to avoid per-line allocations. (reddit.com)

If you want, tell me your exact code or file-size constraints and I’ll give a concrete snippet (BufReader chunked scan, memchr/memmem, or read_to_string + any/contains).

Citations:


Based on my analysis, I can now provide the rewritten review comment:

Optimize single-line validation using contains() instead of lines().count().

The current check uses lines().count() which scans the entire message and allocates owned Strings—unnecessary overhead for this validation. Use contains('\n') instead, which stops at the first newline and avoids allocation.

The suggested optimization is semantically equivalent: it rejects empty strings and multi-line messages the same way while being significantly faster.

-        if trimmed_message.lines().count() != 1 {
-            return Err(HandlerError::InternalServer);
-        }
+        if trimmed_message.is_empty() || trimmed_message.contains('\n') {
+            return Err(HandlerError::InternalServer);
+        }

Verify that the backend cannot legitimately return multi-line results (e.g., pretty-printed JSON). If it can, this validation logic may need adjustment or the constraint should be enforced at the source.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if trimmed_message.lines().count() != 1 {
return Err(HandlerError::InternalServer);
}
if trimmed_message.is_empty() || trimmed_message.contains('\n') {
return Err(HandlerError::InternalServer);
}
🤖 Prompt for AI Agents
In components/api-server/src/routes.rs around lines 140 to 142, the current
single-line validation uses trimmed_message.lines().count(), which iterates and
allocates; replace it with a simpler check that rejects empty messages or any
message containing a newline: use trimmed_message.is_empty() ||
trimmed_message.contains('\n') to return Err(HandlerError::InternalServer). Also
verify whether the backend can legitimately return multi-line output (e.g.,
pretty JSON) and adjust the validation or enforce the constraint upstream if
needed.

Comment on lines +148 to +172
/// Generic errors for request handlers.
#[derive(Error, Debug)]
enum HandlerError {
#[error("Internal server error")]
InternalServer,
}

trait IntoHandlerError {}

impl IntoHandlerError for axum::Error {}

impl IntoHandlerError for ClientError {}

impl<T: IntoHandlerError> From<T> for HandlerError {
fn from(_: T) -> Self {
Self::InternalServer
}
}

/// Converts [`HandlerError`] into an HTTP response.
impl IntoResponse for HandlerError {
fn into_response(self) -> axum::response::Response {
StatusCode::INTERNAL_SERVER_ERROR.into_response()
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider preserving error context for better observability.

All errors are currently mapped to a generic InternalServer (500) response, which loses valuable context for debugging and monitoring. While the marker trait pattern works (as noted in previous reviews), consider whether some errors should be differentiated (e.g., 4xx client errors vs 5xx server errors) or at least logged with their original context before conversion.

For example, you could log errors before converting:

impl<T: IntoHandlerError + std::fmt::Display> From<T> for HandlerError {
    fn from(err: T) -> Self {
        tracing::error!("Handler error: {}", err);
        Self::InternalServer
    }
}

Or add variants to distinguish error types:

enum HandlerError {
    #[error("Internal server error")]
    InternalServer,
    #[error("Client error: {0}")]
    Client(String),
}

@hoophalab hoophalab merged commit f172935 into y-scope:main Nov 27, 2025
21 checks passed
@hoophalab hoophalab deleted the openapi branch December 2, 2025 18:22
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants