fix(api-server): Enforce `max_num_results` across all result output paths (fixes #1858). by goynam · Pull Request #2180 · y-scope/clp

goynam · 2026-04-07T12:03:06Z

The max_num_results parameter was only enforced in the results-cache (MongoDB) output path at the worker level. The file and S3 output paths had no limit applied, so all results were returned regardless of the configured max.

This applies the limit in the API server when streaming results back to the client across all three paths (file, S3, MongoDB). A value of 0 continues to mean unlimited.

Summary by CodeRabbit

New Features
- Search queries now honor a configurable maximum results setting so you can limit how many items are returned.
- Result limits are uniformly enforced across file, object storage and database backends.
- Streaming and pagination now respect configured limits, producing more predictable, capped result sets.

coderabbitai · 2026-04-07T12:03:24Z

Walkthrough

Client::fetch_results reads max_num_results from SearchJobConfig and forwards it to file, S3, and Mongo helpers; file/S3 streams apply take(...) when >0; Mongo helper sets FindOptions.limit when >0.

Changes

Cohort / File(s)	Summary
Result fetching & limiting `components/api-server/src/client.rs`	`Client::fetch_results` now passes `max_num_results` into all backend helpers. `fetch_results_from_file` and `fetch_results_from_s3` gain a `max_num_results: u32` parameter and convert/apply it as a stream limit (`take(...)`) when >0. `fetch_results_from_s3` binds the stream before applying the limit. `fetch_results_from_mongo` gains `max_num_results: u32` and applies `FindOptions.limit` (via `.with_options`) when >0.
Error variants `components/api-server/src/error.rs`	Added `ClientError::InvalidSearchJobConfig(String)` to represent failures converting or using the persisted `max_num_results` value.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Config
  participant FileBackend
  participant S3Backend
  participant MongoDB
  Client->>Config: read SearchJobConfig.max_num_results
  Client->>FileBackend: fetch_results_from_file(search_job_id, max_num_results)
  FileBackend-->>Client: stream (apply take if max_num_results>0)
  Client->>S3Backend: fetch_results_from_s3(search_job_id, max_num_results)
  S3Backend-->>Client: stream (apply take if max_num_results>0)
  Client->>MongoDB: fetch_results_from_mongo(search_job_id, max_num_results)
  MongoDB->>MongoDB: set FindOptions.limit if max_num_results>0
  MongoDB-->>Client: cursor stream (limited if applicable)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check	✅ Passed	The title clearly summarizes the main change: enforcing max_num_results across all result output paths (file, S3, MongoDB) in the API server, matching the primary objective of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

goynam · 2026-04-07T12:04:57Z

@kirkrodrigues , @junhaoliao , @LinZhihao-723 for awareness.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

components/api-server/src/client.rs (2)

365-368: ⚠️ Potential issue | 🔴 Critical

Undefined variable max_num_results in S3 stream — code will not compile.

The stream at lines 427-432 references max_num_results, but this variable is not in scope. The function fetch_results_from_s3 does not accept max_num_results as a parameter.

🐛 Proposed fix — add parameter to function signature

     async fn fetch_results_from_s3(
         &self,
         search_job_id: u64,
+        max_num_results: u32,
     ) -> Result<impl Stream<Item = Result<String, ClientError>> + use<>, ClientError> {

Then update the call site at line 210:

-                inner: self.fetch_results_from_s3(search_job_id).await?,
+                inner: self.fetch_results_from_s3(search_job_id, job_config.max_num_results).await?,

Also applies to: 403-432

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@components/api-server/src/client.rs` around lines 365 - 368, The stream uses
an out-of-scope variable max_num_results; update the function signature of
fetch_results_from_s3 to accept a max_num_results: usize (or appropriate integer
type) parameter, update any internal references to use that parameter (e.g.,
inside the stream closure that currently references max_num_results), and then
update all call sites (such as the caller at the earlier invocation) to pass the
desired max_num_results value; ensure the return type and error handling remain
unchanged and adjust any tests or usages that call fetch_results_from_s3
accordingly.

204-213: ⚠️ Potential issue | 🟠 Major

File output path does not enforce max_num_results.

The PR description states the limit should be enforced for all three output paths, but the file path at line 207 does not pass or enforce max_num_results. The fetch_results_from_file method (lines 308-331) lacks any limiting logic, so file-based results will still return all results regardless of the configured maximum.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@components/api-server/src/client.rs` around lines 204 - 213, The file-output
path doesn't enforce max_num_results: when job_config.write_to_file is true and
StreamOutputStorage::Fs is chosen you construct SearchResultStream::File using
fetch_results_from_file(search_job_id) which neither receives nor enforces
job_config.max_num_results; update the call site to pass the max (e.g.,
fetch_results_from_file(search_job_id, job_config.max_num_results)) and change
the fetch_results_from_file function signature and implementation to cap the
returned results accordingly (or apply the limit when constructing
SearchResultStream::File), referencing job_config.write_to_file,
StreamOutputStorage::Fs, SearchResultStream::File, fetch_results_from_file, and
the max_num_results parameter so all three output paths consistently honor the
configured limit.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/api-server/src/client.rs`:
- Around line 216-218: In fetch_results, the variable max_num_results is
undefined; update the call to fetch_results_from_mongo(search_job_id,
max_num_results) to pass the correct value from the available job_config (i.e.,
use job_config.max_num_results) so the call becomes
fetch_results_from_mongo(search_job_id, job_config.max_num_results); ensure you
reference job_config in the fetch_results scope and retain mapping to
SearchResultStream::Mongo { inner: s }.
- Around line 473-480: The current FindOptions branch applies a
timestamp-descending sort only when max_num_results > 0, causing inconsistent
ordering between limited and unlimited queries; update the construction of
find_options (the mongodb::options::FindOptions used where max_num_results is
checked) so that the sort(mongodb::bson::doc! { "timestamp": -1 }) is always
applied and only set .limit(i64::from(max_num_results)) when max_num_results > 0
(i.e., build a FindOptions::builder() with the sort unconditionally, and
conditionally add the limit), ensuring consistent ordering for both limited and
unlimited queries.

---

Outside diff comments:
In `@components/api-server/src/client.rs`:
- Around line 365-368: The stream uses an out-of-scope variable max_num_results;
update the function signature of fetch_results_from_s3 to accept a
max_num_results: usize (or appropriate integer type) parameter, update any
internal references to use that parameter (e.g., inside the stream closure that
currently references max_num_results), and then update all call sites (such as
the caller at the earlier invocation) to pass the desired max_num_results value;
ensure the return type and error handling remain unchanged and adjust any tests
or usages that call fetch_results_from_s3 accordingly.
- Around line 204-213: The file-output path doesn't enforce max_num_results:
when job_config.write_to_file is true and StreamOutputStorage::Fs is chosen you
construct SearchResultStream::File using fetch_results_from_file(search_job_id)
which neither receives nor enforces job_config.max_num_results; update the call
site to pass the max (e.g., fetch_results_from_file(search_job_id,
job_config.max_num_results)) and change the fetch_results_from_file function
signature and implementation to cap the returned results accordingly (or apply
the limit when constructing SearchResultStream::File), referencing
job_config.write_to_file, StreamOutputStorage::Fs, SearchResultStream::File,
fetch_results_from_file, and the max_num_results parameter so all three output
paths consistently honor the configured limit.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e40d351a-f6e6-4bcc-b88f-3ab1c8a3f2d1

📥 Commits

Reviewing files that changed from the base of the PR and between 2f48844 and b25963abba27aae82f6dd5b69d23cc9c8867ffa6.

📒 Files selected for processing (1)

components/api-server/src/client.rs

coderabbitai · 2026-04-07T12:09:15Z

+        let find_options = if max_num_results > 0 {
+            mongodb::options::FindOptions::builder()
+                .sort(mongodb::bson::doc! { "timestamp": -1 })
+                .limit(i64::from(max_num_results))
+                .build()
+        } else {
+            mongodb::options::FindOptions::default()
+        };


🧹 Nitpick | 🔵 Trivial

Result ordering differs between limited and unlimited queries.

When max_num_results > 0, results are sorted by timestamp descending (newest first). When max_num_results == 0 (unlimited), no sort is applied, so results follow MongoDB's natural/insertion order. This inconsistency may surprise consumers who expect the same ordering regardless of the limit.

Consider applying the same sort for both cases to ensure consistent behaviour:

♻️ Suggested refactor for consistent ordering

let find_options = if max_num_results > 0 { mongodb::options::FindOptions::builder() .sort(mongodb::bson::doc! { "timestamp": -1 }) .limit(i64::from(max_num_results)) .build() } else { - mongodb::options::FindOptions::default() + mongodb::options::FindOptions::builder() + .sort(mongodb::bson::doc! { "timestamp": -1 }) + .build() };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let find_options = if max_num_results > 0 {

mongodb::options::FindOptions::builder()

.sort(mongodb::bson::doc! { "timestamp": -1 })

.limit(i64::from(max_num_results))

.build()

} else {

mongodb::options::FindOptions::default()

};

let find_options = if max_num_results > 0 {

mongodb::options::FindOptions::builder()

.sort(mongodb::bson::doc! { "timestamp": -1 })

.limit(i64::from(max_num_results))

.build()

} else {

mongodb::options::FindOptions::builder()

.sort(mongodb::bson::doc! { "timestamp": -1 })

.build()

};

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/api-server/src/client.rs` around lines 473 - 480, The current FindOptions branch applies a timestamp-descending sort only when max_num_results > 0, causing inconsistent ordering between limited and unlimited queries; update the construction of find_options (the mongodb::options::FindOptions used where max_num_results is checked) so that the sort(mongodb::bson::doc! { "timestamp": -1 }) is always applied and only set .limit(i64::from(max_num_results)) when max_num_results > 0 (i.e., build a FindOptions::builder() with the sort unconditionally, and conditionally add the limit), ensuring consistent ordering for both limited and unlimited queries.

hoophalab · 2026-04-07T16:20:59Z

Hi @goynam , thank you for submitting this PR. We're aware of this issue -- it's currently tracked in #1858. That said, truncating the number of results in the API server alone wouldn't prevent query workers from submitting a full result set, so a more comprehensive fix will be needed in the future.

Do you have a specific motivation for applying this quick patch now?

goynam · 2026-04-07T20:05:58Z

Hi Naman, thank you for submitting this PR. We're aware of this issue -- it's currently tracked in #1858. That said, truncating the number of results in the API server alone wouldn't prevent query workers from submitting a full result set, so a more comprehensive fix will be needed in the future.

Do you have a specific motivation for applying this quick patch now?

Hi @hoophalab , I was working on a canary monitoring system that periodically queries an API server (at every tk interval) to verify its operational status. The canary’s primary role is to detect issues and trigger alerts when anomalies are detected.

During validation testing, I configured max_num_results = to limit the number of responses for simplicity. However, the API server consistently returns more results than this limit allows, due to which i tried to give this a go. But I’m happy to wait for a more comprehensive fix if one is planned for future updates.

hoophalab

Thank you for your contribution - let's patch this behavior in the API server. Here are some comments regarding this PR.

hoophalab · 2026-04-09T17:11:39Z

@@ -213,7 +213,7 @@ impl Client {
            return Ok(stream);
        }

-        self.fetch_results_from_mongo(search_job_id)
+        self.fetch_results_from_mongo(search_job_id, max_num_results)


Suggested change

self.fetch_results_from_mongo(search_job_id, max_num_results)

self.fetch_results_from_mongo(search_job_id, job_config.max_num_results)

This doesn't compile

hoophalab · 2026-04-09T17:15:19Z

We probably also need to patch fetch_results_from_file

hoophalab · 2026-04-09T17:20:30Z

+            mongodb::options::FindOptions::builder()
+                .sort(mongodb::bson::doc! { "timestamp": -1 })
+                .limit(i64::from(max_num_results))
+                .build()


Suggested change

mongodb::options::FindOptions::builder()

.sort(mongodb::bson::doc! { "timestamp": -1 })

.limit(i64::from(max_num_results))

.build()

mongodb::options::FindOptions::builder()

.limit(i64::from(max_num_results))

.build()

Sort sounds like an expensive operation. S3 also doesn't support sort. Can we drop this sort for consistency?

hoophalab · 2026-04-09T17:28:40Z

        Ok(stream! {
-            while let Some(object_page) = object_pages.next().await {
+            let mut num_results: u32 = 0;


Instead of counting manually, we could utilize StreamExt::take

How about:

let stream = stream! { ... }; Ok(stream.take(max_num_results))

max_num_results isn't passed as an argument of this function

checking max_num_results == 0 is unnecessarily

hoophalab · 2026-04-09T17:29:37Z

@@ -456,13 +463,25 @@ impl Client {
    async fn fetch_results_from_mongo(
        &self,
        search_job_id: u64,
+        max_num_results: u32,


We need to update the docstring of this function

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

components/api-server/src/client.rs (1)
409-440: ⚠️ Potential issue | 🟠 Major

S3 still downloads the full object before the limit can bite.

take() caps the number of yielded items, but each touched object is still fully fetched and buffered via obj.body.collect().await? before the first record from that object is yielded. With max_num_results = 1, a large shard can still incur the full S3 download and memory cost, which undercuts the canary use-case. Please switch this path to incremental decoding so the read can stop as soon as the limit is reached.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/api-server/src/client.rs` around lines 409 - 440, The current
stream implementation fetches entire S3 object bodies via
obj.body.collect().await? before deserializing which defeats
stream.take(max_num_results) — change to incremental decoding by converting the
response body into an async reader (e.g., obj.body.into_async_read() or
tokio_util::io::StreamReader) and feed that reader into rmp_serde::Deserializer
(use from_read or from_read_ref variant that accepts an AsyncRead adapter) and
deserialize events one-by-one inside the yielded loop so you can stop reading as
soon as the outer stream reaches max_num_results; update the loop inside the
stream! block where object_pages, obj.body.collect().await?,
Deserialize::deserialize and deserializer are used to perform incremental reads
and ensure the stream.take(max_num_results) actually prevents further S3 reads.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/api-server/src/client.rs`:
- Around line 203-217: The API currently conflates "omitted" and "zero" for
max_num_results so callers can't request an unlimited stream; change the
representation to Option<u32> for job_config.max_num_results (and the incoming
request type used by submit_query) so None = omitted (apply
default_max_num_query_results later) and Some(0) = explicit "no limit"; update
submit_query to stop rewriting 0 into default_max_num_query_results and to
forward Option<u32> as-is, and update fetch_results / fetch_results_from_mongo /
fetch_results_from_file /fetch_results_from_s3 call sites to accept Option<u32>
and treat Some(0) as unbounded while treating None as "use default" where
appropriate.

---

Outside diff comments:
In `@components/api-server/src/client.rs`:
- Around line 409-440: The current stream implementation fetches entire S3
object bodies via obj.body.collect().await? before deserializing which defeats
stream.take(max_num_results) — change to incremental decoding by converting the
response body into an async reader (e.g., obj.body.into_async_read() or
tokio_util::io::StreamReader) and feed that reader into rmp_serde::Deserializer
(use from_read or from_read_ref variant that accepts an AsyncRead adapter) and
deserialize events one-by-one inside the yielded loop so you can stop reading as
soon as the outer stream reaches max_num_results; update the loop inside the
stream! block where object_pages, obj.body.collect().await?,
Deserialize::deserialize and deserializer are used to perform incremental reads
and ensure the stream.take(max_num_results) actually prevents further S3 reads.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7ea73f67-21b6-4e49-95ee-c4a8e61b32d2

📥 Commits

Reviewing files that changed from the base of the PR and between b25963abba27aae82f6dd5b69d23cc9c8867ffa6 and 9b2ab87b10e2087750d6e86fc04fe39009e8f68c.

📒 Files selected for processing (1)

components/api-server/src/client.rs

coderabbitai · 2026-04-14T18:11:09Z

+        let max_num_results = job_config.max_num_results;

        if job_config.write_to_file {
            let stream = match &self.config.stream_output.storage {
                StreamOutputStorage::Fs { .. } => SearchResultStream::File {
-                    inner: self.fetch_results_from_file(search_job_id),
+                    inner: self.fetch_results_from_file(search_job_id, max_num_results),
                },
                StreamOutputStorage::S3 { .. } => SearchResultStream::S3 {
-                    inner: self.fetch_results_from_s3(search_job_id).await?,
+                    inner: self.fetch_results_from_s3(search_job_id, max_num_results).await?,
                },
            };
            return Ok(stream);
        }

-        self.fetch_results_from_mongo(search_job_id)
+        self.fetch_results_from_mongo(search_job_id, max_num_results)


⚠️ Potential issue | 🟠 Major

0 still cannot produce an unlimited stream for API-submitted jobs.

fetch_results now treats job_config.max_num_results == 0 as unbounded, but submit_query still rewrites an incoming 0 to default_max_num_query_results on Lines 134-137. In practice, callers of this API still cannot request “no limit”, so the docs on Line 32 and these new 0 branches do not hold end-to-end. This needs a representation that distinguishes “omitted” from 0 (for example, Option<u32>).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/api-server/src/client.rs` around lines 203 - 217, The API currently conflates "omitted" and "zero" for max_num_results so callers can't request an unlimited stream; change the representation to Option<u32> for job_config.max_num_results (and the incoming request type used by submit_query) so None = omitted (apply default_max_num_query_results later) and Some(0) = explicit "no limit"; update submit_query to stop rewriting 0 into default_max_num_query_results and to forward Option<u32> as-is, and update fetch_results / fetch_results_from_mongo / fetch_results_from_file /fetch_results_from_s3 call sites to accept Option<u32> and treat Some(0) as unbounded while treating None as "use default" where appropriate.

hoophalab

Overall good. One comment

hoophalab · 2026-04-15T09:52:38Z

+        if max_num_results > 0 {
+            Ok(stream.take(max_num_results as usize).left_stream())
+        } else {
+            Ok(stream.right_stream())
+        }


max_num_results = 0 in struct SearchJobConfig simply means that the limit is 0. fn submit_query sets SearchJobConfig::max_num_results to the default value if QueryConfig::max_num_results is 0. We also need to remove A value of 0 means no limit. in the docstring.

We cannot convert u32 to usize safely on all hardware platforms. It's best to use try_into instead.

Suggested change

if max_num_results > 0 {

Ok(stream.take(max_num_results as usize).left_stream())

} else {

Ok(stream.right_stream())

}

Ok(stream.take(max_num_results.try_into().map_err(|_| {

ClientError::InvalidSearchJobConfig(

"cannot convert `max_num_results` from `u32` to `usize`".to_owned(),

)

})?))

and add the following error item in api-server/src/error.rs

#[error("Invalid search job config: {0}")] InvalidSearchJobConfig(String),

When updating fetch_results_from_file, you might need to make the function return a Result<impl Stream<...>, ClientError

hoophalab

LGTM. Two minor issues left:

The CI workflow doesn't pass. Can you run task lint:fix-rust to fix lint?
How about the following title, which follows our convention?

fix(api-server): Enforce `max_num_results` across all result output paths (fixes #1858).

…aths (fixes y-scope#1858).

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/api-server/src/client.rs`:
- Around line 311-315: Update the docstring for the function
fetch_results_from_file to document the max_num_results parameter and its
semantics (same behavior described in fetch_results_from_mongo: how - or zero -
values are handled, limits applied, and any streaming/truncation behavior).
Locate the fetch_results_from_file function and add a brief sentence or param
block describing max_num_results, mirroring the wording used in
fetch_results_from_mongo for consistency with existing docs.
- Around line 440-445: The current code always applies stream.take(limit) which
treats max_num_results == 0 as "take 0" (empty stream); change the logic to
treat 0 as "no limit": if max_num_results == 0 return Ok(stream) directly,
otherwise convert max_num_results to usize (preserving the existing
ClientError::InvalidSearchJobConfig conversion on failure) and return
Ok(stream.take(limit)); update the branch around max_num_results, stream.take,
and the conversion to implement this conditional behavior.
- Around line 334-339: The current code always applies stream.take(limit) which
causes stream.take(0) to yield no items when max_num_results == 0 (meaning
unlimited). Fix by branching on max_num_results: convert max_num_results to
usize only when > 0 and call stream.take(limit) in that branch, otherwise use a
take that effectively means unlimited (e.g., stream.take(usize::MAX)) or return
the unbounded stream variant so the return type still matches; update the block
around the conversion/return (the code using max_num_results, limit, and
stream.take(limit)) to implement this conditional behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b5fa69c2-4a42-4e20-bc21-af2dbfa1cb56

📥 Commits

Reviewing files that changed from the base of the PR and between 9b2ab87b10e2087750d6e86fc04fe39009e8f68c and 076c135.

📒 Files selected for processing (2)

components/api-server/src/client.rs
components/api-server/src/error.rs

coderabbitai · 2026-04-15T18:23:30Z

    fn fetch_results_from_file(
        &self,
        search_job_id: u64,
-    ) -> impl Stream<Item = Result<String, ClientError>> + use<> {
+        max_num_results: u32,
+    ) -> Result<impl Stream<Item = Result<String, ClientError>> + use<>, ClientError> {


🧹 Nitpick | 🔵 Trivial

Consider updating docstring to document max_num_results parameter.

The docstring for fetch_results_from_mongo (lines 450-452) documents the new max_num_results behaviour, but this function's docstring does not mention it. Adding similar documentation would improve consistency.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/api-server/src/client.rs` around lines 311 - 315, Update the docstring for the function fetch_results_from_file to document the max_num_results parameter and its semantics (same behavior described in fetch_results_from_mongo: how - or zero - values are handled, limits applied, and any streaming/truncation behavior). Locate the fetch_results_from_file function and add a brief sentence or param block describing max_num_results, mirroring the wording used in fetch_results_from_mongo for consistency with existing docs.

coderabbitai · 2026-04-15T18:23:30Z

+        let limit: usize = max_num_results.try_into().map_err(|_| {
+            ClientError::InvalidSearchJobConfig(
+                "cannot convert `max_num_results` from `u32` to `usize`".to_owned(),
+            )
+        })?;
+        Ok(stream.take(limit))


⚠️ Potential issue | 🔴 Critical

Critical: stream.take(0) returns an empty stream when unlimited results are expected.

When max_num_results is 0 (meaning unlimited per the docstring on line 32), stream.take(0) yields zero items immediately. The MongoDB path correctly handles this with if max_num_results > 0 (line 485), but file and S3 paths do not.

🐛 Proposed fix

let limit: usize = max_num_results.try_into().map_err(|_| { ClientError::InvalidSearchJobConfig( "cannot convert `max_num_results` from `u32` to `usize`".to_owned(), ) })?; - Ok(stream.take(limit)) + if limit > 0 { + Ok(stream.take(limit).boxed_local()) + } else { + Ok(stream.boxed_local()) + }

Note: You may need to adjust the return type or use a different approach (e.g., conditional take via a helper) to avoid boxing. Alternatively:

- let limit: usize = max_num_results.try_into().map_err(|_| { - ClientError::InvalidSearchJobConfig( - "cannot convert `max_num_results` from `u32` to `usize`".to_owned(), - ) - })?; - Ok(stream.take(limit)) + if max_num_results > 0 { + let limit: usize = max_num_results.try_into().map_err(|_| { + ClientError::InvalidSearchJobConfig( + "cannot convert `max_num_results` from `u32` to `usize`".to_owned(), + ) + })?; + Ok(stream.take(limit)) + } else { + Ok(stream.take(usize::MAX)) + }

Using take(usize::MAX) preserves the return type while effectively being unlimited.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/api-server/src/client.rs` around lines 334 - 339, The current code always applies stream.take(limit) which causes stream.take(0) to yield no items when max_num_results == 0 (meaning unlimited). Fix by branching on max_num_results: convert max_num_results to usize only when > 0 and call stream.take(limit) in that branch, otherwise use a take that effectively means unlimited (e.g., stream.take(usize::MAX)) or return the unbounded stream variant so the return type still matches; update the block around the conversion/return (the code using max_num_results, limit, and stream.take(limit)) to implement this conditional behavior.

coderabbitai · 2026-04-15T18:23:30Z

+        let limit: usize = max_num_results.try_into().map_err(|_| {
+            ClientError::InvalidSearchJobConfig(
+                "cannot convert `max_num_results` from `u32` to `usize`".to_owned(),
+            )
+        })?;
+        Ok(stream.take(limit))


⚠️ Potential issue | 🔴 Critical

Critical: Same stream.take(0) issue as the file path.

Identical bug — when max_num_results == 0, this returns an empty stream instead of unlimited results.

🐛 Proposed fix

- let limit: usize = max_num_results.try_into().map_err(|_| { - ClientError::InvalidSearchJobConfig( - "cannot convert `max_num_results` from `u32` to `usize`".to_owned(), - ) - })?; - Ok(stream.take(limit)) + if max_num_results > 0 { + let limit: usize = max_num_results.try_into().map_err(|_| { + ClientError::InvalidSearchJobConfig( + "cannot convert `max_num_results` from `u32` to `usize`".to_owned(), + ) + })?; + Ok(stream.take(limit)) + } else { + Ok(stream.take(usize::MAX)) + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/api-server/src/client.rs` around lines 440 - 445, The current code always applies stream.take(limit) which treats max_num_results == 0 as "take 0" (empty stream); change the logic to treat 0 as "no limit": if max_num_results == 0 return Ok(stream) directly, otherwise convert max_num_results to usize (preserving the existing ClientError::InvalidSearchJobConfig conversion on failure) and return Ok(stream.take(limit)); update the branch around max_num_results, stream.take, and the conversion to implement this conditional behavior.

goynam · 2026-04-15T18:32:20Z

LGTM. Two minor issues left:

The CI workflow doesn't pass. Can you run task lint:fix-rust to fix lint?

How about the following title, which follows our convention?
fix(api-server): Enforce `max_num_results` across all result output paths (fixes #1858).

Hi @hoophalab ,

The task runner needs GNU tar (gtar) which isn't installed on macOS (where i have done these changes) by default. have run these commands :

rustup run nightly cargo fmt --all
rustup run nightly cargo clippy --all-targets --all-features --fix --allow-dirty

for lint fix.

goynam · 2026-04-15T18:45:49Z

LGTM. Two minor issues left:

The CI workflow doesn't pass. Can you run task lint:fix-rust to fix lint?

How about the following title, which follows our convention?
fix(api-server): Enforce `max_num_results` across all result output paths (fixes #1858).
Hi @hoophalab ,

The task runner needs GNU tar (gtar) which isn't installed on macOS (where i have done these changes) by default. have run these commands :

rustup run nightly cargo fmt --all rustup run nightly cargo clippy --all-targets --all-features --fix --allow-dirty

for lint fix.

infact ran now in my linux env . Lint passed with no changes needed — code was formatted correctly.

kirkrodrigues

Deferring to @hoophalab's review.

…aths (fixes y-scope#1858). (y-scope#2180) Co-authored-by: Naman Goyal <namangg@amazon.com> Co-authored-by: hoophalab <200652805+hoophalab@users.noreply.github.com>

goynam requested a review from a team as a code owner April 7, 2026 12:03

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

junhaoliao added this to the Mid-April 2026 milestone Apr 7, 2026

junhaoliao linked an issue Apr 7, 2026 that may be closed by this pull request

max_num_results in SearchJobConfig is not strictly enforced. #1858

Closed

hoophalab requested changes Apr 9, 2026

View reviewed changes

goynam force-pushed the fix/enforce-max-num-results-in-api-server branch from b25963a to 9b2ab87 Compare April 14, 2026 18:03

coderabbitai Bot reviewed Apr 14, 2026

View reviewed changes

hoophalab reviewed Apr 15, 2026

View reviewed changes

goynam force-pushed the fix/enforce-max-num-results-in-api-server branch from 9b2ab87 to 6cf4687 Compare April 15, 2026 11:51

hoophalab reviewed Apr 15, 2026

View reviewed changes

fix(api-server): Enforce max_num_results across all result output p…

076c135

…aths (fixes y-scope#1858).

goynam force-pushed the fix/enforce-max-num-results-in-api-server branch from 44b852e to 076c135 Compare April 15, 2026 18:17

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

goynam changed the title ~~fix(api-server): enforce max_num_results across all result output paths~~ fix(api-server): Enforce max_num_results across all result output paths (fixes #1858). Apr 15, 2026

Merge branch 'main' into fix/enforce-max-num-results-in-api-server

ae2853f

hoophalab approved these changes Apr 15, 2026

View reviewed changes

kirkrodrigues approved these changes Apr 15, 2026

View reviewed changes

hoophalab merged commit b54df7b into y-scope:main Apr 16, 2026
22 checks passed

	self.fetch_results_from_mongo(search_job_id, max_num_results)
	self.fetch_results_from_mongo(search_job_id, job_config.max_num_results)

Conversation

goynam commented Apr 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

goynam commented Apr 7, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

hoophalab commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goynam commented Apr 7, 2026

Uh oh!

hoophalab left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hoophalab Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

hoophalab Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

hoophalab Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hoophalab Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

hoophalab Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

hoophalab left a comment

Choose a reason for hiding this comment

Uh oh!

hoophalab Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hoophalab left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

goynam commented Apr 15, 2026

Uh oh!

goynam commented Apr 15, 2026

Uh oh!

kirkrodrigues left a comment

Choose a reason for hiding this comment

goynam commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading

hoophalab commented Apr 7, 2026 •

edited

Loading

hoophalab left a comment •

edited

Loading

hoophalab Apr 9, 2026 •

edited

Loading

hoophalab Apr 15, 2026 •

edited

Loading