Skip to content

fix/inference timeout error#7598

Merged
generall merged 7 commits intodevfrom
fix/inference-timeout-error
Nov 25, 2025
Merged

fix/inference timeout error#7598
generall merged 7 commits intodevfrom
fix/inference-timeout-error

Conversation

@dancixx
Copy link
Contributor

@dancixx dancixx commented Nov 24, 2025

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@dancixx dancixx requested a review from generall November 24, 2025 21:16
@qdrant qdrant deleted a comment from coderabbitai bot Nov 25, 2025
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Nov 25, 2025
@generall generall requested a review from timvisee November 25, 2025 13:20
Comment on lines +209 to +213
let request = if let Some(timeout) = timeout {
request.timeout(timeout)
} else {
request
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must also set default timeout here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this request uses pre-configured service, it alredy have timeout by default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, missed that. Thanks!

@generall generall merged commit ec6cce8 into dev Nov 25, 2025
13 checks passed
@generall generall deleted the fix/inference-timeout-error branch November 25, 2025 13:25
timvisee pushed a commit that referenced this pull request Nov 25, 2025
* fix: inference timeout error

* chore: cleanup timeout from inference config

* cleanup timeout from InferenceConfig

* chore: revert config.toml changes

* inference_token -> infrence_params for timeout propagation

* always use at least some timeout

---------

Co-authored-by: generall <andrey@vasnetsov.com>
@timvisee timvisee mentioned this pull request Nov 25, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 25, 2025

📝 Walkthrough

Walkthrough

This pull request refactors the inference subsystem to introduce an intermediate InferenceParams struct that wraps InferenceToken with an optional timeout value. The changes systematically replace direct InferenceToken usage with InferenceParams throughout the codebase. Key modifications include: (1) introducing new modules params.rs and token.rs under src/common/inference/, (2) moving InferenceToken to its dedicated token.rs module, (3) updating InferenceConfig to use Option<u64> for timeout instead of u64, and (4) propagating InferenceParams through all query and update API call chains across both Actix-Web and Tonic gRPC handlers. All internal references, function signatures, and imports are updated to use InferenceParams where previously InferenceToken was passed directly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

  • Breadth: Changes span ~20 files across API layers, inference core, and handler implementations
  • Pattern consistency: Systematic refactoring following a uniform replacement pattern (InferenceTokenInferenceParams), reducing cognitive load for reviewing similar edits
  • Public API impact: Multiple public function signatures updated across src/common/inference/, src/actix/api/, and src/tonic/api/ requiring verification of all call sites
  • Logic preservation: No new complex business logic introduced; existing control flow and error handling remain intact

Areas requiring extra attention:

  • src/common/inference/params.rs — new struct definition and its integration as the primary inference parameter carrier
  • src/common/inference/service.rs — addition of DEFAULT_INFERENCE_TIMEOUT_SECS constant and timeout resolution logic for HTTP client construction
  • Public function signature changes in convert_query_request_from_rest, convert_query_groups_request_from_rest, and all update/query common handlers
  • Import path changes across the codebase pointing to the newly separated token and params modules

Possibly related PRs

Suggested reviewers

  • timvisee

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix/inference timeout error' clearly relates to the core changes in this PR, which systematically introduce InferenceParams to wrap inference tokens with timeout configuration throughout the codebase.
Description check ✅ Passed The PR description is a standard contributor checklist confirming adherence to guidelines, code formatting, linting, and test validation, which is related to the PR submission process and quality checks.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/inference-timeout-error

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
src/tonic/auth.rs (1)

27-27: Consider adding an import for consistency.

Using the full path crate::common::inference::token::InferenceToken(None) inline works, but adding an import at the top of the file would improve readability and align with typical Rust conventions.

 use crate::common::auth::{AuthError, AuthKeys};
+use crate::common::inference::token::InferenceToken;
 
 type Request = tonic::codegen::http::Request<tonic::transport::Body>;

Then simplify the inline usage:

-        let inference_token = crate::common::inference::token::InferenceToken(None);
+        let inference_token = InferenceToken(None);
src/common/inference/params.rs (1)

3-7: Consider adding a use statement for Duration.

Using the fully qualified std::time::Duration inline is valid but verbose. A use statement would improve readability.

 use crate::common::inference::token::InferenceToken;
+use std::time::Duration;
 
 #[derive(Debug, Clone, PartialEq, Default)]
 pub struct InferenceParams {
     pub token: InferenceToken,
-    pub timeout: Option<std::time::Duration>,
+    pub timeout: Option<Duration>,
 }
src/actix/api/update_api.rs (1)

344-346: Unnecessary .clone() on inference_token.

The inference_token is not used after this point, so it can be moved directly into InferenceParams::new without cloning.

     // Update operation doesn't have timeout yet
-    let inference_params = InferenceParams::new(inference_token.clone(), None);
+    let inference_params = InferenceParams::new(inference_token, None);
src/actix/api/query_api.rs (1)

192-226: Slight asymmetry between groups and non-groups inference_params usage

query_points_groups takes InferenceParams by value into convert_query_groups_request_from_rest, whereas the non-groups path passes a shared &InferenceParams. This is correct but causes an extra clone inside from_batch_accum.

If you want symmetry and slightly less cloning, you could change convert_query_groups_request_from_rest to accept &InferenceParams (mirroring convert_query_request_from_rest) and pass &inference_params here.

src/common/inference/infer_processing.rs (1)

11-70: InferenceParams ownership in BatchAccumInferred is correct; clone overhead is minor

Passing InferenceParams by value into from_objects and cloning it in from_batch_accum is functionally fine and keeps the infer API simple. Given InferenceParams is small (token + optional timeout), clone cost is negligible.

If you want to tighten the API, you could also make from_batch_accum take InferenceParams by value and avoid the internal clone(), updating call sites accordingly.

src/tonic/api/update_common.rs (1)

806-854: sync reuses convert_point_struct with InferenceParams; assumptions about no inference should be validated

sync now passes InferenceParams into convert_point_struct with InferenceType::Update. As the comment notes, this path assumes that syncing uses already-persisted points (no Document/Image/Object fields), so no actual inference calls should be triggered even though the parameters are available.

It’s worth double-checking that current SyncPoints producers indeed only send persisted vector forms so that convert_point_struct’s inference branch is never taken here; otherwise, inference timeouts and external calls could unexpectedly affect sync behavior.

src/common/inference/token.rs (1)

1-43: InferenceToken abstraction and extraction helpers look correct and consistent

The InferenceToken newtype, Actix FromRequest impl (using req.extensions().get::<InferenceToken>().cloned().unwrap_or_default()), and Tonic extract_token helper all default to InferenceToken(None) when no token is present, giving a uniform, failure-free extraction story across transports. This matches how the gRPC and Actix handlers now consume tokens.

As a tiny polish, you might rename the local api_key variable in from_request to inference_token to better reflect its type.

src/tonic/api/points_internal_api.rs (2)

459-464: Unnecessary clone of inference_token.

The inference_token.clone() is unnecessary since InferenceParams::new takes impl Into<Option<String>> which works with the token directly.

         let inference_token = extract_token(&request);
-        let inference_params = InferenceParams::new(inference_token.clone(), None);
+        let inference_params = InferenceParams::new(inference_token, None);

481-486: Unnecessary clone of inference_token.

Same as above - the clone is not needed here.

         let inference_token = extract_token(&request);
-        let inference_params = InferenceParams::new(inference_token.clone(), None);
+        let inference_params = InferenceParams::new(inference_token, None);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a92c69 and 5ee768f.

📒 Files selected for processing (18)
  • src/actix/api/query_api.rs (5 hunks)
  • src/actix/api/update_api.rs (5 hunks)
  • src/common/auth/mod.rs (1 hunks)
  • src/common/inference/config.rs (1 hunks)
  • src/common/inference/infer_processing.rs (4 hunks)
  • src/common/inference/mod.rs (1 hunks)
  • src/common/inference/params.rs (1 hunks)
  • src/common/inference/query_requests_grpc.rs (4 hunks)
  • src/common/inference/query_requests_rest.rs (4 hunks)
  • src/common/inference/service.rs (10 hunks)
  • src/common/inference/token.rs (1 hunks)
  • src/common/inference/update_requests.rs (8 hunks)
  • src/common/update.rs (9 hunks)
  • src/tonic/api/points_api.rs (13 hunks)
  • src/tonic/api/points_internal_api.rs (12 hunks)
  • src/tonic/api/query_common.rs (7 hunks)
  • src/tonic/api/update_common.rs (10 hunks)
  • src/tonic/auth.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust
In new code, don't use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice - use bytemuck or zerocopy crates instead
Prefer explicit exhaustive matches over catch-all _ arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using : _ over using .. in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)

Files:

  • src/common/inference/query_requests_rest.rs
  • src/common/inference/params.rs
  • src/tonic/api/update_common.rs
  • src/tonic/auth.rs
  • src/common/inference/query_requests_grpc.rs
  • src/tonic/api/points_api.rs
  • src/common/inference/update_requests.rs
  • src/common/inference/token.rs
  • src/tonic/api/query_common.rs
  • src/actix/api/query_api.rs
  • src/common/auth/mod.rs
  • src/actix/api/update_api.rs
  • src/common/update.rs
  • src/tonic/api/points_internal_api.rs
  • src/common/inference/mod.rs
  • src/common/inference/config.rs
  • src/common/inference/service.rs
  • src/common/inference/infer_processing.rs
🧠 Learnings (4)
📚 Learning: 2025-08-10T18:30:02.986Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

  • src/tonic/api/update_common.rs
  • src/common/update.rs
📚 Learning: 2025-09-16T19:14:17.614Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7183
File: lib/api/src/grpc/qdrant.rs:4263-4273
Timestamp: 2025-09-16T19:14:17.614Z
Learning: In qdrant, lib/api/src/grpc/qdrant.rs is auto-generated by prost-build; do not edit it directly. Make changes in lib/api/src/grpc/proto/points.proto (e.g., add [deprecated=true], doc comments, or encoding options), then regenerate the Rust code.

Applied to files:

  • src/common/inference/query_requests_grpc.rs
  • src/tonic/api/points_api.rs
📚 Learning: 2025-08-10T18:26:12.443Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: docs/redoc/master/openapi.json:13645-13652
Timestamp: 2025-08-10T18:26:12.443Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated from the REST schemas. To change field docs, edit lib/api/src/rest/schema.rs (e.g., add doc comments or #[schemars(description = ...)]). Specifically, UpdateVectors.update_filter lacked a description and should state: "If specified, only update vectors for points that match this filter; points not matching the filter are left unchanged."

Applied to files:

  • src/actix/api/update_api.rs
📚 Learning: 2025-08-10T18:31:56.855Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/point_ops.rs:501-528
Timestamp: 2025-08-10T18:31:56.855Z
Learning: In Qdrant, batch operations validate that `ids`, `vectors`, and `payloads` (if present) have matching lengths at the REST API level in `lib/api/src/rest/validate.rs` through the `Validate` trait implementation for `Batch`. This validation happens before data is converted to internal structures like `PointInsertOperationsInternal`, so methods operating on these internal structures can safely assume the lengths match.

Applied to files:

  • src/common/update.rs
🧬 Code graph analysis (12)
src/common/inference/query_requests_rest.rs (2)
src/common/inference/infer_processing.rs (1)
  • from_batch_accum (63-70)
src/common/inference/batch_processing.rs (1)
  • collect_query_request (168-196)
src/common/inference/params.rs (3)
src/common/inference/service.rs (1)
  • new (85-102)
src/common/inference/config.rs (1)
  • new (11-17)
src/common/inference/token.rs (1)
  • new (10-12)
src/tonic/api/update_common.rs (1)
src/common/inference/update_requests.rs (1)
  • convert_point_struct (17-156)
src/common/inference/query_requests_grpc.rs (1)
src/common/inference/infer_processing.rs (1)
  • from_objects (24-61)
src/tonic/api/points_api.rs (1)
src/common/inference/token.rs (1)
  • extract_token (38-43)
src/common/inference/update_requests.rs (1)
src/common/inference/infer_processing.rs (1)
  • from_batch_accum (63-70)
src/common/inference/token.rs (2)
src/common/inference/config.rs (1)
  • new (11-17)
src/common/inference/params.rs (1)
  • new (10-15)
src/tonic/api/query_common.rs (2)
src/common/inference/query_requests_grpc.rs (2)
  • convert_query_points_from_grpc (108-181)
  • convert_query_point_groups_from_grpc (27-105)
src/actix/api/query_api.rs (1)
  • query_points (28-103)
src/actix/api/query_api.rs (1)
src/common/inference/query_requests_rest.rs (2)
  • convert_query_groups_request_from_rest (33-92)
  • convert_query_request_from_rest (94-148)
src/common/update.rs (1)
src/common/inference/update_requests.rs (3)
  • convert_batch (158-207)
  • convert_point_struct (17-156)
  • convert_point_vectors (209-294)
src/common/inference/config.rs (2)
src/common/inference/service.rs (1)
  • new (85-102)
src/common/inference/params.rs (1)
  • new (10-15)
src/common/inference/service.rs (1)
src/common/inference/config.rs (1)
  • new (11-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: lint
  • GitHub Check: e2e-tests
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests-consensus
  • GitHub Check: test-consistency
  • GitHub Check: Build Qdrant Edge Python bindings
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
🔇 Additional comments (40)
src/common/inference/query_requests_grpc.rs (2)

23-23: LGTM!

The import and parameter changes correctly adopt InferenceParams for propagating inference configuration. Usage aligns with the BatchAccumInferred::from_objects signature shown in the relevant snippets.

Also applies to: 29-30, 63-66


108-111: LGTM!

Consistent adoption of InferenceParams in convert_query_points_from_grpc, matching the pattern established in the grouped queries function.

Also applies to: 142-145

src/common/auth/mod.rs (1)

14-14: LGTM!

Import path correctly updated to reference InferenceToken from its new dedicated module location.

src/actix/api/update_api.rs (3)

20-21: LGTM!

Clean separation of imports for InferenceParams and InferenceToken from their respective modules.


52-54: LGTM!

Good use of InferenceParams wrapper with a clarifying comment explaining why timeout is None. The pattern correctly passes inference configuration downstream.

Also applies to: 62-62


133-134: LGTM!

Consistent application of the InferenceParams pattern in update_vectors.

Also applies to: 142-142

src/common/inference/params.rs (1)

9-15: The original review comment is incorrect and should be disregarded.

The coding guideline to prefer SomeType::from(x) over x.into() applies to concrete types, not opaque trait parameters. The function parameter token: impl Into<InferenceToken> has an unknown concrete type at compile time. The .into() method is the only correct pattern here—it uses return type context to infer the target type. Using InferenceToken::from(token) would fail to compile because the compiler cannot resolve the concrete input type.

The current code is idiomatic Rust and correct as-is.

Likely an incorrect or invalid review comment.

src/actix/api/query_api.rs (2)

35-62: InferenceParams construction and reuse in query_points looks consistent

inference_token is injected via Actix FromRequest, wrapped into InferenceParams using params.timeout(), and the same params are reused for inference conversion and dispatcher calls. Ownership/borrows are sound and avoid repeated construction per query.


105-141: Batch handler reuses a single InferenceParams instance across all searches

In query_points_batch, constructing inference_params once and passing &inference_params into convert_query_request_from_rest for each item is efficient and keeps token/timeout consistent for the whole batch. No lifetime or aliasing issues here.

src/common/inference/config.rs (1)

3-16: Timeout as Option<u64> aligns with per-request overrides and keeps defaults intact

Switching timeout to Option<u64> plus deriving Default and setting timeout: None in new works cleanly with InferenceService::new, which already does timeout.unwrap_or(DEFAULT_INFERENCE_TIMEOUT_SECS). This preserves existing defaults while allowing explicit config-level overrides and coexisting with per-request timeouts in InferenceParams.

src/common/inference/mod.rs (1)

3-15: New params and token modules are wired correctly into the inference namespace

Exporting params and token here aligns with the new crate::common::inference::params::InferenceParams and crate::common::inference::token::InferenceToken usages across Actix and Tonic paths. No issues seen.

src/common/inference/query_requests_rest.rs (2)

33-53: Groups REST conversion now uses InferenceParams; behavior remains consistent

convert_query_groups_request_from_rest collecting a batch, then calling BatchAccumInferred::from_batch_accum(batch, InferenceType::Search, &inference_params) correctly threads token + optional timeout into inference. The rest of the function remains a pure mapping from inferred vectors to CollectionQueryGroupsRequest, so behavior is unchanged aside from the new timeout-aware inference context.


94-101: Regular REST query conversion correctly delegates to BatchAccumInferred with InferenceParams

convert_query_request_from_rest now accepts &InferenceParams and forwards it directly into BatchAccumInferred::from_batch_accum. This matches the updated signature in infer_processing.rs and keeps the function free of token/timeout details beyond passing the struct through.

src/tonic/api/points_api.rs (2)

64-88: Update operations now consistently pass InferenceParams into update_common

For upsert, update_vectors, and update_batch, extracting the token via extract_token(&request) and wrapping it as InferenceParams::new(inference_token, None) cleanly passes inference context into update_common. Sharing a single InferenceParams per RPC and cloning it in update_batch is reasonable given the struct size and keeps per-operation behavior consistent.

Also applies to: 133-159, 277-300


591-612: gRPC query methods now propagate per-request timeout into inference*

In query, query_batch, and query_groups, InferenceParams::new is built from the extracted token plus request.get_ref().timeout.map(Duration::from_secs), so the same timeout value drives both core query execution and inference. Access extraction, token extraction, and into_inner() ordering are all safe.

One nuance: query_batch still separately maps the destructured timeout into Duration for core search while also embedding the original timeout into InferenceParams. That duplication is intentional but worth keeping in mind if you later change the units or semantics of the RPC timeout field.

Also applies to: 618-649, 655-676

src/tonic/api/update_common.rs (1)

40-89: All call sites properly updated; no verification issues found

Verification confirms that the refactor from InferenceToken to InferenceParams is complete across all call sites:

  • tonic upsert/update_vectors: Extract token → construct InferenceParams → pass to functions ✓
  • tonic update_batch: Passes InferenceParams correctly ✓
  • actix update_batch: Receives InferenceToken, immediately converts to InferenceParams (line 345), passes converted value ✓
  • sync function: Accepts and uses inference_params consistently with its comment that "no actual inference" occurs in sync itself (consistency only) ✓

The concern about outdated call sites is not applicable; the migration has been completed successfully.

src/tonic/api/query_common.rs (4)

36-36: LGTM!

Import updated correctly to use the new InferenceParams type from the params module.


776-830: LGTM!

The query function signature and implementation are correctly updated to use InferenceParams. The parameter is properly passed to convert_query_points_from_grpc.


832-894: LGTM!

The query_batch function correctly uses InferenceParams and appropriately clones it for each iteration in the loop, which is necessary since it's consumed by each convert_query_points_from_grpc call.


896-950: LGTM!

The query_groups function correctly updated to use InferenceParams and passes it to convert_query_point_groups_from_grpc.

src/tonic/api/points_internal_api.rs (6)

35-36: LGTM!

Imports correctly updated to reference the new module structure with InferenceParams from params and extract_token from token.


57-82: LGTM!

The sync_internal method correctly updated to accept InferenceParams and pass it to the sync function.


84-110: LGTM!

The upsert_internal method correctly uses InferenceParams. The clone on line 106 is appropriate since the params may be reused.


138-164: LGTM!

The update_vectors_internal method correctly updated to use InferenceParams.


557-580: LGTM!

The update_batch implementation correctly creates InferenceParams once and clones it for each operation that requires inference. The comment explains the design choice.


797-804: LGTM!

The sync method correctly creates InferenceParams with None timeout for internal operations and passes it through. No unnecessary clone here.

src/common/inference/update_requests.rs (5)

14-14: LGTM!

Import correctly updated to use the new InferenceParams type.


17-52: LGTM!

The convert_point_struct function correctly updated to use InferenceParams. The reference passed to from_batch_accum aligns with its signature that takes &InferenceParams.


158-207: LGTM!

The convert_batch function correctly uses InferenceParams and clones it appropriately when iterating over named vectors, since each convert_vectors call consumes the params.


209-294: LGTM!

The convert_point_vectors function correctly updated to use InferenceParams with proper reference passing to from_batch_accum.


373-418: LGTM!

The convert_vectors function correctly updated to use InferenceParams with proper reference passing to from_batch_accum.

src/common/inference/service.rs (5)

19-19: LGTM!

Import correctly added for the new InferenceParams type.


79-102: Default timeout of 10 minutes correctly implemented.

This addresses the concern from previous review discussions about always having a timeout. The client is now built with a base timeout (from config or the 10-minute default), ensuring requests never wait indefinitely.


132-176: LGTM!

The infer method correctly updated to accept InferenceParams and propagate it to infer_remote.


178-242: Per-request timeout override logic looks correct.

The implementation properly destructures InferenceParams and applies the per-request timeout on top of the base client timeout when provided. This allows callers to specify shorter timeouts for specific requests while still having the default 10-minute safety net.


545-564: LGTM!

Test correctly updated to use the new InferenceParams::new constructor with None timeout.

src/common/update.rs (4)

29-29: LGTM!

Import correctly updated to use the new InferenceParams type from the params module.


277-340: LGTM!

The do_upsert_points function correctly updated to use InferenceParams. The parameter is properly passed to both convert_batch and convert_point_struct depending on the operation type.


379-423: LGTM!

The do_update_vectors function correctly updated to use InferenceParams and passes it to convert_point_vectors.


656-800: LGTM!

The do_batch_update_points function correctly uses InferenceParams and appropriately clones it for each Upsert and UpdateVectors operation in the batch loop.

@coderabbitai coderabbitai bot mentioned this pull request Dec 12, 2025
9 tasks
@coderabbitai coderabbitai bot mentioned this pull request Jan 20, 2026
9 tasks
@coderabbitai coderabbitai bot mentioned this pull request Feb 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants