Skip to content

Fix missing Query API validations#6499

Merged
agourlay merged 1 commit intodevfrom
fix-missing-API-validations
May 7, 2025
Merged

Fix missing Query API validations#6499
agourlay merged 1 commit intodevfrom
fix-missing-API-validations

Conversation

@agourlay
Copy link
Member

@agourlay agourlay commented May 6, 2025

This PR adds missing Query API validation.

The following existing validation are reused for all types of queries.

  • sparse vector
  • multivector
  • filter condition

@agourlay agourlay changed the title Fix missing API validations Fix missing Query API validations May 7, 2025
match self {
Expression::Constant(_) => Ok(()),
Expression::Variable(_) => Ok(()),
Expression::Condition(condition) => condition.validate(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure the Expression use a valid Condition

Query::Discover(discover) => discover.discover.validate(),
Query::Context(context) => context.context.validate(),
Query::Fusion(fusion) => fusion.fusion.validate(),
Query::Nearest(vector) => vector.validate(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those lines make sure to use the validation at the struct level to make it fool proof when adding new fields

| grpc::vector_input::Variant::Document(_)
| grpc::vector_input::Variant::Image(_)
| grpc::vector_input::Variant::Object(_) => Ok(()),
grpc::vector_input::Variant::Sparse(sparse_vector) => sparse_vector.validate(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leverage existing sparse and multivector validation

@agourlay agourlay marked this pull request as ready for review May 7, 2025 09:22
@agourlay agourlay requested review from coszio, generall and timvisee May 7, 2025 09:22
@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 7, 2025

📝 Walkthrough

Walkthrough

This change introduces comprehensive validation support for various data structures used in both gRPC and REST APIs. It adds validator::Validate derives and #[validate(nested)] annotations to numerous structs and fields, enabling recursive validation of nested message types and complex query/expression structures. Additionally, custom implementations of the Validate trait are provided for several enums to ensure that validation is properly delegated to their nested components. The validation logic is also updated in relevant places to invoke these new capabilities, ensuring that all parts of queries and expressions are validated at runtime.

Suggested reviewers

  • timvisee
  • coszio
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
lib/api/build.rs (1)

273-301: Consider deduplicating and auto-generating massive “empty-constraint” lists

The block adds ~30 more ("", "") entries whose only purpose is to attach #[validate(nested)].
While functionally correct, hand-maintaining hundreds of string paths is error-prone (easy to mistype or forget on proto evolution) and bloats the build script.

Suggested direction (no immediate change required, but worth a backlog ticket):

-        .validates(&[
-            ("VectorInput.variant", ""),
-            ("RecommendInput.positive", ""),
-
-        ], &[])
+        // Auto-collect every `(Message, field)` where `field` is a oneof or a nested message
+        // and apply `#[validate(nested)]` in a loop.
+        .validates(generate_nested_field_list()?, &[])

where generate_nested_field_list() uses a small introspection table (or a JSON file produced by prost-build) instead of hard-coded strings.

This keeps the script readable, guarantees coverage, and eliminates future maintenance headaches.

lib/api/src/rest/validate.rs (1)

250-273: Expression validation duplicates gRPC logic – consider a shared helper

The implementation is correct but duplicates almost identical code in grpc/validate.rs.
A small helper like fn validate_expression<E: Into<ValidationErrors>>(expr: &Expression)… (or a macro) could remove duplication and keep REST & gRPC layers in sync automatically.

lib/api/src/rest/schema.rs (2)

517-523: Consider adding Validate derive to FormulaQuery.

Unlike other query types, FormulaQuery is missing the Validate derive macro despite having a custom validation implementation in lib/api/src/rest/validate.rs. Adding this derive would maintain consistency with how other query types are handled.

-#[derive(Debug, Serialize, Deserialize, JsonSchema)]
+#[derive(Debug, Serialize, Deserialize, JsonSchema, Validate)]
pub struct FormulaQuery {
    pub formula: Expression,

    #[serde(default)]
    pub defaults: HashMap<String, Value>,
}

789-801: Consider adding validation constraints for numeric parameters.

The scale and midpoint fields in DecayParamsExpression might benefit from validation constraints, especially since the comment mentions that scale "Must be a non-zero positive number."

/// The scale factor of the decay, in terms of `x`. Defaults to 1.0. Must be a non-zero positive number.
+#[validate(range(min = "f32::EPSILON", message = "scale must be positive and non-zero"))]
pub scale: Option<f32>,
/// The midpoint of the decay. Defaults to 0.5. Output will be this value when `|x - target| == scale`.
+#[validate(range(min = 0.0, max = 1.0, message = "midpoint must be between 0.0 and 1.0"))]
pub midpoint: Option<f32>,
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e0c072 and b10c87f.

📒 Files selected for processing (5)
  • lib/api/build.rs (2 hunks)
  • lib/api/src/grpc/qdrant.rs (9 hunks)
  • lib/api/src/grpc/validate.rs (1 hunks)
  • lib/api/src/rest/schema.rs (3 hunks)
  • lib/api/src/rest/validate.rs (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
lib/api/src/grpc/validate.rs (1)
lib/api/src/rest/validate.rs (13)
  • validate (15-21)
  • validate (25-30)
  • validate (34-45)
  • validate (49-59)
  • validate (63-83)
  • validate (87-93)
  • validate (97-101)
  • validate (105-133)
  • validate (137-142)
  • validate (146-150)
  • validate (154-170)
  • validate (174-232)
  • validate (236-247)
lib/api/src/rest/schema.rs (1)
lib/api/src/rest/validate.rs (13)
  • validate (15-21)
  • validate (25-30)
  • validate (34-45)
  • validate (49-59)
  • validate (63-83)
  • validate (87-93)
  • validate (97-101)
  • validate (105-133)
  • validate (137-142)
  • validate (146-150)
  • validate (154-170)
  • validate (174-232)
  • validate (236-247)
🔇 Additional comments (14)
lib/api/src/grpc/validate.rs (2)

337-349: query::Variant validation looks complete – just ensure new variants fail to compile

Good pattern: by enumerating every variant, the compiler enforces updates when proto changes. 👍


368-398: Expression variant validation mirrors REST – looks good

The exhaustive match properly cascades validation into nested expressions and will fail to compile on new variants. Nicely done.

lib/api/src/rest/validate.rs (1)

106-110: Great catch: formula expression now validated first

Adding formula.validate()? ensures invalid expressions are rejected before inspecting defaults. ✔️

lib/api/src/grpc/qdrant.rs (7)

4126-4133: LGTM: Adding validation for VectorInput

The addition of validator::Validate derive and validate(nested) attribute is a good approach for ensuring the validation of vector inputs in queries, which is a core part of this PR's objective.


5128-5193: Well-structured validation for query input types

The systematic addition of validation to the recommendation, context, and discovery input types ensures proper validation across all the query input components. This is especially important for complex nested structures like these, where validation errors could otherwise be missed.


5203-5204: Good catch: Adding nested validation for Expression variants

Adding validation for expression variants ensures that all types of expressions in queries will be properly validated, which is crucial for the correctness of complex filter conditions.


5279-5342: Comprehensive validation for expression operators

The validation has been thoroughly applied to all mathematical expression types (MultExpression, SumExpression, DivExpression, PowExpression, DecayParamsExpression). This systematic approach ensures that complex expressions used in queries will be properly validated.


5343-5350: Core improvement: Query validation

Adding validation to the top-level Query struct with nested validation for its variants is a central component of this PR. This ensures that all query types will be properly validated regardless of which variant is used.


5395-5396: Consistent validation across different request types

Adding nested validation to query fields in different request structures ensures consistent validation behavior across the API, which aligns perfectly with the PR objective.

Also applies to: 5434-5435, 5511-5512


9786-9789: Extended validation to QueryShardPoints

Adding validation to the nested message types within QueryShardPoints completes the comprehensive validation coverage across all parts of the Query API.

lib/api/src/rest/schema.rs (4)

475-530: Well-structured validation added to query types.

The addition of Validate derive macros and #[validate(nested)] annotations to the query wrapper structs ensures proper validation of nested components in complex queries. This is a good approach to ensure data integrity throughout the validation chain.


695-769: Comprehensive validation for expression types.

The addition of validation to all expression types ensures consistent validation across the entire expression tree. This is particularly important for complex mathematical expressions where nested validation helps catch errors at all levels of the expression hierarchy.


771-787: Good use of nested validation for decay expressions.

Adding validation to the different decay expression types provides consistent validation for these specialized functions. The nested validation will ensure that all components of these expressions are properly validated.


475-787:

Details

✅ Verification successful

Verify comprehensive validation for all query paths.

This change adds validation to numerous structs, but let's ensure all code paths in the validate method for Query enum are covered by these changes.


🏁 Script executed:

#!/bin/bash
# Verify all query variants are properly validated

# Check if all validation functions are implemented for query types
echo "Checking Query validation implementation in validate.rs:"
rg -A 10 "impl Validate for Query" lib/api/src/rest/validate.rs

# Check which query types have the Validate derive
echo -e "\nChecking Query types with Validate derive in schema.rs:"
rg "#\[derive.*Validate.*\]" lib/api/src/rest/schema.rs | grep -E "struct (.*Query|.*Expression)"

Length of output: 1276


🏁 Script executed:

#!/bin/bash
# List all variants of the Query enum in schema.rs
echo "=== schema.rs: enum Query variants ==="
rg -A 20 "enum Query" lib/api/src/rest/schema.rs

# Show the full impl Validate for Query in validate.rs
echo -e "\n=== validate.rs: impl Validate for Query ==="
rg -A 20 "impl Validate for Query" lib/api/src/rest/validate.rs

Length of output: 2233


🏁 Script executed:

#!/bin/bash
# Print the full Query enum definition to verify all variants
rg -n "pub enum Query" -A50 lib/api/src/rest/schema.rs

Length of output: 1845


All Query variants are properly validated
The Validate derive on each *Query struct and the impl Validate for Query in validate.rs cover every enum variant (Nearest, Recommend, Discover, Context, OrderBy, Fusion, Formula, and Sample). No validation paths are missing.

Comment on lines +352 to +366
impl Validate for super::qdrant::vector_input::Variant {
fn validate(&self) -> Result<(), ValidationErrors> {
match self {
grpc::vector_input::Variant::Id(_)
| grpc::vector_input::Variant::Dense(_)
| grpc::vector_input::Variant::Document(_)
| grpc::vector_input::Variant::Image(_)
| grpc::vector_input::Variant::Object(_) => Ok(()),
grpc::vector_input::Variant::Sparse(sparse_vector) => sparse_vector.validate(),
grpc::vector_input::Variant::MultiDense(multi_dense_vector) => {
multi_dense_vector.validate()
}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Skip-validation for several variants may hide future bugs

Dense, Document, Image, and Object variants currently bypass validation even though their inner types may implement Validate (REST side already calls it).
For consistency and to avoid silent acceptance of invalid payloads, delegate validation to the nested messages just like you do for Sparse and MultiDense.

-            grpc::vector_input::Variant::Id(_)
-            | grpc::vector_input::Variant::Dense(_)
-            | grpc::vector_input::Variant::Document(_)
-            | grpc::vector_input::Variant::Image(_)
-            | grpc::vector_input::Variant::Object(_) => Ok(()),
+            grpc::vector_input::Variant::Id(_) => Ok(()),
+            grpc::vector_input::Variant::Dense(dense) => dense.validate(),
+            grpc::vector_input::Variant::Document(doc) => doc.validate(),
+            grpc::vector_input::Variant::Image(image) => image.validate(),
+            grpc::vector_input::Variant::Object(obj) => obj.validate(),

This mirrors the REST implementation and guarantees that any future validation added to those structs is actually executed.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
impl Validate for super::qdrant::vector_input::Variant {
fn validate(&self) -> Result<(), ValidationErrors> {
match self {
grpc::vector_input::Variant::Id(_)
| grpc::vector_input::Variant::Dense(_)
| grpc::vector_input::Variant::Document(_)
| grpc::vector_input::Variant::Image(_)
| grpc::vector_input::Variant::Object(_) => Ok(()),
grpc::vector_input::Variant::Sparse(sparse_vector) => sparse_vector.validate(),
grpc::vector_input::Variant::MultiDense(multi_dense_vector) => {
multi_dense_vector.validate()
}
}
}
}
impl Validate for super::qdrant::vector_input::Variant {
fn validate(&self) -> Result<(), ValidationErrors> {
match self {
grpc::vector_input::Variant::Id(_) => Ok(()),
grpc::vector_input::Variant::Dense(dense) => dense.validate(),
grpc::vector_input::Variant::Document(doc) => doc.validate(),
grpc::vector_input::Variant::Image(image) => image.validate(),
grpc::vector_input::Variant::Object(obj) => obj.validate(),
grpc::vector_input::Variant::Sparse(sparse_vector) => sparse_vector.validate(),
grpc::vector_input::Variant::MultiDense(multi_dense_vector) => {
multi_dense_vector.validate()
}
}
}
}

@agourlay agourlay merged commit 7f0b48a into dev May 7, 2025
17 checks passed
@agourlay agourlay deleted the fix-missing-API-validations branch May 7, 2025 17:10
generall pushed a commit that referenced this pull request May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants