feat(oceanbase): add advanced ANN query options (#3649)#3758
Conversation
- Add inner_product and negative_inner_product distance metrics - Extend _distance_func_map with new distance functions - Add where_clause parameter to query() for SQLAlchemy filter support - Update _convert_distance_to_similarity() with sigmoid for IP metrics - Improve docstrings with detailed parameter documentation Part of camel-ai#3649
- Add test_inner_product_distance_metrics for IP similarity conversion - Add test_query_with_where_clause to verify filter passthrough - Add test_query_without_where_clause for default behavior - Add parametrized test_all_distance_metrics_initialization All tests pass with pytest --fast-test-mode Part of camel-ai#3649
- Add example_filtered_query() demonstrating where_clause usage - Add example_distance_metrics() showing all four distance metrics - Import sqlalchemy.text for filter expressions - Show filtered ANN queries with category and price filters - Demonstrate inner_product and negative_inner_product distance Part of camel-ai#3649
- Simplify default similarity conversion logic for clarity - Use consistent header format (=== Title ===) matching other examples - Add expected output documentation for new examples - example_filtered_query() and example_distance_metrics() outputs Part of camel-ai#3649
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
fengju0213
left a comment
There was a problem hiding this comment.
cool ! thanks!@YixinZ-NUS will review it asap
fengju0213
left a comment
There was a problem hiding this comment.
thanks @YixinZ-NUS great work!left some comments
| elif self.distance == "inner_product": | ||
| # Inner product can be negative (opposite directions) | ||
| # Use sigmoid to map (-inf, +inf) to (0, 1) | ||
| # Higher IP -> higher similarity | ||
| return 1.0 / (1.0 + math.exp(-distance)) | ||
| elif self.distance == "negative_inner_product": | ||
| # Negative inner product: neg_ip = -IP | ||
| # Use sigmoid: similarity = sigmoid(-neg_ip) = sigmoid(IP) | ||
| # Lower neg_ip (higher IP) -> higher similarity | ||
| return 1.0 / (1.0 + math.exp(distance)) |
There was a problem hiding this comment.
the similarity conversion for inner_product and negative_inner_product directly calls math.exp().
large dot-product magnitudes can trigger OverflowError.
| results = self._client.ann_search( | ||
| table_name=self.table_name, | ||
| vec_data=query.query_vector, | ||
| vec_column_name="embedding", | ||
| distance_func=distance_func, | ||
| with_dist=True, | ||
| topk=query.top_k, | ||
| output_column_names=["id", "embedding", "metadata"], | ||
| where_clause=where_clause, |
There was a problem hiding this comment.
pyobvector’s ann_search orders results by distance_func in ascending order by default (order_by).
When using inner_product, this causes results to be sorted from low to high, meaning top_k returns the least similar vectors.
waleedalzarooni
left a comment
There was a problem hiding this comment.
LGTM, only things to amend are @fengju0213's comments
… conversion - Use negative_inner_product function for inner_product search queries so ascending order returns most similar vectors first - Add _stable_sigmoid() to prevent OverflowError with large dot products - Update tests and example output accordingly
|
Thanks for the thorough review @fengju0213 and @waleedalzarooni ! I should have verified the output ordering more carefully during initial testing. Both issues are now fixed:
|
thanks! @YixinZ-NUS looks good now! |
Description
Implements feature request from #3649 - adds advanced ANN query options for OceanBase storage.
Changes made:
inner_product,negative_inner_productwhere_clauseparameter for SQLAlchemy filtering inquery()Fixes #3649
Checklist
Go over all the following points, and put an
xin all the boxes that apply.Fixes #issue-numberin the PR description (required)pyproject.tomlanduv lockIf you are unsure about any of these, don't hesitate to ask. We are here to help!