Skip to content

feat(oceanbase): add advanced ANN query options (#3649)#3758

Merged
fengju0213 merged 8 commits into
camel-ai:masterfrom
yixin-zh:feat/issue#3649
Feb 26, 2026
Merged

feat(oceanbase): add advanced ANN query options (#3649)#3758
fengju0213 merged 8 commits into
camel-ai:masterfrom
yixin-zh:feat/issue#3649

Conversation

@yixin-zh

Copy link
Copy Markdown
Contributor

Description

Implements feature request from #3649 - adds advanced ANN query options for OceanBase storage.
Changes made:

  • Extended distance metrics: inner_product, negative_inner_product
  • Added where_clause parameter for SQLAlchemy filtering in query()
  • Added 11 tests and ran with OceanBase
  • Added examples demonstrating new features

Fixes #3649

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • I have read the CONTRIBUTION guide (required)
  • I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
  • I have checked if any dependencies need to be added or updated in pyproject.toml and uv lock
  • I have updated the tests accordingly (required for a bug fix or a new feature)
  • I have updated the documentation if needed:
  • I have added examples if this is a new feature

If you are unsure about any of these, don't hesitate to ask. We are here to help!

- Add inner_product and negative_inner_product distance metrics
- Extend _distance_func_map with new distance functions
- Add where_clause parameter to query() for SQLAlchemy filter support
- Update _convert_distance_to_similarity() with sigmoid for IP metrics
- Improve docstrings with detailed parameter documentation

Part of camel-ai#3649
- Add test_inner_product_distance_metrics for IP similarity conversion
- Add test_query_with_where_clause to verify filter passthrough
- Add test_query_without_where_clause for default behavior
- Add parametrized test_all_distance_metrics_initialization

All tests pass with pytest --fast-test-mode

Part of camel-ai#3649
- Add example_filtered_query() demonstrating where_clause usage
- Add example_distance_metrics() showing all four distance metrics
- Import sqlalchemy.text for filter expressions
- Show filtered ANN queries with category and price filters
- Demonstrate inner_product and negative_inner_product distance

Part of camel-ai#3649
- Simplify default similarity conversion logic for clarity
- Use consistent header format (=== Title ===) matching other examples
- Add expected output documentation for new examples
- example_filtered_query() and example_distance_metrics() outputs

Part of camel-ai#3649
@coderabbitai

coderabbitai Bot commented Jan 28, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@fengju0213 fengju0213 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool ! thanks!@YixinZ-NUS will review it asap

@fengju0213 fengju0213 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @YixinZ-NUS great work!left some comments

Comment on lines +465 to +474
elif self.distance == "inner_product":
# Inner product can be negative (opposite directions)
# Use sigmoid to map (-inf, +inf) to (0, 1)
# Higher IP -> higher similarity
return 1.0 / (1.0 + math.exp(-distance))
elif self.distance == "negative_inner_product":
# Negative inner product: neg_ip = -IP
# Use sigmoid: similarity = sigmoid(-neg_ip) = sigmoid(IP)
# Lower neg_ip (higher IP) -> higher similarity
return 1.0 / (1.0 + math.exp(distance))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the similarity conversion for inner_product and negative_inner_product directly calls math.exp().
large dot-product magnitudes can trigger OverflowError.

Comment on lines 336 to +344
results = self._client.ann_search(
table_name=self.table_name,
vec_data=query.query_vector,
vec_column_name="embedding",
distance_func=distance_func,
with_dist=True,
topk=query.top_k,
output_column_names=["id", "embedding", "metadata"],
where_clause=where_clause,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyobvector’s ann_search orders results by distance_func in ascending order by default (order_by).
When using inner_product, this causes results to be sorted from low to high, meaning top_k returns the least similar vectors.

@waleedalzarooni waleedalzarooni left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only things to amend are @fengju0213's comments

… conversion

- Use negative_inner_product function for inner_product search queries
  so ascending order returns most similar vectors first
- Add _stable_sigmoid() to prevent OverflowError with large dot products
- Update tests and example output accordingly
@yixin-zh

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @fengju0213 and @waleedalzarooni ! I should have verified the output ordering more carefully during initial testing. Both issues are now fixed:

  1. OverflowError with math.exp():
    Added _stable_sigmoid() helper using the identity sigmoid(x) = exp(x)/(1+exp(x)) for x < 0 to prevent overflow with large dot-product magnitudes.
    Added test_stable_sigmoid_overflow() for this.

  2. Wrong ordering for inner_product:
    Since pyobvector's ann_search orders by distance ascending, I now use negative_inner_product function when searching with inner_product metric.
    Also added a new example_arbitrary_distance_metrics() example demonstrating correct ordering with all 4 distance metrics.

@fengju0213

Copy link
Copy Markdown
Collaborator

Thanks for the thorough review @fengju0213 and @waleedalzarooni ! I should have verified the output ordering more carefully during initial testing. Both issues are now fixed:

  1. OverflowError with math.exp():
    Added _stable_sigmoid() helper using the identity sigmoid(x) = exp(x)/(1+exp(x)) for x < 0 to prevent overflow with large dot-product magnitudes.
    Added test_stable_sigmoid_overflow() for this.
  2. Wrong ordering for inner_product:
    Since pyobvector's ann_search orders by distance ascending, I now use negative_inner_product function when searching with inner_product metric.
    Also added a new example_arbitrary_distance_metrics() example demonstrating correct ordering with all 4 distance metrics.

thanks! @YixinZ-NUS looks good now!

@fengju0213 fengju0213 added this to the Sprint 48 milestone Feb 1, 2026
@fengju0213 fengju0213 merged commit 273c1bf into camel-ai:master Feb 26, 2026
7 of 12 checks passed
@fengju0213 fengju0213 modified the milestones: Sprint 48, Sprint 49 Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Add advanced ANN query options in oceanbase

3 participants