Skip to content

Add filtered vector search support to LSMVectorIndex#3072

Merged
lvca merged 2 commits intomainfrom
copilot/add-filtered-search-support
Dec 24, 2025
Merged

Add filtered vector search support to LSMVectorIndex#3072
lvca merged 2 commits intomainfrom
copilot/add-filtered-search-support

Conversation

Copy link
Contributor

Copilot AI commented Dec 24, 2025

What does this PR do?

Adds filtered vector similarity search to LSMVectorIndex, enabling restriction of search space to a specific subset of records via RID filtering during graph traversal.

Motivation

Users need to filter vector searches by domain criteria (user ID, category, permissions) without post-processing results. Current implementation passes Bits.ALL to JVector, ignoring its built-in filtering capability.

Related issues

Feature Request: Add Filtered Search Support to LSMVectorIndex

Additional Notes

Implementation:

  • RIDBitsFilter: Inner class implementing JVector's Bits interface

    • Maps graph ordinals → vector IDs → RIDs
    • Returns true only if RID in allowed set
    • Uses snapshots for thread safety
  • New API: findNeighborsFromVector(float[] queryVector, int k, Set<RID> allowedRIDs)

    • null or empty set → no filtering (backward compatible)
    • Non-empty set → creates RIDBitsFilter passed to GraphSearcher.search()
  • Original method delegates to new overload with null filter

Example:

LSMVectorIndex index = ...;
Set<RID> userDocuments = Set.of(rid1, rid2, rid3);
List<Pair<RID, Float>> results = 
    index.findNeighborsFromVector(queryVector, 10, userDocuments);

Performance: Filtering during graph traversal vs. post-search filtering is O(k) vs. O(total_results).

Test: filteredSearchByRID validates filtering correctness across multiple scenarios. All 23 existing tests pass.

Checklist

  • I have run the build using mvn clean package command
  • My unit tests cover both failure and success scenarios
Original prompt

This section details on the original issue you should resolve

<issue_title>Feature Request: Add Filtered Search Support to LSMVectorIndex</issue_title>
<issue_description># Feature Request: Add Filtered Search Support to LSMVectorIndex

Description

The LSMVectorIndex class currently uses Bits.ALL (no filter) for all vector searches. This prevents users from efficiently restricting the search space to a specific subset of records (e.g., filtering by user ID or category) during the graph traversal.

While JVector supports passing a Bits object to filter nodes during the search, this capability is not exposed in the LSMVectorIndex API.

Proposed Change

Add a new method to LSMVectorIndex that accepts a Set<RID> (or a generic filter) and maps it to a JVector Bits implementation.

public List<Pair<RID, Float>> findNeighborsFromVector(float[] queryVector, int k, Set<RID> allowedRIDs);

Implementation Details

The implementation should:

  1. Accept a Set<RID> of allowed records.
  2. Create a custom Bits implementation that checks if a node's RID is contained in the allowed set.
  3. Pass this Bits filter to GraphSearcher.search(...) instead of Bits.ALL.

Example Usage (Java)

LSMVectorIndex index = ...;
float[] vector = ...;
Set<RID> allowed = Set.of(new RID(10, 0), new RID(10, 1));

// Perform search restricted to only the RIDs in 'allowed'
List<Pair<RID, Float>> results = index.findNeighborsFromVector(vector, 10, allowed);
```</issue_description>

<agent_instructions>In ArcadeDB LSMVectorIndex we search in JVector with: 
  final SearchResult searchResult = GraphSearcher.search(
      queryVectorFloat,
      k,
      vectors,
      metadata.similarityFunction,
      graphIndex,
      Bits.ALL
  );

Explore last param if instead of Bits.ALL we can pass something else as suggested in the issue</agent_instructions>

## Comments on the Issue (you are @copilot in this section)

<comments>
</comments>


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
Copilot AI changed the title [WIP] Add filtered search support to LSMVectorIndex Add filtered vector search support to LSMVectorIndex Dec 24, 2025
Copilot AI requested a review from lvca December 24, 2025 03:09
@lvca lvca marked this pull request as ready for review December 24, 2025 06:07
@mergify
Copy link
Contributor

mergify bot commented Dec 24, 2025

🧪 CI Insights

Here's what we observed from your CI run for fd49a53.

🟢 All jobs passed!

But CI Insights is watching 👀

@lvca lvca merged commit 792e70d into main Dec 24, 2025
16 of 19 checks passed
@lvca lvca deleted the copilot/add-filtered-search-support branch December 24, 2025 06:09
@lvca lvca added the enhancement New feature or request label Dec 24, 2025
@lvca lvca added this to the 25.12.1 milestone Dec 24, 2025
mergify bot added a commit to robfrank/linklift that referenced this pull request Jan 9, 2026
….1 [skip ci]

Bumps [com.arcadedb:arcadedb-network](https://github.com/ArcadeData/arcadedb) from 25.11.1 to 25.12.1.
Release notes

*Sourced from [com.arcadedb:arcadedb-network's releases](https://github.com/ArcadeData/arcadedb/releases).*

> 25.12.1
> -------
>
> ArcadeDB 25.12.1 Release Notes
> ==============================
>
> We're excited to announce the release of ArcadeDB v25.12.1! This release includes significant bug fixes, new features, performance improvements, and dependency updates.
>
> Highlights
> ----------
>
> ### Vector Search Enhancements
>
> * **Fixed critical vector quantization bug** ([#3052](https://redirect.github.com/ArcadeData/arcadedb/issues/3052), [#3053](https://redirect.github.com/ArcadeData/arcadedb/issues/3053)) - INT8 and BINARY vector quantization now works correctly across all dimensions
> * **New filtered vector search** ([#3071](https://redirect.github.com/ArcadeData/arcadedb/issues/3071), [#3072](https://redirect.github.com/ArcadeData/arcadedb/issues/3072)) - LSMVectorIndex now supports filtered searches for more precise queries
> * **Better vector type support** ([#3090](https://redirect.github.com/ArcadeData/arcadedb/issues/3090)) - Added support for `List<Float>` in vector indexes
> * **Improved compression** ([#2911](https://redirect.github.com/ArcadeData/arcadedb/issues/2911)) - Enhanced compression for LSM vector indexes
> * **Fixed HNSW graph persistence** ([#2916](https://redirect.github.com/ArcadeData/arcadedb/issues/2916)) - Ensures JVector HNSW graph file is properly closed and flushed to disk
>
> ### SQL and Query Improvements
>
> * **Fixed IF statement execution** ([#2775](https://redirect.github.com/ArcadeData/arcadedb/issues/2775)) - SQL scripts with IF statements now execute correctly from console
> * **Fixed index creation with IF NOT EXISTS** ([#1819](https://redirect.github.com/ArcadeData/arcadedb/issues/1819)) - Console no longer errors when creating existing indexes with IF NOT EXISTS clause
> * **Custom function parameter binding** ([#3046](https://redirect.github.com/ArcadeData/arcadedb/issues/3046), [#3049](https://redirect.github.com/ArcadeData/arcadedb/issues/3049)) - Fixed parameter binding for SQL and JavaScript custom functions
> * **SQL method consistency** ([#2964](https://redirect.github.com/ArcadeData/arcadedb/issues/2964), [#2967](https://redirect.github.com/ArcadeData/arcadedb/issues/2967)) - `values()` method now behaves consistently with `keys()` method
> * **CONTAINSANY index fix** ([#3051](https://redirect.github.com/ArcadeData/arcadedb/issues/3051)) - Fixed index usage for lists of embedded documents with CONTAINSANY
>
> ### Transaction Management
>
> * **Revised transaction logic** ([#3074](https://redirect.github.com/ArcadeData/arcadedb/issues/3074)) - Improved transaction handling and consistency
> * **Fixed edge index invalidation** ([#3091](https://redirect.github.com/ArcadeData/arcadedb/issues/3091)) - Edge indexes now remain valid in edge-case scenarios
>
> ### New Features
>
> * **Database size API** ([#3045](https://redirect.github.com/ArcadeData/arcadedb/issues/3045)) - Added new `database.getSize()` API method
> * **Version display enhancement** ([#2905](https://redirect.github.com/ArcadeData/arcadedb/issues/2905)) - Server log version number now displayed consistently
>
> What's Changed
> --------------
>
> ### Bug Fixes
>
> * Fix INT8 and BINARY vector quantization offset bug in LSMVectorIndex page loading by [`@​Copilot`](https://github.com/Copilot) in [ArcadeData/arcadedb#3053](https://redirect.github.com/ArcadeData/arcadedb/pull/3053)
> * fix: revert SQL grammar changes and disable deep level JSON insert tests by [`@​robfrank`](https://github.com/robfrank) in [ArcadeData/arcadedb#2961](https://redirect.github.com/ArcadeData/arcadedb/pull/2961)
> * [#2915](https://redirect.github.com/ArcadeData/arcadedb/issues/2915) fix: ensure Jvector HNSW graph file is closed and flushed to disk on database close by [`@​robfrank`](https://github.com/robfrank) in [ArcadeData/arcadedb#2916](https://redirect.github.com/ArcadeData/arcadedb/pull/2916)
> * fix: make values method behave like keys method by [`@​gramian`](https://github.com/gramian) in [ArcadeData/arcadedb#2967](https://redirect.github.com/ArcadeData/arcadedb/pull/2967)
> * Fix custom function parameter binding for SQL and JavaScript functions by [`@​Copilot`](https://github.com/Copilot) in [ArcadeData/arcadedb#3049](https://redirect.github.com/ArcadeData/arcadedb/pull/3049)
> * fix CONTAINSANY index use for lists of embedded documents by [`@​gramian`](https://github.com/gramian) in [ArcadeData/arcadedb#3051](https://redirect.github.com/ArcadeData/arcadedb/pull/3051)
> * fix: support List in vector index by [`@​szekelyszabi`](https://github.com/szekelyszabi) in [ArcadeData/arcadedb#3090](https://redirect.github.com/ArcadeData/arcadedb/pull/3090)
>
> ### Features
>
> * Show version number same as in server log by [`@​gramian`](https://github.com/gramian) in [ArcadeData/arcadedb#2905](https://redirect.github.com/ArcadeData/arcadedb/pull/2905)
> * feat: added new `database.getSize()` api by [`@​lvca`](https://github.com/lvca) in [ArcadeData/arcadedb#3045](https://redirect.github.com/ArcadeData/arcadedb/pull/3045)
> * Add filtered vector search support to LSMVectorIndex by [`@​Copilot`](https://github.com/Copilot) in [ArcadeData/arcadedb#3072](https://redirect.github.com/ArcadeData/arcadedb/pull/3072)
> * add stars chart by [`@​robfrank`](https://github.com/robfrank) in [ArcadeData/arcadedb#3084](https://redirect.github.com/ArcadeData/arcadedb/pull/3084)
>
> ### Performance Improvements
>
> * Lsm vector fix by [`@​lvca`](https://github.com/lvca) in [ArcadeData/arcadedb#2907](https://redirect.github.com/ArcadeData/arcadedb/pull/2907)
> * perf: improved compression with lsm vectors by [`@​lvca`](https://github.com/lvca) in [ArcadeData/arcadedb#2911](https://redirect.github.com/ArcadeData/arcadedb/pull/2911)

... (truncated)


Commits

* [`6290454`](ArcadeData/arcadedb@6290454) Set release version to 25.12.1
* [`5bdbdfa`](ArcadeData/arcadedb@5bdbdfa) chore: removed system.out
* [`5764b95`](ArcadeData/arcadedb@5764b95) fix: deletion of light edge after last fix
* [`a81163a`](ArcadeData/arcadedb@a81163a) fix: avoid reuse of deleted record in same tx
* [`a42ae5e`](ArcadeData/arcadedb@a42ae5e) perf: avoid conversion of float[] into List<Float> in SQL engine
* [`c8fb3e5`](ArcadeData/arcadedb@c8fb3e5) chore: refactoring conversion functions to float[] in a centralized method
* [`de9bfcf`](ArcadeData/arcadedb@de9bfcf) fix: support List<Float> in vector index ([#3090](https://redirect.github.com/ArcadeData/arcadedb/issues/3090))
* [`9e964ef`](ArcadeData/arcadedb@9e964ef) Merge branch 'main' of <https://github.com/ArcadeData/arcadedb>
* [`07c7d3e`](ArcadeData/arcadedb@07c7d3e) Fixed failing test using java
* [`51a058b`](ArcadeData/arcadedb@51a058b) fix CONTAINSANY index use for lists of embedded documents ([#3051](https://redirect.github.com/ArcadeData/arcadedb/issues/3051))
* Additional commits viewable in [compare view](ArcadeData/arcadedb@25.11.1...25.12.1)
  
[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility\_score?dependency-name=com.arcadedb:arcadedb-network&package-manager=maven&previous-version=25.11.1&new-version=25.12.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
Dependabot commands and options
  
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show  ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
robfrank pushed a commit that referenced this pull request Feb 11, 2026
* Initial plan

* Add filtered search support to LSMVectorIndex with RID-based filtering

Co-authored-by: lvca <312606+lvca@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lvca <312606+lvca@users.noreply.github.com>
(cherry picked from commit 792e70d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add Filtered Search Support to LSMVectorIndex

2 participants