Skip to content

MB-65473: Refactor and Optimize Pre-Filtered Vector Search#41

Merged
CascadingRadium merged 9 commits intomasterfrom
preFilterOpt
Apr 1, 2025
Merged

MB-65473: Refactor and Optimize Pre-Filtered Vector Search#41
CascadingRadium merged 9 commits intomasterfrom
preFilterOpt

Conversation

@CascadingRadium
Copy link
Member

@CascadingRadium CascadingRadium commented Mar 25, 2025

  • Add ObtainClusterVectorCountsFromIVFIndex API to return cluster vector counts for given vector IDs.
  • Refactor the SearchClustersFromIVFIndex API to remove the unused nvecs value and the Nvecs attribute from defaultSearchParamsIVF.
  • Remove the NewSearchParamsIVF API and refactor NewSearchParams to accept defaultSearchParamsIVF instead.
  • Requires MB-65473: Batch converter for vector to cluster IDs faiss#49

@CascadingRadium CascadingRadium changed the title MB-65473: Refactor pre-filtered vector search to enhance performance and reduce memory footprint MB-65473: Refactor and Optimize Pre-Filtered Vector Search Mar 25, 2025
@CascadingRadium CascadingRadium requested a review from Copilot March 27, 2025 11:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors and optimizes the pre-filtered vector search implementation, consolidating IVF handling logic and updating related index interface methods. Key changes include:

  • Replacing the separate NewSearchParamsIVF function with a unified NewSearchParams that accepts an optional default parameters pointer.
  • Introducing the IsIVFIndex method and refactoring cluster-related methods to better reflect IVF-specific operations.
  • Adjusting error handling and resource clean-up patterns by enforcing the deletion of allocated resources even on errors.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
search_params.go Refactored search parameters construction, unified IVF/non-IVF handling.
index.go Updated IVF index interface methods and adjusted search function calls.
Comments suppressed due to low confidence (2)

search_params.go:64

  • Consider including the returned error code 'c' in the error message to aid debugging (e.g., 'failed to create faiss search params, code: %d').
if c := C.faiss_SearchParameters_new(&rv.sp, sel); c != 0 {

index.go:158

  • [nitpick] Consider using a more descriptive variable name than 'rv', such as 'clusterCounts', to improve code readability.
rv := make(map[int64]int64, len(vecIDs))

@CascadingRadium CascadingRadium merged commit 371fb38 into master Apr 1, 2025
@CascadingRadium CascadingRadium deleted the preFilterOpt branch April 1, 2025 07:38
abhinavdangeti added a commit to blevesearch/zapx that referenced this pull request Apr 1, 2025
- Refactor pre-filtered vector search to enhance performance and reduce
memory footprint.
- Replace the current bitmap-based cluster selection mechanism with a
simpler approach that uses the DirectMap in the IVF index. The IVF
index's DirectMap directly maps the vector ID to the cluster it belongs
to.
- Make `github.com/bits-and-blooms/bitset` a direct dependency of `zapx`
and upgrade it to
  the latest version
- Requires blevesearch/go-faiss#41

---------

Co-authored-by: Abhinav Dangeti <abhinav@couchbase.com>
abhinavdangeti added a commit to blevesearch/bleve that referenced this pull request Apr 2, 2025
- Refactor pre-filtered vector search to enhance performance and reduce
memory footprint.
- Replace the current bitmap-based approach for calculating segment
local document numbers with a more direct method, where the local
document numbers are mapped directly to the segment ID during the
execution of the eligible collector.
- Requires: 
    - blevesearch/bleve_index_api#63
    - blevesearch/bleve_index_api#66
    - blevesearch/zapx#317
    - blevesearch/go-faiss#41
    - blevesearch/faiss#49

---------

Co-authored-by: Abhinav Dangeti <abhinav@couchbase.com>
CascadingRadium added a commit to blevesearch/zapx that referenced this pull request Apr 7, 2025
- Refactor pre-filtered vector search to enhance performance and reduce
memory footprint.
- Replace the current bitmap-based cluster selection mechanism with a
simpler approach that uses the DirectMap in the IVF index. The IVF
index's DirectMap directly maps the vector ID to the cluster it belongs
to.
- Make `github.com/bits-and-blooms/bitset` a direct dependency of `zapx`
and upgrade it to
  the latest version
- Requires blevesearch/go-faiss#41

---------

Co-authored-by: Abhinav Dangeti <abhinav@couchbase.com>
CascadingRadium added a commit to blevesearch/bleve that referenced this pull request Apr 7, 2025
- Refactor pre-filtered vector search to enhance performance and reduce
memory footprint.
- Replace the current bitmap-based approach for calculating segment
local document numbers with a more direct method, where the local
document numbers are mapped directly to the segment ID during the
execution of the eligible collector.
- Requires:
    - blevesearch/bleve_index_api#63
    - blevesearch/bleve_index_api#66
    - blevesearch/zapx#317
    - blevesearch/go-faiss#41
    - blevesearch/faiss#49

---------

Co-authored-by: Abhinav Dangeti <abhinav@couchbase.com>
abhinavdangeti added a commit to blevesearch/zapx that referenced this pull request Apr 7, 2025
#320)

- Refactor pre-filtered vector search to enhance performance and reduce
memory footprint.
- Replace the current bitmap-based cluster selection mechanism with a
simpler approach that uses the DirectMap in the IVF index. The IVF
index's DirectMap directly maps the vector ID to the cluster it belongs
to.
- Make `github.com/bits-and-blooms/bitset` a direct dependency of `zapx`
and upgrade it to
  the latest version
- Requires blevesearch/go-faiss#41

---------

Co-authored-by: Abhinav Dangeti <abhinav@couchbase.com>
abhinavdangeti added a commit to blevesearch/bleve that referenced this pull request Apr 8, 2025
… (#2175)

- Refactor pre-filtered vector search to enhance performance and reduce
memory footprint.
- Replace the current bitmap-based approach for calculating segment
local document numbers with a more direct method, where the local
document numbers are mapped directly to the segment ID during the
execution of the eligible collector.
- Requires:
    - blevesearch/bleve_index_api#67
    - blevesearch/zapx#320
    - blevesearch/go-faiss#41
    - blevesearch/faiss#49

---------

---------

Co-authored-by: Abhinav Dangeti <abhinav@couchbase.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants