Skip to content
This repository was archived by the owner on Jan 2, 2025. It is now read-only.

Conversation

@rmuller-ml
Copy link
Contributor

@rmuller-ml rmuller-ml commented Nov 27, 2023

Hybrid search replaces semantic search with a mix between semantic search results and lexical search results Results are merged using reciprocal rank fusion (RRF).

To be able to merge results, each search (semantic and lexical) have to operate on the same documents. Here the documents are code snippets and to makes things easier, we use Qdrant as vector and snippet DB/index manager.

Wdrant lexical search does not rank lexical results based on known algorithms (tf-idf, bm25). So we rerank lexical results using word counts and number of query unique words matched.

This PR:

@gitpod-io
Copy link

gitpod-io bot commented Nov 27, 2023

@rmuller-ml rmuller-ml marked this pull request as ready for review November 27, 2023 20:10
Comment on lines +557 to +559
.iter()
.take(limit.try_into().unwrap())
.cloned()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can consume here instead of cloning?

Suggested change
.iter()
.take(limit.try_into().unwrap())
.cloned()
.into_iter()
.take(limit.try_into().unwrap())

@ggordonhall ggordonhall merged commit 2b74f3f into main Dec 1, 2023
@ggordonhall ggordonhall deleted the qdrant_word_tokenizer branch December 1, 2023 09:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants