SOLR-17736: introducing support for KNN search on nested vectors (block join)#3316
Conversation
| vectorField, queryVector, acceptedChildren, topK, allParentsBitSet); | ||
| return new ToParentBlockJoinQuery( | ||
| knnChildren, allParentsBitSet, ScoreModeParser.parse(scoreMode)); | ||
| } else if (isFloatKnnQuery(childrenClauses)) { |
There was a problem hiding this comment.
is the branching on type of Knn query necessary? As far as I can see, the majority of the code blocks are equivalent and the different instance variables of KnnByteVectorQuery/KnnFloatVectorQuery are being used like its generic parent class
There was a problem hiding this comment.
Hi @liangkaiwen, the parent class org.apache.lucene.search.AbstractKnnVectorQuery is not public and also when building the DiversifyingChildrenFloatKnnVectorQuery/DiversifyingChildrenByteKnnVectorQuery I am afraid the if branching can not be avoided?
I agree there's boring repeated code, not sure we can avoid it
There was a problem hiding this comment.
Makes sense. I did not realize the parent class was package private. Maybe there's an argument to be made to make it public so as not to introduce this kind of branching everywhere, but that's a separate lucene question. This is fine
| } catch (IOException e) { | ||
| throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e); | ||
| } |
There was a problem hiding this comment.
curious about the addition of a try/catch block here, perhaps we could have a comment w.r.t. what it might catch?
There was a problem hiding this comment.
I just double checked and it's related to the searcher.rewrite and the searcher.getDocSet, they throw IOException, not sure how much we can comment here
| private boolean isFloatKnnQuery(List<BooleanClause> childrenClauses) { | ||
| return childrenClauses.size() == 1 | ||
| && childrenClauses.get(0).getQuery().getClass().equals(KnnFloatVectorQuery.class); | ||
| } | ||
|
|
||
| private boolean isByteKnnQuery(List<BooleanClause> childrenClauses) { | ||
| return childrenClauses.size() == 1 | ||
| && childrenClauses.get(0).getQuery().getClass().equals(KnnByteVectorQuery.class); | ||
| } |
There was a problem hiding this comment.
wondering if childrenClauses.get(0).getQuery() instances KnnFloatVectorQuery and childrenClauses.get(0).getQuery() instances KnnByteVectorQuery would work too?
There was a problem hiding this comment.
if true is returned then later the caller does (repeat) childrenClauses.get(0).getQuery() and type cast instead of check -- perhaps this could be encapsulated
private KnnFloatVectorQuery getFloatKnnQuery(List<BooleanClause> childrenClauses) {
if (childrenClauses.size() == 1)
{
Query query = childrenClauses.get(0).getQuery();
if (query instanceof KnnFloatVectorQuery)
{
return (KnnFloatVectorQuery) query;
}
}
return null;
}
There was a problem hiding this comment.
done, take a look if you like it, it may be a little bit less readable but I think it's ok!
solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java
Outdated
Show resolved
Hide resolved
…QParser.java Co-authored-by: Christine Poerschke <cpoerschke@apache.org>
…QParser.java Co-authored-by: Christine Poerschke <cpoerschke@apache.org>
|
This PR has had no activity for 60 days and is now labeled as stale. Any new activity will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the dev@solr.apache.org mailing list. To exempt this PR from being marked as stale, make it a draft PR or add the label "exempt-stale". If left unattended, this PR will be closed after another 60 days of inactivity. Thank you for your contribution! |
|
I'll resume this soon, some sponsorship should arrive and my agenda looks more relaxed starting from July |
|
This PR has had no activity for 60 days and is now labeled as stale. Any new activity will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the dev@solr.apache.org mailing list. To exempt this PR from being marked as stale, make it a draft PR or add the label "exempt-stale". If left unattended, this PR will be closed after another 60 days of inactivity. Thank you for your contribution! |
|
This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again. |
|
I'll resume shortly the work on this! |
|
|
||
| This may happen, among other use cases, because you chunked the original text into paragraphs, each of them modeled as a nested document with the paragraph text and the vector representation. | ||
|
|
||
| Solr doesn't need to have denormalised nested documents, you can still retrieve the children paragraphs by knn vector search and prefilter them using parent level metadata. |
There was a problem hiding this comment.
Solr doesn't need to have denormalised nested documents
That sentence fragment confuses me. What does "denomalized nested documents" mean? Do you mean simply that Solr doesn't support multiValued vector fields? If so, just say that please.
There was a problem hiding this comment.
I agree 100%, I wrote it a few months ago but I don't understand it now, a clear sign that a rephrase is necessary.
Possibly I meant to do it as the work in this PR needed to be finished, anyway, I'll spend some time on it very soon, and update the PR (and comments)
|
|
||
| [source,text] | ||
| ---- | ||
| $ curl 'http://localhost:8983/solr/gettingstarted/select' -d 'omitHeader=true' --data-urlencode 'fq={!child of=$block_mask filters=$parentsFilter}&q={!knn f=childVectorField topK=5}[1.0,2.5,3.0...]' --data-urlencode 'block_mask=(*:* -{!prefix f="_nest_path_" v="/skus/" parentsFilter="name_s:pen"})' |
There was a problem hiding this comment.
A new line per param (thus end the line with \) would be helpful to make this more readable. But mostly I think this complicated query needs an english description preceding it.
There was a problem hiding this comment.
+1 I'll sort this out and update the PR soon
|
I'm resuming the work on this briefly tomorrow, and then at full capacity the week of the 1st of December. Thanks, David, for the pointers. I'll carefully read them and respond soon. |
# Conflicts: # solr/core/src/test/org/apache/solr/search/vector/KnnQParserChildTest.java
|
A first revised draft is fully green after the refactor following @hossman suggestions. I'll keep it open for a few days, refine it a bit and then merge it unless critical observations! |
…ck join) (#3316) * first draft * Only Nested Vectors changes * first tests draft, parent filter and children filter missing as a test * tests cleaned * code cleanup * draft documentation * tidy * tidy * add best child per document transformer * add best child per document transformer * add best child per document transformer * minor refinement to avoid some instructions * minor refactor * Update solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java Co-authored-by: Christine Poerschke <cpoerschke@apache.org> * Update solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java Co-authored-by: Christine Poerschke <cpoerschke@apache.org> * new approach following feedback * tests fixed for the new approach * tidy + documentation * tidy + documentation * tidy + documentation * tidy + documentation * minor * renaming after feedback --------- Co-authored-by: Christine Poerschke <cpoerschke@apache.org> (cherry picked from commit 26457c3)
https://issues.apache.org/jira/browse/SOLR-17736
Description
Introducing support for proper KNN search on children/parents block joins.
Solution
Surface Lucene: apache/lucene#12434
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.