Ensure kNN search respects authorization#79693
Conversation
05ddcfc to
7bf35ed
Compare
|
Pinging @elastic/es-search (Team:Search) |
| } | ||
|
|
||
| public void testKnnSearch() throws Exception { | ||
| Settings indexSettings = Settings.builder() |
There was a problem hiding this comment.
While developing this, I found a bug in KnnVectorQuery where the query can crash if the index reader has no live docs: apache/lucene#413. This is why I set the number of shards to 1 here, and make sure that every user can see at least some number of documents. We could loosen this test once the issue is addressed.
There was a problem hiding this comment.
This is outside of my area of expertise, but could this be a real issue in production? Or will the lucene bug be fixed before 8.0 GA?
There was a problem hiding this comment.
Yes this could be a real issue in production. We'll make sure the Lucene problem is fixed before ES 8.0 GA.
mayya-sharipova
left a comment
There was a problem hiding this comment.
Thanks @jtibshirani , LGTM! Love comments in tests!
|
|
||
| client().admin().indices().prepareRefresh("test").get(); | ||
|
|
||
| // Since there's no kNN search action at the transport layer, we just emulate |
There was a problem hiding this comment.
nice descriptions in tests!
ywangd
left a comment
There was a problem hiding this comment.
LGTM
I mostly checked the tests. Changes to FieldSubsetReader are really in the search and lucene areas. They look right to me. But I cannot claim to be an expert for this domain.
| .addFetchField("field1") | ||
| .setSize(5) | ||
| .get(); | ||
| assertTrue(response.getHits().getTotalHits().value <= 10L); |
There was a problem hiding this comment.
Why can't this be an exact equals check, e.g. assertTrue(response.getHits().getTotalHits().value == 10L);? Also could you please help me understand why we are using a size (5) that is less than the total hits? Theoretical we are not sure whether the other 5 matched docs have the field field1 or not? I don't think this is really the case. Just trying to understand the benefit of having a smaller size vs asserting the complete result set.
There was a problem hiding this comment.
This is a great question. The issue is that kNN queries are not guaranteed to find all possible results (https://issues.apache.org/jira/browse/LUCENE-10069). In certain circumstances, even when the index has 'k' documents, they can return slightly fewer than 'k' results. That's why I made these checks loose.
I just tried switching the test to retrieve all matching docs. I ran it 100 times and didn't run into the issue I linked. It seems like the issue doesn't occur with very small indices like the one in this test. So I'll update the approach to retrieve all matching docs.
There was a problem hiding this comment.
Thanks Julie for the great context. 👍
| } | ||
|
|
||
| public void testKnnSearch() throws Exception { | ||
| Settings indexSettings = Settings.builder() |
There was a problem hiding this comment.
This is outside of my area of expertise, but could this be a real issue in production? Or will the lucene bug be fixed before 8.0 GA?
| .setQuery(query) | ||
| .setSize(5) | ||
| .get(); | ||
| assertTrue(response.getHits().getTotalHits().value > 10L); |
There was a problem hiding this comment.
Similarly, why cannot it be an "equals" check?
Also Nit: assertThat(response.getHits().getTotalHits().value, greaterThan(10L)) will generate a better error message in case of CI failure.
...rity/src/internalClusterTest/java/org/elasticsearch/integration/FieldLevelSecurityTests.java
Show resolved
Hide resolved
...rity/src/internalClusterTest/java/org/elasticsearch/integration/FieldLevelSecurityTests.java
Show resolved
Hide resolved
* Retrieve all matching docs in DocumentLevelSecurityTests * Also check field loading in FieldLevelSecurityTests
|
Thank you both for reviewing! |
This PR ensures the
_knn_searchendpoint handles both FLS and DLS:FieldSubsetReaderto handle FLS for the vectors formatRelates to #78473.