Skip to content

Ensure kNN search respects authorization#79693

Merged
jtibshirani merged 2 commits intoelastic:masterfrom
jtibshirani:knn-auth
Oct 26, 2021
Merged

Ensure kNN search respects authorization#79693
jtibshirani merged 2 commits intoelastic:masterfrom
jtibshirani:knn-auth

Conversation

@jtibshirani
Copy link
Copy Markdown
Contributor

This PR ensures the _knn_search endpoint handles both FLS and DLS:

  • Updates FieldSubsetReader to handle FLS for the vectors format
  • Adds tests to check both DLS and FLS work

Relates to #78473.

@jtibshirani jtibshirani added :Search/Search Search-related issues that do not fall into other categories v8.0.0 labels Oct 25, 2021
@jtibshirani jtibshirani force-pushed the knn-auth branch 2 times, most recently from 05ddcfc to 7bf35ed Compare October 25, 2021 04:49
@jtibshirani jtibshirani mentioned this pull request Oct 25, 2021
17 tasks
@jtibshirani jtibshirani marked this pull request as ready for review October 25, 2021 19:30
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 25, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search (Team:Search)

}

public void testKnnSearch() throws Exception {
Settings indexSettings = Settings.builder()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While developing this, I found a bug in KnnVectorQuery where the query can crash if the index reader has no live docs: apache/lucene#413. This is why I set the number of shards to 1 here, and make sure that every user can see at least some number of documents. We could loosen this test once the issue is addressed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside of my area of expertise, but could this be a real issue in production? Or will the lucene bug be fixed before 8.0 GA?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this could be a real issue in production. We'll make sure the Lucene problem is fixed before ES 8.0 GA.

Copy link
Copy Markdown
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jtibshirani , LGTM! Love comments in tests!


client().admin().indices().prepareRefresh("test").get();

// Since there's no kNN search action at the transport layer, we just emulate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice descriptions in tests!

Copy link
Copy Markdown
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I mostly checked the tests. Changes to FieldSubsetReader are really in the search and lucene areas. They look right to me. But I cannot claim to be an expert for this domain.

.addFetchField("field1")
.setSize(5)
.get();
assertTrue(response.getHits().getTotalHits().value <= 10L);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't this be an exact equals check, e.g. assertTrue(response.getHits().getTotalHits().value == 10L);? Also could you please help me understand why we are using a size (5) that is less than the total hits? Theoretical we are not sure whether the other 5 matched docs have the field field1 or not? I don't think this is really the case. Just trying to understand the benefit of having a smaller size vs asserting the complete result set.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great question. The issue is that kNN queries are not guaranteed to find all possible results (https://issues.apache.org/jira/browse/LUCENE-10069). In certain circumstances, even when the index has 'k' documents, they can return slightly fewer than 'k' results. That's why I made these checks loose.

I just tried switching the test to retrieve all matching docs. I ran it 100 times and didn't run into the issue I linked. It seems like the issue doesn't occur with very small indices like the one in this test. So I'll update the approach to retrieve all matching docs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Julie for the great context. 👍

}

public void testKnnSearch() throws Exception {
Settings indexSettings = Settings.builder()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside of my area of expertise, but could this be a real issue in production? Or will the lucene bug be fixed before 8.0 GA?

.setQuery(query)
.setSize(5)
.get();
assertTrue(response.getHits().getTotalHits().value > 10L);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, why cannot it be an "equals" check?
Also Nit: assertThat(response.getHits().getTotalHits().value, greaterThan(10L)) will generate a better error message in case of CI failure.

* Retrieve all matching docs in DocumentLevelSecurityTests
* Also check field loading in FieldLevelSecurityTests
@jtibshirani
Copy link
Copy Markdown
Contributor Author

Thank you both for reviewing!

@jtibshirani jtibshirani merged commit 07428a1 into elastic:master Oct 26, 2021
@jtibshirani jtibshirani deleted the knn-auth branch October 26, 2021 17:48
@jtibshirani jtibshirani added :Search Relevance/Vectors Vector search and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Search Relevance/Vectors Vector search Team:Search Meta label for search team v8.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants