Add docs for kNN search endpoint by jtibshirani · Pull Request #80378 · elastic/elasticsearch

jtibshirani · 2021-11-04T21:06:23Z

This commit adds docs for the new _knn_search endpoint.

It focuses on being an API reference and is light on details in terms of how
exactly the kNN search works, and how the endpoint contrasts with
script_score queries. We plan to add a high-level guide on kNN search that
will explain this in depth.

Relates to #78473.

jtibshirani · 2021-11-04T21:08:35Z

This is a "rough draft" and isn't ready for a full review. I opened it to get feedback on one point: there is a lot of overlap with the _search endpoint in terms of request and response sections. I started to go down the path of tagging several sections in search.asciidoc so I could include them in knn-search.asciidoc with no copying. Does this seem like a good direction? There are still a few left to tag (took, _shards, hits...)

jtibshirani · 2021-11-04T21:09:24Z

docs/reference/search/search.asciidoc

 `docvalue_fields`::
 (Optional, string) A comma-separated list of fields to return as the docvalue
-representation of a field for each hit.
+representation of a field for each hit. See <<docvalue-fields>>.


Some of these changes aren't closely related to kNN, but I saw the opportunity to improve the field retrieval docs a bit.

These look great. I particularly like the use of "field pattern."

jrodewig

This looks good so far! I'd take the same approach you have.

Tag + reuse is probably a good fit for the query + request body params, but I think it's overkill for the response. We should just link users to the existing search API response body docs and concisely point out any differences.

docs/reference/mapping/types/dense-vector.asciidoc

docs/reference/search/knn-search.asciidoc

elasticmachine · 2021-11-04T22:10:27Z

Pinging @elastic/es-docs (Team:Docs)

elasticmachine · 2021-11-04T22:10:27Z

Pinging @elastic/es-search (Team:Search)

jrodewig

This looks great. I left some comments and suggestions, but feel free to disregard as wanted. Nothing is blocking, aside from a syntax error in the stored_fields def. Thanks!

jrodewig · 2021-11-05T14:16:31Z

docs/reference/search/knn-search.asciidoc

+
+experimental::[]
+
+Performs a k-nearest neighbor search and returns the matching documents.


I'd include the acronym whenever we spell out k-nearest neighbor. I also don't think we need returns matching documents, but it's fine if you want to keep it.

Suggested change

Performs a k-nearest neighbor search and returns the matching documents.

Performs a k-nearest neighbor (kNN) search.

May be a better way to phrase it would be something like: "returns top K documents as found by k-nearest search".

If it's okay with you, I'm going to go with "Performs a k-nearest neighbor (kNN) search and returns the matching documents." I think the wording is clearer and it helps clarify what the API does: kNN search just finds vectors, but the API actually returns documents.

docs/reference/search/knn-search.asciidoc

jrodewig · 2021-11-05T14:32:49Z

docs/reference/search/knn-search.asciidoc

+[source,console]
+----
+PUT my-vector-index
+{
+  "mappings": {
+    "properties": {
+      "image_vector": {
+        "type": "dense_vector",
+        "dims": 3,
+        "index": true,
+        "similarity": "l2_norm"
+      }
+    }
+  }
+}
+
+GET /my-vector-index/_knn_search
+{
+  "knn": {
+    "field": "image_vector",
+    "query_vector": [0.3, 0.1, 1.2],
+    "k": 10,
+    "num_candidates": 100
+  },
+  "_source": ["name", "date"]
+}
+----
+// TEST[setup:my_index]


The // TEST[setup:my_index] comment isn't doing anything. It just creates some extra work for the cluster that runs the snippet tests.

I'd also hide the index creation/mapping setup since this reference is for the API. However, it's not a huge deal if you want to leave it in. I see benefits both ways. If we keep it, I'd index some data so users can test the API quickly.

Minor nit: We're trying to move away from a leading slash in our API/console examples. It's supposed to be easier if a user needs to copy/paste their own endpoint. However, that really, really minor. We still have examples of it everywhere.

Is there a reason we went with _source instead of fields? Not a huge deal, but I figured fields would be preferred.

Suggested change

[source,console]

----

PUT my-vector-index

{

"mappings": {

"properties": {

"image_vector": {

"type": "dense_vector",

"dims": 3,

"index": true,

"similarity": "l2_norm"

}

}

}

}

GET /my-vector-index/_knn_search

{

"knn": {

"field": "image_vector",

"query_vector": [0.3, 0.1, 1.2],

"k": 10,

"num_candidates": 100

},

"_source": ["name", "date"]

}

----

// TEST[setup:my_index]

////

[source,console]

----

PUT my-vector-index

{

"mappings": {

"properties": {

"image_vector": {

"type": "dense_vector",

"dims": 3,

"index": true,

"similarity": "l2_norm"

}

}

}

}

----

// TESTSETUP

////

[source,console]

----

GET my-vector-index/_knn_search

{

"knn": {

"field": "image_vector",

"query_vector": [ 0.3, 0.1, 1.2 ],

"k": 10,

"num_candidates": 100

},

"_source": [ "name", "date" ]

}

----

Thanks for catching these issues! I ended up hiding the mapping set-up.

Is there a reason we went with _source instead of fields? Not a huge deal, but I figured fields would be preferred.

I initially had fields in some API examples, and one of the readers got very confused. Approaching the API for the first time, they thought fields referred to the vector fields for kNN. Unfortunately fields is not a widely-used parameter yet, so users would be learning two different concepts at the same time.

docs/reference/search/knn-search.asciidoc

docs/reference/search/search.asciidoc

docs/reference/search/knn-search.asciidoc

jrodewig · 2021-11-05T15:24:59Z

docs/reference/search/search.asciidoc

 `docvalue_fields`::
 (Optional, string) A comma-separated list of fields to return as the docvalue
-representation of a field for each hit.
+representation of a field for each hit. See <<docvalue-fields>>.


These look great. I particularly like the use of "field pattern."

docs/reference/search/search.asciidoc

docs/reference/search/knn-search.asciidoc

jrodewig · 2021-11-05T15:37:48Z

docs/reference/search/knn-search.asciidoc

+include::{es-repo-dir}/search/search.asciidoc[tag=fields-param-def]
+include::{es-repo-dir}/search/search.asciidoc[tag=docvalue-fields-def]
+include::{es-repo-dir}/search/search.asciidoc[tag=stored-fields-def]
+include::{es-repo-dir}/search/search.asciidoc[tag=source-filtering-def]


We typically try to order parameters alphabetically, but I kinda like this sort.

I can reorder these to be consistent. I was trying to emphasize that fields is what you usually want, so people don't accidentally start using docvalue_fields. But there's probably better ways to do this, like through examples and "how to" guides.

Oh I just realized that if they were truly alphabetical, then fields and docvalue_fields would float above knn! I think that would really be unfortunate, as the knn section contains information that's important to understanding the API.

mayya-sharipova

@jtibshirani Thanks, a great addition! I've left a couple of small comments.

mayya-sharipova · 2021-11-08T12:48:53Z

docs/reference/mapping/types/dense-vector.asciidoc

+* In <<query-dsl-script-score-query,`script_score`>> queries, to score
+documents matching a filter
+* In the <<knn-search, kNN search API>>, to find the _k_ most similar vectors
+to a query vector


. at the end

I don't usually put periods at the end of bulleted items unless they're complete sentences. It looks like most of our docs take this approach too (although we're not totally consistent).

mayya-sharipova · 2021-11-08T12:55:15Z

docs/reference/search/knn-search.asciidoc

+
+experimental::[]
+
+Performs a k-nearest neighbor search and returns the matching documents.


May be a better way to phrase it would be something like: "returns top K documents as found by k-nearest search".

mayya-sharipova · 2021-11-08T12:58:24Z

docs/reference/search/knn-search.asciidoc

+  },
+  "_source": ["name", "date"]
+}
+----


// TEST[setup:my_index], adding to @jrodewig, also looks like the index name is different.

mayya-sharipova · 2021-11-08T13:02:35Z

docs/reference/search/knn-search.asciidoc

+(Required, string) The name of the vector field to search against.
+
+`query_vector`::
+(Required, array of floats) The query vector.


May be we can add that query_vector must have the same dims as an indexed field it searches against. Although it is also kind of obvious, so not sure if it is worth to add.

mayya-sharipova · 2021-11-08T13:08:02Z

docs/reference/search/knn-search.asciidoc

+(Required, integer) The number of nearest neighbor candidates to consider per
+shard. Increasing `num_candidates` tends to improve the accuracy of the final
+`k` results. This value cannot exceed 10,000.


I agree with @jrodewig it would be nice to add more details here how this num_candidates works, for example: "{es} collects from each shard the top num_candidates results, and then merges collected from the shards results to get the top k results. Increasing num_candidates tends to improve the accuracy of the final k results..."

jtibshirani · 2021-11-08T19:48:50Z

@jrodewig @mayya-sharipova thanks for the great comments. I tried to either respond or address them in the latest commits. I plan to merge later today unless you have more feedback.

jimczi · 2021-11-08T22:07:27Z

docs/reference/search/knn-search.asciidoc

+to search. Supports wildcards (`*`). To search all data streams and indices,
+use `*` or `_all`.
+
+NOTE: The kNN search API does not support {ccs}.


Is it something that we forbid explicitly ? I didn't test but it should work transparently since we use the search action internally.

It indeed should already work transparently. I wrote this because I didn't want to set a precedent that it supports CCS while we're still figuring out the execution strategy -- especially around how to combine kNN results with term-based results. However it's not something we forbid explicitly (given our strategy of translating at the REST layer, it's not very simple to do). What are your thoughts?

IMO we're protected by the experimental status. CCS works out of the box and there's nothing that prevents us from removing the support later on. I don't see why we would though so I am not sure we need the NOTE at the moment.

That reasoning makes sense to me, I'll remove the note to keep things simple.

This commit adds docs for the new `_knn_search` endpoint. It focuses on being an API reference and is light on details in terms of how exactly the kNN search works, and how the endpoint contrasts with `script_score` queries. We plan to add a high-level guide on kNN search that will explain this in depth. Relates to #78473.

Add docs for kNN search endpoint

2813715

jtibshirani added >docs General docs changes :Search/Search Search-related issues that do not fall into other categories v8.0.0 v8.1.0 labels Nov 4, 2021

jtibshirani requested a review from jrodewig November 4, 2021 21:08

jtibshirani commented Nov 4, 2021

View reviewed changes

jtibshirani mentioned this pull request Nov 4, 2021

Integrate ANN search #78473

Closed

17 tasks

jrodewig reviewed Nov 4, 2021

View reviewed changes

docs/reference/mapping/types/dense-vector.asciidoc Outdated Show resolved Hide resolved

docs/reference/search/knn-search.asciidoc Outdated Show resolved Hide resolved

jtibshirani added 3 commits November 4, 2021 14:48

Simplify response section

4651db5

More small fixes

451bbca

Make sure to say kNN search

1dacb54

jtibshirani marked this pull request as ready for review November 4, 2021 22:10

elasticmachine added Team:Docs Meta label for docs team Team:Search Meta label for search team labels Nov 4, 2021

jtibshirani requested a review from mayya-sharipova November 4, 2021 22:10

jrodewig approved these changes Nov 5, 2021

View reviewed changes

mayya-sharipova approved these changes Nov 8, 2021

View reviewed changes

jtibshirani added 2 commits November 8, 2021 11:29

Address review comments

41836b4

Add a warning about index alias filters

46ffdca

jimczi reviewed Nov 8, 2021

View reviewed changes

Remove note that CCS doesn't work

7bc4186

jtibshirani merged commit 8ca693b into elastic:master Nov 9, 2021

jtibshirani deleted the knn-search-docs branch November 9, 2021 17:28

mark-vieira removed the v8.0.0 label Jan 12, 2022

mark-vieira added the v8.0.0-rc1 label Jan 12, 2022

jtibshirani added :Search Relevance/Vectors Vector search and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 21, 2022


		experimental::[]

		Performs a k-nearest neighbor search and returns the matching documents.

	Performs a k-nearest neighbor search and returns the matching documents.
	Performs a k-nearest neighbor (kNN) search.

Conversation

jtibshirani commented Nov 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jtibshirani commented Nov 4, 2021

Uh oh!

jtibshirani Nov 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrodewig Nov 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrodewig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

elasticmachine commented Nov 4, 2021

Uh oh!

elasticmachine commented Nov 4, 2021

Uh oh!

jrodewig left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jrodewig Nov 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jrodewig Nov 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jrodewig Nov 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mayya-sharipova left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtibshirani commented Nov 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jimczi Nov 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

jtibshirani commented Nov 4, 2021 •

edited

Loading

jtibshirani Nov 4, 2021 •

edited

Loading

jrodewig Nov 5, 2021 •

edited

Loading

jrodewig Nov 5, 2021 •

edited

Loading

jrodewig Nov 5, 2021 •

edited

Loading

jrodewig Nov 5, 2021 •

edited

Loading

jtibshirani commented Nov 8, 2021 •

edited

Loading

jimczi Nov 9, 2021 •

edited

Loading