Update dense_vector docs with kNN indexing options#80306
Update dense_vector docs with kNN indexing options#80306jtibshirani merged 8 commits intoelastic:masterfrom
dense_vector docs with kNN indexing options#80306Conversation
|
Pinging @elastic/es-docs (Team:Docs) |
|
Pinging @elastic/es-search (Team:Search) |
|
|
||
| [discrete] | ||
| [[dense-vector-params]] | ||
| ==== Parameters for dense vector fields |
There was a problem hiding this comment.
I struggled with the formatting a bit here -- any corrections/ suggestions are appreciated. The other field types use the [horizontal] styling, but I couldn't get this to work with the sublists and code snippets.
There was a problem hiding this comment.
I think we should update the other field types to remove the [horizontal] attribute and use formatting similar to the parameter definitions in our API docs.
I left a few comments to:
- Add collapsible sections for nested value/properties.
- Add required and data type for each parameter.
There was a problem hiding this comment.
Thanks! I went through and accepted all of these (batched into a new commit).
| only be accessed in scripts through the dedicated <<vector-functions,vector functions>>. | ||
| You can use `dense_vector` fields in | ||
| <<query-dsl-script-score-query,`script_score`>> queries to score documents. | ||
| They can also be indexed to support efficient k-nearest neighbor search. Dense |
There was a problem hiding this comment.
k-nearest neighbor search
Note from @jtibshirani: Link this to the knn search API docs when they're up (in a follow-up PR).
Sorry for accidentally deleting this. I was deleting some of my own comments from a pending review and accidentally clicked the wrong one. 🤦
There was a problem hiding this comment.
efficient k-nearest neighbor search
The link will help, but this feels a little cryptic to me. Should we mention the kNN search API directly?
There was a problem hiding this comment.
I agree it's unclear right now. I'll improve this in the follow-up PR where I document the kNN search API.
mayya-sharipova
left a comment
There was a problem hiding this comment.
@jtibshirani Thanks, great addition!
| The type of kNN algorithm to use. Currently only `hnsw` is supported. | ||
|
|
||
| `m`::: | ||
| Controls how many of the nearest neighbor candidates are connected to each new |
There was a problem hiding this comment.
may be better say, the maximum number of neighbour connections each node can have in the HNSW graph?
There was a problem hiding this comment.
👍 I reworded this to make it clearer. But I didn't say "maximum" because this is the actual number of connections that the algorithm tries to create (not an upper bound).
There was a problem hiding this comment.
My interpretation was that it is an upper bound, in the sense that we don't allow to have more neighbours than m, and will prune the worst to have the number of neighbours always under m.
There was a problem hiding this comment.
Oh I see, I interpreted "maximum" to mean that it's possible many nodes have fewer than m neighbors. But the algorithm attempts to have each node have exactly m neighbors.
In any case, hopefully the current wording is clear :) It matches the description from hnswlib: "M - the number of bi-directional links created for every new element during construction".
jrodewig
left a comment
There was a problem hiding this comment.
Thanks @jtibshirani. I mostly left nits and formatting fixes.
My only larger feedback was around how we talk about script scoring vs. the kNN search API. I wonder if we can be more direct in contrasting these methods. However, I don't think that's blocking. I can address it more in the high-level guide if wanted.
|
|
||
| [discrete] | ||
| [[dense-vector-params]] | ||
| ==== Parameters for dense vector fields |
There was a problem hiding this comment.
I think we should update the other field types to remove the [horizontal] attribute and use formatting similar to the parameter definitions in our API docs.
I left a few comments to:
- Add collapsible sections for nested value/properties.
- Add required and data type for each parameter.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
|
@mayya-sharipova @jrodewig thank you both for the reviews! It looks much more readable now. |
This commit updates the `dense_vector` docs to include information on the new `index`, `similarity`, and `index_options` parameters. It also tries to clarify the difference between `similarity` and `index_options` with the existing parameters that have the same name. Relates to #78473.
This commit updates the
dense_vectordocs to include information on the newindex,similarity, andindex_optionsparameters. It also tries to clarifythe difference between
similarityandindex_optionswith the existingparameters that have the same name.
Relates to #78473.