Add notes on indexing to kNN search guide#83188
Conversation
This change adds a new 'indexing considerations' section that explains why index calls can be slow and how force merge can help search latency.
|
|
||
| [discrete] | ||
| [[knn-indexing-considerations]] | ||
| ==== Indexing considerations |
There was a problem hiding this comment.
I'm very open to suggestions for a better name :)
There was a problem hiding this comment.
What is the reason we put this section in knnSearch guide? Isn't a better place for it the indexing section of dense-vector document?
Or we provide these kind of suggestions only in reference guides as this one?
There was a problem hiding this comment.
I was wondering this myself and would appreciate @jrodewig's opinion here. I was thinking of this reference as a combined "how to" for kNN search, where we offer guidance about what method to choose, other best practices, etc. I think of the API and field type docs more as specific references that don't give high-level guidance like this.
There was a problem hiding this comment.
I'm very open to suggestions for a better name :)
I think Indexing considerations is fine. However, you could do Indexing speed if wanted.
I was thinking of this reference as a combined "how to" for kNN search, where we offer guidance about what method to choose, other best practices, etc. I think of the API and field type docs more as specific references that don't give high-level guidance like this.
Yep. That's pretty much my model of these docs. I'm okay with linking from the dense-vector docs to this section, but I think this content is a little too dense and specific to approximate kNN search for the dense-vector reference docs.
|
Pinging @elastic/es-search (Team:Search) |
|
Pinging @elastic/es-docs (Team:Docs) |
mayya-sharipova
left a comment
There was a problem hiding this comment.
@jtibshirani Thanks Julie, the changes LGTM. I am just not sure what's the best place for this next section.
|
|
||
| [discrete] | ||
| [[knn-indexing-considerations]] | ||
| ==== Indexing considerations |
There was a problem hiding this comment.
What is the reason we put this section in knnSearch guide? Isn't a better place for it the indexing section of dense-vector document?
Or we provide these kind of suggestions only in reference guides as this one?
jrodewig
left a comment
There was a problem hiding this comment.
LGTM overall with @mayya-sharipova's edits.
I left some suggestions that swap the ANN reference to HNSW graphs, but I'd like for you to take a look before accepting it. The other suggestions are non-blocking. Feel free to ignore those if wanted.
Thanks @jtibshirani!
|
|
||
| [discrete] | ||
| [[knn-indexing-considerations]] | ||
| ==== Indexing considerations |
There was a problem hiding this comment.
I'm very open to suggestions for a better name :)
I think Indexing considerations is fine. However, you could do Indexing speed if wanted.
I was thinking of this reference as a combined "how to" for kNN search, where we offer guidance about what method to choose, other best practices, etc. I think of the API and field type docs more as specific references that don't give high-level guidance like this.
Yep. That's pretty much my model of these docs. I'm okay with linking from the dense-vector docs to this section, but I think this content is a little too dense and specific to approximate kNN search for the dense-vector reference docs.
| Indexing vectors for approximate kNN search can take substantial time because | ||
| of how expensive it is to build the ANN index structures. You may need to | ||
| increase the client request timeout for index and bulk requests. |
There was a problem hiding this comment.
Instead of ANN, we may want to mention HNSW graphs here. We talk about them in the next paragraph without much of an introduction.
I took a stab at a suggestion, but feel free to edit or ignore if wanted.
| Indexing vectors for approximate kNN search can take substantial time because | |
| of how expensive it is to build the ANN index structures. You may need to | |
| increase the client request timeout for index and bulk requests. | |
| {es} shards are composed of segments, which are internal storage elements in the | |
| index. For approximate kNN search, {es} stores the dense vector values of each | |
| segment as an https://arxiv.org/abs/1603.09320[HNSW graph]. Indexing vectors for | |
| approximate kNN search can take substantial time because of how expensive it is | |
| to build these graphs. You may need to increase the client request timeout for | |
| index and bulk requests. |
There was a problem hiding this comment.
This reorganization works nicely, I'll adopt it.
|
Thanks for the helpful comments! |
This change adds a new 'indexing considerations' section that explains why index calls can be slow and how force merge can help search latency.
* upstream/master: (100 commits) Avoid duplicate _type fields in v7 compat layer (elastic#83239) Bump bundled JDK to 17.0.2+8 (elastic#83243) [DOCS] Correct header syntax (elastic#83275) Add unit tests for indices.recovery.max_bytes_per_sec default values (elastic#83261) [DOCS] Add note that write indices are not replicated (elastic#82997) Add notes on indexing to kNN search guide (elastic#83188) Fix get-snapshot-api :docs:integTest (elastic#83273) FilterPathBasedFilter support match fieldname with dot (elastic#83178) Fix compilation issues in example-plugins (elastic#83258) fix ClusterStateListener javadoc (elastic#83246) Speed up Building Indices Lookup in Metadata (elastic#83241) Mute whole suite for elastic#82502 (elastic#83252) Make PeerFinder log messages happier (elastic#83222) [Docs] Add supported _terms_enum field types (elastic#83244) Add an aggregator for IPv4 and IPv6 subnets (elastic#82410) [CI] Fix 70_time_series/default sort yaml test failures (elastic#83217) Update test-failure Issue Template to include "needs:triage" label elastic#83226 Add an index->step cache to the PolicyStepsRegistry (elastic#82316) Improve support for joda datetime to java datetime transition in Painless (elastic#83099) Fix joda migration for week based methods in Painless (elastic#83232) ... # Conflicts: # x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/TransportRollupAction.java
This change adds a new 'indexing considerations' section that explains why index
calls can be slow and how force merge can help search latency.