Conversation
Elasticsearch 8.0 introduces indexed vectors and _knn_search operation on them. This tracks is for benchmarking indexing, force merging and _knn_seach operations for indexed vectors. TODO: - add deep1b dataset to rally-tracks.elastic.co - add _knn_search operation to the challenge, once elastic/rally#1380 is implemented.
dliappis
left a comment
There was a problem hiding this comment.
Thanks for raising this!
I took a first look, starting at generating the dataset and left a suggestion for making the convert script a bit more user friendly.
As next I'll test the track itself.
c4acd4e to
48e9487
Compare
|
@dliappis Thanks for the feedback on the script, great suggestions! I've push your suggestions in 48e9487. Also want to report results from my laptop while running: ./rally race --track-path=.../rally-tracks/vectors_deep1b/ --car="4gheap" --pipeline=from-sources --revision=current --user-tag="intention:vectors1" --track-params="bulk_indexing_clients:1,ingest_percentage:10"Results| Metric | Task | Value | Unit |
|---------------------------------------------------------------:|-------------:|------------:|-------:|
| Cumulative indexing time of primary shards | | 1.11228 | min |
| Min cumulative indexing time across primary shards | | 0.0116833 | min |
| Median cumulative indexing time across primary shards | | 0.556142 | min |
| Max cumulative indexing time across primary shards | | 1.1006 | min |
| Cumulative indexing throttle time of primary shards | | 0 | min |
| Min cumulative indexing throttle time across primary shards | | 0 | min |
| Median cumulative indexing throttle time across primary shards | | 0 | min |
| Max cumulative indexing throttle time across primary shards | | 0 | min |
| Cumulative merge time of primary shards | | 9.07122 | min |
| Cumulative merge count of primary shards | | 1 | |
| Min cumulative merge time across primary shards | | 0 | min |
| Median cumulative merge time across primary shards | | 4.53561 | min |
| Max cumulative merge time across primary shards | | 9.07122 | min |
| Cumulative merge throttle time of primary shards | | 0 | min |
| Min cumulative merge throttle time across primary shards | | 0 | min |
| Median cumulative merge throttle time across primary shards | | 0 | min |
| Max cumulative merge throttle time across primary shards | | 0 | min |
| Cumulative refresh time of primary shards | | 4.45042 | min |
| Cumulative refresh count of primary shards | | 25 | |
| Min cumulative refresh time across primary shards | | 0.0253167 | min |
| Median cumulative refresh time across primary shards | | 2.22521 | min |
| Max cumulative refresh time across primary shards | | 4.4251 | min |
| Cumulative flush time of primary shards | | 5.44108 | min |
| Cumulative flush count of primary shards | | 7 | |
| Min cumulative flush time across primary shards | | 0.0421167 | min |
| Median cumulative flush time across primary shards | | 2.72054 | min |
| Max cumulative flush time across primary shards | | 5.39897 | min |
| Total Young Gen GC time | | 2.202 | s |
| Total Young Gen GC count | | 118 | |
| Total Old Gen GC time | | 0 | s |
| Total Old Gen GC count | | 0 | |
| Store size | | 1.56353 | GB |
| Translog size | | 1.02445e-07 | GB |
| Heap used for segments | | 0 | MB |
| Heap used for doc values | | 0 | MB |
| Heap used for terms | | 0 | MB |
| Heap used for norms | | 0 | MB |
| Heap used for points | | 0 | MB |
| Heap used for stored fields | | 0 | MB |
| Segment count | | 8 | |
| Min Throughput | index-append | 6753.77 | docs/s |
| Mean Throughput | index-append | 10894.2 | docs/s |
| Median Throughput | index-append | 11498.8 | docs/s |
| Max Throughput | index-append | 11829.6 | docs/s |
| 50th percentile latency | index-append | 391.876 | ms |
| 90th percentile latency | index-append | 451.808 | ms |
| 99th percentile latency | index-append | 624.93 | ms |
| 100th percentile latency | index-append | 709.739 | ms |
| 50th percentile service time | index-append | 391.876 | ms |
| 90th percentile service time | index-append | 451.808 | ms |
| 99th percentile service time | index-append | 624.93 | ms |
| 100th percentile service time | index-append | 709.739 | ms |
| error rate | index-append | 0 | % |
---------------------------------
[INFO] SUCCESS (took 952 seconds)
--------------------------------- |
dliappis
left a comment
There was a problem hiding this comment.
Thanks for iterating @mayya-sharipova ! I've left a few questions, the main one is whether we should have a warmup time for indexing as per benchmarking best practices.
|
@dliappis Thank you for another round of review . I've added some comments to clarify and more questions to ask. |
ad51bf2 to
8667d92
Compare
|
@dliappis Thanks for another round of review. I am going for vacation, and will be away for some time. @jtibshirani can the best point of contact. @jtibshirani If you have time, feel free to add modifications to this PR. |
|
I pushed some changes:
Notes:
|
dense_vector/challenges/default.json
Outdated
| "operation": "knn-search-10-500", | ||
| "warmup-iterations": 100, | ||
| "iterations": 100, | ||
| "clients": 4 |
There was a problem hiding this comment.
I wasn't sure what values to select here, feedback is very welcome.
There was a problem hiding this comment.
To answer this, first let's clarify if you deliberately want to run this as fast as possible; I don't see target-throughput specified, so it will do that. If this is the intended behavior, what value to choose here is a matter of what we want to benchmark. In this case it makes sense to keep this to 1 to avoid saturating the target.
One the other hand if you'd rather use a target-throughput we should run some tests on the target hardware to calculate the optimal target throughput and probably also specify a different schedule. In this case, we could also use >1 clients; in this case each client will run requests with a rate of target throughput/clients.
There was a problem hiding this comment.
We discussed offline and agreed it would make sense to set clients: 1 with no specific target-throughput. This lets us run the track as fast as possible, avoiding unnecessary complexity. We don't have a specific reason to test request parallelism too. The rally team is thinking of moving most "standard" tracks to this simple set-up.
dliappis
left a comment
There was a problem hiding this comment.
Thank you for iterating and improving the track so much!
On a local test, indexing and force-merging one shard (5 million out of 10 million total vectors) took ~2 hours. So the benchmark will take quite a while to complete!
2hrs is quite a lot of time in nightly benchmarks. Would it make sense that for the nightlies we reduce the amount of data ingested via the track parameter ingest_percentage?
If we down that path, it may make sense to tweak the warmup and query iterations based on that.
| "name": "knn-search-10-500", | ||
| "operation": "knn-search-10-500", | ||
| "warmup-iterations": 100, | ||
| "iterations": 100, |
There was a problem hiding this comment.
This may be an obvious question, but could you explain why the iterations would scale with ingest percentage? It's not clear to me that there'd be a relationship.
There was a problem hiding this comment.
This may be an obvious question, but could you explain why the iterations would scale with ingest percentage? It's not clear to me that there'd be a relationship.
I don't think it's an obvious question, so it's good we are discussing it. As I mentioned earlier it could (but doesn't have to). As you mentioned in your comment:
I would suggest ingest_percentage: 20, corresponding to 2 million total docs, or roughly 1 million per shard. I will test this locally and update the number of iterations.
So if we update amount of data we ingest, it might make sense to update the number of iterations (and warmup iterations?), and maybe there is a simple formula we could use.
However, I'd caution that evaluating the right amount of warmup and normal iterations depends on the hardware. Since we intend to run these on the nightly hardware, I suggest we make some measurements on similar hardware. Let's sync offline on how to do that.
dense_vector/challenges/default.json
Outdated
| "operation": "knn-search-10-500", | ||
| "warmup-iterations": 100, | ||
| "iterations": 100, | ||
| "clients": 4 |
There was a problem hiding this comment.
To answer this, first let's clarify if you deliberately want to run this as fast as possible; I don't see target-throughput specified, so it will do that. If this is the intended behavior, what value to choose here is a matter of what we want to benchmark. In this case it makes sense to keep this to 1 to avoid saturating the target.
One the other hand if you'd rather use a target-throughput we should run some tests on the target hardware to calculate the optimal target throughput and probably also specify a different schedule. In this case, we could also use >1 clients; in this case each client will run requests with a rate of target throughput/clients.
We can definitely do this. I would suggest Update: I tested with |
|
It looks like this is the final issue holding up ANN support in Elastic? elastic/elasticsearch#78473 |
|
Hello @rjurney, we already merged support for ANN in Elasticsearch. You can actually try it out as part of an 8.0 early access build (https://www.elastic.co/downloads/past-releases/elasticsearch-8-0-0-beta1). This PR adds a set of benchmarks (to a separate repo) to help us identify performance issues/ improvements. The results look pretty good so far, and we are on track to ship what we've merged. |
|
I ran some experiments on the nightly benchmark hardware to select better warm-up periods. Here are some screenshots showing the warm-up cutoffs for I ran into some difficulties with the These timeouts really need to be investigated, in case there is a problem with dense vector indexing. I would prefer to merge the track with this working configuration, then follow-up with an investigation and fix in Elasticsearch. |
dliappis
left a comment
There was a problem hiding this comment.
Thank you for iterating here and tuning warmup iterations and indexing clients.
We are basically ready to merge, I left a comment with a question about the need to explicitly disable refreshes. I also wanted to clarify: we intent to run this on the nightly setup with 20 bulk ingest percentage right? This is defined elsewhere, so no need to do anything here, just wanted to understand what we want to do later on for nightly runs.
dliappis
left a comment
There was a problem hiding this comment.
LGTM! Thank you both @jtibshirani and @mayya-sharipova for the work here.
That's correct. Here are the notable parameters I used in test runs: |
That's perfect. We'll deal with passing the ingest_percentage parameter and car in another repo as it's specific to nightlies. For now feel free to (squash please) merge this! I don't know if you've considered backporting this to any branch so that it can be run against released versions of ES. In short the most recent branch in the 7 series in this repo is |


Elasticsearch 8.0 introduces indexed vectors and _knn_search operation
on them. This tracks is for benchmarking indexing, force merging
and _knn_seach operations for indexed vectors.
TODO:
add deep1b dataset to rally-tracks.elastic.co
add _knn_search operation to the challenge, once
Add support for _knn_search rally#1380 is implemented.