Add vectors search track by mayya-sharipova · Pull Request #217 · elastic/rally-tracks

mayya-sharipova · 2021-11-10T20:44:28Z

Elasticsearch 8.0 introduces indexed vectors and _knn_search operation
on them. This tracks is for benchmarking indexing, force merging
and _knn_seach operations for indexed vectors.

TODO:

add deep1b dataset to rally-tracks.elastic.co
add _knn_search operation to the challenge, once
Add support for _knn_search rally#1380 is implemented.

Elasticsearch 8.0 introduces indexed vectors and _knn_search operation on them. This tracks is for benchmarking indexing, force merging and _knn_seach operations for indexed vectors. TODO: - add deep1b dataset to rally-tracks.elastic.co - add _knn_search operation to the challenge, once elastic/rally#1380 is implemented.

dliappis

Thanks for raising this!

I took a first look, starting at generating the dataset and left a suggestion for making the convert script a bit more user friendly.

As next I'll test the track itself.

vectors_deep1b/_tools/parse.py

mayya-sharipova · 2021-11-11T20:39:05Z

@dliappis Thanks for the feedback on the script, great suggestions! I've push your suggestions in 48e9487.

Also want to report results from my laptop while running:

./rally race --track-path=.../rally-tracks/vectors_deep1b/ --car="4gheap" --pipeline=from-sources --revision=current --user-tag="intention:vectors1"  --track-params="bulk_indexing_clients:1,ingest_percentage:10"

Results

|                                                         Metric |         Task |       Value |   Unit |
|---------------------------------------------------------------:|-------------:|------------:|-------:|
|                     Cumulative indexing time of primary shards |              |     1.11228 |    min |
|             Min cumulative indexing time across primary shards |              |   0.0116833 |    min |
|          Median cumulative indexing time across primary shards |              |    0.556142 |    min |
|             Max cumulative indexing time across primary shards |              |      1.1006 |    min |
|            Cumulative indexing throttle time of primary shards |              |           0 |    min |
|    Min cumulative indexing throttle time across primary shards |              |           0 |    min |
| Median cumulative indexing throttle time across primary shards |              |           0 |    min |
|    Max cumulative indexing throttle time across primary shards |              |           0 |    min |
|                        Cumulative merge time of primary shards |              |     9.07122 |    min |
|                       Cumulative merge count of primary shards |              |           1 |        |
|                Min cumulative merge time across primary shards |              |           0 |    min |
|             Median cumulative merge time across primary shards |              |     4.53561 |    min |
|                Max cumulative merge time across primary shards |              |     9.07122 |    min |
|               Cumulative merge throttle time of primary shards |              |           0 |    min |
|       Min cumulative merge throttle time across primary shards |              |           0 |    min |
|    Median cumulative merge throttle time across primary shards |              |           0 |    min |
|       Max cumulative merge throttle time across primary shards |              |           0 |    min |
|                      Cumulative refresh time of primary shards |              |     4.45042 |    min |
|                     Cumulative refresh count of primary shards |              |          25 |        |
|              Min cumulative refresh time across primary shards |              |   0.0253167 |    min |
|           Median cumulative refresh time across primary shards |              |     2.22521 |    min |
|              Max cumulative refresh time across primary shards |              |      4.4251 |    min |
|                        Cumulative flush time of primary shards |              |     5.44108 |    min |
|                       Cumulative flush count of primary shards |              |           7 |        |
|                Min cumulative flush time across primary shards |              |   0.0421167 |    min |
|             Median cumulative flush time across primary shards |              |     2.72054 |    min |
|                Max cumulative flush time across primary shards |              |     5.39897 |    min |
|                                        Total Young Gen GC time |              |       2.202 |      s |
|                                       Total Young Gen GC count |              |         118 |        |
|                                          Total Old Gen GC time |              |           0 |      s |
|                                         Total Old Gen GC count |              |           0 |        |
|                                                     Store size |              |     1.56353 |     GB |
|                                                  Translog size |              | 1.02445e-07 |     GB |
|                                         Heap used for segments |              |           0 |     MB |
|                                       Heap used for doc values |              |           0 |     MB |
|                                            Heap used for terms |              |           0 |     MB |
|                                            Heap used for norms |              |           0 |     MB |
|                                           Heap used for points |              |           0 |     MB |
|                                    Heap used for stored fields |              |           0 |     MB |
|                                                  Segment count |              |           8 |        |
|                                                 Min Throughput | index-append |     6753.77 | docs/s |
|                                                Mean Throughput | index-append |     10894.2 | docs/s |
|                                              Median Throughput | index-append |     11498.8 | docs/s |
|                                                 Max Throughput | index-append |     11829.6 | docs/s |
|                                        50th percentile latency | index-append |     391.876 |     ms |
|                                        90th percentile latency | index-append |     451.808 |     ms |
|                                        99th percentile latency | index-append |      624.93 |     ms |
|                                       100th percentile latency | index-append |     709.739 |     ms |
|                                   50th percentile service time | index-append |     391.876 |     ms |
|                                   90th percentile service time | index-append |     451.808 |     ms |
|                                   99th percentile service time | index-append |      624.93 |     ms |
|                                  100th percentile service time | index-append |     709.739 |     ms |
|                                                     error rate | index-append |           0 |      % |


---------------------------------
[INFO] SUCCESS (took 952 seconds)
---------------------------------

dliappis

Thanks for iterating @mayya-sharipova ! I've left a few questions, the main one is whether we should have a warmup time for indexing as per benchmarking best practices.

vectors_deep1b/challenges/default.json

vectors_deep1b/operations/default.json

vectors_deep1b/index.json

vectors_deep1b/challenges/default.json

mayya-sharipova · 2021-11-22T12:02:59Z

@dliappis Thank you for another round of review . I've added some comments to clarify and more questions to ask.

mayya-sharipova · 2021-11-22T20:31:11Z

@dliappis Thanks for another round of review. I am going for vacation, and will be away for some time. @jtibshirani can the best point of contact.

@jtibshirani If you have time, feel free to add modifications to this PR.

jtibshirani · 2021-12-08T01:09:08Z

I pushed some changes:

Update the README to use a different link (still for the same dataset). This link is explicit about the license (CC-BY-4.0). I parsed the dataset and uploaded it to the rally-tracks GCS bucket.
Add _knn_search operations and a script_score query that uses vector functions.
Update the index settings to use 2 shards. This lets us exercise the result merging and fetching logic.

Notes:

On a local test, indexing and force-merging one shard (5 million out of 10 million total vectors) took ~2 hours. So the benchmark will take quite a while to complete!
For now I just randomly chose 3 vectors from the dataset to use as query vectors. Ideally we would use a small "test set" containing many different vectors. This would be a good follow-up, but I wanted to keep the initial PR simple.
Since kNN search uses an approximate algorithm, it's important to measure result accuracy in addition to latency/ throughput. This helps ensure we're using realistic parameters in benchmarks. We don't measure or report accuracy right now, but we should look into that in a follow-up.

jtibshirani · 2021-12-08T01:10:03Z

dense_vector/challenges/default.json

+      "operation": "knn-search-10-500",
+      "warmup-iterations": 100,
+      "iterations": 100,
+      "clients": 4


I wasn't sure what values to select here, feedback is very welcome.

To answer this, first let's clarify if you deliberately want to run this as fast as possible; I don't see target-throughput specified, so it will do that. If this is the intended behavior, what value to choose here is a matter of what we want to benchmark. In this case it makes sense to keep this to 1 to avoid saturating the target.

One the other hand if you'd rather use a target-throughput we should run some tests on the target hardware to calculate the optimal target throughput and probably also specify a different schedule. In this case, we could also use >1 clients; in this case each client will run requests with a rate of target throughput/clients.

We discussed offline and agreed it would make sense to set clients: 1 with no specific target-throughput. This lets us run the track as fast as possible, avoiding unnecessary complexity. We don't have a specific reason to test request parallelism too. The rally team is thinking of moving most "standard" tracks to this simple set-up.

dliappis

@jtibshirani

Thank you for iterating and improving the track so much!

On a local test, indexing and force-merging one shard (5 million out of 10 million total vectors) took ~2 hours. So the benchmark will take quite a while to complete!

2hrs is quite a lot of time in nightly benchmarks. Would it make sense that for the nightlies we reduce the amount of data ingested via the track parameter ingest_percentage?

If we down that path, it may make sense to tweak the warmup and query iterations based on that.

dliappis · 2021-12-08T16:28:33Z

dense_vector/challenges/default.json

+      "name": "knn-search-10-500",
+      "operation": "knn-search-10-500",
+      "warmup-iterations": 100,
+      "iterations": 100,


Both iterations and warmup-iterations could be a function of ingest percentage. We'd need to do some measurements of course. See

rally-tracks/sql/track.json

Line 23 in 3cd7d47

"warmup-iterations": {{ scale_iterations(50) }},

and

rally-tracks/sql/track.json

Lines 3 to 5 in 3cd7d47

{% macro scale_iterations(iterations) -%}

{{ (iterations * (query_percentage | default(100)) / 100) | int }}

{%- endmacro %}

for an example.

This may be an obvious question, but could you explain why the iterations would scale with ingest percentage? It's not clear to me that there'd be a relationship.

This may be an obvious question, but could you explain why the iterations would scale with ingest percentage? It's not clear to me that there'd be a relationship.

I don't think it's an obvious question, so it's good we are discussing it. As I mentioned earlier it could (but doesn't have to). As you mentioned in your comment:

I would suggest ingest_percentage: 20, corresponding to 2 million total docs, or roughly 1 million per shard. I will test this locally and update the number of iterations.

So if we update amount of data we ingest, it might make sense to update the number of iterations (and warmup iterations?), and maybe there is a simple formula we could use.
However, I'd caution that evaluating the right amount of warmup and normal iterations depends on the hardware. Since we intend to run these on the nightly hardware, I suggest we make some measurements on similar hardware. Let's sync offline on how to do that.

dliappis · 2021-12-08T17:05:48Z

dense_vector/challenges/default.json

+      "operation": "knn-search-10-500",
+      "warmup-iterations": 100,
+      "iterations": 100,
+      "clients": 4


To answer this, first let's clarify if you deliberately want to run this as fast as possible; I don't see target-throughput specified, so it will do that. If this is the intended behavior, what value to choose here is a matter of what we want to benchmark. In this case it makes sense to keep this to 1 to avoid saturating the target.

One the other hand if you'd rather use a target-throughput we should run some tests on the target hardware to calculate the optimal target throughput and probably also specify a different schedule. In this case, we could also use >1 clients; in this case each client will run requests with a rate of target throughput/clients.

jtibshirani · 2021-12-08T17:56:29Z

Would it make sense that for the nightlies we reduce the amount of data ingested via the track parameter ingest_percentage?

We can definitely do this. I would suggest ingest_percentage: 20, corresponding to 2 million total docs, or roughly 1 million per shard. I will test this locally and update the number of iterations.

Update: I tested with ingest_percentage: 20, and the current number of iterations provides pretty tight results.

|                                   50th percentile service time |   knn-search-10-500 |     5.26591 |     ms |
|                                   90th percentile service time |   knn-search-10-500 |     6.01791 |     ms |
|                                   99th percentile service time |   knn-search-10-500 |     6.31891 |     ms |
|                                  100th percentile service time |   knn-search-10-500 |     6.36262 |     ms |

|                                   50th percentile service time |  knn-search-10-1000 |      6.0672 |     ms |
|                                   90th percentile service time |  knn-search-10-1000 |     7.26231 |     ms |
|                                   99th percentile service time |  knn-search-10-1000 |     8.08956 |     ms |
|                                  100th percentile service time |  knn-search-10-1000 |     40.5459 |     ms |

|                                   50th percentile service time | knn-search-100-1000 |     9.00785 |     ms |
|                                   90th percentile service time | knn-search-100-1000 |     10.1576 |     ms |
|                                   99th percentile service time | knn-search-100-1000 |     10.7948 |     ms |
|                                  100th percentile service time | knn-search-100-1000 |     10.8415 |     ms |

|                                   50th percentile service time |  script-score-query |     217.926 |     ms |
|                                   90th percentile service time |  script-score-query |      222.94 |     ms |
|                                   99th percentile service time |  script-score-query |      225.98 |     ms |
|                                  100th percentile service time |  script-score-query |     226.436 |     ms |

rjurney · 2021-12-13T19:05:13Z

It looks like this is the final issue holding up ANN support in Elastic? elastic/elasticsearch#78473

jtibshirani · 2021-12-13T23:35:25Z

Hello @rjurney, we already merged support for ANN in Elasticsearch. You can actually try it out as part of an 8.0 early access build (https://www.elastic.co/downloads/past-releases/elasticsearch-8-0-0-beta1). This PR adds a set of benchmarks (to a separate repo) to help us identify performance issues/ improvements. The results look pretty good so far, and we are on track to ship what we've merged.

jtibshirani · 2021-12-15T03:08:53Z

I ran some experiments on the nightly benchmark hardware to select better warm-up periods. Here are some screenshots showing the warm-up cutoffs for index-append and knn-search-10-500 (the other search operations are very similar):

I ran into some difficulties with the index-append operation. Setting a generous client timeout like 240s and using --on-error=abort showed that the bulk index operations sometimes stall and timeout. I managed to find a very stable configuration using bulk_indexing_clients: 1 and a 4GB heap that does not trigger timeouts.

These timeouts really need to be investigated, in case there is a problem with dense vector indexing. I would prefer to merge the track with this working configuration, then follow-up with an investigation and fix in Elasticsearch.

dliappis

Thank you for iterating here and tuning warmup iterations and indexing clients.

We are basically ready to merge, I left a comment with a question about the need to explicitly disable refreshes. I also wanted to clarify: we intent to run this on the nightly setup with 20 bulk ingest percentage right? This is defined elsewhere, so no need to do anything here, just wanted to understand what we want to do later on for nightly runs.

dense_vector/index.json

dliappis

LGTM! Thank you both @jtibshirani and @mayya-sharipova for the work here.

jtibshirani · 2021-12-15T17:01:25Z

I also wanted to clarify: we intent to run this on the nightly setup with 20 bulk ingest percentage right?

That's correct. Here are the notable parameters I used in test runs: --car=4gheap --client-options="timeout:240" --track-params="ingest_percentage:20".

dliappis · 2021-12-15T17:08:35Z

That's correct. Here are the notable parameters I used in test runs: --car=4gheap --client-options="timeout:240" --track-params="ingest_percentage:20".

That's perfect. We'll deal with passing the ingest_percentage parameter and car in another repo as it's specific to nightlies.

For now feel free to (squash please) merge this!

I don't know if you've considered backporting this to any branch so that it can be run against released versions of ES.
There is branch logic in Rally (described in depth here: https://esrally.readthedocs.io/en/latest/track.html?highlight=cherry%20pick#custom-track-repositories) regarding which branch of rally-tracks will be used depending on the ES version is benchmarking against.

In short the most recent branch in the 7 series in this repo is 7.14 so if you'd backport to 7.14 the particular branch would be picked when ES target version is >=7.14.0 and <8.

mayya-sharipova mentioned this pull request Nov 10, 2021

Integrate ANN search elastic/elasticsearch#78473

Closed

17 tasks

dliappis reviewed Nov 11, 2021

View reviewed changes

vectors_deep1b/_tools/parse.py Outdated Show resolved Hide resolved

Address Dimitrios' comments 1

48e9487

mayya-sharipova force-pushed the vectors_deep1b branch from c4acd4e to 48e9487 Compare November 11, 2021 20:35

dliappis reviewed Nov 22, 2021

View reviewed changes

vectors_deep1b/challenges/default.json Outdated Show resolved Hide resolved

vectors_deep1b/operations/default.json Outdated Show resolved Hide resolved

vectors_deep1b/index.json Outdated Show resolved Hide resolved

vectors_deep1b/challenges/default.json Show resolved Hide resolved

Modify warmup-time-period

8667d92

mayya-sharipova force-pushed the vectors_deep1b branch from ad51bf2 to 8667d92 Compare November 22, 2021 20:27

jtibshirani added 5 commits December 2, 2021 16:02

Rename vectors_deep1b to dense_vector

c430ace

Switch to new dataset and enable kNN search

59b5983

Add more search operations

b3b50f2

Replace tabs with two spaces for consistency

f4e6fdd

Adjust index and search parameters

a75de85

jtibshirani reviewed Dec 8, 2021

View reviewed changes

dliappis reviewed Dec 8, 2021

View reviewed changes

jtibshirani added 2 commits December 8, 2021 11:42

Only use one client in search

6d92187

Decrease the bulk index size

9ba5260

jtibshirani added 2 commits December 14, 2021 18:58

Adjust the warm-up periods

3cad145

Use only one indexing client

832241a

dliappis self-requested a review December 15, 2021 08:04

dliappis reviewed Dec 15, 2021

View reviewed changes

dense_vector/index.json Outdated Show resolved Hide resolved

Revert change to refresh_interval

312a0d1

dliappis added enhancement new workload Any work related to adding a new track or functionality within a track and removed enhancement labels Dec 15, 2021

dliappis self-requested a review December 15, 2021 16:21

dliappis approved these changes Dec 15, 2021

View reviewed changes

jtibshirani merged commit ad80ba9 into elastic:master Dec 15, 2021

jtibshirani mentioned this pull request Jan 28, 2022

Finalize dense_vector track #234

Closed

	{% macro scale_iterations(iterations) -%}
	{{ (iterations * (query_percentage \| default(100)) / 100) \| int }}
	{%- endmacro %}

Conversation

mayya-sharipova commented Nov 10, 2021

Uh oh!

dliappis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mayya-sharipova commented Nov 11, 2021

Uh oh!

dliappis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mayya-sharipova commented Nov 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mayya-sharipova commented Nov 22, 2021

Uh oh!

jtibshirani commented Dec 8, 2021

Uh oh!

jtibshirani Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

dliappis Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

jtibshirani Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

dliappis left a comment

Choose a reason for hiding this comment

Uh oh!

dliappis Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

jtibshirani Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

dliappis Dec 9, 2021

Choose a reason for hiding this comment

Uh oh!

dliappis Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

jtibshirani commented Dec 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rjurney commented Dec 13, 2021

Uh oh!

jtibshirani commented Dec 13, 2021

Uh oh!

jtibshirani commented Dec 15, 2021

Uh oh!

dliappis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dliappis left a comment

Choose a reason for hiding this comment

Uh oh!

jtibshirani commented Dec 15, 2021

Uh oh!

dliappis commented Dec 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mayya-sharipova commented Nov 22, 2021 •

edited

Loading

jtibshirani commented Dec 8, 2021 •

edited

Loading