Add sort by field with and without can match challenge to http-logs and geonames tracks#357
Conversation
… and geonames tracks We recently found a regression that affected searches sorted by keyword field (elastic/elasticsearch#92026). Given that we had no benchmarks for sorting by keyword, this commit adds the relevant operations to the http-logs and geonames tracks. Geonames is a good base but it's good to also make the new challenges part of the many-shards benchmarks as differences can be appreciated with a high amount of shards involved in a query. This commit adds also specific challenges to verify the effect of elastic/elasticsearch#51852 when a search is sorted by numeric or timestamp.
| "warmup-iterations": 200, | ||
| "iterations": 100, | ||
| "target-throughput": 2 | ||
| }, |
There was a problem hiding this comment.
In order to make these part of many-shards (as these are particularly significant when run against many shards), I am not sure whether I need to add anything to some of the existing files under elastic/logs/challenges. Also, is it common to have some of these challenges not part of the ordinary http-logs benchmarks but only of the many-shards ones? That would be good too but I am not sure if it's an existing pattern .
There was a problem hiding this comment.
i think the example for adding queries to many shards is this:
and https://github.com/elastic/rally-tracks/blob/master/elastic/logs/tasks/field-caps.json
There was a problem hiding this comment.
I am not sure how to parse that :) I am adding my challenges to existing files. Does it mean that I also have to create an additional separate file for many-shards?
There was a problem hiding this comment.
I think, you can create a file with all the new operations and schedules in one file and then include it similar to filed-caps. So under tasks create a new file sort_by_field.json for example and add all the new operations:
{
"name": "sort_country_code_can_match_shortcut",
"operation": {
"operation-type": "search",
"body": {
"track_total_hits": false,
"query": {
"match": {
"timezone": "America"
}
},
"sort" : [
{"country_code.raw" : "asc"}
]
}
},
"warmup-iterations": 200,
"iterations": 100,
"clients": 2
},
...
and then include it to the
{% include "tasks/sort_by_field.json" %}
The only ting we don't seem to be using target throughput and instead driving it as much as we can. So maybe don't specify the target throughout and instead specify the number of clients you want to run the queries?
There was a problem hiding this comment.
Thanks, I've fallen silent because I am trying to gather context. I am catching up with Armin on how many shards benchmarks are run, and it seems that we currently have no search charts for the many shards challenge, which I was expecting I can add to. I will update my PR once I figured out how to proceed.
|
@ebadyano I updated the description of the PR. It seems like more work is needed to add search challenges to the existing many shards benchmarks. I will look into that as a follow-up. For now, adding these bnenchmarks to geonames and http-logs is enough. |
|
@ebadyano could you review this please? It is hopefully ready. |
ebadyano
left a comment
There was a problem hiding this comment.
@javanna Sorry for taking so long to review, I was away. LGTM! Thank you!
I tested it on bare metal env.
with:
rally --skip-update race --telemetry="gc" --target-host="192.168.14.13:9200" --track-path=/var/lib/jenkins/tracks/rally-tracks/http_logs --challenge="append-no-conflicts" --car="4gheap,trial-license,x-pack-security" --on-error="abort" --client-options="timeout:240,use_ssl:true,verify_certs:false,basic_auth_user:'rally',basic_auth_password:'rally-password'" --runtime-jdk="bundled" --pipeline="from-sources" --kill-running-processe
rally --skip-update race --telemetry="gc" --target-host="192.168.14.13:9200" --track-path=/var/lib/jenkins/tracks/rally-tracks/geonames --challenge="append-no-conflicts" --car="defaults,trial-license,x-pack-security" --on-error="abort" --client-options="timeout:240,use_ssl:true,verify_certs:false,basic_auth_user:'rally',basic_auth_password:'rally-password'" --runtime-jdk="bundled" --track-params="{\"max_num_segments\": 1}" --pipeline="from-sources" --kill-running-processes
|
no worries, thanks @ebadyano |
`runtime-fields` challenge requires track parameter `runtime_fields` set to execute the runtime fields. We missed an issue with #357 during ci checks since it was missing.
We recently found a regression that affected searches sorted by a keyword field (elastic/elasticsearch#92026).
Given that we had no benchmarks for sorting by keyword, this commit adds the relevant operations to the http-logs and geonames tracks. We will want to also add similar challenges to the many-shards benchmarks, as the regressions we found can be seen with more than a couple of shards. This commit adds also specific challenges to verify the effect of elastic/elasticsearch#51852 when a search is sorted by numeric or timestamp.