Skip to content

Add sort by field with and without can match challenge to http-logs and geonames tracks#357

Merged
javanna merged 2 commits intoelastic:masterfrom
javanna:sort_by_keyword
Jan 23, 2023
Merged

Add sort by field with and without can match challenge to http-logs and geonames tracks#357
javanna merged 2 commits intoelastic:masterfrom
javanna:sort_by_keyword

Conversation

@javanna
Copy link
Copy Markdown
Contributor

@javanna javanna commented Dec 19, 2022

We recently found a regression that affected searches sorted by a keyword field (elastic/elasticsearch#92026).

Given that we had no benchmarks for sorting by keyword, this commit adds the relevant operations to the http-logs and geonames tracks. We will want to also add similar challenges to the many-shards benchmarks, as the regressions we found can be seen with more than a couple of shards. This commit adds also specific challenges to verify the effect of elastic/elasticsearch#51852 when a search is sorted by numeric or timestamp.

… and geonames tracks

We recently found a regression that affected searches sorted by keyword field (elastic/elasticsearch#92026).

Given that we had no benchmarks for sorting by keyword, this commit adds the relevant operations to the http-logs and geonames tracks.

Geonames is a good base but it's good to also make the new challenges part of the many-shards benchmarks as differences can be appreciated
with a high amount of shards involved in a query. This commit adds also specific challenges to verify the effect of elastic/elasticsearch#51852
when a search is sorted by numeric or timestamp.
"warmup-iterations": 200,
"iterations": 100,
"target-throughput": 2
},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make these part of many-shards (as these are particularly significant when run against many shards), I am not sure whether I need to add anything to some of the existing files under elastic/logs/challenges. Also, is it common to have some of these challenges not part of the ordinary http-logs benchmarks but only of the many-shards ones? That would be good too but I am not sure if it's an existing pattern .

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the example for adding queries to many shards is this:

{% include "tasks/field-caps.json" %}

and https://github.com/elastic/rally-tracks/blob/master/elastic/logs/tasks/field-caps.json

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to parse that :) I am adding my challenges to existing files. Does it mean that I also have to create an additional separate file for many-shards?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, you can create a file with all the new operations and schedules in one file and then include it similar to filed-caps. So under tasks create a new file sort_by_field.json for example and add all the new operations:

{
  "name": "sort_country_code_can_match_shortcut",
  "operation": {
      "operation-type": "search",
      "body": {
        "track_total_hits": false,
        "query": {
          "match": {
            "timezone": "America"
          }
        },
        "sort" : [
          {"country_code.raw" : "asc"}
        ]
      }
  },
  "warmup-iterations": 200,
  "iterations": 100,
  "clients": 2
},
...

and then include it to the 

{% include "tasks/sort_by_field.json" %}

The only ting we don't seem to be using target throughput and instead driving it as much as we can. So maybe don't specify the target throughout and instead specify the number of clients you want to run the queries?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've fallen silent because I am trying to gather context. I am catching up with Armin on how many shards benchmarks are run, and it seems that we currently have no search charts for the many shards challenge, which I was expecting I can add to. I will update my PR once I figured out how to proceed.

@javanna
Copy link
Copy Markdown
Contributor Author

javanna commented Dec 23, 2022

@ebadyano I updated the description of the PR. It seems like more work is needed to add search challenges to the existing many shards benchmarks. I will look into that as a follow-up. For now, adding these bnenchmarks to geonames and http-logs is enough.

@javanna
Copy link
Copy Markdown
Contributor Author

javanna commented Jan 10, 2023

@ebadyano could you review this please? It is hopefully ready.

Copy link
Copy Markdown
Contributor

@ebadyano ebadyano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna Sorry for taking so long to review, I was away. LGTM! Thank you!

I tested it on bare metal env.
with:

rally --skip-update race --telemetry="gc" --target-host="192.168.14.13:9200" --track-path=/var/lib/jenkins/tracks/rally-tracks/http_logs --challenge="append-no-conflicts" --car="4gheap,trial-license,x-pack-security" --on-error="abort" --client-options="timeout:240,use_ssl:true,verify_certs:false,basic_auth_user:'rally',basic_auth_password:'rally-password'"  --runtime-jdk="bundled"  --pipeline="from-sources"   --kill-running-processe 
 rally --skip-update race --telemetry="gc" --target-host="192.168.14.13:9200" --track-path=/var/lib/jenkins/tracks/rally-tracks/geonames --challenge="append-no-conflicts" --car="defaults,trial-license,x-pack-security" --on-error="abort" --client-options="timeout:240,use_ssl:true,verify_certs:false,basic_auth_user:'rally',basic_auth_password:'rally-password'"  --runtime-jdk="bundled" --track-params="{\"max_num_segments\": 1}" --pipeline="from-sources"   --kill-running-processes

@javanna
Copy link
Copy Markdown
Contributor Author

javanna commented Jan 23, 2023

no worries, thanks @ebadyano

@javanna javanna merged commit 145b0c0 into elastic:master Jan 23, 2023
ebadyano added a commit that referenced this pull request Jan 30, 2023
`runtime-fields` challenge requires track parameter `runtime_fields` set to execute the runtime fields. We missed an issue with #357 during ci checks since it was missing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants