Changes for runtime fields by nik9000 · Pull Request #132 · elastic/rally-tracks

nik9000 · 2020-09-29T16:04:50Z

Adds an option to load the unparsed data in the http_logs track and
create all fields as runtime fields that reference the raw message
field.

nik9000 · 2020-09-29T16:06:12Z

http_logs/index.json

 {
  "settings": {
-    "index.number_of_shards": {{number_of_shards | default(5)}},
+    "index.number_of_shards": {{number_of_shards | default(1)}},


This seemed useful for me when I was testing the amount of space that fields take up. I'm happy to remove it from this PR if we'd like.

Yeah this will be a surprise to users relying on defaults, I think we should try to keep this default value.

http_logs/index.json

nik9000 · 2020-09-29T16:07:40Z

http_logs/index.json

-          "raw": {
-            "ignore_above": 256,
-            "type": "keyword"
+      {%- if runtime_grok is defined %}


This implements using my grok prototype. I'm 10000% sure we aren't going to keep this API, but it is something.

http_logs/index.json

nik9000 · 2020-09-29T16:09:12Z

http_logs/index.json

+          "script": "String m = doc[\"message\"].value; int start = m.lastIndexOf(\" \") + 1; emit(Long.parseLong(m.substring(start)));"
+        }
+      {%- else %}
+        "message": {


And this is how things used to be.

http_logs/index.json

http_logs/operations/default.json

noaa/index.json

nik9000 · 2020-09-29T16:12:03Z

I've opened this more as a place to share my work then because I expect we'll want to merge it. I think we'll likely want to pick parts of this that we'd like to merge, but I'm not clear which parts and when.

I moved it to a separate change.

dliappis

I left some comments, mainly regarding necessary changes to allow us to create the necessary visualizations.

dliappis · 2020-11-17T13:21:15Z

http_logs/challenges/default.json

-          "warmup-time-period": 240,
-          "clients": {{bulk_indexing_clients | default(8)}}
-        },
+        {%- if runtime_script_grok is defined %}


@nik9000 In order to visualize the queries in https://elasticsearch-benchmarks.elastic.co/#tracks/http-logs/nightly/default/30d we will actually need a second challenge. The query visualizations and the chart generator will not currently visualize more than one operations in the same chart (plus it'd get very noisy given that they already visualize 4 percentiles).

So I suggest we create a new challenge e.g. append-no-conflicts-with-runtime-fields or something like that, where we have all the content of the append-no-conflicts challenge plus the operations in the conditionals here. We can skip the if in the new challenge.

I have this condition that I call copy-and-paste blindness. If I see a bunch of stuff that looks copy and pasted I can never find the small bits that differ. Is there any way I can write this as a function and call it twice to get two named challenges? That way the if statement stay? Otherwise I'll never be able to find the difference.

I hear you. You can leverage the {{ rally.collect(parts="") }} helper to reference common sections.

Example approach:

Create a common dir under challenges, e.g. like:

├── challenges │ ├── common │ │ └── default-schedule.json │ └── default.json

Move the the schedule from the append-no-conflicts challenge to common/default-schedule.json.

Change the append-no-conflicts challenge to be:

{ "name": "append-no-conflicts", "description": "Indexes the whole document corpus using Elasticsearch default settings. We only adjust the number of replicas as we benchmark a single node cluster and Rally will only start the benchmark if the cluster turns green. Document ids are unique so all index operations are append only. After that a couple of queries are run.", "default": true, "schedule": [ {{ rally.collect(parts="common/default-schedule.json") }} ] },

Create a second challenge with a different name + description, same content, that relies on use specifying the runtime_script_grok track parameter.

This is just an idea, one could split sections from the schedule of the default challenge to more files and include as required.

dliappis · 2020-11-17T13:21:48Z

http_logs/index.json

 {
  "settings": {
-    "index.number_of_shards": {{number_of_shards | default(5)}},
+    "index.number_of_shards": {{number_of_shards | default(1)}},


Yeah this will be a surprise to users relying on defaults, I think we should try to keep this default value.

dliappis · 2020-11-17T13:22:55Z

http_logs/index.json

    "_source": {
      "enabled": {{ source_enabled | default(true) | tojson }}
    },
+    {%- if runtime_script_grok is defined %}


Similarly I think we should limit the amount of files that contain conditionals. I suggest we have a separate index-runtimefields.json and apply it conditional in the central track.json where we anyway have some conditionals.

I hadn't realized that was a thing. I'm happy to do that.

dliappis

Thanks for iterating and changing things based on comments!

The last thing we need to still, to allow us to create separate charts, is to create a new challenge for the runtime fields. I've left a comment suggesting an approach to reduce copy/pasta lemme know if you need assistance with it.

nik9000 · 2020-11-19T15:06:44Z

@dliappis I've pushed the dry-stuff. I feel bad that the two challenges are the same and you only get a different thing with a track param. We could totally set variables before importing the template to get some of the changes, but the corpa change is the big one that can't do that.

dliappis · 2020-11-19T17:09:08Z

@dliappis I've pushed the dry-stuff. I feel bad that the two challenges are the same and you only get a different thing with a track param. We could totally set variables before importing the template to get some of the changes, but the corpa change is the big one that can't do that.

@nik9000 I must be doing something wrong, I don't see any additional commits (or signs of a force push) since I reviewed yesterday and don't see two separate challenges. Is there something I am missing?

nik9000 · 2020-11-19T17:17:55Z

@nik9000 I must be doing something wrong, I don't see any additional commits (or signs of a force push) since I reviewed yesterday and don't see two separate challenges. Is there something I am missing?

I suppose it'd help if I'd actually pushed..... I've pushed now though!

dliappis

Thanks for iterating! This looks very close to ready, left a few small comments.

dliappis · 2020-11-19T17:20:39Z

http_logs/challenges/default.json

+    },
+    {
+      "name": "append-runtime-script-grok",
+      "description": "Indexes the whole document corpus using scripts to extract fields.",


Maybe we can add in the description that this relies on setting the track param runtime_script_grok to true.

dliappis · 2020-11-19T17:21:22Z

http_logs/challenges/common/default-schedule.json

+  "warmup-iterations": 10,
+  "iterations": 100,
+  "target-throughput": 0.5
+}


nit: add newline

dliappis · 2020-11-19T17:31:55Z

http_logs/challenges/common/default-schedule.json

@@ -0,0 +1,183 @@
+{


With this level of indentation the rendered track by Rally looks like:

"challenges": [ { "name": "append-no-conflicts", "description": "Indexes the whole document corpus using Elasticsearch default settings. We only adjust the number of replicas as we benchmark a single node cluster and Rally will only start the benchmark if the cluster turns green. Document ids are unique so all index operations are append only. After that a couple of queries are run.", "default": true, "schedule": [ { "operation": "delete-index" }, { "operation": { "operation-type": "create-index", "settings": {} } },

to make things look proper when rendered internally by Rally (you can check those rendered files in your $TMPDIR, they are useful for debugging) we should indent by 8 characters

dliappis

This is basically LGTM.

@nik9000 would it be possible to address the comments in https://github.com/elastic/rally-tracks/pull/132/files#r527070658 , https://github.com/elastic/rally-tracks/pull/132/files#r527063535 and https://github.com/elastic/rally-tracks/pull/132/files#r527063075 before we merge?

nik9000 · 2020-11-24T14:41:14Z

Sure! Sorry, I was off yesterday and have test triage today and sdh tomorrow and fixing a cache bug with runtime fields has become my priority. I'll make the change you asked for, but it might take some time. Just for scceduling reasons.

…

On Tue, Nov 24, 2020, 09:32 Dimitrios Liappis ***@***.***> wrote: ***@***.**** commented on this pull request. This is basically LGTM. @nik9000 <https://github.com/nik9000> would it be possible to address the comments in https://github.com/elastic/rally-tracks/pull/132/files#r527070658 , https://github.com/elastic/rally-tracks/pull/132/files#r527063535 and https://github.com/elastic/rally-tracks/pull/132/files#r527063075 before we merge? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#132 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUXIWOLVWOQP62AN4MXOTSRO7W7ANCNFSM4R57Z6GA> .

dliappis · 2020-11-24T15:49:45Z

Flagged as backport pending; will do this after a few days.

Add a new challenge `append-runtime-script-grok` loading the unparsed data in the http_logs track and create all fields as runtime fields that reference the raw message field. Co-authored-by: Dimitrios Liappis <dimitrios.liappis@gmail.com>

dliappis · 2020-11-25T10:43:37Z

Backported to 7 in 98a30ce

will not backport further to 7.1/7.0.

nik9000 added 3 commits September 24, 2020 08:28

WIP

76613f5

Jinja!

c05869c

Drop extra section

be7b97d

nik9000 commented Sep 29, 2020

View reviewed changes

nik9000 added 7 commits October 13, 2020 14:26

Fixup

5d150b6

Rework mappings after change to runtime

f8a2b41

Extact timestamp

e50fc37

Merge branch 'master' into index_less

ad19c42

Remove bug fix

de59bcc

I moved it to a separate change.

Drop noaa changes

c0a2add

Remove runtime_grok

3dbd99b

nik9000 marked this pull request as ready for review November 16, 2020 19:39

nik9000 added 3 commits November 16, 2020 14:40

Explain

b350f12

Blast size stuff

eb177c0

Undo more

9988e26

dliappis self-requested a review November 17, 2020 11:28

dliappis added the enhancement label Nov 17, 2020

dliappis suggested changes Nov 17, 2020

View reviewed changes

nik9000 added 2 commits November 17, 2020 08:31

Default number of shards

35c3547

Break out index.josn

be0864a

dliappis self-requested a review November 18, 2020 07:09

dliappis suggested changes Nov 18, 2020

View reviewed changes

Moar jinja

0e3a3b8

dliappis reviewed Nov 19, 2020

View reviewed changes

dliappis reviewed Nov 24, 2020

View reviewed changes

Address PR comments

0d4fbfe

dliappis merged commit 5083c46 into elastic:master Nov 24, 2020

dliappis added the backport pending Awaiting backport to stable release branch label Nov 24, 2020

dliappis removed the backport pending Awaiting backport to stable release branch label Nov 25, 2020

danielmitterdorfer mentioned this pull request Dec 10, 2020

WIP: Add runtime field operations to http_logs #133

Closed

Conversation

nik9000 commented Sep 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nik9000 commented Sep 29, 2020

Uh oh!

dliappis left a comment

Choose a reason for hiding this comment

Uh oh!

dliappis Nov 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dliappis left a comment

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Nov 19, 2020

Uh oh!

dliappis commented Nov 19, 2020

Uh oh!

nik9000 commented Nov 19, 2020

Uh oh!

dliappis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dliappis left a comment

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Nov 24, 2020 via email

Uh oh!

dliappis commented Nov 24, 2020

Uh oh!

dliappis commented Nov 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nik9000 commented Sep 29, 2020 •

edited

Loading

dliappis Nov 17, 2020 •

edited

Loading