Skip to content

Changes for runtime fields#132

Merged
dliappis merged 17 commits intoelastic:masterfrom
nik9000:index_less
Nov 24, 2020
Merged

Changes for runtime fields#132
dliappis merged 17 commits intoelastic:masterfrom
nik9000:index_less

Conversation

@nik9000
Copy link
Copy Markdown
Member

@nik9000 nik9000 commented Sep 29, 2020

Adds an option to load the unparsed data in the http_logs track and
create all fields as runtime fields that reference the raw message
field.

{
"settings": {
"index.number_of_shards": {{number_of_shards | default(5)}},
"index.number_of_shards": {{number_of_shards | default(1)}},
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed useful for me when I was testing the amount of space that fields take up. I'm happy to remove it from this PR if we'd like.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this will be a surprise to users relying on defaults, I think we should try to keep this default value.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

"raw": {
"ignore_above": 256,
"type": "keyword"
{%- if runtime_grok is defined %}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implements using my grok prototype. I'm 10000% sure we aren't going to keep this API, but it is something.

"script": "String m = doc[\"message\"].value; int start = m.lastIndexOf(\" \") + 1; emit(Long.parseLong(m.substring(start)));"
}
{%- else %}
"message": {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is how things used to be.

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Sep 29, 2020

I've opened this more as a place to share my work then because I expect we'll want to merge it. I think we'll likely want to pick parts of this that we'd like to merge, but I'm not clear which parts and when.

@nik9000 nik9000 marked this pull request as ready for review November 16, 2020 19:39
@dliappis dliappis self-requested a review November 17, 2020 11:28
Copy link
Copy Markdown
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, mainly regarding necessary changes to allow us to create the necessary visualizations.

"warmup-time-period": 240,
"clients": {{bulk_indexing_clients | default(8)}}
},
{%- if runtime_script_grok is defined %}
Copy link
Copy Markdown
Contributor

@dliappis dliappis Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 In order to visualize the queries in https://elasticsearch-benchmarks.elastic.co/#tracks/http-logs/nightly/default/30d we will actually need a second challenge. The query visualizations and the chart generator will not currently visualize more than one operations in the same chart (plus it'd get very noisy given that they already visualize 4 percentiles).

So I suggest we create a new challenge e.g. append-no-conflicts-with-runtime-fields or something like that, where we have all the content of the append-no-conflicts challenge plus the operations in the conditionals here. We can skip the if in the new challenge.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this condition that I call copy-and-paste blindness. If I see a bunch of stuff that looks copy and pasted I can never find the small bits that differ. Is there any way I can write this as a function and call it twice to get two named challenges? That way the if statement stay? Otherwise I'll never be able to find the difference.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you. You can leverage the {{ rally.collect(parts="") }} helper to reference common sections.

Example approach:

  1. Create a common dir under challenges, e.g. like:
├── challenges
│   ├── common
│   │   └── default-schedule.json
│   └── default.json
  1. Move the the schedule from the append-no-conflicts challenge to common/default-schedule.json.

  2. Change the append-no-conflicts challenge to be:

      {
        "name": "append-no-conflicts",
        "description": "Indexes the whole document corpus using Elasticsearch default settings. We only adjust the number of replicas as we benchmark a single node cluster and Rally will only start the benchmark if the cluster turns green. Document ids are unique so all index operations are append only. After that a couple of queries are run.",
        "default": true,
        "schedule": [
          {{ rally.collect(parts="common/default-schedule.json") }}
        ]
      },
    
  3. Create a second challenge with a different name + description, same content, that relies on use specifying the runtime_script_grok track parameter.

This is just an idea, one could split sections from the schedule of the default challenge to more files and include as required.

{
"settings": {
"index.number_of_shards": {{number_of_shards | default(5)}},
"index.number_of_shards": {{number_of_shards | default(1)}},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this will be a surprise to users relying on defaults, I think we should try to keep this default value.

"_source": {
"enabled": {{ source_enabled | default(true) | tojson }}
},
{%- if runtime_script_grok is defined %}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly I think we should limit the amount of files that contain conditionals. I suggest we have a separate index-runtimefields.json and apply it conditional in the central track.json where we anyway have some conditionals.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't realized that was a thing. I'm happy to do that.

@dliappis dliappis self-requested a review November 18, 2020 07:09
Copy link
Copy Markdown
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating and changing things based on comments!

The last thing we need to still, to allow us to create separate charts, is to create a new challenge for the runtime fields. I've left a comment suggesting an approach to reduce copy/pasta lemme know if you need assistance with it.

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Nov 19, 2020

@dliappis I've pushed the dry-stuff. I feel bad that the two challenges are the same and you only get a different thing with a track param. We could totally set variables before importing the template to get some of the changes, but the corpa change is the big one that can't do that.

@dliappis
Copy link
Copy Markdown
Contributor

@dliappis I've pushed the dry-stuff. I feel bad that the two challenges are the same and you only get a different thing with a track param. We could totally set variables before importing the template to get some of the changes, but the corpa change is the big one that can't do that.

@nik9000 I must be doing something wrong, I don't see any additional commits (or signs of a force push) since I reviewed yesterday and don't see two separate challenges. Is there something I am missing?

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Nov 19, 2020

@nik9000 I must be doing something wrong, I don't see any additional commits (or signs of a force push) since I reviewed yesterday and don't see two separate challenges. Is there something I am missing?

I suppose it'd help if I'd actually pushed..... I've pushed now though!

Copy link
Copy Markdown
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! This looks very close to ready, left a few small comments.

},
{
"name": "append-runtime-script-grok",
"description": "Indexes the whole document corpus using scripts to extract fields.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add in the description that this relies on setting the track param runtime_script_grok to true.

"warmup-iterations": 10,
"iterations": 100,
"target-throughput": 0.5
} No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add newline

@@ -0,0 +1,183 @@
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this level of indentation the rendered track by Rally looks like:

  "challenges": [
        {
      "name": "append-no-conflicts",
      "description": "Indexes the whole document corpus using Elasticsearch default settings. We only adjust the number of replicas as we benchmark a single node cluster and Rally will only start the benchmark if the cluster turns green. Document ids are unique so all index operations are append only. After that a couple of queries are run.",
      "default": true,
      "schedule": [
        {
  "operation": "delete-index"
},
{
  "operation": {
    "operation-type": "create-index",
    "settings": {}
  }
},

to make things look proper when rendered internally by Rally (you can check those rendered files in your $TMPDIR, they are useful for debugging) we should indent by 8 characters

Copy link
Copy Markdown
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Nov 24, 2020 via email

@dliappis dliappis merged commit 5083c46 into elastic:master Nov 24, 2020
@dliappis dliappis added the backport pending Awaiting backport to stable release branch label Nov 24, 2020
@dliappis
Copy link
Copy Markdown
Contributor

Flagged as backport pending; will do this after a few days.

dliappis added a commit that referenced this pull request Nov 25, 2020
Add a new challenge `append-runtime-script-grok` loading the
unparsed data in the http_logs track and create all fields as
runtime fields that reference the raw message field.

Co-authored-by: Dimitrios Liappis <dimitrios.liappis@gmail.com>
@dliappis
Copy link
Copy Markdown
Contributor

Backported to 7 in 98a30ce

will not backport further to 7.1/7.0.

@dliappis dliappis removed the backport pending Awaiting backport to stable release branch label Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants