Conversation
http_logs/index.json
Outdated
| { | ||
| "settings": { | ||
| "index.number_of_shards": {{number_of_shards | default(5)}}, | ||
| "index.number_of_shards": {{number_of_shards | default(1)}}, |
There was a problem hiding this comment.
This seemed useful for me when I was testing the amount of space that fields take up. I'm happy to remove it from this PR if we'd like.
There was a problem hiding this comment.
Yeah this will be a surprise to users relying on defaults, I think we should try to keep this default value.
http_logs/index.json
Outdated
| "raw": { | ||
| "ignore_above": 256, | ||
| "type": "keyword" | ||
| {%- if runtime_grok is defined %} |
There was a problem hiding this comment.
This implements using my grok prototype. I'm 10000% sure we aren't going to keep this API, but it is something.
http_logs/index.json
Outdated
| "script": "String m = doc[\"message\"].value; int start = m.lastIndexOf(\" \") + 1; emit(Long.parseLong(m.substring(start)));" | ||
| } | ||
| {%- else %} | ||
| "message": { |
There was a problem hiding this comment.
And this is how things used to be.
|
I've opened this more as a place to share my work then because I expect we'll want to merge it. I think we'll likely want to pick parts of this that we'd like to merge, but I'm not clear which parts and when. |
dliappis
left a comment
There was a problem hiding this comment.
I left some comments, mainly regarding necessary changes to allow us to create the necessary visualizations.
http_logs/challenges/default.json
Outdated
| "warmup-time-period": 240, | ||
| "clients": {{bulk_indexing_clients | default(8)}} | ||
| }, | ||
| {%- if runtime_script_grok is defined %} |
There was a problem hiding this comment.
@nik9000 In order to visualize the queries in https://elasticsearch-benchmarks.elastic.co/#tracks/http-logs/nightly/default/30d we will actually need a second challenge. The query visualizations and the chart generator will not currently visualize more than one operations in the same chart (plus it'd get very noisy given that they already visualize 4 percentiles).
So I suggest we create a new challenge e.g. append-no-conflicts-with-runtime-fields or something like that, where we have all the content of the append-no-conflicts challenge plus the operations in the conditionals here. We can skip the if in the new challenge.
There was a problem hiding this comment.
I have this condition that I call copy-and-paste blindness. If I see a bunch of stuff that looks copy and pasted I can never find the small bits that differ. Is there any way I can write this as a function and call it twice to get two named challenges? That way the if statement stay? Otherwise I'll never be able to find the difference.
There was a problem hiding this comment.
I hear you. You can leverage the {{ rally.collect(parts="") }} helper to reference common sections.
Example approach:
- Create a
commondir under challenges, e.g. like:
├── challenges
│ ├── common
│ │ └── default-schedule.json
│ └── default.json
-
Move the the schedule from the
append-no-conflictschallenge tocommon/default-schedule.json. -
Change the
append-no-conflictschallenge to be:{ "name": "append-no-conflicts", "description": "Indexes the whole document corpus using Elasticsearch default settings. We only adjust the number of replicas as we benchmark a single node cluster and Rally will only start the benchmark if the cluster turns green. Document ids are unique so all index operations are append only. After that a couple of queries are run.", "default": true, "schedule": [ {{ rally.collect(parts="common/default-schedule.json") }} ] }, -
Create a second challenge with a different name + description, same content, that relies on use specifying the
runtime_script_groktrack parameter.
This is just an idea, one could split sections from the schedule of the default challenge to more files and include as required.
http_logs/index.json
Outdated
| { | ||
| "settings": { | ||
| "index.number_of_shards": {{number_of_shards | default(5)}}, | ||
| "index.number_of_shards": {{number_of_shards | default(1)}}, |
There was a problem hiding this comment.
Yeah this will be a surprise to users relying on defaults, I think we should try to keep this default value.
http_logs/index.json
Outdated
| "_source": { | ||
| "enabled": {{ source_enabled | default(true) | tojson }} | ||
| }, | ||
| {%- if runtime_script_grok is defined %} |
There was a problem hiding this comment.
Similarly I think we should limit the amount of files that contain conditionals. I suggest we have a separate index-runtimefields.json and apply it conditional in the central track.json where we anyway have some conditionals.
There was a problem hiding this comment.
I hadn't realized that was a thing. I'm happy to do that.
dliappis
left a comment
There was a problem hiding this comment.
Thanks for iterating and changing things based on comments!
The last thing we need to still, to allow us to create separate charts, is to create a new challenge for the runtime fields. I've left a comment suggesting an approach to reduce copy/pasta lemme know if you need assistance with it.
|
@dliappis I've pushed the dry-stuff. I feel bad that the two challenges are the same and you only get a different thing with a track param. We could totally set variables before importing the template to get some of the changes, but the corpa change is the big one that can't do that. |
@nik9000 I must be doing something wrong, I don't see any additional commits (or signs of a force push) since I reviewed yesterday and don't see two separate challenges. Is there something I am missing? |
I suppose it'd help if I'd actually pushed..... I've pushed now though! |
dliappis
left a comment
There was a problem hiding this comment.
Thanks for iterating! This looks very close to ready, left a few small comments.
http_logs/challenges/default.json
Outdated
| }, | ||
| { | ||
| "name": "append-runtime-script-grok", | ||
| "description": "Indexes the whole document corpus using scripts to extract fields.", |
There was a problem hiding this comment.
Maybe we can add in the description that this relies on setting the track param runtime_script_grok to true.
| "warmup-iterations": 10, | ||
| "iterations": 100, | ||
| "target-throughput": 0.5 | ||
| } No newline at end of file |
| @@ -0,0 +1,183 @@ | |||
| { | |||
There was a problem hiding this comment.
With this level of indentation the rendered track by Rally looks like:
"challenges": [
{
"name": "append-no-conflicts",
"description": "Indexes the whole document corpus using Elasticsearch default settings. We only adjust the number of replicas as we benchmark a single node cluster and Rally will only start the benchmark if the cluster turns green. Document ids are unique so all index operations are append only. After that a couple of queries are run.",
"default": true,
"schedule": [
{
"operation": "delete-index"
},
{
"operation": {
"operation-type": "create-index",
"settings": {}
}
},
to make things look proper when rendered internally by Rally (you can check those rendered files in your $TMPDIR, they are useful for debugging) we should indent by 8 characters
dliappis
left a comment
There was a problem hiding this comment.
This is basically LGTM.
@nik9000 would it be possible to address the comments in https://github.com/elastic/rally-tracks/pull/132/files#r527070658 , https://github.com/elastic/rally-tracks/pull/132/files#r527063535 and https://github.com/elastic/rally-tracks/pull/132/files#r527063075 before we merge?
|
Sure! Sorry, I was off yesterday and have test triage today and sdh
tomorrow and fixing a cache bug with runtime fields has become my
priority. I'll make the change you asked for, but it might take some time.
Just for scceduling reasons.
…On Tue, Nov 24, 2020, 09:32 Dimitrios Liappis ***@***.***> wrote:
***@***.**** commented on this pull request.
This is basically LGTM.
@nik9000 <https://github.com/nik9000> would it be possible to address the
comments in
https://github.com/elastic/rally-tracks/pull/132/files#r527070658 ,
https://github.com/elastic/rally-tracks/pull/132/files#r527063535 and
https://github.com/elastic/rally-tracks/pull/132/files#r527063075 before
we merge?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#132 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABUXIWOLVWOQP62AN4MXOTSRO7W7ANCNFSM4R57Z6GA>
.
|
|
Flagged as backport pending; will do this after a few days. |
Add a new challenge `append-runtime-script-grok` loading the unparsed data in the http_logs track and create all fields as runtime fields that reference the raw message field. Co-authored-by: Dimitrios Liappis <dimitrios.liappis@gmail.com>
|
Backported to 7 in 98a30ce will not backport further to 7.1/7.0. |
Adds an option to load the
unparseddata in thehttp_logstrack andcreate all fields as
runtimefields that reference the rawmessagefield.