Skip to content

Rally benchmark aws.ec2_metrics#8442

Merged
francescayeye merged 5 commits intoelastic:mainfrom
francescayeye:rally_benchmark_aws.ec2_metrics
Nov 23, 2023
Merged

Rally benchmark aws.ec2_metrics#8442
francescayeye merged 5 commits intoelastic:mainfrom
francescayeye:rally_benchmark_aws.ec2_metrics

Conversation

@francescayeye
Copy link
Copy Markdown

@francescayeye francescayeye commented Nov 9, 2023

Enhancement

Proposed commit message

Add artifacts for elastic-package rally benchmark

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
    - [ ] I have added an entry to my package's changelog.yml file.
    - [ ] I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • [ ]

How to test this PR locally

From aws package root (remember to bring up the elastic-package stack before):
./elastic-package benchmark rally --benchmark ec2metrics-benchmark -v

Related issues

Screenshots

--- Benchmark results for package: aws - START ---
╭─────────────────────────────────────────────────────────────────────────────────────────────────╮
│ info                                                                                            │
├────────────────────────┬────────────────────────────────────────────────────────────────────────┤
│ benchmark              │                                                  ec2_metrics-benchmark │
│ description            │                        Benchmark 20000 aws.ec2_metrics events ingested │
│ run ID                 │                                   a8bdbffd-fb3c-4994-aa35-1b88818ad312 │
│ package                │                                                                    aws │
│ start ts (s)           │                                                             1699521510 │
│ end ts (s)             │                                                             1699521543 │
│ duration               │                                                                    33s │
│ generated corpora file │ /Users/andreaspacca/.elastic-package/tmp/rally_corpus/corpus-713170205 │
╰────────────────────────┴────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────────────────────────╮
│ parameters                                                                │
├─────────────────────────────────┬─────────────────────────────────────────┤
│ package version                 │                                   2.8.5 │
│ data_stream.name                │                             ec2_metrics │
│ corpora.generator.total_events  │                                   20000 │
│ corpora.generator.template.path │ ./ec2_metrics-benchmark/template.ndjson │
│ corpora.generator.template.raw  │                                         │
│ corpora.generator.template.type │                                  gotext │
│ corpora.generator.config.path   │      ./ec2_metrics-benchmark/config.yml │
│ corpora.generator.config.raw    │                                   map[] │
│ corpora.generator.fields.path   │      ./ec2_metrics-benchmark/fields.yml │
│ corpora.generator.fields.raw    │                                   map[] │
╰─────────────────────────────────┴─────────────────────────────────────────╯
╭───────────────────────╮
│ cluster info          │
├───────┬───────────────┤
│ name  │ elasticsearch │
│ nodes │             1 │
╰───────┴───────────────╯
╭─────────────────────────────────────────────────────────╮
│ data stream stats                                       │
├────────────────────────────┬────────────────────────────┤
│ data stream                │ metrics-aws.ec2_metrics-ep │
│ approx total docs ingested │                      20000 │
│ backing indices            │                          1 │
│ store size bytes           │                   14742775 │
│ maximum ts (ms)            │              1699525110490 │
╰────────────────────────────┴────────────────────────────╯
╭───────────────────────────────────────╮
│ disk usage for index .ds-metrics-aws. │
│ ec2_metrics-ep-2023.11.09-000001 (for │
│ all fields)                           │
├──────────────────────────────┬────────┤
│ total                        │  14 MB │
│ inverted_index.total         │ 2.2 MB │
│ inverted_index.stored_fields │ 7.6 MB │
│ inverted_index.doc_values    │ 4.0 MB │
│ inverted_index.points        │ 794 kB │
│ inverted_index.norms         │    0 B │
│ inverted_index.term_vectors  │    0 B │
│ inverted_index.knn_vectors   │    0 B │
╰──────────────────────────────┴────────╯
╭───────────────────────────────────────────────────────────────────────────────────╮
│ pipeline metrics-aws.ec2_metrics-2.8.5 stats in node -u0THNwnRLeH1Qslb_aclw       │
├───────────────────────────────────────────┬───────────────────────────────────────┤
│ Totals                                    │ Count: 20000 | Failed: 0 | Time: 11ms │
│ pipeline (metrics-aws.ec2_metrics@custom) │  Count: 20000 | Failed: 0 | Time: 3ms │
╰───────────────────────────────────────────┴───────────────────────────────────────╯
╭────────────────────────────────────────────────────────────────────────────────────────────╮
│ rally stats                                                                                │
├────────────────────────────────────────────────────────────────┬───────────────────────────┤
│ Cumulative indexing time of primary shards                     │   0.14781666666666665 min │
│ Min cumulative indexing time across primary shards             │                     0 min │
│ Median cumulative indexing time across primary shards          │ 0.0006166666666666666 min │
│ Max cumulative indexing time across primary shards             │   0.06888333333333334 min │
│ Cumulative indexing throttle time of primary shards            │                     0 min │
│ Min cumulative indexing throttle time across primary shards    │                     0 min │
│ Median cumulative indexing throttle time across primary shards │                     0 min │
│ Max cumulative indexing throttle time across primary shards    │                     0 min │
│ Cumulative merge time of primary shards                        │ 0.0015666666666666667 min │
│ Cumulative merge count of primary shards                       │                         3 │
│ Min cumulative merge time across primary shards                │                     0 min │
│ Median cumulative merge time across primary shards             │                     0 min │
│ Max cumulative merge time across primary shards                │               0.00055 min │
│ Cumulative merge throttle time of primary shards               │                     0 min │
│ Min cumulative merge throttle time across primary shards       │                     0 min │
│ Median cumulative merge throttle time across primary shards    │                     0 min │
│ Max cumulative merge throttle time across primary shards       │                     0 min │
│ Cumulative refresh time of primary shards                      │  0.028833333333333332 min │
│ Cumulative refresh count of primary shards                     │                       227 │
│ Min cumulative refresh time across primary shards              │                     0 min │
│ Median cumulative refresh time across primary shards           │               0.00025 min │
│ Max cumulative refresh time across primary shards              │  0.014616666666666667 min │
│ Cumulative flush time of primary shards                        │                0.0722 min │
│ Cumulative flush count of primary shards                       │                       120 │
│ Min cumulative flush time across primary shards                │                     0 min │
│ Median cumulative flush time across primary shards             │  0.002816666666666667 min │
│ Max cumulative flush time across primary shards                │  0.012016666666666667 min │
│ Total Young Gen GC time                                        │                    0.02 s │
│ Total Young Gen GC count                                       │                         5 │
│ Total Old Gen GC time                                          │                       0 s │
│ Total Old Gen GC count                                         │                         0 │
│ Store size                                                     │    0.03179096523672342 GB │
│ Translog size                                                  │    0.00132650975137949 GB │
│ Heap used for segments                                         │                      0 MB │
│ Heap used for doc values                                       │                      0 MB │
│ Heap used for terms                                            │                      0 MB │
│ Heap used for norms                                            │                      0 MB │
│ Heap used for points                                           │                      0 MB │
│ Heap used for stored fields                                    │                      0 MB │
│ Segment count                                                  │                       347 │
│ Total Ingest Pipeline count                                    │                     20032 │
│ Total Ingest Pipeline time                                     │                   0.642 s │
│ Total Ingest Pipeline failed                                   │                         0 │
│ Min Throughput                                                 │           24767.24 docs/s │
│ Mean Throughput                                                │           24767.24 docs/s │
│ Median Throughput                                              │           24767.24 docs/s │
│ Max Throughput                                                 │           24767.24 docs/s │
│ 50th percentile latency                                        │      717.6847919999982 ms │
│ 100th percentile latency                                       │      731.2604169999979 ms │
│ 50th percentile service time                                   │      717.6847919999982 ms │
│ 100th percentile service time                                  │      731.2604169999979 ms │
│ error rate                                                     │                    0.00 % │
╰────────────────────────────────────────────────────────────────┴───────────────────────────╯

--- Benchmark results for package: aws - END   ---
Done

@francescayeye francescayeye self-assigned this Nov 9, 2023
@francescayeye francescayeye requested review from a team as code owners November 9, 2023 11:23
@elasticmachine
Copy link
Copy Markdown

elasticmachine commented Nov 9, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-11-15T05:09:19.031+0000

  • Duration: 75 min 15 sec

Test stats 🧪

Test Results
Failed 0
Passed 223
Skipped 3
Total 226

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link
Copy Markdown

elasticmachine commented Nov 10, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (17/17) 💚
Files 94.444% (17/18) 👎 -1.723
Classes 94.444% (17/18) 👎 -1.723
Methods 89.701% (270/301) 👎 -2.556
Lines 86.083% (7571/8795) 👎 -2.522
Conditionals 100.0% (0/0) 💚

# no dimension: 2.5%, AutoScalingGroupName: 10%, ImageId: 5%, InstanceType: 2.5%, InstanceId: 80%
enum: ["", "AutoScalingGroupName", "AutoScalingGroupName", "AutoScalingGroupName", "AutoScalingGroupName", "ImageId", "ImageId", "InstanceType", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId"]
cardinality: 600
# we want every single different "dimension identifier", regardless of its type, to have always the same generated fixed "metadata" once the cardinality kicks in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a little confused about this - isn't the cardinality of the enum fields equal to the number of unique values in the list? what does a cardinality of 600 mean for a field with 15 possible values?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote in the comment :)
please let me know if it's clear:
we want every single different "dimension identifier", regardless of its type, to have always the same generated fixed "metadata" once the cardinality kicks in. for this we must take the ordered highest enum length appending one by one the ones that does not have a 0 module between each others.
we start from the first two, multiply between their values and exclude from the order list the ones that have a 0 module on the result of the multiplication. we end up with the list of enum lengths whose value, multiplied, define the least common multiple: this is the value we must use for the cardinality of all fields.
in this case the remaining enum are two: dimensionType (40) and region (15), resulting in cardinality 600

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i'm still not following this...

it's 40 for dimensionType because it has 40 values in the list, even though there are only 5 possible values?

i think the way the cardinality applies across multiple fields is quite confusing, and there seem to be lots of 'hidden' things that make it work the way it does. is there a way to make it more explicit?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aspacca it is not clear. The LCM of 40 and 15 is 120.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's 40 for dimensionType because it has 40 values in the list, even though there are only 5 possible values?

indeed, the "size" is the length of the enum, even if some values are duplicated

@aspacca it is not clear. The LCM of 40 and 15 is 120.

@MichaelKatsoulis , doh!, you're right! 👍
this might simplify a lot

so: how it works?

when you have cardinality: N for a field, the generator generates N different values, using whatever random logic for each value according to the type of the field (ie: int: rand/rand in a range if present, enum: enumList[rand of enumSize], etc etc).

It generates the N values in order, one after the other: so event 1 will have value 1, event 2 value 2, etc etc
once we arrive at event N+1, we start over with value 1, event N+2 will have value 2 etc etc

if you have multiple fields with cardinality, and you want those field to be linked (meaning that for field1/valueX you want always field2/valueY) you have to set as cardinality of each linked fields the LCM of their size.

what is their size? for enum is the length of the enum list, for integer is the range, etc etc etc.

where you don't have a fixed length (like text or integer with a range), you won't consider it in the calculation for the LCM: you just decide how many different values you want to have, as long as the number fits the LCM of the fields with a size and you set this as the cardinality.

I hope it is clearer now. :)

once you'll find my explanation clear enough I will update the docs :)

period: 60m # one hour
- name: dimensionType
# no dimension: 2.5%, AutoScalingGroupName: 10%, ImageId: 5%, InstanceType: 2.5%, InstanceId: 80%
enum: ["", "AutoScalingGroupName", "AutoScalingGroupName", "AutoScalingGroupName", "AutoScalingGroupName", "ImageId", "ImageId", "InstanceType", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be cool if we could weight enum lists without having to do repetition like this

@@ -0,0 +1,75 @@
- name: timestamp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to just use the mappings defined in the package?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, mappings in the package are schema-c, we are dealing here with schema-b

@francescayeye
Copy link
Copy Markdown
Author

@tommyers-elastic, @MichaelKatsoulis all good here? :)

I've applied the latest change related to LCM cardinality

@MichaelKatsoulis
Copy link
Copy Markdown
Contributor

@tommyers-elastic, @MichaelKatsoulis all good here? :)

All good for me. It is clearer now. Thank you !

@francescayeye
Copy link
Copy Markdown
Author

@elastic/ecosystem I'd need your CR

@jsoriano
Copy link
Copy Markdown
Member

@elastic/ecosystem I'd need your CR

It should not be required as you are only modifying files in the AWS package, I think ecosystem only appears there because there were changes in go.mod before. You probably need a review from the owners of the AWS package, I don't have much to say about the rally files added 🙂

@jsoriano jsoriano requested review from a team and removed request for a team November 23, 2023 13:08
Copy link
Copy Markdown
Contributor

@tommyers-elastic tommyers-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving - but i still find the implicit field linked via cardinality somewhat confusing - and the calculations change of course if we implement havinf weighted enum lists instead of repetition.

i wonder if some kind of field nesting for 'linked' fields would make this clearer?

@francescayeye francescayeye merged commit c209241 into elastic:main Nov 23, 2023
@francescayeye
Copy link
Copy Markdown
Author

i wonder if some kind of field nesting for 'linked' fields would make this clearer?

the problem is that the link could not be just a parent/child relation, and it would be hard to express something else with a nested structure.

I will think to some alternatives, if you have any suggestion you're welcome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants