Rally benchmark aws.ec2_metrics#8442
Conversation
🌐 Coverage report
|
| # no dimension: 2.5%, AutoScalingGroupName: 10%, ImageId: 5%, InstanceType: 2.5%, InstanceId: 80% | ||
| enum: ["", "AutoScalingGroupName", "AutoScalingGroupName", "AutoScalingGroupName", "AutoScalingGroupName", "ImageId", "ImageId", "InstanceType", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId"] | ||
| cardinality: 600 | ||
| # we want every single different "dimension identifier", regardless of its type, to have always the same generated fixed "metadata" once the cardinality kicks in |
There was a problem hiding this comment.
i'm a little confused about this - isn't the cardinality of the enum fields equal to the number of unique values in the list? what does a cardinality of 600 mean for a field with 15 possible values?
There was a problem hiding this comment.
I wrote in the comment :)
please let me know if it's clear:
we want every single different "dimension identifier", regardless of its type, to have always the same generated fixed "metadata" once the cardinality kicks in. for this we must take the ordered highest enum length appending one by one the ones that does not have a 0 module between each others.
we start from the first two, multiply between their values and exclude from the order list the ones that have a 0 module on the result of the multiplication. we end up with the list of enum lengths whose value, multiplied, define the least common multiple: this is the value we must use for the cardinality of all fields.
in this case the remaining enum are two: dimensionType (40) and region (15), resulting in cardinality 600
There was a problem hiding this comment.
sorry i'm still not following this...
it's 40 for dimensionType because it has 40 values in the list, even though there are only 5 possible values?
i think the way the cardinality applies across multiple fields is quite confusing, and there seem to be lots of 'hidden' things that make it work the way it does. is there a way to make it more explicit?
There was a problem hiding this comment.
@aspacca it is not clear. The LCM of 40 and 15 is 120.
There was a problem hiding this comment.
it's 40 for dimensionType because it has 40 values in the list, even though there are only 5 possible values?
indeed, the "size" is the length of the enum, even if some values are duplicated
@aspacca it is not clear. The LCM of 40 and 15 is 120.
@MichaelKatsoulis , doh!, you're right! 👍
this might simplify a lot
so: how it works?
when you have cardinality: N for a field, the generator generates N different values, using whatever random logic for each value according to the type of the field (ie: int: rand/rand in a range if present, enum: enumList[rand of enumSize], etc etc).
It generates the N values in order, one after the other: so event 1 will have value 1, event 2 value 2, etc etc
once we arrive at event N+1, we start over with value 1, event N+2 will have value 2 etc etc
if you have multiple fields with cardinality, and you want those field to be linked (meaning that for field1/valueX you want always field2/valueY) you have to set as cardinality of each linked fields the LCM of their size.
what is their size? for enum is the length of the enum list, for integer is the range, etc etc etc.
where you don't have a fixed length (like text or integer with a range), you won't consider it in the calculation for the LCM: you just decide how many different values you want to have, as long as the number fits the LCM of the fields with a size and you set this as the cardinality.
I hope it is clearer now. :)
once you'll find my explanation clear enough I will update the docs :)
| period: 60m # one hour | ||
| - name: dimensionType | ||
| # no dimension: 2.5%, AutoScalingGroupName: 10%, ImageId: 5%, InstanceType: 2.5%, InstanceId: 80% | ||
| enum: ["", "AutoScalingGroupName", "AutoScalingGroupName", "AutoScalingGroupName", "AutoScalingGroupName", "ImageId", "ImageId", "InstanceType", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId"] |
There was a problem hiding this comment.
it would be cool if we could weight enum lists without having to do repetition like this
| @@ -0,0 +1,75 @@ | |||
| - name: timestamp | |||
There was a problem hiding this comment.
is it possible to just use the mappings defined in the package?
There was a problem hiding this comment.
no, mappings in the package are schema-c, we are dealing here with schema-b
|
@tommyers-elastic, @MichaelKatsoulis all good here? :) I've applied the latest change related to LCM cardinality |
All good for me. It is clearer now. Thank you ! |
|
@elastic/ecosystem I'd need your CR |
It should not be required as you are only modifying files in the AWS package, I think ecosystem only appears there because there were changes in go.mod before. You probably need a review from the owners of the AWS package, I don't have much to say about the rally files added 🙂 |
tommyers-elastic
left a comment
There was a problem hiding this comment.
approving - but i still find the implicit field linked via cardinality somewhat confusing - and the calculations change of course if we implement havinf weighted enum lists instead of repetition.
i wonder if some kind of field nesting for 'linked' fields would make this clearer?
the problem is that the link could not be just a parent/child relation, and it would be hard to express something else with a nested structure. I will think to some alternatives, if you have any suggestion you're welcome :) |
Enhancement
Proposed commit message
Add artifacts for
elastic-packagerally benchmarkChecklist
- [ ] I have added an entry to my package'schangelog.ymlfile.- [ ] I have verified that Kibana version constraints are current according to guidelines.Author's Checklist
How to test this PR locally
From
awspackage root (remember to bring up the elastic-package stack before):./elastic-package benchmark rally --benchmark ec2metrics-benchmark -vRelated issues
Screenshots