[aws] [cloudwatch_metrics] Map aws.dimensions field as object#11883
[aws] [cloudwatch_metrics] Map aws.dimensions field as object#11883zmoog merged 0 commit intoelastic:mainfrom
Conversation
🚀 Benchmarks reportTo see the full report comment with |
|
|
All For example, here is the # packages/aws/data_stream/ec2_metrics/fields/fields.yml
- name: aws
type: group
fields:
- name: dimensions
type: group
fields:
- name: AutoScalingGroupName
type: keyword
dimension: true
description: An Auto Scaling group is a collection of instances you define if you're using Auto Scaling.
- name: ImageId
type: keyword
dimension: true
description: This dimension filters the data you request for all instances running this Amazon EC2 Amazon Machine Image (AMI)
- name: InstanceId
type: keyword
dimension: true
description: Amazon EC2 instance ID
- name: InstanceType
type: keyword
dimension: true
description: This dimension filters the data you request for all instances running with this specified instance type.Since ALL metrics data streams use this mapping, it probably makes sense to use the |
tetianakravchenko
left a comment
There was a problem hiding this comment.
please add a change log and the version change
There was a problem hiding this comment.
should be added explicitly subobjects: false ? for custom dimension keys
There was a problem hiding this comment.
Yeah, but this requires a package-spec update, which needs several changes. We need to make these changes happen in a dedicated PR.
There was a problem hiding this comment.
ok, maybe add a comment with this information in the field definition as a reminder
There was a problem hiding this comment.
I added a task to upgrade the package-spec to (at least) 3.1.0.
There was a problem hiding this comment.
(it's simpler than expected: there were a lot of errors, but they are instances of the same class of error)
There was a problem hiding this comment.
The PR to bump the package-spec version to 3.3.0 is #11893
@tetianakravchenko, are you okay to bump package-spec to 3.3.0 and introduce subobjects: false in dedicated PRs?
|
Looks good to me. Could you add a section in the description on how to test this locally. A use case that would prove that this uniformity solves the problem of data correlation? |
There was a problem hiding this comment.
Isn't * the default? If so, this can be removed.
There was a problem hiding this comment.
If I am not looking in the wrong place, the object_type_mapping_type definition does not have a default value.
There was a problem hiding this comment.
What I meant is that I think you can remove object_type_mapping_type completely which will also match all mapping types, as no explicit subset of mapping types have been specified.
There was a problem hiding this comment.
I think we could remove the fingerprint in favor of mapping aws.dimensions.* as dimensions. If we decide to do that, we should apply some of the learnings from the compatibility issues introduced in the prometheus integration when we removed the fingerprint and also perform dedicated upgrade tests.
There was a problem hiding this comment.
The part that wasn't reverted is that labels are still mapped as dimensions and the fingerprint is not mapped as a dimension.
There was a problem hiding this comment.
There is a meta issue - #9910, I think such change should be part of this meta issue
Yeah, I'm also in favour of leaving this change to the meta issue #9910 and keep the PR focus to have the same aws.dimensions mapping across the AWS integrations.
WDYT @felixbarny ?
There was a problem hiding this comment.
I'd split this into two parts. What I think we should do in this PR is to do the same what we ended up doing with the Prometheus integration - map dimensions.* as TSDB dimensions, keep the fingerprint processing and field but not map it as a dimension. As a second step, in the context of #9910, we can then work on removing the fingerprinting entirely.
keep the PR focus to have the same aws.dimensions mapping across the AWS integrations.
Sounds like what I'm proposing would be aligned with that? However, I think we shouldn't add dimensions_fingerprint to the more specific aws integrations that previously didn't have that field.
There was a problem hiding this comment.
Ouch.
Thanks for the heads up, I'm restoring the fingerprint then.
There was a problem hiding this comment.
dimensions_fingerprint field restored!
There was a problem hiding this comment.
But you can remove mapping the fingerprint field as a dimension.
Since the new mapping will take place after the look-ahead time has passed, don't we also need to keep the dimensions_fingerprint as as well?
There was a problem hiding this comment.
Yes, we need to keep mapping the field. But we don't need to map it as a dimension, as all dimension.* fields are mapped as a dimension.
There was a problem hiding this comment.
- The current index will continue to map field using the "old" mappings
- when ES performs the rollover we'll get the "new" mappings.
- We need the pipeline to keep the old mapping run while look-ahead time passes.
Got it.
packages/aws/data_stream/cloudwatch_metrics/fields/package-fields.yml
Outdated
Show resolved
Hide resolved
9641abe to
7da35b0
Compare
I guess the main issue for users is (quoting @felixbarny) "[they] can’t build visualizations with flattened fields in Lens, because Kibana doesn’t recognize fields within flattened objects." @MichaelKatsoulis, I updated the "how to test this locally" section accordingly. |
felixbarny
left a comment
There was a problem hiding this comment.
LGTM!
Please create follow-ups for removing dimensions_fingerprint (#9910), and using subobjects: false for dimensions.*.
|
I tested the upgrade from AWS integration 2.36.2 to 2.37.0 (the unreleased changes from this PR) with the following steps:
More details on selected steps. Started sending 1 document every 5 secsI used the following shell script: sequence=0
while true
do
cat > metrics.json <<EOF
{
"@timestamp": "$(date '+%Y-%m-%dT%H:%M:%S%z')",
"aws": {
"dimensions": {
"name": "Maurizio Branca",
"AutoScalingGroupName": "whatever"
},
"metric": {
"cpu": 10,
"sequence": $sequence
}
}
}
EOF
((sequence++))
cat metrics.json | jq -c | es docs bulk -f - -i metrics-aws.cloudwatch_metrics-sdh5390
sleep 5
doneThe scripts sends a document like the following every 5 secs: {
"@timestamp": "2024-12-31T00:14:58+0100",
"aws": {
"dimensions": {
"name": "Maurizio Branca",
"AutoScalingGroupName": "whatever"
},
"metric": {
"cpu": 10,
"sequence": 270
}
}
}Waited for the rollout to take effectRight after the upgrade, Fleet/ES creates a new Old index {
".ds-metrics-aws.cloudwatch_metrics-sdh5390-2024.12.30-000001": {
"settings": {
"index": {
"mapping": {
"total_fields": {
"limit": "1000",
"ignore_dynamic_beyond_limit": "true"
}
},
"hidden": "true",
"time_series": {
"end_time": "2024-12-30T23:05:37.000Z",
"start_time": "2024-12-30T20:35:37.000Z"
},New index {
".ds-metrics-aws.cloudwatch_metrics-sdh5390-2024.12.30-000002": {
"settings": {
"index": {
"mapping": {
"total_fields": {
"limit": "1000",
"ignore_dynamic_beyond_limit": "true"
}
},
"hidden": "true",
"time_series": {
"end_time": "2024-12-30T23:35:37.000Z",
"start_time": "2024-12-30T23:05:37.000Z"
},Checked that the the data stream didn't lose any sequence numberAt |
|
The testing methodology looks great! Longer-term, I wish we had something similar to this for all integrations as automated upgrade tests. The only thing that should be required in an integration is to provide a templated sample document. The rest could be a generic infrastructure. WDYT @jsoriano? |
We have an open issue about testing integration upgrades: elastic/elastic-package#1831
Some integrations start dummy services with docker compose to generate events. But for this case we'd be missing the validation on the sequence numbers. |
tetianakravchenko
left a comment
There was a problem hiding this comment.
@zmoog please add changelog entry and change the version of the integration
|
@zmoog great approach to test this PR - #11883 (comment)! And thank you for sharing! |
ritalwar
left a comment
There was a problem hiding this comment.
LGTM! Just ensure the changelog entry and version update as suggested already.
|
@tetianakravchenko, changelog updated! |
|
💚 Build Succeeded
History
cc @zmoog |
|
Package aws - 2.37.0 containing this change is available at https://epr.elastic.co/package/aws/2.37.0/ |
…c#11883) Change the mapping type for the `aws.dimensions` field from `flattened` to `object`. Currently, all `*_metrics` data streams but one use the `object` mapping. The `cloudwatch_metrics` data stream uses the `flattened` type instead. We need to unify the mapping of `aws.dimensions` across all metrics-related data streams in the AWS integration. If all data streams use the exact mapping for `aws.dimensions`, users will be able to query and build a dashboard that correlates data across different data streams. # Conflicts: # packages/aws/changelog.yml # packages/aws/manifest.yml
…rt of #11883) (#12237) Change the mapping type for the `aws.dimensions` field from `flattened` to `object`. Currently, all `*_metrics` data streams but one use the `object` mapping. The `cloudwatch_metrics` data stream uses the `flattened` type instead. We need to unify the mapping of `aws.dimensions` across all metrics-related data streams in the AWS integration. If all data streams use the exact mapping for `aws.dimensions`, users will be able to query and build a dashboard that correlates data across different data streams. --------- Co-authored-by: muthu-mps <101238137+muthu-mps@users.noreply.github.com> Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
…c#11883) Change the mapping type for the `aws.dimensions` field from `flattened` to `object`. Currently, all `*_metrics` data streams but one use the `object` mapping. The `cloudwatch_metrics` data stream uses the `flattened` type instead. We need to unify the mapping of `aws.dimensions` across all metrics-related data streams in the AWS integration. If all data streams use the exact mapping for `aws.dimensions`, users will be able to query and build a dashboard that correlates data across different data streams.
…rt of #11883) (#12237) Change the mapping type for the `aws.dimensions` field from `flattened` to `object`. Currently, all `*_metrics` data streams but one use the `object` mapping. The `cloudwatch_metrics` data stream uses the `flattened` type instead. We need to unify the mapping of `aws.dimensions` across all metrics-related data streams in the AWS integration. If all data streams use the exact mapping for `aws.dimensions`, users will be able to query and build a dashboard that correlates data across different data streams. --------- Co-authored-by: muthu-mps <101238137+muthu-mps@users.noreply.github.com> Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
…c#11883) Change the mapping type for the `aws.dimensions` field from `flattened` to `object`. Currently, all `*_metrics` data streams but one use the `object` mapping. The `cloudwatch_metrics` data stream uses the `flattened` type instead. We need to unify the mapping of `aws.dimensions` across all metrics-related data streams in the AWS integration. If all data streams use the exact mapping for `aws.dimensions`, users will be able to query and build a dashboard that correlates data across different data streams.





Proposed commit message
Change the mapping type for the
aws.dimensionsfield fromflattenedtoobject.Currently, all
*_metricsdata streams but one use theobjectmapping. Thecloudwatch_metricsdata stream uses theflattenedtype instead.We need to unify the mapping of
aws.dimensionsacross all metrics-related data streams in the AWS integration.If all data streams use the exact mapping for
aws.dimensions, users will be able to query and build a dashboard that correlates data across different data streams.Checklist
changelog.ymlfile.I have verified that any added dashboard complies with Kibana's Dashboard good practicesAuthor's Checklist
subobjects: falsesupport)How to test this PR locally
Check the component template
With 2.36.2, the component template should be something similar to:
Post the test document
Using the Dev Tools, post the following test document:
Try to visualize the metric in Kibana
If I try to visualize the CPU metric in Kibana, the dimension subfields in
aws.dimensions.*are not available:Upgrade the integration to this PR version
Bump the integration version (for example, 2.37.0), and build and update the integration:
elastic-package build && elastic-package stack up -d -v --services package-registryDelete the data stream and re-index the same test document
If we check the mapping, we should now see a dynamic template:
Re-try to visualize the metric in Kibana using the new mapping
With the updated mapping, we can now use
aws.dimensions.*fields in Kibana to break down values by a dimension:Related issues
aws.dimensions.*fields mapping #11806