Relevent telegraf.conf
# not just this config since this is a bug in the coding
[[inputs.cloudwatch]]
region = "eu-central-1"
access_key = "<redacted>"
secret_key = "<redacted>"
period = "1m"
delay = "30m"
interval = "5m"
namespaces = ["AWS/NATGateway"]
statistic_include = [ "sum" ]
[[inputs.cloudwatch.metrics]]
names = ["BytesInFromDestination"]
[[inputs.cloudwatch.metrics.dimensions]]
name = "NatGatewayId"
value = "*"
[[outputs.file]]
files = ["stdout"]
data_format = "json"
System info
macOS 12.0.1; go version go1.17.2 darwin/amd64; telegraf 1.20.4 (and master)
Docker
No response
Steps to reproduce
- Using the above config with multiple NAT Gateways deployed on AWS, run telegraf
telegraf --config telegraf.conf --input-filter cloudwatch
Expected behavior
I can see the metric BytesInFromDestination for each NAT GW
Actual behavior
The value of the BytesInFromDestination metric is the same for all NAT GW
Additional info
I did some digging around and found out that this bug is related to the getDataQueries function of the CloudWatch struct. In there it iterates over all filteredMetrics and takes the address of the metric from that list to store it in the dataQueries map. However since go seems to re-use the same object for every iteration of the loop the pointer that is taken always points to the exact same memory location. Due to this the Metric field will always contain the same pointer (and therefore value) after the for loop is done. The fix is easy and I will provide it as soon as I am done with this issue: the metric struct needs to be copied once to allocate new memory, after that the address can be taken.
This is one of the places where this address-taking is done (line 480):
|
dataQueries[*metric.Namespace] = append(dataQueries[*metric.Namespace], types.MetricDataQuery{ |
|
Id: aws.String("average_" + id), |
|
Label: aws.String(snakeCase(*metric.MetricName + "_average")), |
|
MetricStat: &types.MetricStat{ |
|
Metric: &metric, |
|
Period: aws.Int32(int32(time.Duration(c.Period).Seconds())), |
|
Stat: aws.String(StatisticAverage), |
|
}, |
|
}) |
The weird thing is that this would affect everyone using the plugin and having more then one dimension per metric. Why didn't this show up earlier?
Relevent telegraf.conf
System info
macOS 12.0.1; go version go1.17.2 darwin/amd64; telegraf 1.20.4 (and master)
Docker
No response
Steps to reproduce
telegraf --config telegraf.conf --input-filter cloudwatchExpected behavior
I can see the metric
BytesInFromDestinationfor each NAT GWActual behavior
The value of the
BytesInFromDestinationmetric is the same for all NAT GWAdditional info
I did some digging around and found out that this bug is related to the
getDataQueriesfunction of theCloudWatchstruct. In there it iterates over allfilteredMetricsand takes the address of the metric from that list to store it in thedataQueriesmap. However since go seems to re-use the same object for every iteration of the loop the pointer that is taken always points to the exact same memory location. Due to this theMetricfield will always contain the same pointer (and therefore value) after the for loop is done. The fix is easy and I will provide it as soon as I am done with this issue: the metric struct needs to be copied once to allocate new memory, after that the address can be taken.This is one of the places where this address-taking is done (line 480):
telegraf/plugins/inputs/cloudwatch/cloudwatch.go
Lines 476 to 484 in 34ad5aa
The weird thing is that this would affect everyone using the plugin and having more then one dimension per metric. Why didn't this show up earlier?