Starting from 8.6 the default global processors for Beats that are run by agent are configured in code instead of being read from the default Beat configuration file. Beats managed by agent no longer read a configuration file at startup and instead wait for their initial configuration to be sent by agent. This change was done in #34149.
This is in contrast to the default configuration file which defines a single instance of each processor for the process by defining them at the top level of the configuration (see Where are processors valid for details):
This was not a problem until 8.7, when it was discovered that each instance of a beat input pipeline was referencing an accidentally global instance of the per input processors. This was fixed in #34761. The change in #34761 now results in each input pipeline constructing a new instance of the global processors.
Each of these global processors is expensive to create and includes code to try to perform expensive work only at initialization time. The problem is this only works if there is a single instance of the processor, otherwise each unique instance of the processor attempts to reinitialize itself often performing the exact same request multiple times. For example:
As of 8.7 we observing extremely high CPU usage for Beats run under agent and the agent itself in situations where inputs are frequently created. For example in the case of the add_cloud_metadata processor we are observing the agent itself being spammed by repeated log messages from the add_cloud_metadata initialization sequence:
{"log.level":"info","@timestamp":"2023-04-01T00:14:59.605Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-01T00:14:59.715Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-01T00:14:59.716Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-01T00:14:59.824Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-01T00:15:00.062Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-01T00:15:00.194Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-01T00:15:00.282Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-01T00:15:00.394Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-01T00:15:00.477Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-01T00:15:00.542Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-01T00:15:00.629Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"ecs.version":"1.6.0"}
When Beats run under agent we need to create the default global processors at the Beat process level, instead of the input level to match what is done in the global configuration file.
Background
Starting from 8.6 the default global processors for Beats that are run by agent are configured in code instead of being read from the default Beat configuration file. Beats managed by agent no longer read a configuration file at startup and instead wait for their initial configuration to be sent by agent. This change was done in #34149.
The implementation from #34149 makes the processors global by configuring them for each input run by Elastic Agent. Looking at the
beat-rendered-config.ymlfile available in the agent diagnostics for example shows the processors at the input level:This is in contrast to the default configuration file which defines a single instance of each processor for the process by defining them at the top level of the configuration (see Where are processors valid for details):
beats/x-pack/filebeat/filebeat.yml
Lines 167 to 172 in 91906c9
Problem
Similarly in 8.6 there was a change to the aws-s3 input to create a new
beat.Clientfor each new SQS worker in #33658 to improve performance. This results in a new input pipeline being constructed for each SQS worker, each of which gets its own instance of the per input processors as of 8.6.This was not a problem until 8.7, when it was discovered that each instance of a beat input pipeline was referencing an accidentally global instance of the per input processors. This was fixed in #34761. The change in #34761 now results in each input pipeline constructing a new instance of the global processors.
Each of these global processors is expensive to create and includes code to try to perform expensive work only at initialization time. The problem is this only works if there is a single instance of the processor, otherwise each unique instance of the processor attempts to reinitialize itself often performing the exact same request multiple times. For example:
add_cloud_metadatathere was in inadvertent change that introduces a 3s worst-case construction cost ([libbeat] add_cloud_metadata - startup blocked by AWS IMSDv2 token fetch #33058).Impact
As of 8.7 we observing extremely high CPU usage for Beats run under agent and the agent itself in situations where inputs are frequently created. For example in the case of the
add_cloud_metadataprocessor we are observing the agent itself being spammed by repeated log messages from theadd_cloud_metadatainitialization sequence:{"log.level":"info","@timestamp":"2023-04-01T00:14:59.605Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"ecs.version":"1.6.0"} {"log.level":"info","@timestamp":"2023-04-01T00:14:59.715Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"ecs.version":"1.6.0"} {"log.level":"info","@timestamp":"2023-04-01T00:14:59.716Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"} {"log.level":"info","@timestamp":"2023-04-01T00:14:59.824Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","ecs.version":"1.6.0"} {"log.level":"info","@timestamp":"2023-04-01T00:15:00.062Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"ecs.version":"1.6.0"} {"log.level":"info","@timestamp":"2023-04-01T00:15:00.194Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","ecs.version":"1.6.0"} {"log.level":"info","@timestamp":"2023-04-01T00:15:00.282Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"} {"log.level":"info","@timestamp":"2023-04-01T00:15:00.394Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"} {"log.level":"info","@timestamp":"2023-04-01T00:15:00.477Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"} {"log.level":"info","@timestamp":"2023-04-01T00:15:00.542Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"ecs.version":"1.6.0"} {"log.level":"info","@timestamp":"2023-04-01T00:15:00.629Z","message":"add_cloud_metadata: hosting provider type detected as aws, metadata={\"cloud\":{\"account\":{\"id\":\"REDACTED\"},\"availability_zone\":\"us-west-2d\",\"image\":{\"id\":\"ami-REDACTED\"},\"instance\":{\"id\":\"i-REDACTED\"},\"machine\":{\"type\":\"r6g.xlarge\"},\"provider\":\"aws\",\"region\":\"us-west-2\",\"service\":{\"name\":\"EC2\"}}}","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"aws-s3-default","type":"aws-s3"},"log":{"source":"aws-s3-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"add_cloud_metadata","log.origin":{"file.line":106,"file.name":"add_cloud_metadata/add_cloud_metadata.go"},"ecs.version":"1.6.0"}This comes from the code below which includes a
sync.Onceblock that is being defeated by a new instance of the processor being created for each individual input:beats/libbeat/processors/add_cloud_metadata/add_cloud_metadata.go
Lines 98 to 112 in 91906c9
Solution
When Beats run under agent we need to create the default global processors at the Beat process level, instead of the input level to match what is done in the global configuration file.