Problem
Instead of logging a config validation error (no paths setting) and exiting, we can observe invalid log messages like these:
{
"log.level": "info",
"@timestamp": "2025-03-07T18:27:48.880+0100",
"log.logger": "crawler",
"log.origin": {
"function": "github.com/elastic/beats/v7/filebeat/beater.(*crawler).Start",
"file.name": "beater/crawler.go",
"file.line": 71
},
"message": "Loading Inputs: 1",
"service.name": "filebeat",
"ecs.version": "1.6.0"
}
{
"log.level": "info",
"@timestamp": "2025-03-07T18:27:48.880+0100",
"log.logger": "crawler",
"log.origin": {
"function": "github.com/elastic/beats/v7/filebeat/beater.(*crawler).startInput",
"file.name": "beater/crawler.go",
"file.line": 117
},
"message": "starting input, keys present on the config: [filebeat.inputs.0.id filebeat.inputs.0.type]",
"service.name": "filebeat",
"ecs.version": "1.6.0"
}
{
"log.level": "info",
"@timestamp": "2025-03-07T18:27:48.880+0100",
"log.origin": {
"function": "github.com/elastic/beats/v7/filebeat/beater.(*crawler).Stop",
"file.name": "beater/crawler.go",
"file.line": 155
},
"message": "Stopping Crawler",
"service.name": "filebeat",
"ecs.version": "1.6.0"
}
{
"log.level": "info",
"@timestamp": "2025-03-07T18:27:48.880+0100",
"log.origin": {
"function": "github.com/elastic/beats/v7/filebeat/beater.(*crawler).Stop",
"file.name": "beater/crawler.go",
"file.line": 165
},
"message": "Stopping 0 inputs",
"service.name": "filebeat",
"ecs.version": "1.6.0"
}
{
"log.level": "info",
"@timestamp": "2025-03-07T18:27:48.880+0100",
"log.origin": {
"function": "github.com/elastic/beats/v7/filebeat/beater.(*crawler).Stop",
"file.name": "beater/crawler.go",
"file.line": 185
},
"message": "Crawler stopped",
"service.name": "filebeat",
"ecs.version": "1.6.0"
}
{
"log.level": "info",
"@timestamp": "2025-03-07T18:27:48.880+0100",
"log.logger": "registrar",
"log.origin": {
"function": "github.com/elastic/beats/v7/filebeat/registrar.(*Registrar).Stop",
"file.name": "registrar/registrar.go",
"file.line": 130
},
"message": "Stopping Registrar",
"service.name": "filebeat",
"ecs.version": "1.6.0"
}
{
"log.level": "debug",
"@timestamp": "2025-03-07T18:27:48.880+0100",
"log.logger": "registrar",
"log.origin": {
"function": "github.com/elastic/beats/v7/filebeat/registrar.(*Registrar).Run",
"file.name": "registrar/registrar.go",
"file.line": 138
},
"message": "Starting Registrar",
"service.name": "filebeat",
"ecs.version": "1.6.0"
}
{
"log.level": "info",
"@timestamp": "2025-03-07T18:27:48.880+0100",
"log.logger": "registrar",
"log.origin": {
"function": "github.com/elastic/beats/v7/filebeat/registrar.(*Registrar).Run",
"file.name": "registrar/registrar.go",
"file.line": 166
},
"message": "Ending Registrar",
"service.name": "filebeat",
"ecs.version": "1.6.0"
}
{
"log.level": "debug",
"@timestamp": "2025-03-07T18:27:48.880+0100",
"log.logger": "registrar",
"log.origin": {
"function": "github.com/elastic/beats/v7/filebeat/registrar.(*Registrar).Run",
"file.name": "registrar/registrar.go",
"file.line": 167
},
"message": "Stopping Registrar",
"service.name": "filebeat",
"ecs.version": "1.6.0"
}
{
"log.level": "info",
"@timestamp": "2025-03-07T18:27:48.880+0100",
"log.logger": "registrar",
"log.origin": {
"function": "github.com/elastic/beats/v7/filebeat/registrar.(*Registrar).Stop",
"file.name": "registrar/registrar.go",
"file.line": 135
},
"message": "Registrar stopped",
"service.name": "filebeat",
"ecs.version": "1.6.0"
}
Likely there is a race condition while shutting down Filbeat due to an error from input creation.
Steps to reproduce
Because it's a concurrency issue, it's not trivial to reproduce it.
I managed to reproduce this issue running the Filebeat binary with an invalid config in a loop using this script:
run.sh
#!/bin/bash
set -e
FILEBEAT_ROOT=/path/to/filebeat/folder
CONFIG_PATH=/path/to/config/file
ITERATIONS=$1
echo "Compiling..."
cd $FILEBEAT_ROOT
rm -f ./filebeat
go build
cd -
echo "Running $ITERATIONS times..."
for i in $(seq 1 $ITERATIONS);
do
echo "Iteration $i>>>"
$FILEBEAT_ROOT/filebeat run -e -c $CONFIG_PATH 2> output.json || echo "Failed as expected"
echo "Iteration $i<<<"
echo
done
echo "Test successfully finished"
with this configuration file (no paths in the input which makes it invalid on purpose):
filebeat.yml
filebeat.inputs:
- type: filestream
id: my-filestream-id
path.data: "/path/to/data/directory"
output.console:
enabled: true
I ran it for 10000 iterations using:
And sometimes I hit a Filebeat process that got stuck in what seems to be a deadlock state. It cannot be interrupted and only kill -9 can stop it.
In order to collect the logs of the stuck process you need to execute kill -9 of the running bash script first and then on the stuck Filebeat. Then the logs of the most recent Filebeat process can be found in output.json.
If the bug cannot be caught on 10000 iteration, increase it to a higher number.
After the issue is found and fixed, this PR with a skipped test that detected the issue needs to be reverted #42780
Problem
Instead of logging a config validation error (no
pathssetting) and exiting, we can observe invalid log messages like these:{ "log.level": "info", "@timestamp": "2025-03-07T18:27:48.880+0100", "log.logger": "crawler", "log.origin": { "function": "github.com/elastic/beats/v7/filebeat/beater.(*crawler).Start", "file.name": "beater/crawler.go", "file.line": 71 }, "message": "Loading Inputs: 1", "service.name": "filebeat", "ecs.version": "1.6.0" } { "log.level": "info", "@timestamp": "2025-03-07T18:27:48.880+0100", "log.logger": "crawler", "log.origin": { "function": "github.com/elastic/beats/v7/filebeat/beater.(*crawler).startInput", "file.name": "beater/crawler.go", "file.line": 117 }, "message": "starting input, keys present on the config: [filebeat.inputs.0.id filebeat.inputs.0.type]", "service.name": "filebeat", "ecs.version": "1.6.0" } { "log.level": "info", "@timestamp": "2025-03-07T18:27:48.880+0100", "log.origin": { "function": "github.com/elastic/beats/v7/filebeat/beater.(*crawler).Stop", "file.name": "beater/crawler.go", "file.line": 155 }, "message": "Stopping Crawler", "service.name": "filebeat", "ecs.version": "1.6.0" } { "log.level": "info", "@timestamp": "2025-03-07T18:27:48.880+0100", "log.origin": { "function": "github.com/elastic/beats/v7/filebeat/beater.(*crawler).Stop", "file.name": "beater/crawler.go", "file.line": 165 }, "message": "Stopping 0 inputs", "service.name": "filebeat", "ecs.version": "1.6.0" } { "log.level": "info", "@timestamp": "2025-03-07T18:27:48.880+0100", "log.origin": { "function": "github.com/elastic/beats/v7/filebeat/beater.(*crawler).Stop", "file.name": "beater/crawler.go", "file.line": 185 }, "message": "Crawler stopped", "service.name": "filebeat", "ecs.version": "1.6.0" } { "log.level": "info", "@timestamp": "2025-03-07T18:27:48.880+0100", "log.logger": "registrar", "log.origin": { "function": "github.com/elastic/beats/v7/filebeat/registrar.(*Registrar).Stop", "file.name": "registrar/registrar.go", "file.line": 130 }, "message": "Stopping Registrar", "service.name": "filebeat", "ecs.version": "1.6.0" } { "log.level": "debug", "@timestamp": "2025-03-07T18:27:48.880+0100", "log.logger": "registrar", "log.origin": { "function": "github.com/elastic/beats/v7/filebeat/registrar.(*Registrar).Run", "file.name": "registrar/registrar.go", "file.line": 138 }, "message": "Starting Registrar", "service.name": "filebeat", "ecs.version": "1.6.0" } { "log.level": "info", "@timestamp": "2025-03-07T18:27:48.880+0100", "log.logger": "registrar", "log.origin": { "function": "github.com/elastic/beats/v7/filebeat/registrar.(*Registrar).Run", "file.name": "registrar/registrar.go", "file.line": 166 }, "message": "Ending Registrar", "service.name": "filebeat", "ecs.version": "1.6.0" } { "log.level": "debug", "@timestamp": "2025-03-07T18:27:48.880+0100", "log.logger": "registrar", "log.origin": { "function": "github.com/elastic/beats/v7/filebeat/registrar.(*Registrar).Run", "file.name": "registrar/registrar.go", "file.line": 167 }, "message": "Stopping Registrar", "service.name": "filebeat", "ecs.version": "1.6.0" } { "log.level": "info", "@timestamp": "2025-03-07T18:27:48.880+0100", "log.logger": "registrar", "log.origin": { "function": "github.com/elastic/beats/v7/filebeat/registrar.(*Registrar).Stop", "file.name": "registrar/registrar.go", "file.line": 135 }, "message": "Registrar stopped", "service.name": "filebeat", "ecs.version": "1.6.0" }Likely there is a race condition while shutting down Filbeat due to an error from input creation.
Steps to reproduce
Because it's a concurrency issue, it's not trivial to reproduce it.
I managed to reproduce this issue running the Filebeat binary with an invalid config in a loop using this script:
run.sh
with this configuration file (no
pathsin the input which makes it invalid on purpose):filebeat.yml
I ran it for 10000 iterations using:
And sometimes I hit a Filebeat process that got stuck in what seems to be a deadlock state. It cannot be interrupted and only
kill -9can stop it.In order to collect the logs of the stuck process you need to execute
kill -9of the running bash script first and then on the stuck Filebeat. Then the logs of the most recent Filebeat process can be found inoutput.json.If the bug cannot be caught on 10000 iteration, increase it to a higher number.
After the issue is found and fixed, this PR with a skipped test that detected the issue needs to be reverted #42780