Relevant telegraf.conf
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
logfile = "/var/log/telegraf/telegraf.log"
hostname = "SECRET"
omit_hostname = false
[[outputs.amqp]]
brokers = ["amqp://SECRET:5672/"]
exchange = "SECRET"
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.net]]
[[inputs.suricata]]
source = "/tmp/stats.sock"
delimiter = ".
Logs from Telegraf
Feb 08 07:57:49 SECRET systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
Feb 08 07:57:49 SECRET telegraf[2600218]: 2022-02-08T07:57:49Z I! Starting Telegraf 1.21.3
Feb 10 18:17:45 SECRET telegraf[2600218]: panic: runtime error: invalid memory address or nil pointer dereference
Feb 10 18:17:45 SECRET telegraf[2600218]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x30d880f]
Feb 10 18:17:45 SECRET telegraf[2600218]: goroutine 1008013 [running]:
Feb 10 18:17:45 SECRET telegraf[2600218]: github.com/influxdata/telegraf/plugins/outputs/amqp.(*AMQP).Write(0xc000889200, {0xc00541a000, 0x3e8, 0x3e8})
Feb 10 18:17:45 SECRET telegraf[2600218]: /go/src/github.com/influxdata/telegraf/plugins/outputs/amqp/amqp.go:263 +0x34f
Feb 10 18:17:45 SECRET telegraf[2600218]: github.com/influxdata/telegraf/models.(*RunningOutput).write(0xc0003d4c80, {0xc00541a000, 0x3e8, 0x3e8})
Feb 10 18:17:45 SECRET telegraf[2600218]: /go/src/github.com/influxdata/telegraf/models/running_output.go:244 +0x118
Feb 10 18:17:45 SECRET telegraf[2600218]: github.com/influxdata/telegraf/models.(*RunningOutput).WriteBatch(0xc0003d4c80)
Feb 10 18:17:45 SECRET telegraf[2600218]: /go/src/github.com/influxdata/telegraf/models/running_output.go:218 +0x58
Feb 10 18:17:45 SECRET telegraf[2600218]: github.com/influxdata/telegraf/agent.(*Agent).flushOnce.func1()
Feb 10 18:17:45 SECRET telegraf[2600218]: /go/src/github.com/influxdata/telegraf/agent/agent.go:829 +0x29
Feb 10 18:17:45 SECRET telegraf[2600218]: created by github.com/influxdata/telegraf/agent.(*Agent).flushOnce
Feb 10 18:17:45 SECRET telegraf[2600218]: /go/src/github.com/influxdata/telegraf/agent/agent.go:828 +0xb8
Feb 10 18:17:45 SECRET systemd[1]: telegraf.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Feb 10 18:17:45 SECRET systemd[1]: telegraf.service: Failed with result 'exit-code'.
Feb 10 18:17:45 SECRET systemd[1]: telegraf.service: Consumed 1h 42min 35.123s CPU time.
Feb 10 18:17:45 SECRET systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 1.
Feb 10 18:17:45 SECRET systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
Feb 10 18:17:45 SECRET systemd[1]: telegraf.service: Consumed 1h 42min 35.123s CPU time.
Feb 10 18:17:45 SECRET systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
Feb 10 18:17:45 SECRET telegraf[4149180]: 2022-02-10T18:17:45Z I! Starting Telegraf 1.21.3
Feb 10 18:18:00 SECRET systemd[1]: telegraf.service: Main process exited, code=exited, status=1/FAILURE
Feb 10 18:18:00 SECRET systemd[1]: telegraf.service: Failed with result 'exit-code'.
System info
Telegraf 1.21.3, Debian 11 Bullseye
Docker
No response
Steps to reproduce
- Update from telegraf 1.21.2 to 1.21.3 via offical influx repo
- Wait some time, it happens randomly
- Wonder why telegraf can't restart, see that it says the /tmp/stats.sock is in use
...
Expected behavior
I would expect step 2 and 3 not to happen
Actual behavior
Telegraf crashes due to the segfault and aftereffect the /tmp/stats.sock also seems to be broken.
If I remove the socket and restart telegraf again it works for some time, even days.
Additional info
Based on the crash output I think this PR has to do with it, since we haven't seen this issue with 1.21.2 at all and it started to come up with 1.21.3. The line 263 is mentioned in the goroutine trace and this matches #10360
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.21.3, Debian 11 Bullseye
Docker
No response
Steps to reproduce
...
Expected behavior
I would expect step 2 and 3 not to happen
Actual behavior
Telegraf crashes due to the segfault and aftereffect the /tmp/stats.sock also seems to be broken.
If I remove the socket and restart telegraf again it works for some time, even days.
Additional info
Based on the crash output I think this PR has to do with it, since we haven't seen this issue with 1.21.2 at all and it started to come up with 1.21.3. The line 263 is mentioned in the goroutine trace and this matches #10360