-
Notifications
You must be signed in to change notification settings - Fork 5.8k
regression: Telegraf 1.19.3 fails to start with couchbase 6.5.1 #9764
Copy link
Copy link
Closed
Labels
Description
Relevant telegraf.conf:
# Global tags can be specified here in key="value" format.
[global_tags]
zone = "eu-central-1a"
id = "couchbase-staging-1"
environment = "staging"
couchbase = "true"
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 15000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = true
logfile = "/var/log//telegraf/telegraf.log"
quiet = false
hostname = "couchbase-staging-1"
omit_hostname = false
[[outputs.prometheus_client]]
listen = ":9009"
[[inputs.couchbase]]
servers = ["http://abc:xxx@127.0.0.1:8091/"]
System info:
Debian GNU/Linux 9.13 (stretch)
telegraf 1.19.3-1
Steps to reproduce:
- Deploy a couchbase 6.5.1 and create a monitoring user
- Deploy telegraf 1.19.3 with the above config
Expected behavior:
telegraf getting metrics from couchbase
Actual behavior:
telegraf enters a restart loop. Removing the couchbase config from telegraf, everything works.
No useful logs show up. Strace to the process make it not crash.
gdb show this:
Starting program: /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffd0d0b700 (LWP 19560)]
[New Thread 0x7fffd050a700 (LWP 19561)]
[New Thread 0x7fffcfd09700 (LWP 19562)]
[New Thread 0x7fffcf508700 (LWP 19563)]
[New Thread 0x7fffced07700 (LWP 19564)]
[New Thread 0x7fffce506700 (LWP 19565)]
[New Thread 0x7fffcdb25700 (LWP 19566)]
[New Thread 0x7fffccfff700 (LWP 19567)]
2021-09-15T19:28:18Z I! Starting Telegraf 1.19.3
panic: runtime error: index out of range [-1]
goroutine 25 [running]:
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).addBucketFieldChecked(0xc0001ea080, 0xc000ca1c50, 0x4f29c9e, 0x9, 0x7d676b8, 0x0, 0x0, 0xffffffffffffffff)
/go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:362 +0x116
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).gatherDetailedBucketStats(0xc0001ea080, 0xc0000dc681, 0x32, 0xc0001fa5d0, 0x15, 0xc000ca1c50, 0x16, 0x0)
/go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:285 +0x3cba
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).gatherServer(0xc0001ea080, 0x596ec78, 0xc000634460, 0xc0000dc681, 0x32, 0xc000cdbcf8, 0x589c501, 0xc000b98140)
/go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:111 +0xa18
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).Gather.func1(0xc000afe030, 0x596ec78, 0xc000634460, 0xc0001ea080, 0xc0000dc681, 0x32)
/go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:64 +0x91
created by github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).Gather
/go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:62 +0x10d
[Thread 0x7fffce506700 (LWP 19565) exited]
[Thread 0x7fffced07700 (LWP 19564) exited]
[Thread 0x7fffcf508700 (LWP 19563) exited]
[Thread 0x7fffcfd09700 (LWP 19562) exited]
[Thread 0x7fffd050a700 (LWP 19561) exited]
[Thread 0x7fffd0d0b700 (LWP 19560) exited]
[Thread 0x7ffff7fed700 (LWP 19557) exited]
[Thread 0x7fffcdb25700 (LWP 19566) exited]
[Inferior 1 (process 19557) exited with code 02]
(gdb)
Additional info:
Maybe related to similar #9416 and #9495 , as version 1.19.x had already several problems with couchbase
Reverting to telegraf 1.18.2, everything works
journalctl -u telegraf -f
set 15 19:13:10 couchbase-staging-1 telegraf[18332]: 2021-09-15T19:13:10Z I! Starting Telegraf 1.19.3
set 15 19:13:20 couchbase-staging-1 systemd[1]: telegraf.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
set 15 19:13:20 couchbase-staging-1 systemd[1]: telegraf.service: Unit entered failed state.
set 15 19:13:20 couchbase-staging-1 systemd[1]: telegraf.service: Failed with result 'exit-code'.
set 15 19:13:20 couchbase-staging-1 systemd[1]: telegraf.service: Service hold-off time over, scheduling restart.
set 15 19:13:20 couchbase-staging-1 systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
set 15 19:13:20 couchbase-staging-1 systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
set 15 19:13:20 couchbase-staging-1 telegraf[18344]: time="2021-09-15T19:13:20Z" level=error msg="failed to create cache directory. /etc/telegraf/.cache/snowflake, err: mkdir /etc/telegraf/.cache: permission denied. ignored\n" func="gosnowflake.(*defaultLogger).Errorf" file="log.go:120"
set 15 19:13:20 couchbase-staging-1 telegraf[18344]: time="2021-09-15T19:13:20Z" level=error msg="failed to open. Ignored. open /etc/telegraf/.cache/snowflake/ocsp_response_cache.json: no such file or directory\n" func="gosnowflake.(*defaultLogger).Errorf" file="log.go:120"
set 15 19:13:20 couchbase-staging-1 telegraf[18344]: 2021-09-15T19:13:20Z I! Starting Telegraf 1.19.3
set 15 19:13:30 couchbase-staging-1 systemd[1]: telegraf.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
set 15 19:13:30 couchbase-staging-1 systemd[1]: telegraf.service: Unit entered failed state.
set 15 19:13:30 couchbase-staging-1 systemd[1]: telegraf.service: Failed with result 'exit-code'.
set 15 19:13:30 couchbase-staging-1 systemd[1]: telegraf.service: Service hold-off time over, scheduling restart.
set 15 19:13:30 couchbase-staging-1 systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
set 15 19:13:30 couchbase-staging-1 systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
set 15 19:13:30 couchbase-staging-1 telegraf[18354]: time="2021-09-15T19:13:30Z" level=error msg="failed to create cache directory. /etc/telegraf/.cache/snowflake, err: mkdir /etc/telegraf/.cache: permission denied. ignored\n" func="gosnowflake.(*defaultLogger).Errorf" file="log.go:120"
set 15 19:13:30 couchbase-staging-1 telegraf[18354]: time="2021-09-15T19:13:30Z" level=error msg="failed to open. Ignored. open /etc/telegraf/.cache/snowflake/ocsp_response_cache.json: no such file or directory\n" func="gosnowflake.(*defaultLogger).Errorf" file="log.go:120"
set 15 19:13:30 couchbase-staging-1 telegraf[18354]: 2021-09-15T19:13:30Z I! Starting Telegraf 1.19.3
telegraf.log
2021-09-15T19:22:20Z D! [agent] Successfully connected to outputs.prometheus_client
2021-09-15T19:22:20Z D! [agent] Starting service inputs
2021-09-15T19:22:30Z I! Loaded inputs: couchbase
2021-09-15T19:22:30Z I! Loaded aggregators:
2021-09-15T19:22:30Z I! Loaded processors:
2021-09-15T19:22:30Z I! Loaded outputs: prometheus_client
2021-09-15T19:22:30Z I! Tags enabled: couchbase=true environment=staging host=couchbase-staging-1 id=couchbase-staging-1 zone=eu-central-1a
2021-09-15T19:22:30Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"couchbase-staging-1", Flush Interval:10s
2021-09-15T19:22:30Z D! [agent] Initializing plugins
2021-09-15T19:22:30Z D! [agent] Connecting outputs
2021-09-15T19:22:30Z D! [agent] Attempting connection to [outputs.prometheus_client]
2021-09-15T19:22:30Z I! [outputs.prometheus_client] Listening on http://[::]:9009/metrics
2021-09-15T19:22:30Z D! [agent] Successfully connected to outputs.prometheus_client
2021-09-15T19:22:30Z D! [agent] Starting service inputs
Reactions are currently unavailable