Skip to content

regression: Telegraf 1.19.3 fails to start with couchbase 6.5.1 #9764

@danielmotaleite

Description

@danielmotaleite

Relevant telegraf.conf:

# Global tags can be specified here in key="value" format.
[global_tags]
  zone = "eu-central-1a"
  id = "couchbase-staging-1"
  environment = "staging"
  couchbase = "true"

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 15000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  logfile = "/var/log//telegraf/telegraf.log"
  quiet = false
  hostname = "couchbase-staging-1"
  omit_hostname = false

[[outputs.prometheus_client]]
  listen = ":9009"

[[inputs.couchbase]]
  servers = ["http://abc:xxx@127.0.0.1:8091/"]

System info:

Debian GNU/Linux 9.13 (stretch)
telegraf 1.19.3-1

Steps to reproduce:

  • Deploy a couchbase 6.5.1 and create a monitoring user
  • Deploy telegraf 1.19.3 with the above config

Expected behavior:

telegraf getting metrics from couchbase

Actual behavior:

telegraf enters a restart loop. Removing the couchbase config from telegraf, everything works.
No useful logs show up. Strace to the process make it not crash.

gdb show this:

Starting program: /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffd0d0b700 (LWP 19560)]
[New Thread 0x7fffd050a700 (LWP 19561)]
[New Thread 0x7fffcfd09700 (LWP 19562)]
[New Thread 0x7fffcf508700 (LWP 19563)]
[New Thread 0x7fffced07700 (LWP 19564)]
[New Thread 0x7fffce506700 (LWP 19565)]
[New Thread 0x7fffcdb25700 (LWP 19566)]
[New Thread 0x7fffccfff700 (LWP 19567)]
2021-09-15T19:28:18Z I! Starting Telegraf 1.19.3
panic: runtime error: index out of range [-1]

goroutine 25 [running]:
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).addBucketFieldChecked(0xc0001ea080, 0xc000ca1c50, 0x4f29c9e, 0x9, 0x7d676b8, 0x0, 0x0, 0xffffffffffffffff)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:362 +0x116
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).gatherDetailedBucketStats(0xc0001ea080, 0xc0000dc681, 0x32, 0xc0001fa5d0, 0x15, 0xc000ca1c50, 0x16, 0x0)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:285 +0x3cba
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).gatherServer(0xc0001ea080, 0x596ec78, 0xc000634460, 0xc0000dc681, 0x32, 0xc000cdbcf8, 0x589c501, 0xc000b98140)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:111 +0xa18
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).Gather.func1(0xc000afe030, 0x596ec78, 0xc000634460, 0xc0001ea080, 0xc0000dc681, 0x32)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:64 +0x91
created by github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).Gather
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:62 +0x10d
[Thread 0x7fffce506700 (LWP 19565) exited]
[Thread 0x7fffced07700 (LWP 19564) exited]
[Thread 0x7fffcf508700 (LWP 19563) exited]
[Thread 0x7fffcfd09700 (LWP 19562) exited]
[Thread 0x7fffd050a700 (LWP 19561) exited]
[Thread 0x7fffd0d0b700 (LWP 19560) exited]
[Thread 0x7ffff7fed700 (LWP 19557) exited]
[Thread 0x7fffcdb25700 (LWP 19566) exited]
[Inferior 1 (process 19557) exited with code 02]
(gdb) 

Additional info:

Maybe related to similar #9416 and #9495 , as version 1.19.x had already several problems with couchbase
Reverting to telegraf 1.18.2, everything works

journalctl -u telegraf -f

set 15 19:13:10 couchbase-staging-1 telegraf[18332]: 2021-09-15T19:13:10Z I! Starting Telegraf 1.19.3



set 15 19:13:20 couchbase-staging-1 systemd[1]: telegraf.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
set 15 19:13:20 couchbase-staging-1 systemd[1]: telegraf.service: Unit entered failed state.
set 15 19:13:20 couchbase-staging-1 systemd[1]: telegraf.service: Failed with result 'exit-code'.
set 15 19:13:20 couchbase-staging-1 systemd[1]: telegraf.service: Service hold-off time over, scheduling restart.
set 15 19:13:20 couchbase-staging-1 systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
set 15 19:13:20 couchbase-staging-1 systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
set 15 19:13:20 couchbase-staging-1 telegraf[18344]: time="2021-09-15T19:13:20Z" level=error msg="failed to create cache directory. /etc/telegraf/.cache/snowflake, err: mkdir /etc/telegraf/.cache: permission denied. ignored\n" func="gosnowflake.(*defaultLogger).Errorf" file="log.go:120"
set 15 19:13:20 couchbase-staging-1 telegraf[18344]: time="2021-09-15T19:13:20Z" level=error msg="failed to open. Ignored. open /etc/telegraf/.cache/snowflake/ocsp_response_cache.json: no such file or directory\n" func="gosnowflake.(*defaultLogger).Errorf" file="log.go:120"
set 15 19:13:20 couchbase-staging-1 telegraf[18344]: 2021-09-15T19:13:20Z I! Starting Telegraf 1.19.3




set 15 19:13:30 couchbase-staging-1 systemd[1]: telegraf.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
set 15 19:13:30 couchbase-staging-1 systemd[1]: telegraf.service: Unit entered failed state.
set 15 19:13:30 couchbase-staging-1 systemd[1]: telegraf.service: Failed with result 'exit-code'.
set 15 19:13:30 couchbase-staging-1 systemd[1]: telegraf.service: Service hold-off time over, scheduling restart.
set 15 19:13:30 couchbase-staging-1 systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
set 15 19:13:30 couchbase-staging-1 systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
set 15 19:13:30 couchbase-staging-1 telegraf[18354]: time="2021-09-15T19:13:30Z" level=error msg="failed to create cache directory. /etc/telegraf/.cache/snowflake, err: mkdir /etc/telegraf/.cache: permission denied. ignored\n" func="gosnowflake.(*defaultLogger).Errorf" file="log.go:120"
set 15 19:13:30 couchbase-staging-1 telegraf[18354]: time="2021-09-15T19:13:30Z" level=error msg="failed to open. Ignored. open /etc/telegraf/.cache/snowflake/ocsp_response_cache.json: no such file or directory\n" func="gosnowflake.(*defaultLogger).Errorf" file="log.go:120"
set 15 19:13:30 couchbase-staging-1 telegraf[18354]: 2021-09-15T19:13:30Z I! Starting Telegraf 1.19.3

telegraf.log

2021-09-15T19:22:20Z D! [agent] Successfully connected to outputs.prometheus_client
2021-09-15T19:22:20Z D! [agent] Starting service inputs




2021-09-15T19:22:30Z I! Loaded inputs: couchbase
2021-09-15T19:22:30Z I! Loaded aggregators: 
2021-09-15T19:22:30Z I! Loaded processors: 
2021-09-15T19:22:30Z I! Loaded outputs: prometheus_client
2021-09-15T19:22:30Z I! Tags enabled: couchbase=true environment=staging host=couchbase-staging-1 id=couchbase-staging-1 zone=eu-central-1a
2021-09-15T19:22:30Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"couchbase-staging-1", Flush Interval:10s
2021-09-15T19:22:30Z D! [agent] Initializing plugins
2021-09-15T19:22:30Z D! [agent] Connecting outputs
2021-09-15T19:22:30Z D! [agent] Attempting connection to [outputs.prometheus_client]
2021-09-15T19:22:30Z I! [outputs.prometheus_client] Listening on http://[::]:9009/metrics
2021-09-15T19:22:30Z D! [agent] Successfully connected to outputs.prometheus_client
2021-09-15T19:22:30Z D! [agent] Starting service inputs


Metadata

Metadata

Assignees

No one assigned

    Labels

    area/couchbasebugunexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions