fix(inputs.mongodb): add an option to bypass connection errors on start by papey · Pull Request #11629 · influxdata/telegraf

papey · 2022-08-06T16:18:53Z

Required for all PRs

Updated associated README.md.
Wrote appropriate unit tests.
Pull request title or commits are in conventional commit format

resolves #10078

This PR replaces #10086.

Summary of changes :

Added an option ignore_unreachable_hosts to not return an error on init if a ping fails

Hipska

My comments about moving the connect to mongo from the Init to the Start method still counts.

papey · 2022-08-08T17:30:09Z

My comments about moving the connect to mongo from the Init to the Start method still counts.

Oh yes, sorry. Will fix.

Edit: fixed

papey · 2022-08-08T19:57:10Z

Here is a refactor, I did not find an example of the Start function, so not sure if correct. If not correct, can you provide some example ? Thanks a lot for the feedback.

Hipska · 2022-08-09T07:53:37Z

This is the interface: input.go

And these are some sample implementations:

telegraf/plugins/inputs/sql/sql.go

Lines 354 to 392 in b07e94b

    
           func (s *SQL) Start(_ telegraf.Accumulator) error { 
        
           	var err error 
        
           	// Connect to the database server 
        
           	s.Log.Debugf("Connecting to %q...", s.Dsn) 
        
           	s.db, err = dbsql.Open(s.driverName, s.Dsn) 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	// Set the connection limits 
        
           	// s.db.SetConnMaxIdleTime(time.Duration(s.MaxIdleTime)) // Requires go >= 1.15 
        
           	s.db.SetConnMaxLifetime(time.Duration(s.MaxLifetime)) 
        
           	s.db.SetMaxOpenConns(s.MaxOpenConnections) 
        
           	s.db.SetMaxIdleConns(s.MaxIdleConnections) 
        
           	// Test if the connection can be established 
        
           	s.Log.Debugf("Testing connectivity...") 
        
           	ctx, cancel := context.WithTimeout(context.Background(), time.Duration(s.Timeout)) 
        
           	err = s.db.PingContext(ctx) 
        
           	cancel() 
        
           	if err != nil { 
        
           		return fmt.Errorf("connecting to database failed: %v", err) 
        
           	} 
        
           	// Prepare the statements 
        
           	for i, q := range s.Queries { 
        
           		s.Log.Debugf("Preparing statement %q...", q.Query) 
        
           		ctx, cancel := context.WithTimeout(context.Background(), time.Duration(s.Timeout)) 
        
           		stmt, err := s.db.PrepareContext(ctx, q.Query) //nolint:sqlclosecheck // Closed in Stop() 
        
           		cancel() 
        
           		if err != nil { 
        
           			return fmt.Errorf("preparing query %q failed: %v", q.Query, err) 
        
           		} 
        
           		s.Queries[i].statement = stmt 
        
           	} 
        
           	return nil 
        
           }

telegraf/plugins/inputs/mqtt_consumer/mqtt_consumer.go

Lines 151 to 166 in b07e94b

    
           func (m *MQTTConsumer) Start(acc telegraf.Accumulator) error { 
        
           	m.state = Disconnected 
        
           	m.acc = acc.WithTracking(m.MaxUndeliveredMessages) 
        
           	m.sem = make(semaphore, m.MaxUndeliveredMessages) 
        
           	m.ctx, m.cancel = context.WithCancel(context.Background()) 
        
           	m.client = m.clientFactory(m.opts) 
        
           	// AddRoute sets up the function for handling messages.  These need to be 
        
           	// added in case we find a persistent session containing subscriptions so we 
        
           	// know where to dispatch persisted and new messages to.  In the alternate 
        
           	// case that we need to create the subscriptions these will be replaced. 
        
           	for _, topic := range m.Topics { 
        
           		m.client.AddRoute(topic, m.recvMessage) 
        
           	} 
        
           	m.state = Connecting 
        
           	return m.connect() 
        
           }

Hipska

Looks good overall, but shouldn't the ping then also be done at the start of the gathering for the server, it will reduce the amount of errors every gather interval?

plugins/inputs/mongodb/mongodb.go

plugins/inputs/mongodb/mongodb_server_test.go

plugins/inputs/mongodb/README.md

papey · 2022-08-11T04:24:53Z

Looks good overall, but shouldn't the ping then also be done at the start of the gathering for the server, it will reduce the amount of errors every gather interval?

Sounds good to me but how do we want that ?

In the current implementation on master if one of the Mongo servers becomes unreachable after init/start errors will be written in the logs.

IMO, it's ok to attach the behavior you describe to the new option, WDYT ?

Hipska · 2022-08-12T08:02:48Z

Yeah, so I was thinking, if this is enabled, then first ping the device and then only try to gather all the metrics if it does respond ok? Does that sounds reasonable?

papey · 2022-08-12T16:43:04Z

LGTM, i will implement it asap !

papey · 2022-08-23T17:47:48Z

There is just one behavior left we may want to discuss :

When Gather is called on an unreachable server it waits a very looong time before returning back an error. This is the current implementation and this PR do not change anything about it. We may want to change it in another PR. WDYT ?

Sorry for the the review requests mess, my internet was lagging and Github just goes crazy for no reason

Hipska

Do you mean this is the current default behaviour? I think that's good, they can change to retry when they don't want that. But when thinking about that, "retry" might give wrong impression. I would expect telegraf to retry in the same poll interval. Maybe the value for this option should be "skip"?

Also, I don't get the new logic in Gather when behaviour is "retry". So seems like it returns and skips collection for the server if there is no error? That sounds a bit strange.

plugins/inputs/mongodb/mongodb.go

reimda · 2022-08-24T19:21:52Z

When Gather is called on an unreachable server it waits a very looong time before returning back an error. This is the current implementation and this PR do not change anything about it. We may want to change it in another PR. WDYT ?

Sounds good to me to work on that in a future PR.

plugins/inputs/mongodb/README.md

papey · 2022-08-30T15:09:06Z

@Hipska I changed the wording from "retry" to "skip", I also find it more clear. Thanks.

papey · 2022-08-30T15:11:59Z

Do you mean this is the current default behaviour? I think that's good, they can change to retry when they don't want that. But when thinking about that, "retry" might give wrong impression. I would expect telegraf to retry in the same poll interval. Maybe the value for this option should be "skip"?

Also, I don't get the new logic in Gather when behaviour is "retry". So seems like it returns and skips collection for the server if there is no error? That sounds a bit strange.

Sorry I didn't get your point, if behavior is skip, we ping, if ping fails, we drop fetch from the current server :

			if m.DisconnectedServersBehavior == "skip" {
				if err := srv.ping(); err != nil {
					return
				}
			}

Hipska · 2022-08-31T08:05:03Z

Sorry I didn't get your point, if behavior is skip, we ping, if ping fails, we drop fetch from the current server :

Nevermind, I was indeed looking wrong. Maybe we could add a debug log message with the content of the error before we return?

telegraf-tiger · 2022-09-07T19:24:15Z

Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

🥳 This pull request decreases the Telegraf binary size by -1.04 % for linux amd64 (new size: 150.2 MB, nightly size 151.7 MB)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB	RPM	TAR GZ	ZIP
amd64.deb	aarch64.rpm	darwin_amd64.tar.gz	windows_amd64.zip
arm64.deb	armel.rpm	darwin_arm64.tar.gz	windows_i386.zip
armel.deb	armv6hl.rpm	freebsd_amd64.tar.gz
armhf.deb	i386.rpm	freebsd_armv7.tar.gz
i386.deb	ppc64le.rpm	freebsd_i386.tar.gz
mips.deb	riscv64.rpm	linux_amd64.tar.gz
mipsel.deb	s390x.rpm	linux_arm64.tar.gz
ppc64el.deb	x86_64.rpm	linux_armel.tar.gz
riscv64.deb		linux_armhf.tar.gz
s390x.deb		linux_i386.tar.gz
		linux_mips.tar.gz
		linux_mipsel.tar.gz
		linux_ppc64le.tar.gz
		linux_riscv64.tar.gz
		linux_s390x.tar.gz
		static_linux_amd64.tar.gz

The start method handler did not match the interface, nor was there a stop function. As a result, start was never called and the plugin was never setting up the servers to connect to and collect from correctly. This was introduced in influxdata#11629. fixes: influxdata#11830

…rt (influxdata#11629)

telegraf-tiger bot added the fix pr to fix corresponding bug label Aug 6, 2022

papey mentioned this pull request Aug 6, 2022

Fix(inputs/mongodb): better handling of conn errors #10086

Closed

3 tasks

Hipska requested review from Hipska and sspaink August 8, 2022 07:38

Hipska suggested changes Aug 8, 2022

View reviewed changes

sspaink changed the title ~~Fix(inputs/mongodb): add an option to by pass connexions error on init~~ fix(inputs.mongodb): add an option to by pass connexions error on init Aug 8, 2022

Hipska added area/mongodb plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Aug 9, 2022

Hipska suggested changes Aug 9, 2022

View reviewed changes

papey added 10 commits August 10, 2022 21:38

fix(inputs/mongodb): add an option to by pass connexions error on init

b623edb

Move connection logic in Start func

f54547f

Prefer %w when formatting errors

3d45ac5

Change log level from info to error

534a95f

Chore: replace init with start in docs

9981b91

Fix: add start to test

7d94083

Add test that assert for error when option is not set

b3830f8

Group together internal vars

664d451

Migrate tests to testutil.Container

f1f0206

Fix: ensure start on integration tests

35f16ca

papey added 5 commits August 14, 2022 01:42

Remove useless containers in tests

4998099

Add ping wrapper to mongo db server struct

e7b050c

Add ignore unreachable hosts to Gather() func

dc1ed4e

Chore: remove ugly line return

601f484

Fix: add missing err check

97e94d5

papey requested review from Hipska and reimda and removed request for Hipska and reimda August 23, 2022 16:59

Hipska reviewed Aug 24, 2022

View reviewed changes

plugins/inputs/mongodb/mongodb.go Outdated Show resolved Hide resolved

reimda approved these changes Aug 24, 2022

View reviewed changes

reimda reviewed Aug 24, 2022

View reviewed changes

plugins/inputs/mongodb/README.md Outdated Show resolved Hide resolved

papey added 2 commits August 30, 2022 16:55

Refacto: prefer single liner

35335eb

Replace retry with skip for clarity

29f46b1

Fix: docs

6c490d8

papey requested a review from Hipska August 30, 2022 15:13

Hipska changed the title ~~fix(inputs.mongodb): add an option to by pass connexions error on init~~ fix(inputs.mongodb): add an option to bypass connection errors on start Aug 31, 2022

Hipska approved these changes Aug 31, 2022

View reviewed changes

papey added 2 commits August 31, 2022 23:56

Add debug log message in case of ping error

22b0dd5

Chore: harmonize error logs

3f2bc9f

sspaink merged commit e46f90e into influxdata:master Sep 7, 2022

sjwang90 mentioned this pull request Sep 7, 2022

Telegraf 1.24 influxdata/docs-v2#4422

Closed

This was referenced Sep 20, 2022

fix(inputs.mongodb): actually start plugin correctly #11849

Merged

Telegraf MongoDB plugin doesn't work. #11830

Closed

dba-leshop pushed a commit to dba-leshop/telegraf that referenced this pull request Oct 30, 2022

fix(inputs.mongodb): add an option to bypass connection errors on sta…

2f61762

…rt (influxdata#11629)

Conversation

papey commented Aug 6, 2022

Required for all PRs

Uh oh!

Hipska left a comment

Choose a reason for hiding this comment

Uh oh!

papey commented Aug 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

papey commented Aug 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hipska commented Aug 9, 2022

Uh oh!

Hipska left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

papey commented Aug 11, 2022

Uh oh!

Hipska commented Aug 12, 2022

Uh oh!

papey commented Aug 12, 2022

Uh oh!

papey commented Aug 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hipska left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

reimda commented Aug 24, 2022

Uh oh!

Uh oh!

papey commented Aug 30, 2022

Uh oh!

papey commented Aug 30, 2022

Uh oh!

Hipska commented Aug 31, 2022

Uh oh!

telegraf-tiger bot commented Sep 7, 2022

Artifact URLs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

papey commented Aug 8, 2022 •

edited

Loading

papey commented Aug 8, 2022 •

edited

Loading

Hipska left a comment •

edited

Loading

papey commented Aug 23, 2022 •

edited

Loading