Fix(inputs/mongodb): better handling of conn errors by papey · Pull Request #10086 · influxdata/telegraf

papey · 2021-11-10T08:34:39Z

Required for all PRs:

Updated associated README.md.
Wrote appropriate unit tests.
Pull request title or commits are in conventional commit format

resolves #10078

Here is a draft for improvements in the MongoDB input plugin.
The current issue is described and tracked in #10078.
For now, Telegraf exits if a conn to a configured MongoDB server failed on startup.
This PR tries to add a more reliable "retry after" mechanism.

sspaink

Thank you for taking the time to write this pull request, I've added some comments for you to review. I think adding a retry mechanism to this plugin would require some sort of configuration setting for users to opt into it, otherwise this new behavior could break existing setups where people do expect it to fail early. Also including a test for the retry logic would be great.

sspaink · 2022-04-27T15:09:25Z

plugins/inputs/mongodb/mongodb.go

 		client, err := mongo.Connect(ctx, opts)
 		if err != nil {
-			return fmt.Errorf("unable to connect to MongoDB: %q", err)
+			m.Log.Errorf("unable to create MongoDB client: %q", err)


I think we still want to return here in case of an error, otherwise client.Ping will get called and this could lead to a panic because client could be nil if there was an error.

I think this is what causes the behavior I wanted to avoid with this PR. If you return an error in an init, Telegraf exits.

sspaink · 2022-04-27T15:11:51Z

plugins/inputs/mongodb/mongodb.go

+				// is not reachable try a reconnect
+				disconnectCtx, disconnectCancel := context.WithTimeout(context.Background(), 1*time.Second)
+				defer disconnectCancel()
+				err := srv.client.Disconnect(disconnectCtx)


client needs to be checked if it isn't nil before calling Disconnect, because src.reachable could be set to false when mongo.Connect fails resulting in a nil client.

sspaink · 2022-04-27T15:17:11Z

plugins/inputs/mongodb/mongodb.go

+				err := srv.client.Disconnect(disconnectCtx)
+				if err != nil {
+					m.Log.Errorf("unable to reconnect to MongoDB: %q", err)
+					return


Because we aren't returning an error, this would cause a retry to happen everytime Gather is called right? This could cause a lot of "unable to reconnect to mongoDB" log messages to show up. Maybe adding a configuration option for the user to set how often to retry would help? Or how many times to retry before either exiting or only stop retrying for this instance?

Yes totally, this the first behavior we wanted on our side (as a draft I did not invest to much time on it as I wanted to be shure it will have an interest for Telegraf upstream). Can you share an example of this kind of configuration. I think we want it plugin scoped and not instance scoped ?

sspaink · 2022-04-27T15:23:22Z

plugins/inputs/mongodb/mongodb.go

+				connectCtx, connectCancel := context.WithTimeout(context.Background(), 1*time.Second)
+				defer connectCancel()


Reusing the connect context from init might be a good idea, is there a reason you picked 1*time.Second for the timeout?

No reason for 1 second, it was just looking like a simple default.

papey · 2022-05-06T07:34:28Z

Hi @sspaink, thanks for taking the time to review. Here are quick feedback of what I remember from the initial idea behind this PR. I will take another look at the code ASAP to have a fresh look.

Hipska

Actually, the trying to connect to MongoDB should be in the Start() method and not in the Init() method. Would you be able to refactor this?

papey · 2022-06-18T14:58:16Z

Thanks for the feedback @Hipska, I look at it ASAP !

Just one question/detail. I am no longer a member of @bearstech org (where the fork is), I think we shoud close this PR and open a new one with the changes you requested. What do you think ?

Hipska · 2022-06-18T21:04:45Z

Yeah sure go ahead. Leave a note to reference the new PR

papey · 2022-08-06T16:20:12Z

Replaced by #11629, thx for the feedback.

Wilfried OLLIVIER added 2 commits November 10, 2021 09:24

Add conn retry mechanism in Gather func

93160fe

Fix: set reachable state in Init func

4ebf02a

telegraf-tiger bot added the fix pr to fix corresponding bug label Nov 10, 2021

papey mentioned this pull request Nov 10, 2021

[inputs.mongodb] Telegraf crash on init if conn to MongoDB fails #10078

Closed

papey marked this pull request as ready for review November 17, 2021 14:39

sspaink suggested changes Apr 27, 2022

View reviewed changes

sspaink added the waiting for response waiting for response from contributor label Apr 27, 2022

telegraf-tiger bot removed the waiting for response waiting for response from contributor label May 6, 2022

Hipska mentioned this pull request Jun 15, 2022

feat(agent): add ignore_error_inputs option for inputs #11304

Closed

3 tasks

Hipska added area/mongodb plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Jun 15, 2022

Hipska reviewed Jun 15, 2022

View reviewed changes

papey mentioned this pull request Aug 6, 2022

fix(inputs.mongodb): add an option to bypass connection errors on start #11629

Merged

3 tasks

papey closed this Aug 6, 2022

		connectCtx, connectCancel := context.WithTimeout(context.Background(), 1*time.Second)
		defer connectCancel()

Conversation

papey commented Nov 10, 2021

Required for all PRs:

Uh oh!

sspaink left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sspaink Apr 27, 2022

Choose a reason for hiding this comment

Uh oh!

papey May 6, 2022

Choose a reason for hiding this comment

Uh oh!

sspaink Apr 27, 2022

Choose a reason for hiding this comment

Uh oh!

sspaink Apr 27, 2022

Choose a reason for hiding this comment

Uh oh!

papey May 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sspaink Apr 27, 2022

Choose a reason for hiding this comment

Uh oh!

papey May 6, 2022

Choose a reason for hiding this comment

Uh oh!

papey commented May 6, 2022

Uh oh!

Hipska left a comment

Choose a reason for hiding this comment

Uh oh!

papey commented Jun 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hipska commented Jun 18, 2022

Uh oh!

papey commented Aug 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sspaink left a comment •

edited

Loading

papey May 6, 2022 •

edited

Loading

papey commented Jun 18, 2022 •

edited

Loading