[Heartbeat] Log error on dupe monitor ID instead of strict req by andrewvc · Pull Request #29041 · elastic/beats

andrewvc · 2021-11-18T22:53:52Z

Currently Heartbeat will not start > 1 monitor with the same ID. This can create unintuitive scenarios in situations where a user
changes a monitor by adding a new entry via some reload mechanism (say k8s autodiscover) before the old one is deleted.

This PR changes the behavior to only log errors in this situation. Only one monitor will be active, but it will be the newest monitor.

There may be other fixes we should do in other areas, but this is probably a better failure method than what we currently have.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Currently Heartbeat will not start > 1 monitor with the same ID. This can create unintuitive scenarios in situations where a user changes a monitor by adding a new entry via some reload mechanism (say k8s autodiscover) before the old one is deleted. This PR changes the behavior to only log errors in this situation. Only one monitor will be active, but it will be the newest monitor.

elasticmachine · 2021-11-18T22:53:54Z

Pinging @elastic/uptime (Team:Uptime)

elasticmachine · 2021-11-18T22:59:24Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2021-11-20T01:56:31.605+0000
Duration: 70 min 45 sec
Commit: fad0683

Test stats 🧪

Test	Results
Failed	0
Passed	3589
Skipped	80
Total	3669

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

justinkambic

One thing I'm noticing in my smoke testing of this patch is that, while it does let me run a config with multiple monitors sharing an ID, it doesn't seem to respect the usual CTRL+C I'd
use to kill the process.

When attempting to run the same multi-shared-ID config on master I got the error message/no-op that is typical of HB up until now.

In the case of different monitors with the same ID it did seem to log them, per the included screenshot:

vigneshshanmugam

Code changes LGTM. Added small comments.

vigneshshanmugam · 2021-11-19T16:25:34Z

heartbeat/monitors/dedup.go

+
+	closed := um.stopUnsafe(m)
+	if closed {
+		logp.Warn("monitor ID %s is configured for multiple monitors! IDs should be unique values, last seen config will win", m.stdFields.ID)


should we log like stopping previous monitor with same monitor id?

I don't know if that adds anything, because it's hard to tell them apart. Should I change the log message maybe?

vigneshshanmugam · 2021-11-19T16:38:06Z

heartbeat/monitors/monitor_test.go

+	m2.stdFields.Name = "MON2!!!"
+	// This used to trigger an error, but shouldn't any longer, we just log
+	// the error, and ensure the last monitor wins
+	require.NoError(t, m2Err)


can we also check if previous monitor is stopped?

We do check for this at the end of the test by checking the number of stops invoked

jsoriano

Added some comments, mainly about avoiding the global objects if possible.

heartbeat/monitors/dedup.go

jsoriano · 2021-11-19T17:54:18Z

heartbeat/monitors/monitor.go

-func (e ErrDuplicateMonitorID) Error() string {
-	return fmt.Sprintf("monitor ID %s is configured for multiple monitors! IDs must be unique values.", e.ID)
-}
+var globalDedup = newDedup()


Could this be part of the RunnerFactory to avoid needing a global object?

Great idea, done!

jsoriano · 2021-11-19T17:55:13Z

heartbeat/monitors/monitor.go


+	// De-duplicate monitors with identical IDs
+	// last write wins
+	globalDedup.register(m)


newMonitor is used by CheckConfig. Checking configs shouldn't stop monitors existing monitors.

Fixed this, it is now the case that newMonitor is side effect free

andrewvc · 2021-11-20T01:22:25Z

@justinkambic there was a deadlock issue that I've now fixed, that was blocking clean shutdowns, ctrl-c should now work, that's important!

andrewvc · 2021-11-20T01:23:04Z

heartbeat/beater/heartbeat.go

 	var runners []cfgfile.Runner
 	for _, cfg := range bt.config.Monitors {
-		created, err := factory.Create(b.Publisher, cfg)
+		created, err := bt.dynamicFactory.Create(b.Publisher, cfg)


Cleans everything up so that we only use a single factory instance in heartbeat

andrewvc · 2021-11-20T01:53:01Z

heartbeat/monitors/monitor.go

 		internalsMtx:      sync.Mutex{},
 		config:            config,
 		stats:             pluginFactory.Stats,
+		state:             MON_INIT,


Prevents weird situations with dual stops, makes stop() idempotent.

andrewvc · 2021-11-20T02:01:08Z

OK, this should be ready for review again, it's largely rewritten, now using the factory for all state, and generally cleaning up how all the lifecycle stuff is handled. Thanks for the great input @jsoriano @justinkambic @vigneshshanmugam

Races and deadlocks should all be gone.

jsoriano

LGTM if it looks good to uptime 🙂

heartbeat/monitors/factory.go

justinkambic

LGTM!

justinkambic · 2021-11-22T19:33:26Z

heartbeat/monitors/monitor.go


-// Stop stops the Monitor's execution in its configured scheduler.
-// This is safe to call even if the Monitor was never started.
+// Stop stops the monitor without freeing it in global dedup


Suggested change

// Stop stops the monitor without freeing it in global dedup

// Stop the monitor without freeing it in global dedup

I know the function name is a proper noun here but it feels kinda redundant.

(cherry picked from commit dbca099)

… (#29083) (cherry picked from commit dbca099) Co-authored-by: Andrew Cholakian <andrew@andrewvc.com>

…ead of strict req (#29082) * Remove watch poll feature (#27166) * Remove watch poll feature * Changelog * Update YML * [Heartbeat] Log error on dupe monitor ID instead of strict req (#29041) Co-authored-by: Andrew Cholakian <andrew@andrewvc.com>

awahab07 · 2021-11-23T14:33:11Z

Post FF Testing

LGTM

Before the merge, heartbeat would error out adding a monitor with duplicate ID as an existing monitor.
After the merge and rebuild, only warning was logged and the new monitor with duplicate ID overruled the previous one.

Before:

After:

{"log.level":"warn","@timestamp":"2021-11-23T15:17:56.412+0100","log.logger":"monitor-factory","log.origin":{"file.name":"monitors/factory.go","file.line":125},"message":"monitor ID alibaba-monitor is configured for multiple monitors! IDs should be unique values, last seen config will win","service.name":"heartbeat","ecs.version":"1.6.0"}

Previous monitor list	List after duplicate monitor is configured

…ws-on-file-changes * upstream/master: override host on statsd metricset (elastic#29103) Skip config check in autodiscover for duplicated configurations (elastic#29048) Change "filebeat.config.modules.enabled" to "true" (elastic#28769) Remove deprecated spool queue from Beats (elastic#28869) Add `beat` field back to beat.stats (elastic#29094) Revert "Move labels and annotations under kubernetes.namespace. (elastic#27917)" (elastic#29069) heartbeat: remove w2008 in the CI (elastic#29093) Remove deprecated `--template` and `--index-policy` flags (elastic#28870) Fix parsing of apache trace log levels (elastic#28717) [Elastic-Agent] IUse itnernal port for local fleet server (elastic#28993) [Heartbeat] Log error on dupe monitor ID instead of strict req (elastic#29041) Enable pprof for elastic-agent and beats (elastic#28983)

andrewvc added bug Heartbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team v7.16.0 backport-v8.0.0 Automated backport with mergify backport-v7.16.0 Automated backport with mergify labels Nov 18, 2021

andrewvc requested review from jsoriano and justinkambic November 18, 2021 22:53

andrewvc self-assigned this Nov 18, 2021

andrewvc requested a review from a team as a code owner November 18, 2021 22:53

botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Nov 18, 2021

Tweaks |+ changelog

d9f64c6

andrewvc added 2 commits November 19, 2021 09:53

Fix threadsafety

ec7d4be

Stop not close

6d4a78d

justinkambic reviewed Nov 19, 2021

View reviewed changes

vigneshshanmugam approved these changes Nov 19, 2021

View reviewed changes

andrewvc added 3 commits November 19, 2021 11:10

Check IDs

f361348

Fix reentrant lock issue causing deadlock

8dd4878

Fix tests

6f2df91

jsoriano reviewed Nov 19, 2021

View reviewed changes

andrewvc added 4 commits November 19, 2021 15:24

Move uniqueness test to start

665cf1f

It all works

9685ba5

Cleanup factory

cd10b27

Cleanup logging

637d439

andrewvc commented Nov 20, 2021

View reviewed changes

Refactors

0f81f53

andrewvc commented Nov 20, 2021

View reviewed changes

andrewvc added 2 commits November 19, 2021 19:54

Cleanup

1074805

Cleanup

fad0683

jsoriano reviewed Nov 22, 2021

View reviewed changes

heartbeat/monitors/factory.go Show resolved Hide resolved

justinkambic approved these changes Nov 22, 2021

View reviewed changes

andrewvc merged commit dbca099 into elastic:master Nov 22, 2021

andrewvc deleted the warn-on-dup branch November 22, 2021 21:46

mergify bot mentioned this pull request Nov 22, 2021

[7.16](backport #29041) [Heartbeat] Log error on dupe monitor ID instead of strict req #29082

Merged

mergify bot pushed a commit that referenced this pull request Nov 22, 2021

[Heartbeat] Log error on dupe monitor ID instead of strict req (#29041)

c39855a

(cherry picked from commit dbca099)

mergify bot mentioned this pull request Nov 22, 2021

[8.0](backport #29041) [Heartbeat] Log error on dupe monitor ID instead of strict req #29083

Merged

andrewvc added a commit that referenced this pull request Nov 22, 2021

[Heartbeat] Log error on dupe monitor ID instead of strict req (#29041)

96cbf50

andrewvc added a commit that referenced this pull request Nov 23, 2021

[Heartbeat] Log error on dupe monitor ID instead of strict req (#29041)…

03b3796

… (#29083) (cherry picked from commit dbca099) Co-authored-by: Andrew Cholakian <andrew@andrewvc.com>

awahab07 assigned awahab07 and unassigned andrewvc Nov 23, 2021

awahab07 removed their assignment Nov 23, 2021

andrewvc mentioned this pull request Jan 11, 2022

[Heartbeat] Edited Synthetics Integration policies sometimes create duplicate monitor errors #28518

Closed

	// Stop stops the monitor without freeing it in global dedup
	// Stop the monitor without freeing it in global dedup

Conversation

andrewvc commented Nov 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

elasticmachine commented Nov 18, 2021

Uh oh!

elasticmachine commented Nov 18, 2021 • edited by jenkins-beats-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

Uh oh!

justinkambic left a comment

Choose a reason for hiding this comment

Uh oh!

vigneshshanmugam left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsoriano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewvc commented Nov 20, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewvc commented Nov 20, 2021

Uh oh!

jsoriano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

justinkambic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awahab07 commented Nov 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

andrewvc commented Nov 18, 2021 •

edited

Loading

elasticmachine commented Nov 18, 2021 •

edited by jenkins-beats-ci bot

Loading