Skip to content

fix(discovery): make discovery manager notify consumers of dropped targets for still defined jobs#13147

Merged
machine424 merged 1 commit intoprometheus:mainfrom
machine424:drop-targets
Aug 28, 2024
Merged

fix(discovery): make discovery manager notify consumers of dropped targets for still defined jobs#13147
machine424 merged 1 commit intoprometheus:mainfrom
machine424:drop-targets

Conversation

@machine424
Copy link
Member

@machine424 machine424 commented Nov 15, 2023

See #12858 (comment) for details (will add that to the commit message)

I added a test that reproduces: #12858 (comment) and it's failing with the code in main: https://github.com/prometheus/prometheus/actions/runs/6876250042/job/18701456148?pr=13147#step:6:378

I pushed a draft fix (see comments in the code)

fixes #12858


>>>>>>> NOTE:

This fix concerns the SD manager behind the flag new-service-discovery-manager which is the manager used by most third parties but not the one used by default by Prometheus itself, as of now, Prometheus is still using the legacymanager

package legacymanager
by default. The plan is to switch to the new manager in 3.0 #13609


@machine424
Copy link
Member Author

@machine424 machine424 changed the title scrape/manager_test.go: add a test to check that the manager gets notified for targets that got dropped by discovery. make discovery manager notify consumers of dropped targets for a still existing job Nov 15, 2023
@roidelapluie
Copy link
Member

Thanks a lot for picking up this!

@machine424
Copy link
Member Author

I don't think this will solve all cases as the scrape manager e.g. only reloads every reloadIntervalDuration (which is customizable) and during that time the targetsgroup with the empty targets we send here in cleaner maybe superseded by another one sent by another stale provider through cleaner or another new provider through sender...

So I'll need to think about another solution (that doesn't require any acking from scrape manager to keep things simple)

}

func (m *Manager) reloader() {
// TODO: OK?
Copy link
Member Author

@machine424 machine424 Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be useful to speed up tests, WDYT? (I don't see why we silently limit it currently)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful, we want to throttle that by 5 seconds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t catch your comments earlier.
It seems that the default is already set to 5s

a.Flag("scrape.discovery-reload-interval", "Interval used by scrape manager to throttle target groups updates.").
Hidden().Default("5s").SetValue(&cfg.scrape.DiscoveryReloadInterval)
.
I’m not sure if there’s a specific reason for capping the value in this manner. If it could potentially cause issues, it might be best to explicitly mention this.
Users are generally allowed to customize other parameters, so it would make sense to allow them to adjust this one as well, provided they understand the implications.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this conversation resolved? If so, please remove the TODO.

@machine424
Copy link
Member Author

I'll need to deal with the races in tests.

@machine424 machine424 marked this pull request as draft November 24, 2023 16:31
@machine424 machine424 changed the title make discovery manager notify consumers of dropped targets for a still existing job make discovery manager notify consumers of dropped targets for still defined jobs Nov 25, 2023
@machine424 machine424 marked this pull request as ready for review November 25, 2023 15:17
@machine424
Copy link
Member Author

machine424 commented Nov 25, 2023

I'll need to deal with the races in tests.

Done, you can take a look now.

cc @roidelapluie

Still getting some weird yaml loading issues on windows, will take a look.
Tell me if you prefer putting the config files in testdata/, I'll also see if we can merge the tests.

@machine424 machine424 marked this pull request as draft January 4, 2024 19:42
@machine424
Copy link
Member Author

Done.
The remaining failing test TestOnlyProviderStaleTargetsAreDropped is due to #13312.

@machine424 machine424 marked this pull request as ready for review January 5, 2024 12:29
@machine424
Copy link
Member Author

#13312 was fixed in main, now the CI is green again.

@beorn7
Copy link
Member

beorn7 commented Feb 27, 2024

@roidelapluie are you able to finish the review on this?

@machine424 machine424 force-pushed the drop-targets branch 3 times, most recently from f7fd9a0 to 916a441 Compare July 10, 2024 16:00
roidelapluie
roidelapluie previously approved these changes Aug 9, 2024
@roidelapluie
Copy link
Member

LGTM

Copy link
Member

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks fine to me; one question about the TODO.

}

func (m *Manager) reloader() {
// TODO: OK?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this conversation resolved? If so, please remove the TODO.

@roidelapluie
Copy link
Member

cc @machine424 have you seen Bryan's comment?

@machine424 machine424 changed the title make discovery manager notify consumers of dropped targets for still defined jobs fix(discovery): make discovery manager notify consumers of dropped targets for still defined jobs Aug 20, 2024
@machine424
Copy link
Member Author

cc @machine424 have you seen Bryan's comment?

I haven't looked at this in a while and need a refresher. I'll revisit it this week.

…rgets for still defined jobs

scrape/manager_test.go: add a test to check that the manager gets notified
for targets that got dropped by discovery to reproduce: prometheus#12858 (comment)

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
@machine424
Copy link
Member Author

Ready now.

@machine424
Copy link
Member Author

Once this is merged, I'll revive #13622

Copy link
Member

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kubernetes_sd does not remove existing targets when an existing config is modified to produce an empty result

4 participants