fix(discovery): make discovery manager notify consumers of dropped targets for still defined jobs#13147
Conversation
|
See the current code failing https://github.com/prometheus/prometheus/actions/runs/6876250042/job/18701456148?pr=13147#step:6:378 |
|
Thanks a lot for picking up this! |
|
I don't think this will solve all cases as the scrape manager e.g. only reloads every So I'll need to think about another solution (that doesn't require any acking from scrape manager to keep things simple) |
af94b7b to
76f1cd1
Compare
scrape/manager.go
Outdated
| } | ||
|
|
||
| func (m *Manager) reloader() { | ||
| // TODO: OK? |
There was a problem hiding this comment.
This may be useful to speed up tests, WDYT? (I don't see why we silently limit it currently)
There was a problem hiding this comment.
This is useful, we want to throttle that by 5 seconds
There was a problem hiding this comment.
I didn’t catch your comments earlier.
It seems that the default is already set to 5s
prometheus/cmd/prometheus/main.go
Lines 447 to 448 in 8eb9228
I’m not sure if there’s a specific reason for capping the value in this manner. If it could potentially cause issues, it might be best to explicitly mention this.
Users are generally allowed to customize other parameters, so it would make sense to allow them to adjust this one as well, provided they understand the implications.
There was a problem hiding this comment.
Is this conversation resolved? If so, please remove the TODO.
|
I'll need to deal with the races in tests. |
5fdb66a to
06ca3c3
Compare
06ca3c3 to
9e19ec5
Compare
Done, you can take a look now. Still getting some weird yaml loading issues on windows, will take a look. |
9e19ec5 to
ecd7826
Compare
ecd7826 to
4fc8514
Compare
|
Done. |
4fc8514 to
fd4ff8f
Compare
fd4ff8f to
93211d5
Compare
|
#13312 was fixed in main, now the CI is green again. |
93211d5 to
e88cca5
Compare
|
@roidelapluie are you able to finish the review on this? |
f7fd9a0 to
916a441
Compare
|
LGTM |
bboreham
left a comment
There was a problem hiding this comment.
PR looks fine to me; one question about the TODO.
scrape/manager.go
Outdated
| } | ||
|
|
||
| func (m *Manager) reloader() { | ||
| // TODO: OK? |
There was a problem hiding this comment.
Is this conversation resolved? If so, please remove the TODO.
|
cc @machine424 have you seen Bryan's comment? |
I haven't looked at this in a while and need a refresher. I'll revisit it this week. |
…rgets for still defined jobs scrape/manager_test.go: add a test to check that the manager gets notified for targets that got dropped by discovery to reproduce: prometheus#12858 (comment) Signed-off-by: machine424 <ayoubmrini424@gmail.com>
916a441 to
2a8d3c1
Compare
|
Ready now. |
|
Once this is merged, I'll revive #13622 |
See #12858 (comment) for details (will add that to the commit message)
I added a test that reproduces: #12858 (comment) and it's failing with the code in main: https://github.com/prometheus/prometheus/actions/runs/6876250042/job/18701456148?pr=13147#step:6:378
I pushed a draft fix (see comments in the code)
fixes #12858
>>>>>>> NOTE:
This fix concerns the SD manager behind the flag
new-service-discovery-managerwhich is the manager used by most third parties but not the one used by default by Prometheus itself, as of now, Prometheus is still using the legacymanagerprometheus/discovery/legacymanager/manager.go
Line 14 in d63f5b3