monitoring: 'flatten' observables (reimplement #11985)#17600
Conversation
|
Notifying subscribers in CODENOTIFY files for diff 1246425...8aa90f3.
|
Codecov Report
@@ Coverage Diff @@
## main #17600 +/- ##
==========================================
- Coverage 51.58% 51.58% -0.01%
==========================================
Files 1717 1717
Lines 85285 85285
Branches 7822 7761 -61
==========================================
- Hits 43992 43991 -1
- Misses 37419 37420 +1
Partials 3874 3874
|
|
I would honestly prefer to remove the In any case, I would merge to fix the issues until we figure that out. (not approving as I believe there is a typo which I mentioned) |
Co-authored-by: Gonzalo Peci <pecigonzalo@users.noreply.github.com>
I think they're fine to add, not not fantastic, thing we are in the end achieving the same thing: alerting on the first occurrence of a value exceeding a threshold
This describes it well I think: https://github.com/sourcegraph/sourcegraph/issues/11571#issuecomment-654571953 |
…ring/flatten-observables
|
Thanks for that link, yeah, but we should not have those collisions, or it is rather rare, so I wonder if we have the wrong query or the wrong collection rule or something else altogether. Or maybe this is just more normal than I expect it to be :D |
|
Our alert delivery, formatting, and entire model does not support per-series anyway, and there's a whole monitoring pillars section about why high-cardinality graphs are not ideal If we want individual series alerting, I think they should just be separate panels, and instead of surfacing information through the alert itself IMO we should just push to have useful information about the panel be surfaced instead (e.g. a link to it, where more context can be derived), so I think this change is inline with that |
Re-introduce "flattening" of observables by wrapping queries in a
minormaxwhere appropriate (first added in https://github.com/sourcegraph/sourcegraph/pull/11985) to fix a regression introduced in https://github.com/sourcegraph/sourcegraph/issues/17599. Also add monitoring to ensure we have some visibility into rule failures, and reorganize Prometheus panels for uniformity.See docstring and linked issues/PRs in https://github.com/sourcegraph/sourcegraph/issues/17599 for more context.
Main diffs to look at:
Closes https://github.com/sourcegraph/sourcegraph/issues/17599