As a simpler version of the meta alerts issue. We need to have a pragmatic solution in place to indicate administrators when the alerting framework is failing.
Some scenarios:
- The rate of failures increased on a connector over the past x time (or consistent?)
- The actions saved object can't be decrypted
Some of the ideas bounced around in a team discussion:
- Create an always firing alert. Users can use this to send emails (ex: daily) and they'll know something is wrong when the email isn't sent.
- Building on top of the first point, the email sent could have a summary of failures, activity, etc.
- Have a configuration in kibana.yml (pre configured connector?) that the framework can use to communicate externally (ex: send emails on failures, etc).
- A health API, health status bar in the connectors management page (failures over past x).
As a simpler version of the meta alerts issue. We need to have a pragmatic solution in place to indicate administrators when the alerting framework is failing.
Some scenarios:
Some of the ideas bounced around in a team discussion: