Metrics cluster working state and last update#7670
Conversation
b016eea to
92751b4
Compare
I'm not sure if this is a conventional way of doing things? Maybe it's better to only report the current one?
I suggest to rename this to |
92751b4 to
621c90c
Compare
It probably is not a conventional way. It seemed to be more suitable for plotting to me. Checking with ChatGPT, it seems to confirm my approach: An alternative could be to include the state in the metrics name: I know ChatGPT could be wrong here so please let me know if you want me to look deeper into the conventions here! Edit: On a second thought, maybe we shouldn't include the error itself for the same reason. I assume the error, if exists, can be read somewhere else too. |
Yeah, that sounds reasonable! I'm fine with that ChatGPT explanation. Let's exclude the error message, but just denote the succesful and error states. That would be enough to detect 'an error'. And then a user is responsible for looking into the cluster to see the actual error. |
timvisee
left a comment
There was a problem hiding this comment.
One minor change I just noticed. Other than that all good, thanks!
ad788f6 to
d3cb8ab
Compare
* cluster working state and last update in metrics * Rename metric * Remove error string * Use timestamp instead * Fix prometheus help text * Change metric type to counter
Depends on #7479
Adds the following two new metric families to the metrics API:
cluster_working_statealways has two metrics, one withstate=workingand onestate=stopped.The active state exports the value
1.