fix: Ignore extra HealthStatus fields in service monitor#3132
fix: Ignore extra HealthStatus fields in service monitor#3132Bravo555 merged 2 commits intothin-edge:mainfrom
Conversation
Features were added because without them it was impossible to build and check `tedge-watchdog` by itself using a `-p` flag, i.e: `cargo build -p tedge-watchdog` (`hide_env_values` requires `env` and `default_value` variant that uses `OsString` required `string`) Note that adding these features in workspace Cargo.toml should not add more code to the binary as these features were already used because other crate required them, so the only clap instance we're pulling have these enabled. Only now we're making these features available to all the workspace crates (including tedge-watchdog). Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅ Additional details and impacted files📢 Thoughts on this report? Let us know! |
| #[derive(Debug, Serialize, Deserialize)] | ||
| pub struct HealthStatusExt { |
There was a problem hiding this comment.
I would consider to move this struct in tedge_api.
Also, I'm not really convinced by having two public structs with similar purpose. Wouldn't be better to keep the HealthStatus struct unchanged with a deserialize method that fallbacks on a minimal private struct in case some fields cannot be parsed?
There was a problem hiding this comment.
I did talk about this with @albinsuresh because I started out with the same approach but concluded that having separate structs with the one in tedge_watchdog extending the one in tedge_api would be better.
Also, I'm not really convinced by having two public structs with similar purpose.
My bad, the struct wasn't supposed to be public and the aim was for it to be very much private to the tedge_watchdog crate. pub was a leftover from when I had it in a nested module, which is now removed.
Wouldn't be better to keep the HealthStatus struct unchanged with a deserialize method that fallbacks on a minimal private struct in case some fields cannot be parsed?
I understand that pros of the "one struct" approach to be the following:
- simpler interface (only one type of
HealthStatus) at the cost of a more complex implementation (customDeserializeimpl that tries to deserialize fields to an expected format but falls back to some other representation when it's not possible) - all possible fields in one place so one has to only look at
tedge_apito see thatHealthStatuscan have these fields only
But IMHO it would make sense to keep HealthStatus in tedge_api as small as possible and not to put in fields that are not really required and erect more structure than necessary:
My view is that the payload under te/device/device001/service/service001/status/health is from an open set: it needs to have status property for its status to be synced with c8y, but it may also have pid and time when it's using tedge_watchdog. Perhaps some other component would require other fields to be present, but not care about status, pid, or time.
So we can't really deserialize to a struct that has all these fields defined because then if there's a problem with one of the fields that we're not using, we can't use other fields that we need and that are fine.
The easiest would be to just deserialize to HashMap<String, JsonValue>, but then the callers have to look up the fields they need and validate their types.
So we can do one additional step - make a struct for these properties so that serde will fetch the fields and validate their types and values for us. Because serde by default omits unknown fields, the struct doesn't say "Health status payload has only these fields", it says "it has at least these fields"
deserialize method that fallbacks on a minimal private struct in case some fields cannot be parsed
This is the somewhat problematic part. Deserialization either succeeds or fails, so it's not clear to me how to handle failures on individual fields while keeping other fields in a good way. If we want to deserialize e.g. pid: Option<u32>, how to handle "pid": "invalid" case? If we put None, we lose the error. If we deserialize to Result<u32, DeserializeError> that's just clumsy. Maybe we can put it into extra: HashMap<String, JsonValue>, but the outcome is not much different from just skipping the complex deserialize impl and using the extra hashmap directly, or even better defining another struct that has fields we require so that we don't have to extract from the hashmap manually.
IMO the default serde behaviour of "I want these fields to be present and valid and I want to ignore other fields" is conducive to the approach of defining multiple structs, even for things that are only slightly different.
TL;DR: IMO health status payload is a dynamic type, we shouldn't try to enumerate all possible fields for a deserialize impl, and coping with different components having different failure modes when some of the fields are missing. Instead it makes sense to only require fields we require in each caller, but be upfront in the documentation that the JSON payload may also contain other fields.
There was a problem hiding this comment.
Perhaps some other component would require other fields to be present, but not care about
status,pid, ortime.
Here, you are going a bit too far. If there no relationship with an HealthStatus, what's the point to use that struct?
The easiest would be to just deserialize to
HashMap<String, JsonValue>, but then the callers have to look up the fields they need and validate their types.
This definitely not what I'm asking for.
Deserialization either succeeds or fails, so it's not clear to me how to handle failures on individual fields while keeping other fields in a good way. If we want to deserialize e.g.
pid: Option<u32>, how to handle"pid": "invalid"case? If we putNone, we lose the error. If we deserialize toResult<u32, DeserializeError>that's just clumsy.
I would put None in that case (I agree this is not always appropriate), meaning there is no pid information that can be used.
IMO the default serde behaviour of "I want these fields to be present and valid and I want to ignore other fields" is conducive to the approach of defining multiple structs, even for things that are only slightly different.
It can be indeed appropriate to define multiple independent structs. But here you propose something a bit complicated for nothing: why an extension of tedge_api::HealthStatus and not a fully independent tedge_watchdog::HealthStatus.
This is even a surprise to me that tedge_watchdog doesn't use the status field! Did I miss something?
There was a problem hiding this comment.
It can be indeed appropriate to define multiple independent structs. But here you propose something a bit complicated for nothing: why an extension of tedge_api::HealthStatus and not a fully independent tedge_watchdog::HealthStatus.
Right, if there were more fields in base HealthStatus that tedge_watchdog would be using then it would make sense to extend, right now they can be made independent with not much difference.
This is even a surprise to me that tedge_watchdog doesn't use the status field! Did I miss something?
You're correct, tedge_watchdog seems to not inspect status field before sending notifications to systemd, but it probably should, right?
In any case, I'll provide an alternative implementation more in line with your requirements - I had it working at one point but decided the current approach to be simpler, so I'll just find it in my git reflog and we can compare.
There was a problem hiding this comment.
In any case, I'll provide an alternative implementation more in line with your requirements - I had it working at one point but decided the current approach to be simpler, so I'll just find it in my
git reflogand we can compare.
As tedge_watchdog is not using the status field, the simpler is to go your way but with two fully independent structs.
There was a problem hiding this comment.
Alternative approach here:
There was a problem hiding this comment.
As tedge_watchdog is not using the status field, the simpler is to go your way but with two fully independent structs.
The watchdog should have been using that field, which is why I proposed the watchdog to use a HealthStatusExt struct, which is an extension of the HealthStatus struct, but with those additional fields relevant only to it. I believe this extended setup would make more sense when the watchdog is made to care about the status, which it should always have. I'm not quite sure if I missed it in the initial impl itself or got lost during some of those earlier refactoring exercises.
Looking at the alternate impl, I am also in favour of the simplified impl in this PR.
Robot Results
|
74f6ac5 to
33ca589
Compare
| if let Ok(message) = message.payload_str() { | ||
| debug!("Health response received: {message}"); | ||
| if let Ok(health_status) = serde_json::from_str::<HealthStatus>(message) { | ||
| if let Ok(health_status) = serde_json::from_str::<HealthStatusExt>(message) { |
There was a problem hiding this comment.
I feel a bit odd that the watchdog doesn't seem caring if the status is up or not. As long as pid and time are included in the health status message, the health status is considered "up"?
There was a problem hiding this comment.
Yes, if it contains time and pid it sends systemd-notify notification without checking status, and this definitely seems like an overlook to me. I would expect notifications to be sent only when status is up.
There was a problem hiding this comment.
Okay, so that's why you didn't include the HealthStatus struct as a field in the HealthStatusExt struct, as it was ignored. That needs to be fixed.
There was a problem hiding this comment.
Yes, if it contains
timeandpidit sends systemd-notify notification without checkingstatus, and this definitely seems like an overlook to me. I would expect notifications to be sent only whenstatusisup.
I also expect in this way. However, it's out of scope to fix it in this PR?
| /// | ||
| /// https://thin-edge.github.io/thin-edge.io/operate/troubleshooting/monitoring-service-health/ | ||
| #[derive(Debug, Serialize, Deserialize)] | ||
| struct HealthStatusExt { |
There was a problem hiding this comment.
One can keep the former name.
| struct HealthStatusExt { | |
| struct HealthStatus { |
There was a problem hiding this comment.
I was expecting this struct to wrap the HealthStatus struct as a flattened field, hence the name.
There was a problem hiding this comment.
I was expecting this struct to wrap the
HealthStatusstruct as a flattened field, hence the name.
Yeah, but the wrapped value is not used.
There was a problem hiding this comment.
Yeah, but the wrapped value is not used.
I think the value should have used. It's the implementation oversight that we are not using the status in watchdog. See this thread.
Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
1753f46 to
de6e408
Compare
Proposed changes
tedge-mapper will now not only try to parse
statusfield and will ignore other extra fields.Types of changes
Paste Link to the issue
Checklist
cargo fmtas mentioned in CODING_GUIDELINEScargo clippyas mentioned in CODING_GUIDELINESFurther comments