feat: implement dora inspect top command for real-time node resource monitoring (#1210)#1251
Conversation
7b3d367 to
583bda8
Compare
phil-opp
left a comment
There was a problem hiding this comment.
Thanks for the PR! Unfortunately, the code as written won't work as intended.
In addition to my inline comments, see my general concern in #1210 (comment).
binaries/coordinator/src/lib.rs
Outdated
| // Find which daemon this node is running on | ||
| for daemon_id in &dataflow.daemons { | ||
| node_infos.push(NodeInfo { | ||
| dataflow_id: dataflow.uuid, | ||
| dataflow_name: dataflow.name.clone(), | ||
| node_id: node_id.clone(), | ||
| daemon_id: daemon_id.clone(), | ||
| }); | ||
| } |
There was a problem hiding this comment.
A node is running only on one daemon. This adds the node ID for all daemons to the list, as if the node was running on all the daemons.
| // Get node info from coordinator | ||
| let mut session = connect_to_coordinator((coordinator_addr, coordinator_port).into()) | ||
| .wrap_err("Failed to connect to coordinator")?; |
There was a problem hiding this comment.
This reconnects to the coordinater in every loop iteration. Why not reuse the connection?
| let request = ControlRequest::GetNodeInfo; | ||
| let reply_raw = session | ||
| .request(&serde_json::to_vec(&request).unwrap()) | ||
| .wrap_err("failed to send request to coordinator")?; |
There was a problem hiding this comment.
Is the node info expected to change during calls? It looks like this will always return the same result, so querying it again and again in every loop iteration is just uneccessary overhead.
phil-opp
left a comment
There was a problem hiding this comment.
Thanks for the updates!
phil-opp
left a comment
There was a problem hiding this comment.
Thanks for the updates!
|
Thanks again! |
|
Thank you for the merge and the great feedback throughout the review process! It was a pleasure working on this feature. |
|
I think the inspect function providing metrics in this PR is not working when the node is spawned using |
This PR implements the
dora inspect topcommand (#1210) for real-time monitoring of node resource usage. The command provides an interactive TUI built withratatuithat displays CPU usage, memory consumption, and process IDs for all running nodes across active dataflows, with configurable refresh intervals (default 2s) and keyboard-driven sorting by node name, CPU, or memory.The implementation extends the message protocol with
GetNodeInforequest/reply types, adds a coordinator handler to collect node information, and usessysinfofor process metrics collection. All changes are backward compatible with no breaking changes to existing functionality.