Skip to content

feat: implement dora inspect top command for real-time node resource monitoring (#1210)#1251

Merged
phil-opp merged 8 commits intodora-rs:mainfrom
guptapratykshh:feature/inspect-top-command
Dec 12, 2025
Merged

feat: implement dora inspect top command for real-time node resource monitoring (#1210)#1251
phil-opp merged 8 commits intodora-rs:mainfrom
guptapratykshh:feature/inspect-top-command

Conversation

@guptapratykshh
Copy link
Copy Markdown
Contributor

This PR implements the dora inspect top command (#1210) for real-time monitoring of node resource usage. The command provides an interactive TUI built with ratatui that displays CPU usage, memory consumption, and process IDs for all running nodes across active dataflows, with configurable refresh intervals (default 2s) and keyboard-driven sorting by node name, CPU, or memory.

The implementation extends the message protocol with GetNodeInfo request/reply types, adds a coordinator handler to collect node information, and uses sysinfo for process metrics collection. All changes are backward compatible with no breaking changes to existing functionality.

@guptapratykshh guptapratykshh force-pushed the feature/inspect-top-command branch from 7b3d367 to 583bda8 Compare December 5, 2025 09:32
Copy link
Copy Markdown
Collaborator

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Unfortunately, the code as written won't work as intended.

In addition to my inline comments, see my general concern in #1210 (comment).

Comment on lines +782 to +790
// Find which daemon this node is running on
for daemon_id in &dataflow.daemons {
node_infos.push(NodeInfo {
dataflow_id: dataflow.uuid,
dataflow_name: dataflow.name.clone(),
node_id: node_id.clone(),
daemon_id: daemon_id.clone(),
});
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A node is running only on one daemon. This adds the node ID for all daemons to the list, as if the node was running on all the daemons.

Comment on lines +296 to +298
// Get node info from coordinator
let mut session = connect_to_coordinator((coordinator_addr, coordinator_port).into())
.wrap_err("Failed to connect to coordinator")?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reconnects to the coordinater in every loop iteration. Why not reuse the connection?

Comment on lines +300 to +303
let request = ControlRequest::GetNodeInfo;
let reply_raw = session
.request(&serde_json::to_vec(&request).unwrap())
.wrap_err("failed to send request to coordinator")?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the node info expected to change during calls? It looks like this will always return the same result, so querying it again and again in every loop iteration is just uneccessary overhead.

Copy link
Copy Markdown
Collaborator

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

Copy link
Copy Markdown
Collaborator

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

@phil-opp phil-opp merged commit e7f2959 into dora-rs:main Dec 12, 2025
66 of 73 checks passed
@phil-opp
Copy link
Copy Markdown
Collaborator

Thanks again!

@guptapratykshh
Copy link
Copy Markdown
Contributor Author

Thank you for the merge and the great feedback throughout the review process! It was a pleasure working on this feature.

@haixuanTao
Copy link
Copy Markdown
Collaborator

haixuanTao commented Jan 8, 2026

I think the inspect function providing metrics in this PR is not working when the node is spawned using uv. Only the parent process is taken into account and does not take child process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants