Skip to content

MilkClouds/smon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

smon

image

A terminal user interface (TUI) for monitoring Slurm clusters. Built with Textual for DGX H100 clusters.

⚠️ Development Notice: This project is mainly implemented by LLM (Sonnet 4/GPT-4) and is not complete, has bugs. Contributions are welcome, including major changes.

Features

  • Job monitoring with live updates
  • Node status display with GPU availability
  • GPU count display with partition info
  • Script viewer with syntax highlighting
  • Output tracking (stdout/stderr)
  • Search and filtering
  • Tabbed TUI interface
  • Keyboard shortcuts
  • gpustat-web integration for real-time GPU monitoring

Installation

Using uvx/uv tool (recommended)

# 1. Use uvx
$ uvx --from git+https://github.com/MilkClouds/smon.git smon

# 2. Use uv tool
$ uv tool install git+https://github.com/MilkClouds/smon.git
$ smon

Using pip

$ pip install git+https://github.com/MilkClouds/smon.git
$ smon

Usage

Basic Usage

smon

Command Line Options

smon --help                    # Show help
smon --refresh 10              # Set refresh interval to 10 seconds
smon --user alice              # Filter jobs by user
smon --partition gpu           # Filter jobs by partition
smon --gpustat-web URL         # Enable gpustat-web integration

Keyboard Shortcuts

Key Action
q Quit application
r Refresh data
/ Focus search input
f Show filter status
s Open script modal for selected job
o Open output modal for selected job
t Toggle real-time output refresh
Ctrl+R Refresh output in current tab

TUI Interface

Jobs Tab

  • Job information: JobID, User, State, Partition, Resources
  • GPU/CPU/memory usage and timing
  • Select job to view details, script, and output

Script Tab

  • Shows script for selected job
  • Bash syntax highlighting
  • Modal view with s key

Output Tab

  • stdout/stderr for selected jobs
  • Real-time refresh toggle (t)
  • Manual refresh (Ctrl+R)

Nodes Tab

  • Node status and availability
  • GPU/CPU/memory per node
  • gpustat-web integration (side-by-side view)

gpustat-web Integration

smon can display real-time GPU status from gpustat-web alongside the Slurm node information.

Setup

  1. Make sure gpustat-web is running on your cluster (e.g., http://10.50.0.111:48109/)

  2. Run smon with the --gpustat-web option:

    smon --gpustat-web http://10.50.0.111:48109/
  3. Or add to config file (~/.config/smon/config.json):

    {
      "gpustat_web_url": "http://10.50.0.111:48109/"
    }

The Nodes tab will show the Slurm node table on the left and live GPU status from gpustat-web on the right.

Requirements

  • Python ≥ 3.11
  • Slurm cluster with squeue, sinfo, and scontrol commands
  • Terminal with color support

Dependencies

Related projects

Contributing

This TUI project is primarily implemented using LLM assistance (Sonnet 4/GPT-4) and is incomplete with known bugs. Contributions are welcome:

  • Bug fixes
  • Feature improvements
  • Code refactoring
  • Documentation
  • Major changes
  • Testing

Feel free to open issues or submit pull requests.

About

Real-time Slurm cluster monitoring tool with interactive TUI with Textual. Visualizes GPU/CPU/memory allocation across nodes with job-level drill-down.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages