Problem Statement
Summary
Add percentage-based CPU/RAM resource profiles to nemoclaw onboard and a new nemoclaw resources command for hardware inventory. Resource limits are applied to sandbox pods via OpenShell's --cpu-request/--cpu-limit/--memory-request/--memory-limit flags (OpenShell PR #1063).
Motivation
Currently, sandbox pods run with default K8s resource allocations (BestEffort QoS). On shared machines (DGX Spark, laptops running other workloads), this leads to:
- Sandbox consuming all available CPU/RAM, starving host-side processes (Ollama, IDE)
- No predictable resource budgeting for multi-sandbox scenarios
- No visibility into hardware capacity vs. sandbox allocation
Proposed Design
Proposed Changes
1. nemoclaw resources command (hardware inventory)
- Reports CPU cores, RAM, GPU VRAM, and K8s allocatable capacity
- Supports
--json output for scripting
- Uses K8s allocatable (not host totals) when available
2. Resource profile integration in nemoclaw onboard
- Interactive picker with pre-defined profiles (creator, gamer, developer, etc.)
- Custom profile option with percentage or absolute values
- Environment variable overrides:
NEMOCLAW_RESOURCE_PROFILE, NEMOCLAW_CPU_LIMIT, etc.
- Percentage resolution against K8s allocatable capacity (e.g., "25%" of 22 cores = 5 cores)
- Graceful degradation: when OpenShell CLI lacks resource flags, displays resolved values but skips enforcement
3. Blueprint schema update
resource_profiles section in blueprint.yaml with percentage-based patterns (e.g., cpu_limit: "25%")
- JSON schema validation for 1%–100% whole-number percentages
Testing
Verified end-to-end on cgroup v2:
- Pod spec:
resources.requests.cpu=5, resources.limits.cpu=11, resources.requests.memory=7Gi, resources.limits.memory=15Gi
- cgroup enforcement:
cpu.max=1100000/100000 (11 cores), memory.max=16106127360 (15Gi)
- CPU burn test: 100 throttle events in 10s (22 threads capped at 11 cores)
- Memory OOM test: exit 137 when exceeding 15Gi limit
Dependencies
- OpenShell PR #1063: adds
--cpu-request/--cpu-limit/--memory-request/--memory-limit flags to openshell sandbox create
Scope
This is PR 1/2. A follow-up PR will add:
nemoclaw <sandbox> resize (live resource adjustment)
nemoclaw <sandbox> verify (cgroup validation)
Alternatives Considered
No response
Category
enhancement: feature
Checklist
Problem Statement
Summary
Add percentage-based CPU/RAM resource profiles to
nemoclaw onboardand a newnemoclaw resourcescommand for hardware inventory. Resource limits are applied to sandbox pods via OpenShell's--cpu-request/--cpu-limit/--memory-request/--memory-limitflags (OpenShell PR #1063).Motivation
Currently, sandbox pods run with default K8s resource allocations (BestEffort QoS). On shared machines (DGX Spark, laptops running other workloads), this leads to:
Proposed Design
Proposed Changes
1.
nemoclaw resourcescommand (hardware inventory)--jsonoutput for scripting2. Resource profile integration in
nemoclaw onboardNEMOCLAW_RESOURCE_PROFILE,NEMOCLAW_CPU_LIMIT, etc.3. Blueprint schema update
resource_profilessection inblueprint.yamlwith percentage-based patterns (e.g.,cpu_limit: "25%")Testing
Verified end-to-end on cgroup v2:
resources.requests.cpu=5, resources.limits.cpu=11, resources.requests.memory=7Gi, resources.limits.memory=15Gicpu.max=1100000/100000(11 cores),memory.max=16106127360(15Gi)Dependencies
--cpu-request/--cpu-limit/--memory-request/--memory-limitflags toopenshell sandbox createScope
This is PR 1/2. A follow-up PR will add:
nemoclaw <sandbox> resize(live resource adjustment)nemoclaw <sandbox> verify(cgroup validation)Alternatives Considered
No response
Category
enhancement: feature
Checklist