Skip to content

Add per-model memoryBudget and memoryFraction CRD fields#206

Merged
Defilan merged 1 commit intomainfrom
feat/crd-memory-budget-fields
Mar 4, 2026
Merged

Add per-model memoryBudget and memoryFraction CRD fields#206
Defilan merged 1 commit intomainfrom
feat/crd-memory-budget-fields

Conversation

@Defilan
Copy link
Member

@Defilan Defilan commented Mar 4, 2026

Summary

Closes #187

  • Adds memoryBudget (absolute byte limit, e.g., "24Gi") and memoryFraction (fraction of system RAM, e.g., 0.8) fields to HardwareSpec on the Model CRD
  • The Metal agent resolves the effective memory budget via a precedence chain: CRD absolute > CRD fraction > --memory-fraction agent flag > auto-detect
  • Adds --memory-fraction and --memory-budget CLI flags to llmkube deploy

Precedence chain

  1. model.Spec.Hardware.MemoryBudget — absolute byte limit from CRD
  2. model.Spec.Hardware.MemoryFraction — fraction of system RAM from CRD
  3. --memory-fraction agent flag — global default
  4. Built-in adaptive default (0.67 for ≤36GB, 0.75 for >36GB)

Security hardening

  • Validates MemoryFraction bounds (0.0–1.0, rejects NaN/Inf/negative/zero)
  • Validates MemoryBudget is positive (rejects negative quantities that would wrap to huge uint64)
  • Adds overflow guards in KV cache estimation to prevent uint64 wraparound with corrupt GGUF metadata
  • Validates --memory-fraction CLI flag range

Files changed

  • api/v1alpha1/model_types.go — New CRD fields on HardwareSpec
  • api/v1alpha1/zz_generated.deepcopy.go — Regenerated
  • config/crd/bases/inference.llmkube.dev_models.yaml — Regenerated
  • Makefile — Added crd:allowDangerousTypes=true for *float64 support
  • pkg/agent/memory.goResolveMemoryBudget(), CheckMemoryBudgetAbsolute(), overflow-safe estimation
  • pkg/agent/agent.go — Updated ensureProcess() to use resolved budget
  • pkg/cli/deploy.go — New --memory-fraction and --memory-budget flags
  • pkg/agent/memory_test.go — 16 new tests
  • pkg/cli/deploy_test.go — 3 new tests

Test plan

  • make test — all existing + 19 new tests pass
  • make build — controller and metal-agent build
  • golangci-lint run ./... — 0 issues
  • Security audit — 6 findings fixed (input validation, overflow, bounds checking)
  • Apply a Model with memoryBudget: "1Gi" → agent rejects large model with InsufficientMemory (validated on local minikube cluster)
  • Apply a Model with memoryFraction: 0.9 → agent uses 90% instead of default (validated on local minikube cluster)
  • Deploy with llmkube deploy ... --memory-fraction 0.8 → CRD has correct field (validated on local minikube cluster)

Add MemoryBudget (absolute byte limit) and MemoryFraction (fraction of
system RAM) fields to HardwareSpec, allowing per-model memory budget
overrides on the Model CRD. The agent resolves the effective budget via
a precedence chain: CRD absolute > CRD fraction > agent flag > auto-detect.

- Add MemoryBudget string and MemoryFraction *float64 to HardwareSpec
- Add ResolveMemoryBudget() with precedence chain and input validation
- Add CheckMemoryBudgetAbsolute() for fixed-byte budget checks
- Add --memory-fraction and --memory-budget CLI flags to deploy command
- Add overflow guards in KV cache estimation to prevent uint64 wraparound
- Validate MemoryFraction bounds (0.0-1.0, reject NaN/Inf)
- Validate MemoryBudget is positive (reject negative quantities)
- Regenerate CRD manifests and deepcopy with allowDangerousTypes=true
- Add 19 new tests covering resolution, edge cases, and CLI wiring

Signed-off-by: Christopher Maher <chris@mahercode.io>
@Defilan Defilan merged commit e632369 into main Mar 4, 2026
15 checks passed
@Defilan Defilan deleted the feat/crd-memory-budget-fields branch March 4, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add memoryFraction and memoryBudget fields to CRD for unified memory control

1 participant