Add per-model memoryBudget and memoryFraction CRD fields#206
Merged
Conversation
Add MemoryBudget (absolute byte limit) and MemoryFraction (fraction of system RAM) fields to HardwareSpec, allowing per-model memory budget overrides on the Model CRD. The agent resolves the effective budget via a precedence chain: CRD absolute > CRD fraction > agent flag > auto-detect. - Add MemoryBudget string and MemoryFraction *float64 to HardwareSpec - Add ResolveMemoryBudget() with precedence chain and input validation - Add CheckMemoryBudgetAbsolute() for fixed-byte budget checks - Add --memory-fraction and --memory-budget CLI flags to deploy command - Add overflow guards in KV cache estimation to prevent uint64 wraparound - Validate MemoryFraction bounds (0.0-1.0, reject NaN/Inf) - Validate MemoryBudget is positive (reject negative quantities) - Regenerate CRD manifests and deepcopy with allowDangerousTypes=true - Add 19 new tests covering resolution, edge cases, and CLI wiring Signed-off-by: Christopher Maher <chris@mahercode.io>
This was referenced Mar 4, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #187
memoryBudget(absolute byte limit, e.g.,"24Gi") andmemoryFraction(fraction of system RAM, e.g.,0.8) fields toHardwareSpecon the Model CRD--memory-fractionagent flag > auto-detect--memory-fractionand--memory-budgetCLI flags tollmkube deployPrecedence chain
model.Spec.Hardware.MemoryBudget— absolute byte limit from CRDmodel.Spec.Hardware.MemoryFraction— fraction of system RAM from CRD--memory-fractionagent flag — global defaultSecurity hardening
MemoryFractionbounds (0.0–1.0, rejects NaN/Inf/negative/zero)MemoryBudgetis positive (rejects negative quantities that would wrap to huge uint64)--memory-fractionCLI flag rangeFiles changed
api/v1alpha1/model_types.go— New CRD fields on HardwareSpecapi/v1alpha1/zz_generated.deepcopy.go— Regeneratedconfig/crd/bases/inference.llmkube.dev_models.yaml— RegeneratedMakefile— Addedcrd:allowDangerousTypes=truefor *float64 supportpkg/agent/memory.go—ResolveMemoryBudget(),CheckMemoryBudgetAbsolute(), overflow-safe estimationpkg/agent/agent.go— UpdatedensureProcess()to use resolved budgetpkg/cli/deploy.go— New--memory-fractionand--memory-budgetflagspkg/agent/memory_test.go— 16 new testspkg/cli/deploy_test.go— 3 new testsTest plan
make test— all existing + 19 new tests passmake build— controller and metal-agent buildgolangci-lint run ./...— 0 issuesmemoryBudget: "1Gi"→ agent rejects large model with InsufficientMemory (validated on local minikube cluster)memoryFraction: 0.9→ agent uses 90% instead of default (validated on local minikube cluster)llmkube deploy ... --memory-fraction 0.8→ CRD has correct field (validated on local minikube cluster)