Skip to content

fix(agent): read contextSize from InferenceService CRD#160

Merged
Defilan merged 1 commit intomainfrom
fix/metal-agent-context-size
Feb 20, 2026
Merged

fix(agent): read contextSize from InferenceService CRD#160
Defilan merged 1 commit intomainfrom
fix/metal-agent-context-size

Conversation

@Defilan
Copy link
Member

@Defilan Defilan commented Feb 20, 2026

Summary

  • Metal Agent now reads contextSize from the InferenceService CRD spec instead of hardcoding 2048
  • Falls back to 2048 only when the field is not set in the CRD
  • Resolves context overflow errors for prompts exceeding 2048 tokens (e.g., financial analysis prompts where system prompt alone is ~800 tokens)

Fixes #159

Test plan

  • Deploy with InferenceService that has contextSize: 8192 and verify llama-server starts with -c 8192
  • Deploy with InferenceService without contextSize set and verify fallback to 2048
  • Run prompts that exceed 2048 tokens and verify no context overflow errors

The Metal Agent was hardcoding ContextSize to 2048 when starting
llama-server processes, ignoring the contextSize field in the
InferenceService spec. This caused context overflow errors for
prompts exceeding 2048 tokens.

Now reads contextSize from isvc.Spec.ContextSize and falls back
to 2048 only if not specified in the CRD.

Fixes #159

Signed-off-by: Christopher Maher <chris@mahercode.io>
@Defilan Defilan force-pushed the fix/metal-agent-context-size branch from 4b185f9 to c37a63c Compare February 20, 2026 19:01
@Defilan Defilan merged commit 17f58d4 into main Feb 20, 2026
15 checks passed
@Defilan Defilan deleted the fix/metal-agent-context-size branch February 20, 2026 19:08
@github-actions github-actions bot mentioned this pull request Feb 20, 2026
@github-actions github-actions bot mentioned this pull request Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metal Agent ignores InferenceService contextSize, hardcodes 2048

1 participant