Skip to content

Metal Agent ignores InferenceService contextSize, hardcodes 2048 #159

@Defilan

Description

@Defilan

Problem

The Metal Agent hardcodes ContextSize: 2048 when starting llama-server processes (pkg/agent/agent.go line 157), ignoring the contextSize field from the InferenceService CRD spec. There's even a TODO comment acknowledging this:

ContextSize: 2048, // TODO: Get from model spec

This causes context overflow errors when prompts exceed 2048 tokens, which is common for financial analysis prompts where the system prompt alone is ~800 tokens. Models like Qwen3-32B support 32K context and the CRD may specify 4096 or 8192, but the agent never reads it.

Expected Behavior

The Metal Agent should read contextSize from isvc.Spec.ContextSize and only fall back to 2048 if the field is not set in the CRD.

Impact

  • LLM analysis fails for data-heavy symbols (e.g., GOOGL, META) due to context overflow
  • Users cannot control context size through the CRD as intended

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions