-
Notifications
You must be signed in to change notification settings - Fork 4
Closed
Description
Problem
The Metal Agent hardcodes ContextSize: 2048 when starting llama-server processes (pkg/agent/agent.go line 157), ignoring the contextSize field from the InferenceService CRD spec. There's even a TODO comment acknowledging this:
ContextSize: 2048, // TODO: Get from model specThis causes context overflow errors when prompts exceed 2048 tokens, which is common for financial analysis prompts where the system prompt alone is ~800 tokens. Models like Qwen3-32B support 32K context and the CRD may specify 4096 or 8192, but the agent never reads it.
Expected Behavior
The Metal Agent should read contextSize from isvc.Spec.ContextSize and only fall back to 2048 if the field is not set in the CRD.
Impact
- LLM analysis fails for data-heavy symbols (e.g., GOOGL, META) due to context overflow
- Users cannot control context size through the CRD as intended
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels