Skip to content

perf: apply non-inferable-only principle to system prompts and memory injection #188

@Aureliolo

Description

@Aureliolo

Context

ETH Zurich research (arXiv:2602.11988, AGENTbench 138 tasks) found that repository context files (AGENTS.md, CLAUDE.md-style) often hurt AI agent performance:

  • LLM-generated context: -3% success rate vs no context
  • Human-written context: +4% success rate, but +19-20% inference cost
  • Agents follow ALL instructions literally — including unnecessary ones (excessive testing, quality checks beyond scope)
  • Generic architecture overviews don't reduce exploration — agents read code anyway

The Non-Inferable Principle

System prompts and injected memories should include only information the agent cannot discover by reading the codebase or environment:

Include (non-inferable) Exclude (inferable)
Role constraints, authority Architecture overviews
Custom build commands File structure descriptions
Project-specific conventions General coding best practices
Organizational policies (specific) Generic quality rules
Prior decisions, historical outcomes What's already in the code
Interpersonal context Standard library usage

Current State Audit

The current prompt template (engine/prompt_template.py v1.2.0) is mostly good:

  • Identity, Personality, Skills, Authority, Autonomy sections: all non-inferable ✅
  • Task section: essential ✅
  • Company Context: low-value generic info (department listing) — already correctly first-to-trim ✅
  • Tools section: potentially redundant if tools are also passed via the API's tool_use mechanism ⚠️
  • org_policies: content-dependent — no guidance on what makes a good policy ⚠️

Tasks

Prompt Builder (engine/prompt.py, engine/prompt_template.py)

  • Add docstring/comment documenting the non-inferable-only principle
  • Evaluate whether the Tools section is redundant when tools are passed via the LLM API's native tool mechanism — if so, skip the section or make it opt-in
  • Consider making Company Context opt-in rather than default (currently always included if company is provided)

Org Policies Validation

  • Add guidance/validation for org_policies content: policies should be actionable + non-inferable
  • Document what makes a good org policy (specific conventions, custom tooling) vs bad (generic "maintain quality")
  • Consider a validate_policy_quality() helper or at minimum a docstring contract

Memory Injection (memory/retrieval/)

  • Add a non-inferable filter stage to the context injection pipeline (Strategy 1)
  • Filter should exclude memories that restate what's discoverable from code
  • Consider relevance-score penalty for generic/inferable memories

Cost-Aware Context Budgeting

  • Factor prompt token overhead into auto-loop selection cost estimation
  • Track and log prompt tokens as a percentage of total run cost
  • Consider a prompt_cost_ratio metric in TaskCompletionMetrics

References

Labels

type:perf, scope:engine, scope:memory


Design Decisions Finalized

  • D22 — Remove Tools Section: Do NOT list tools in system prompt — the API's tools parameter already injects richer definitions with schemas. Saves 200-400+ tokens per call, 20%+ cost reduction. Behavioral guidance ("when to use") may be added later.
  • D23 — Memory Filter: Pluggable MemoryFilterStrategy protocol. Initial: tag-based at write time. non-inferable tag convention enforced at MemoryBackend.store() boundary. Uses existing MemoryMetadata.tags + MemoryQuery.tags — zero new models.

Common pattern: All strategies use pluggable protocol interfaces with one initial implementation. Alternative strategies are documented in DESIGN_SPEC.md for future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:highImportant, should be prioritizedscope:medium1-3 days of workspec:agent-systemDESIGN_SPEC Section 3 - Agent Systemspec:memoryDESIGN_SPEC Section 7 - Memory & Persistencetype:featureNew feature implementationtype:perfPerformance optimization

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions