-
Notifications
You must be signed in to change notification settings - Fork 0
perf: apply non-inferable-only principle to system prompts and memory injection #188
Copy link
Copy link
Closed
Closed
Copy link
Labels
prio:highImportant, should be prioritizedImportant, should be prioritizedscope:medium1-3 days of work1-3 days of workspec:agent-systemDESIGN_SPEC Section 3 - Agent SystemDESIGN_SPEC Section 3 - Agent Systemspec:memoryDESIGN_SPEC Section 7 - Memory & PersistenceDESIGN_SPEC Section 7 - Memory & Persistencetype:featureNew feature implementationNew feature implementationtype:perfPerformance optimizationPerformance optimization
Description
Context
ETH Zurich research (arXiv:2602.11988, AGENTbench 138 tasks) found that repository context files (AGENTS.md, CLAUDE.md-style) often hurt AI agent performance:
- LLM-generated context: -3% success rate vs no context
- Human-written context: +4% success rate, but +19-20% inference cost
- Agents follow ALL instructions literally — including unnecessary ones (excessive testing, quality checks beyond scope)
- Generic architecture overviews don't reduce exploration — agents read code anyway
The Non-Inferable Principle
System prompts and injected memories should include only information the agent cannot discover by reading the codebase or environment:
| Include (non-inferable) | Exclude (inferable) |
|---|---|
| Role constraints, authority | Architecture overviews |
| Custom build commands | File structure descriptions |
| Project-specific conventions | General coding best practices |
| Organizational policies (specific) | Generic quality rules |
| Prior decisions, historical outcomes | What's already in the code |
| Interpersonal context | Standard library usage |
Current State Audit
The current prompt template (engine/prompt_template.py v1.2.0) is mostly good:
- Identity, Personality, Skills, Authority, Autonomy sections: all non-inferable ✅
- Task section: essential ✅
- Company Context: low-value generic info (department listing) — already correctly first-to-trim ✅
- Tools section: potentially redundant if tools are also passed via the API's tool_use mechanism
⚠️ - org_policies: content-dependent — no guidance on what makes a good policy
⚠️
Tasks
Prompt Builder (engine/prompt.py, engine/prompt_template.py)
- Add docstring/comment documenting the non-inferable-only principle
- Evaluate whether the Tools section is redundant when tools are passed via the LLM API's native tool mechanism — if so, skip the section or make it opt-in
- Consider making Company Context opt-in rather than default (currently always included if company is provided)
Org Policies Validation
- Add guidance/validation for
org_policiescontent: policies should be actionable + non-inferable - Document what makes a good org policy (specific conventions, custom tooling) vs bad (generic "maintain quality")
- Consider a
validate_policy_quality()helper or at minimum a docstring contract
Memory Injection (memory/retrieval/)
- Add a non-inferable filter stage to the context injection pipeline (Strategy 1)
- Filter should exclude memories that restate what's discoverable from code
- Consider relevance-score penalty for generic/inferable memories
Cost-Aware Context Budgeting
- Factor prompt token overhead into auto-loop selection cost estimation
- Track and log prompt tokens as a percentage of total run cost
- Consider a
prompt_cost_ratiometric inTaskCompletionMetrics
References
- Evaluating AGENTS.md (arXiv:2602.11988)
- Codified Context (arXiv:2602.20478) — complementary research; reconciled in DESIGN_SPEC
- DESIGN_SPEC §6.5 (step 3, updated), §7.7 (updated with non-inferable filter note)
Labels
type:perf, scope:engine, scope:memory
Design Decisions Finalized
- D22 — Remove Tools Section: Do NOT list tools in system prompt — the API's
toolsparameter already injects richer definitions with schemas. Saves 200-400+ tokens per call, 20%+ cost reduction. Behavioral guidance ("when to use") may be added later. - D23 — Memory Filter: Pluggable
MemoryFilterStrategyprotocol. Initial: tag-based at write time.non-inferabletag convention enforced atMemoryBackend.store()boundary. Uses existingMemoryMetadata.tags+MemoryQuery.tags— zero new models.
Common pattern: All strategies use pluggable protocol interfaces with one initial implementation. Alternative strategies are documented in DESIGN_SPEC.md for future.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
prio:highImportant, should be prioritizedImportant, should be prioritizedscope:medium1-3 days of work1-3 days of workspec:agent-systemDESIGN_SPEC Section 3 - Agent SystemDESIGN_SPEC Section 3 - Agent Systemspec:memoryDESIGN_SPEC Section 7 - Memory & PersistenceDESIGN_SPEC Section 7 - Memory & Persistencetype:featureNew feature implementationNew feature implementationtype:perfPerformance optimizationPerformance optimization