Context
Research (2026-03-24): OpenPencil uses capability-aware prompt engineering with full/standard/basic profiles per model tier. Deep dive revealed a consequential gap in SynthOrg: auto-downgrade changes the model tier but never adapts the prompt.
A small model receives the same verbose personality, nested acceptance criteria, and org policies as a large model. This likely degrades output quality on cheaper models.
Current State
ModelRequirement has capabilities: tuple[str, ...] marked "Future-use", never read
build_system_prompt() has token-budget trimming but zero model-tier awareness
- Single Jinja2
DEFAULT_TEMPLATE for all tiers
- Auto-downgrade (
budget/enforcer.py) changes model but prompt stays identical
Scope
PromptProfile registry (engine/prompt_profiles.py)
PromptProfile frozen Pydantic model: tier, max_personality_tokens, include_org_policies, simplify_acceptance_criteria, autonomy_detail_level (full/summary/minimal), personality_mode (full/condensed/minimal)
PromptProfileRegistry maps ModelTier -> PromptProfile
- Three built-in profiles:
- full (large): all sections, full personality, full criteria, full org policies
- standard (medium): condensed personality (2-3 key traits), bullet-point criteria, org policies included
- basic (small): minimal personality (role + 1-line), single-sentence criteria, no org policies/company context
- Authority and security sections NEVER stripped regardless of profile
Template adaptation (engine/prompt_template.py)
- Add profile-conditional Jinja2 sections (conditionals, not template duplication)
Integration (engine/prompt.py, engine/agent_engine.py)
build_system_prompt() gains model_tier: ModelTier | None parameter
- Engine passes resolved tier (post-downgrade) to prompt builder
Optional: personality preset variants (templates/presets.py)
condensed_description and minimal_description fields on personality presets
- Prompt builder selects based on
PromptProfile.personality_mode
Optional: activate capabilities field
ModelRequirement.capabilities tags (reasoning, tool_use, long_context) influence profile selection beyond tier
Deliverables
Research
- Deep dive:
research/capability-aware-prompts.md (project memory)
- Source: OpenPencil -- MIT, 1.5k stars
Additional Research (2026-03-26)
Cognitive Gating via Answer Separability
Source: SpecEyes (arXiv:2603.23483)
Speculative cognitive gating: a lightweight model pre-checks if the full agent tool chain is actually needed for a given task. Bypasses 30-70% of tasks for 1.1-3.35x throughput improvement.
Answer Separability Metric (S_sep):
- Measures the margin between the top logit and its top-K competitors, normalized by their standard deviation
- Scale-invariant and calibration-free
- Uses min-token aggregation (worst-case guard) across all generated tokens
- Sharp bimodal separation between correct and incorrect answers
Heterogeneous Parallel Serving Funnel:
- Lightweight stateless model runs batched in parallel for "easy" tasks
- Only the residual set (low S_sep) hits the full sequential agentic pipeline
- Throughput speedup approx
1 / (1 - beta * alpha) where beta = screening ratio, alpha = acceptance rate
Application to SynthOrg: integrate with model routing strategies -- a fast pre-check using the "small" tier model could determine whether to invoke the full tool chain or return a direct answer, reducing both latency and cost.
Context
Research (2026-03-24): OpenPencil uses capability-aware prompt engineering with full/standard/basic profiles per model tier. Deep dive revealed a consequential gap in SynthOrg: auto-downgrade changes the model tier but never adapts the prompt.
A
smallmodel receives the same verbose personality, nested acceptance criteria, and org policies as alargemodel. This likely degrades output quality on cheaper models.Current State
ModelRequirementhascapabilities: tuple[str, ...]marked "Future-use", never readbuild_system_prompt()has token-budget trimming but zero model-tier awarenessDEFAULT_TEMPLATEfor all tiersbudget/enforcer.py) changes model but prompt stays identicalScope
PromptProfile registry (
engine/prompt_profiles.py)PromptProfilefrozen Pydantic model:tier,max_personality_tokens,include_org_policies,simplify_acceptance_criteria,autonomy_detail_level(full/summary/minimal),personality_mode(full/condensed/minimal)PromptProfileRegistrymapsModelTier->PromptProfileTemplate adaptation (
engine/prompt_template.py)Integration (
engine/prompt.py,engine/agent_engine.py)build_system_prompt()gainsmodel_tier: ModelTier | NoneparameterOptional: personality preset variants (
templates/presets.py)condensed_descriptionandminimal_descriptionfields on personality presetsPromptProfile.personality_modeOptional: activate
capabilitiesfieldModelRequirement.capabilitiestags (reasoning,tool_use,long_context) influence profile selection beyond tierDeliverables
PromptProfilemodel and registry with 3 built-in profilesbuild_system_prompt()model-tier awarenessdocs/design/engine.md)Research
research/capability-aware-prompts.md(project memory)Additional Research (2026-03-26)
Cognitive Gating via Answer Separability
Source: SpecEyes (arXiv:2603.23483)
Speculative cognitive gating: a lightweight model pre-checks if the full agent tool chain is actually needed for a given task. Bypasses 30-70% of tasks for 1.1-3.35x throughput improvement.
Answer Separability Metric (S_sep):
Heterogeneous Parallel Serving Funnel:
1 / (1 - beta * alpha)where beta = screening ratio, alpha = acceptance rateApplication to SynthOrg: integrate with model routing strategies -- a fast pre-check using the "small" tier model could determine whether to invoke the full tool chain or return a direct answer, reducing both latency and cost.