You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Meta Superintelligence Labs (with Princeton and Duke) recently published PAHF: Personalized Agents from Human Feedback (Feb 2026), introducing a three-step continual personalization loop that enables agents to learn user preferences from scratch, ground actions in stored preferences, and adapt when preferences change over time. The reference implementation is MIT-licensed.
PAHF addresses three core failure modes: cold start (no prior knowledge of user), static errors (not learning from corrections), and preference drift (not adapting when preferences change). Their empirical results show that combining pre-action clarification with post-action feedback achieves 68-70% success rates vs 27-45% for no-memory baselines, and recovers from preference shifts significantly faster than single-channel approaches.
This feature proposes integrating PAHF's personalization loop into Hermes Agent's behavioral layer — not as a new tool, but as an enhancement to how the agent uses its existing memory, clarify, and session_search tools to systematically learn and adapt to each user. This complements #346 (Structured Memory System) which provides the storage infrastructure; this issue focuses on the behavioral strategy that uses that infrastructure for personalization.
Research Findings
How PAHF Works
PAHF implements a three-step interactive loop:
Step 1: Pre-Action Clarification
When a task is ambiguous AND the agent's memory lacks relevant preferences, the agent proactively asks a clarifying question before acting. The key insight: once memory contains the preference, the agent should act directly without asking. This prevents the "annoying assistant" pattern of asking every time.
In the reference implementation, this is controlled by prompt engineering — the agent's ReAct prompt includes "Ask human" as a valid action with explicit rules:
You MUST ask for clarification when:
1. The task contains ambiguous references
2. The task involves subjective preferences (e.g., "favorite", "preferred")
3. You have NO memory context relevant to this preference
But also: "If memory already provides context, act directly without asking."
When the agent asks:
Generate a clarification question (using a question-generation prompt)
Receive the user's answer
Summarize the Q&A into a preference statement and store in memory
Re-attempt the task with the new preference context
Step 2: Preference-Grounded Action
Before executing any task, the agent retrieves relevant memories (top-k by embedding similarity), summarizes them into a concise context, and includes this as grounding context. The action is then generated with awareness of stored preferences.
Step 3: Post-Action Feedback Integration (Detect-Summarize-Integrate)
After completing a task, if the user provides corrective feedback, the agent processes it through a three-stage pipeline:
Detect: LLM judges whether feedback contains actionable preference information (filters out "thanks" / "ok")
Summarize: Extract the preference into a brief, reusable statement (e.g., "User prefers herbal tea when tired")
Integrate:
Check if this updates an existing preference (detect phrases like "actually, I changed my mind", "I no longer like...")
If update: find the most similar existing memory → merge old + new into a coherent statement (priority to newer preference)
If new: add as a fresh memory entry
This is the critical mechanism for preference drift — the paper proves (Proposition 1) that without post-action feedback, agents accumulate Ω(T) mistakes when preferences change, because they remain "confidently wrong" with stale memories.
Key Design Decisions in PAHF
LLM-as-judge for feedback detection — No classifier; the LLM decides if feedback is salient. Lightweight but flexible.
Summarize before storing — Raw feedback is noisy. Summarizing into a preference statement produces cleaner, more retrievable memories.
Merge over replace — When preferences change, the integration prompt merges old and new information rather than deleting old entries, preserving context (e.g., "Used to prefer coffee, now prefers herbal tea when tired").
Memory gates clarification — The agent only asks questions when memory is empty/insufficient. This naturally reduces question frequency over time as the agent learns.
Embedding-based memory retrieval — Uses DRAGON+ dual-encoder for semantic search (query vs. context encoders). Critical for finding relevant preferences even when the task phrasing differs from the stored preference.
Per-user memory isolation — Each user gets their own memory bank, keyed by person_id.
Theoretical Backing
Proposition 1: Post-action feedback is necessary under preference drift (without it, Ω(T) cumulative mistakes)
Proposition 2: Pre-action clarification is necessary under partial observability (reduces error from constant ε₀ to O(m⁻ᵏ))
Theorem 1 (Complementarity): PAHF combining both channels achieves dynamic regret O(K + γ) where K = preference switches, γ = ambiguity rate
Current State in Hermes Agent
What We Have
Component
Status
Gap
clarify tool
✅ Exists
Not used systematically for preference learning; no guidance to summarize answers into preferences
memory tool
✅ Exists
Flat text, no preference typing, no merge/update logic, no salience detection
session_search
✅ Exists
Keyword-only (FTS5), no semantic/embedding search
honcho integration
✅ Optional
External service for user context queries — complementary but requires Honcho setup
System prompt guidance
✅ Exists
Tells agent to save preferences to memory, but no structured loop
Post-action feedback
❌ Missing
No systematic mechanism to learn from corrections/mistakes
Preference drift detection
❌ Missing
No mechanism to detect and handle changing preferences
Detect-Summarize-Integrate pipeline
❌ Missing
Memory updates are ad-hoc; no salience filtering or preference merging
Post-action feedback pattern — Guidance for the agent to proactively ask "did that work?" after preference-sensitive tasks
Detect-Summarize-Integrate logic — Either in the memory tool or as a prompt pattern the agent follows
Preference retrieval before action — Guidance to check memory/session_search for relevant preferences before making choices
Phased Rollout
Phase 1: Behavioral Guidance (System Prompt Enhancement)
No code changes to tools — purely prompt engineering.
Add PAHF-aware guidance to prompt_builder.py:
"Before tasks involving subjective choices, check your memory for relevant user preferences"
"When a task is ambiguous and you lack preference context, use clarify to ask BEFORE acting"
"After completing a preference-sensitive task, if the user corrects you, summarize the correction as a preference and save to memory (user target)"
"When saving preferences, check if an existing preference entry covers the same topic — if so, replace it with a merged version that preserves context"
"As you learn more preferences, reduce how often you ask — act on stored knowledge"
Add a PERSONALIZATION_GUIDANCE constant alongside existing MEMORY_GUIDANCE
Deliverable: Agent follows the PAHF loop via instructions, using existing tools
Implement preference drift detection: when the agent retrieves a preference but the user's correction contradicts it, flag the old preference for update
Add periodic preference consolidation: during memory maintenance, merge duplicate/overlapping preferences
Solves the cold-start problem — Agent can learn any user's preferences from scratch through natural interaction, not just manual memory entries
Handles preference drift — Agent adapts when users change their minds, avoiding the "confidently wrong" failure mode
Leverages existing tools — Phase 1 requires zero new tools; just better instructions for using clarify and memory
Theoretically grounded — PAHF provides formal guarantees (Propositions 1-2, Theorem 1) that the dual-channel approach is optimal
Progressive enhancement — Each phase adds value independently; Phase 1 alone improves personalization
MIT-licensed reference code — Can study and adapt patterns from github.com/facebookresearch/PAHF
Complements Honcho — Users with Honcho get external dialectic reasoning; users without it get built-in preference learning
Multi-platform compatible — Works across CLI, Telegram, Discord since it uses existing tools
Cons / Risks
Over-asking risk — If Phase 1 prompt guidance is poorly calibrated, the agent might ask too many questions. Need careful prompt engineering to gate clarification on memory state.
Memory pollution — Without Phase 2's salience detection, the agent might save non-preference information as preferences, filling memory with noise.
Increased system prompt size — Adding PAHF guidance increases the system prompt, consuming context window. Must keep instructions concise.
Evaluation difficulty — Hard to objectively measure personalization quality in production (PAHF's benchmarks are synthetic). May need to rely on user feedback.
LLM reasoning overhead — The Detect-Summarize-Integrate pipeline adds extra LLM calls for processing feedback. Phase 3 must be careful about latency/cost.
Open Questions
Should Phase 1's behavioral guidance be opt-in (config flag) or default-on? Default-on risks over-asking for users who don't want heavy personalization.
How should PAHF interact with Honcho? If both are enabled, should PAHF's built-in preference memory defer to Honcho's dialectic reasoning, or complement it?
What's the right frequency for post-action feedback solicitation? PAHF's paper uses simulated users who always provide feedback — real users may find "did that work?" annoying if asked too often.
Should preferences be scoped to domains (coding style, communication, tool choices) or flat? Domain scoping improves retrieval but adds complexity.
For Phase 2's merge logic, should the agent autonomously merge similar preferences, or ask the user to confirm before merging?
Overview
Meta Superintelligence Labs (with Princeton and Duke) recently published PAHF: Personalized Agents from Human Feedback (Feb 2026), introducing a three-step continual personalization loop that enables agents to learn user preferences from scratch, ground actions in stored preferences, and adapt when preferences change over time. The reference implementation is MIT-licensed.
PAHF addresses three core failure modes: cold start (no prior knowledge of user), static errors (not learning from corrections), and preference drift (not adapting when preferences change). Their empirical results show that combining pre-action clarification with post-action feedback achieves 68-70% success rates vs 27-45% for no-memory baselines, and recovers from preference shifts significantly faster than single-channel approaches.
This feature proposes integrating PAHF's personalization loop into Hermes Agent's behavioral layer — not as a new tool, but as an enhancement to how the agent uses its existing
memory,clarify, andsession_searchtools to systematically learn and adapt to each user. This complements #346 (Structured Memory System) which provides the storage infrastructure; this issue focuses on the behavioral strategy that uses that infrastructure for personalization.Research Findings
How PAHF Works
PAHF implements a three-step interactive loop:
Step 1: Pre-Action Clarification
When a task is ambiguous AND the agent's memory lacks relevant preferences, the agent proactively asks a clarifying question before acting. The key insight: once memory contains the preference, the agent should act directly without asking. This prevents the "annoying assistant" pattern of asking every time.
In the reference implementation, this is controlled by prompt engineering — the agent's ReAct prompt includes "Ask human" as a valid action with explicit rules:
But also: "If memory already provides context, act directly without asking."
When the agent asks:
Step 2: Preference-Grounded Action
Before executing any task, the agent retrieves relevant memories (top-k by embedding similarity), summarizes them into a concise context, and includes this as grounding context. The action is then generated with awareness of stored preferences.
Step 3: Post-Action Feedback Integration (Detect-Summarize-Integrate)
After completing a task, if the user provides corrective feedback, the agent processes it through a three-stage pipeline:
This is the critical mechanism for preference drift — the paper proves (Proposition 1) that without post-action feedback, agents accumulate Ω(T) mistakes when preferences change, because they remain "confidently wrong" with stale memories.
Key Design Decisions in PAHF
Theoretical Backing
Current State in Hermes Agent
What We Have
clarifytoolmemorytoolsession_searchhonchointegrationRelevant Existing Issues
Relevant Files
agent/prompt_builder.py— System prompt assembly; MEMORY_GUIDANCE and SESSION_SEARCH_GUIDANCE would gain PAHF-specific behavioral instructionstools/memory_tool.py— MemoryStore class; would gain preference-type support and merge/update logictools/clarify_tool.py— Clarify tool; no code changes needed, but system prompt guidance changes how it's usedagent/context_compressor.py— Context compression; could extract preferences during summarization (also proposed in Feature: Structured Memory System — Typed Nodes, Graph Edges, and Hybrid Search #346)tools/honcho_tools.py— Honcho user context queries; complementary to PAHF's built-in memoryImplementation Plan
Skill vs. Tool Classification
This should be a core codebase enhancement (not a skill or tool) because:
prompt_builder.py) to instruct the agent on the PAHF patternmemorytool with preference-aware merge logicWhat We'd Need
Phased Rollout
Phase 1: Behavioral Guidance (System Prompt Enhancement)
No code changes to tools — purely prompt engineering.
prompt_builder.py:PERSONALIZATION_GUIDANCEconstant alongside existingMEMORY_GUIDANCEPhase 2: Preference Memory Infrastructure (requires #346 Phase 1)
Preferencetype with importance 0.7mergelogic to the memory tool: when adding a preference that's similar to an existing one, offer to merge instead of duplicatePreferencefirst when grounding actionsPhase 3: Automated Detect-Summarize-Integrate Pipeline
Pros & Cons
Pros
clarifyandmemoryCons / Risks
Open Questions
References
tools/memory_tool.py— Current memory implementationtools/clarify_tool.py— Current clarification toolagent/prompt_builder.py— System prompt assembly