Skip to content

Feature: PAHF Personalization Loop — Pre-Action Clarification, Preference Grounding & Post-Action Feedback Integration #362

@teknium1

Description

@teknium1

Overview

Meta Superintelligence Labs (with Princeton and Duke) recently published PAHF: Personalized Agents from Human Feedback (Feb 2026), introducing a three-step continual personalization loop that enables agents to learn user preferences from scratch, ground actions in stored preferences, and adapt when preferences change over time. The reference implementation is MIT-licensed.

PAHF addresses three core failure modes: cold start (no prior knowledge of user), static errors (not learning from corrections), and preference drift (not adapting when preferences change). Their empirical results show that combining pre-action clarification with post-action feedback achieves 68-70% success rates vs 27-45% for no-memory baselines, and recovers from preference shifts significantly faster than single-channel approaches.

This feature proposes integrating PAHF's personalization loop into Hermes Agent's behavioral layer — not as a new tool, but as an enhancement to how the agent uses its existing memory, clarify, and session_search tools to systematically learn and adapt to each user. This complements #346 (Structured Memory System) which provides the storage infrastructure; this issue focuses on the behavioral strategy that uses that infrastructure for personalization.


Research Findings

How PAHF Works

PAHF implements a three-step interactive loop:

Step 1: Pre-Action Clarification
When a task is ambiguous AND the agent's memory lacks relevant preferences, the agent proactively asks a clarifying question before acting. The key insight: once memory contains the preference, the agent should act directly without asking. This prevents the "annoying assistant" pattern of asking every time.

In the reference implementation, this is controlled by prompt engineering — the agent's ReAct prompt includes "Ask human" as a valid action with explicit rules:

You MUST ask for clarification when:
1. The task contains ambiguous references
2. The task involves subjective preferences (e.g., "favorite", "preferred")
3. You have NO memory context relevant to this preference

But also: "If memory already provides context, act directly without asking."

When the agent asks:

  1. Generate a clarification question (using a question-generation prompt)
  2. Receive the user's answer
  3. Summarize the Q&A into a preference statement and store in memory
  4. Re-attempt the task with the new preference context

Step 2: Preference-Grounded Action
Before executing any task, the agent retrieves relevant memories (top-k by embedding similarity), summarizes them into a concise context, and includes this as grounding context. The action is then generated with awareness of stored preferences.

Step 3: Post-Action Feedback Integration (Detect-Summarize-Integrate)
After completing a task, if the user provides corrective feedback, the agent processes it through a three-stage pipeline:

  1. Detect: LLM judges whether feedback contains actionable preference information (filters out "thanks" / "ok")
  2. Summarize: Extract the preference into a brief, reusable statement (e.g., "User prefers herbal tea when tired")
  3. Integrate:
    • Check if this updates an existing preference (detect phrases like "actually, I changed my mind", "I no longer like...")
    • If update: find the most similar existing memory → merge old + new into a coherent statement (priority to newer preference)
    • If new: add as a fresh memory entry

This is the critical mechanism for preference drift — the paper proves (Proposition 1) that without post-action feedback, agents accumulate Ω(T) mistakes when preferences change, because they remain "confidently wrong" with stale memories.

Key Design Decisions in PAHF

  1. LLM-as-judge for feedback detection — No classifier; the LLM decides if feedback is salient. Lightweight but flexible.
  2. Summarize before storing — Raw feedback is noisy. Summarizing into a preference statement produces cleaner, more retrievable memories.
  3. Merge over replace — When preferences change, the integration prompt merges old and new information rather than deleting old entries, preserving context (e.g., "Used to prefer coffee, now prefers herbal tea when tired").
  4. Memory gates clarification — The agent only asks questions when memory is empty/insufficient. This naturally reduces question frequency over time as the agent learns.
  5. Embedding-based memory retrieval — Uses DRAGON+ dual-encoder for semantic search (query vs. context encoders). Critical for finding relevant preferences even when the task phrasing differs from the stored preference.
  6. Per-user memory isolation — Each user gets their own memory bank, keyed by person_id.

Theoretical Backing

  • Proposition 1: Post-action feedback is necessary under preference drift (without it, Ω(T) cumulative mistakes)
  • Proposition 2: Pre-action clarification is necessary under partial observability (reduces error from constant ε₀ to O(m⁻ᵏ))
  • Theorem 1 (Complementarity): PAHF combining both channels achieves dynamic regret O(K + γ) where K = preference switches, γ = ambiguity rate

Current State in Hermes Agent

What We Have

Component Status Gap
clarify tool ✅ Exists Not used systematically for preference learning; no guidance to summarize answers into preferences
memory tool ✅ Exists Flat text, no preference typing, no merge/update logic, no salience detection
session_search ✅ Exists Keyword-only (FTS5), no semantic/embedding search
honcho integration ✅ Optional External service for user context queries — complementary but requires Honcho setup
System prompt guidance ✅ Exists Tells agent to save preferences to memory, but no structured loop
Post-action feedback ❌ Missing No systematic mechanism to learn from corrections/mistakes
Preference drift detection ❌ Missing No mechanism to detect and handle changing preferences
Detect-Summarize-Integrate pipeline ❌ Missing Memory updates are ad-hoc; no salience filtering or preference merging

Relevant Existing Issues

Relevant Files

  • agent/prompt_builder.py — System prompt assembly; MEMORY_GUIDANCE and SESSION_SEARCH_GUIDANCE would gain PAHF-specific behavioral instructions
  • tools/memory_tool.py — MemoryStore class; would gain preference-type support and merge/update logic
  • tools/clarify_tool.py — Clarify tool; no code changes needed, but system prompt guidance changes how it's used
  • agent/context_compressor.py — Context compression; could extract preferences during summarization (also proposed in Feature: Structured Memory System — Typed Nodes, Graph Edges, and Hybrid Search #346)
  • tools/honcho_tools.py — Honcho user context queries; complementary to PAHF's built-in memory

Implementation Plan

Skill vs. Tool Classification

This should be a core codebase enhancement (not a skill or tool) because:

  • It changes the agent's fundamental behavioral loop, not just adding a capability
  • It modifies system prompt guidance (prompt_builder.py) to instruct the agent on the PAHF pattern
  • It extends the existing memory tool with preference-aware merge logic
  • It integrates with context compression for automated preference extraction
  • Skills cannot modify agent behavior or tool implementations

What We'd Need

  1. PAHF behavioral guidance in system prompt — New prompt sections instructing the agent on the three-step loop
  2. Memory tool upgrade — Preference-type entries with merge/update logic (builds on Feature: Structured Memory System — Typed Nodes, Graph Edges, and Hybrid Search #346)
  3. Post-action feedback pattern — Guidance for the agent to proactively ask "did that work?" after preference-sensitive tasks
  4. Detect-Summarize-Integrate logic — Either in the memory tool or as a prompt pattern the agent follows
  5. Preference retrieval before action — Guidance to check memory/session_search for relevant preferences before making choices

Phased Rollout

Phase 1: Behavioral Guidance (System Prompt Enhancement)
No code changes to tools — purely prompt engineering.

  • Add PAHF-aware guidance to prompt_builder.py:
    • "Before tasks involving subjective choices, check your memory for relevant user preferences"
    • "When a task is ambiguous and you lack preference context, use clarify to ask BEFORE acting"
    • "After completing a preference-sensitive task, if the user corrects you, summarize the correction as a preference and save to memory (user target)"
    • "When saving preferences, check if an existing preference entry covers the same topic — if so, replace it with a merged version that preserves context"
    • "As you learn more preferences, reduce how often you ask — act on stored knowledge"
  • Add a PERSONALIZATION_GUIDANCE constant alongside existing MEMORY_GUIDANCE
  • Deliverable: Agent follows the PAHF loop via instructions, using existing tools
  • Effort: Small (~50 lines in prompt_builder.py)

Phase 2: Preference Memory Infrastructure (requires #346 Phase 1)

  • Leverage Feature: Structured Memory System — Typed Nodes, Graph Edges, and Hybrid Search #346's typed memory to add a Preference type with importance 0.7
  • Add merge logic to the memory tool: when adding a preference that's similar to an existing one, offer to merge instead of duplicate
  • Implement a lightweight salience detector: before saving to memory, check if the information is actually a preference vs. transient observation
  • Add preference retrieval helper: search memories of type Preference first when grounding actions
  • Deliverable: Structured preference storage with intelligent merge/deduplication
  • Effort: Medium (~200-300 lines across memory_tool.py and new helpers)

Phase 3: Automated Detect-Summarize-Integrate Pipeline


Pros & Cons

Pros

  • Solves the cold-start problem — Agent can learn any user's preferences from scratch through natural interaction, not just manual memory entries
  • Handles preference drift — Agent adapts when users change their minds, avoiding the "confidently wrong" failure mode
  • Leverages existing tools — Phase 1 requires zero new tools; just better instructions for using clarify and memory
  • Theoretically grounded — PAHF provides formal guarantees (Propositions 1-2, Theorem 1) that the dual-channel approach is optimal
  • Progressive enhancement — Each phase adds value independently; Phase 1 alone improves personalization
  • MIT-licensed reference code — Can study and adapt patterns from github.com/facebookresearch/PAHF
  • Complements Honcho — Users with Honcho get external dialectic reasoning; users without it get built-in preference learning
  • Multi-platform compatible — Works across CLI, Telegram, Discord since it uses existing tools

Cons / Risks

  • Over-asking risk — If Phase 1 prompt guidance is poorly calibrated, the agent might ask too many questions. Need careful prompt engineering to gate clarification on memory state.
  • Memory pollution — Without Phase 2's salience detection, the agent might save non-preference information as preferences, filling memory with noise.
  • Increased system prompt size — Adding PAHF guidance increases the system prompt, consuming context window. Must keep instructions concise.
  • Dependency on Feature: Structured Memory System — Typed Nodes, Graph Edges, and Hybrid Search #346 — Phases 2-3 require structured memory infrastructure. Can't fully deliver without typed memories and semantic search.
  • Evaluation difficulty — Hard to objectively measure personalization quality in production (PAHF's benchmarks are synthetic). May need to rely on user feedback.
  • LLM reasoning overhead — The Detect-Summarize-Integrate pipeline adds extra LLM calls for processing feedback. Phase 3 must be careful about latency/cost.

Open Questions

  • Should Phase 1's behavioral guidance be opt-in (config flag) or default-on? Default-on risks over-asking for users who don't want heavy personalization.
  • How should PAHF interact with Honcho? If both are enabled, should PAHF's built-in preference memory defer to Honcho's dialectic reasoning, or complement it?
  • What's the right frequency for post-action feedback solicitation? PAHF's paper uses simulated users who always provide feedback — real users may find "did that work?" annoying if asked too often.
  • Should preferences be scoped to domains (coding style, communication, tool choices) or flat? Domain scoping improves retrieval but adds complexity.
  • For Phase 2's merge logic, should the agent autonomously merge similar preferences, or ask the user to confirm before merging?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions