Feature: PAHF Personalization Loop — Pre-Action Clarification, Preference Grounding & Post-Action Feedback Integration

## Overview

Meta Superintelligence Labs (with Princeton and Duke) recently published [PAHF: Personalized Agents from Human Feedback](https://arxiv.org/abs/2602.16173) (Feb 2026), introducing a three-step continual personalization loop that enables agents to learn user preferences from scratch, ground actions in stored preferences, and adapt when preferences change over time. The [reference implementation](https://github.com/facebookresearch/PAHF) is MIT-licensed.

PAHF addresses three core failure modes: **cold start** (no prior knowledge of user), **static errors** (not learning from corrections), and **preference drift** (not adapting when preferences change). Their empirical results show that combining pre-action clarification with post-action feedback achieves 68-70% success rates vs 27-45% for no-memory baselines, and recovers from preference shifts significantly faster than single-channel approaches.

This feature proposes integrating PAHF's personalization loop into Hermes Agent's behavioral layer — not as a new tool, but as an enhancement to how the agent uses its existing `memory`, `clarify`, and `session_search` tools to systematically learn and adapt to each user. This complements #346 (Structured Memory System) which provides the storage infrastructure; this issue focuses on the **behavioral strategy** that uses that infrastructure for personalization.

---

## Research Findings

### How PAHF Works

PAHF implements a three-step interactive loop:

**Step 1: Pre-Action Clarification**
When a task is ambiguous AND the agent's memory lacks relevant preferences, the agent proactively asks a clarifying question before acting. The key insight: once memory contains the preference, the agent should act directly without asking. This prevents the "annoying assistant" pattern of asking every time.

In the reference implementation, this is controlled by prompt engineering — the agent's ReAct prompt includes "Ask human" as a valid action with explicit rules:
```
You MUST ask for clarification when:
1. The task contains ambiguous references
2. The task involves subjective preferences (e.g., "favorite", "preferred")
3. You have NO memory context relevant to this preference
```
But also: "If memory already provides context, act directly without asking."

When the agent asks:
1. Generate a clarification question (using a question-generation prompt)
2. Receive the user's answer
3. **Summarize the Q&A into a preference statement** and store in memory
4. Re-attempt the task with the new preference context

**Step 2: Preference-Grounded Action**
Before executing any task, the agent retrieves relevant memories (top-k by embedding similarity), summarizes them into a concise context, and includes this as grounding context. The action is then generated with awareness of stored preferences.

**Step 3: Post-Action Feedback Integration (Detect-Summarize-Integrate)**
After completing a task, if the user provides corrective feedback, the agent processes it through a three-stage pipeline:

1. **Detect**: LLM judges whether feedback contains actionable preference information (filters out "thanks" / "ok")
2. **Summarize**: Extract the preference into a brief, reusable statement (e.g., "User prefers herbal tea when tired")
3. **Integrate**: 
   - Check if this **updates** an existing preference (detect phrases like "actually, I changed my mind", "I no longer like...")
   - If update: find the most similar existing memory → merge old + new into a coherent statement (priority to newer preference)
   - If new: add as a fresh memory entry

This is the critical mechanism for **preference drift** — the paper proves (Proposition 1) that without post-action feedback, agents accumulate Ω(T) mistakes when preferences change, because they remain "confidently wrong" with stale memories.

### Key Design Decisions in PAHF

1. **LLM-as-judge for feedback detection** — No classifier; the LLM decides if feedback is salient. Lightweight but flexible.
2. **Summarize before storing** — Raw feedback is noisy. Summarizing into a preference statement produces cleaner, more retrievable memories.
3. **Merge over replace** — When preferences change, the integration prompt merges old and new information rather than deleting old entries, preserving context (e.g., "Used to prefer coffee, now prefers herbal tea when tired").
4. **Memory gates clarification** — The agent only asks questions when memory is empty/insufficient. This naturally reduces question frequency over time as the agent learns.
5. **Embedding-based memory retrieval** — Uses DRAGON+ dual-encoder for semantic search (query vs. context encoders). Critical for finding relevant preferences even when the task phrasing differs from the stored preference.
6. **Per-user memory isolation** — Each user gets their own memory bank, keyed by person_id.

### Theoretical Backing

- **Proposition 1**: Post-action feedback is necessary under preference drift (without it, Ω(T) cumulative mistakes)
- **Proposition 2**: Pre-action clarification is necessary under partial observability (reduces error from constant ε₀ to O(m⁻ᵏ))
- **Theorem 1 (Complementarity)**: PAHF combining both channels achieves dynamic regret O(K + γ) where K = preference switches, γ = ambiguity rate

---

## Current State in Hermes Agent

### What We Have

| Component | Status | Gap |
|:---|:---|:---|
| **`clarify` tool** | ✅ Exists | Not used systematically for preference learning; no guidance to summarize answers into preferences |
| **`memory` tool** | ✅ Exists | Flat text, no preference typing, no merge/update logic, no salience detection |
| **`session_search`** | ✅ Exists | Keyword-only (FTS5), no semantic/embedding search |
| **`honcho` integration** | ✅ Optional | External service for user context queries — complementary but requires Honcho setup |
| **System prompt guidance** | ✅ Exists | Tells agent to save preferences to memory, but no structured loop |
| **Post-action feedback** | ❌ Missing | No systematic mechanism to learn from corrections/mistakes |
| **Preference drift detection** | ❌ Missing | No mechanism to detect and handle changing preferences |
| **Detect-Summarize-Integrate pipeline** | ❌ Missing | Memory updates are ad-hoc; no salience filtering or preference merging |

### Relevant Existing Issues

- **#346 — Structured Memory System** (typed nodes, graph edges, hybrid search): Provides the storage infrastructure this feature builds on. PAHF's preference grounding requires semantic search and typed memories — Phase 1 of #346 (typed memories with importance) is a prerequisite for the full PAHF integration. However, Phase 1 of THIS issue (behavioral guidance) can work with the current flat memory system.

### Relevant Files

- `agent/prompt_builder.py` — System prompt assembly; MEMORY_GUIDANCE and SESSION_SEARCH_GUIDANCE would gain PAHF-specific behavioral instructions
- `tools/memory_tool.py` — MemoryStore class; would gain preference-type support and merge/update logic
- `tools/clarify_tool.py` — Clarify tool; no code changes needed, but system prompt guidance changes how it's used
- `agent/context_compressor.py` — Context compression; could extract preferences during summarization (also proposed in #346)
- `tools/honcho_tools.py` — Honcho user context queries; complementary to PAHF's built-in memory

---

## Implementation Plan

### Skill vs. Tool Classification

This should be a **core codebase enhancement** (not a skill or tool) because:
- It changes the agent's fundamental behavioral loop, not just adding a capability
- It modifies system prompt guidance (`prompt_builder.py`) to instruct the agent on the PAHF pattern
- It extends the existing `memory` tool with preference-aware merge logic
- It integrates with context compression for automated preference extraction
- Skills cannot modify agent behavior or tool implementations

### What We'd Need

1. **PAHF behavioral guidance in system prompt** — New prompt sections instructing the agent on the three-step loop
2. **Memory tool upgrade** — Preference-type entries with merge/update logic (builds on #346)
3. **Post-action feedback pattern** — Guidance for the agent to proactively ask "did that work?" after preference-sensitive tasks
4. **Detect-Summarize-Integrate logic** — Either in the memory tool or as a prompt pattern the agent follows
5. **Preference retrieval before action** — Guidance to check memory/session_search for relevant preferences before making choices

### Phased Rollout

**Phase 1: Behavioral Guidance (System Prompt Enhancement)**
No code changes to tools — purely prompt engineering.
- Add PAHF-aware guidance to `prompt_builder.py`:
  - "Before tasks involving subjective choices, check your memory for relevant user preferences"
  - "When a task is ambiguous and you lack preference context, use clarify to ask BEFORE acting"
  - "After completing a preference-sensitive task, if the user corrects you, summarize the correction as a preference and save to memory (user target)"
  - "When saving preferences, check if an existing preference entry covers the same topic — if so, replace it with a merged version that preserves context"
  - "As you learn more preferences, reduce how often you ask — act on stored knowledge"
- Add a `PERSONALIZATION_GUIDANCE` constant alongside existing `MEMORY_GUIDANCE`
- **Deliverable**: Agent follows the PAHF loop via instructions, using existing tools
- **Effort**: Small (~50 lines in prompt_builder.py)

**Phase 2: Preference Memory Infrastructure (requires #346 Phase 1)**
- Leverage #346's typed memory to add a `Preference` type with importance 0.7
- Add `merge` logic to the memory tool: when adding a preference that's similar to an existing one, offer to merge instead of duplicate
- Implement a lightweight salience detector: before saving to memory, check if the information is actually a preference vs. transient observation
- Add preference retrieval helper: search memories of type `Preference` first when grounding actions
- **Deliverable**: Structured preference storage with intelligent merge/deduplication
- **Effort**: Medium (~200-300 lines across memory_tool.py and new helpers)

**Phase 3: Automated Detect-Summarize-Integrate Pipeline**
- Add post-action preference extraction to context compression: when summarizing conversation turns, detect preference-revealing exchanges and auto-save them as Preference memories (mirrors #346's compaction-extraction idea)
- Implement preference drift detection: when the agent retrieves a preference but the user's correction contradicts it, flag the old preference for update
- Add periodic preference consolidation: during memory maintenance, merge duplicate/overlapping preferences
- Optional: embedding-based preference retrieval (builds on #346 Phase 3's vector search)
- **Deliverable**: Self-maintaining preference system that learns passively from conversation flow
- **Effort**: Large (~500+ lines, depends on #346 progress)

---

## Pros & Cons

### Pros
- **Solves the cold-start problem** — Agent can learn any user's preferences from scratch through natural interaction, not just manual memory entries
- **Handles preference drift** — Agent adapts when users change their minds, avoiding the "confidently wrong" failure mode
- **Leverages existing tools** — Phase 1 requires zero new tools; just better instructions for using `clarify` and `memory`
- **Theoretically grounded** — PAHF provides formal guarantees (Propositions 1-2, Theorem 1) that the dual-channel approach is optimal
- **Progressive enhancement** — Each phase adds value independently; Phase 1 alone improves personalization
- **MIT-licensed reference code** — Can study and adapt patterns from github.com/facebookresearch/PAHF
- **Complements Honcho** — Users with Honcho get external dialectic reasoning; users without it get built-in preference learning
- **Multi-platform compatible** — Works across CLI, Telegram, Discord since it uses existing tools

### Cons / Risks
- **Over-asking risk** — If Phase 1 prompt guidance is poorly calibrated, the agent might ask too many questions. Need careful prompt engineering to gate clarification on memory state.
- **Memory pollution** — Without Phase 2's salience detection, the agent might save non-preference information as preferences, filling memory with noise.
- **Increased system prompt size** — Adding PAHF guidance increases the system prompt, consuming context window. Must keep instructions concise.
- **Dependency on #346** — Phases 2-3 require structured memory infrastructure. Can't fully deliver without typed memories and semantic search.
- **Evaluation difficulty** — Hard to objectively measure personalization quality in production (PAHF's benchmarks are synthetic). May need to rely on user feedback.
- **LLM reasoning overhead** — The Detect-Summarize-Integrate pipeline adds extra LLM calls for processing feedback. Phase 3 must be careful about latency/cost.

---

## Open Questions

- Should Phase 1's behavioral guidance be opt-in (config flag) or default-on? Default-on risks over-asking for users who don't want heavy personalization.
- How should PAHF interact with Honcho? If both are enabled, should PAHF's built-in preference memory defer to Honcho's dialectic reasoning, or complement it?
- What's the right frequency for post-action feedback solicitation? PAHF's paper uses simulated users who always provide feedback — real users may find "did that work?" annoying if asked too often.
- Should preferences be scoped to domains (coding style, communication, tool choices) or flat? Domain scoping improves retrieval but adds complexity.
- For Phase 2's merge logic, should the agent autonomously merge similar preferences, or ask the user to confirm before merging?

---

## References

- [PAHF Paper — arXiv:2602.16173](https://arxiv.org/abs/2602.16173) (Liang, Kruk, Qian et al., Feb 2026)
- [PAHF Reference Implementation](https://github.com/facebookresearch/PAHF) (MIT license, Meta / facebookresearch)
- [PAHF Project Page](https://personalized-ai.github.io/)
- Hermes Agent #346 — Structured Memory System (prerequisite for Phases 2-3)
- Hermes Agent `tools/memory_tool.py` — Current memory implementation
- Hermes Agent `tools/clarify_tool.py` — Current clarification tool
- Hermes Agent `agent/prompt_builder.py` — System prompt assembly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: PAHF Personalization Loop — Pre-Action Clarification, Preference Grounding & Post-Action Feedback Integration #362

Overview

Research Findings

How PAHF Works

Key Design Decisions in PAHF

Theoretical Backing

Current State in Hermes Agent

What We Have

Relevant Existing Issues

Relevant Files

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Component	Status	Gap
`clarify` tool	✅ Exists	Not used systematically for preference learning; no guidance to summarize answers into preferences
`memory` tool	✅ Exists	Flat text, no preference typing, no merge/update logic, no salience detection
`session_search`	✅ Exists	Keyword-only (FTS5), no semantic/embedding search
`honcho` integration	✅ Optional	External service for user context queries — complementary but requires Honcho setup
System prompt guidance	✅ Exists	Tells agent to save preferences to memory, but no structured loop
Post-action feedback	❌ Missing	No systematic mechanism to learn from corrections/mistakes
Preference drift detection	❌ Missing	No mechanism to detect and handle changing preferences
Detect-Summarize-Integrate pipeline	❌ Missing	Memory updates are ad-hoc; no salience filtering or preference merging

Feature: PAHF Personalization Loop — Pre-Action Clarification, Preference Grounding & Post-Action Feedback Integration #362

Description

Overview

Research Findings

How PAHF Works

Key Design Decisions in PAHF

Theoretical Backing

Current State in Hermes Agent

What We Have

Relevant Existing Issues

Relevant Files

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions