Summary
Phase 2 of graph memory (#1222): LLM extraction pipeline for entities and relationships from conversation messages.
Depends on: #1224 (schema & types)
Tasks
1. Extraction Types (graph/extractor.rs)
ExtractionResult, ExtractedEntity, ExtractedEdge — all derive JsonSchema for structured LLM output. GraphExtractor struct holds &AnyProvider reference.
2. Extraction Prompt
System prompt for entity/relation extraction with rules:
- Extract named entities (people, tools, concepts, projects, languages, files, configs, organizations)
- Extract relationships as (source, target, relation_verb, fact_sentence)
- Use context window of last 4 messages for coreference resolution
- Normalize entity names (capitalize proper nouns, use canonical tool names)
- Output empty arrays for messages with no extractable content
- Maximum entities/edges per config limits
- Skip greetings, acknowledgments, short conversational messages
- Do not extract PII (emails, phone numbers, addresses)
- Always output in English regardless of conversation language
- Temporal hints: if message implies time ("last week", "since January"), include temporal_hint
3. Entity Resolver (graph/resolver.rs)
EntityResolver with methods:
resolve_entity(extracted: &ExtractedEntity, store: &GraphStore) — exact name+type match (MVP). Returns existing entity ID or creates new.
resolve_edge(extracted: &ExtractedEdge, source_id: i64, target_id: i64, store: &GraphStore) — check for semantically similar existing edges between same entity pair. If contradictory, invalidate old edge.
scrub_content integration: when redact_credentials is enabled, entity names pass through scrub before storage.
4. Extraction Pipeline
GraphExtractor::extract(&self, message: &str, context: &[Message]) -> Result<ExtractionResult>:
- Build prompt with context window
- Call
provider.chat_typed_erased::<ExtractionResult>()
- Validate: filter entities with unknown types (coerce to Concept), truncate to limits
- Return structured result
GraphExtractor::process(&self, result: ExtractionResult, store: &GraphStore, resolver: &EntityResolver, episode_id: MessageId):
- For each entity: resolve → upsert
- For each edge: resolve source+target entities → check duplicates → insert or invalidate+insert
Architecture Reference
See .local/plan/graph-memory-architecture.md Section 4 for prompt template, resolution algorithm, and PII scrubbing details.
Acceptance Criteria
Summary
Phase 2 of graph memory (#1222): LLM extraction pipeline for entities and relationships from conversation messages.
Depends on: #1224 (schema & types)
Tasks
1. Extraction Types (graph/extractor.rs)
ExtractionResult,ExtractedEntity,ExtractedEdge— all deriveJsonSchemafor structured LLM output.GraphExtractorstruct holds&AnyProviderreference.2. Extraction Prompt
System prompt for entity/relation extraction with rules:
3. Entity Resolver (graph/resolver.rs)
EntityResolverwith methods:resolve_entity(extracted: &ExtractedEntity, store: &GraphStore)— exact name+type match (MVP). Returns existing entity ID or creates new.resolve_edge(extracted: &ExtractedEdge, source_id: i64, target_id: i64, store: &GraphStore)— check for semantically similar existing edges between same entity pair. If contradictory, invalidate old edge.scrub_contentintegration: whenredact_credentialsis enabled, entity names pass through scrub before storage.4. Extraction Pipeline
GraphExtractor::extract(&self, message: &str, context: &[Message]) -> Result<ExtractionResult>:provider.chat_typed_erased::<ExtractionResult>()GraphExtractor::process(&self, result: ExtractionResult, store: &GraphStore, resolver: &EntityResolver, episode_id: MessageId):Architecture Reference
See
.local/plan/graph-memory-architecture.mdSection 4 for prompt template, resolution algorithm, and PII scrubbing details.Acceptance Criteria