Signal topic clustering for the digest
Context
A reference AI-inbox product groups awareness items into a handful of named topic
clusters (one per life-area, each holding several related messages) so the user
reads by theme instead of one flat list. SkyTwin has the raw materials — a
life-domain extractor and a Lifebook that tags signals to domains — but it has no
step that clusters the current window of signals into presentable topic groups for
the briefing. Today the "Signals" section is a flat importance-ranked list. Cluster
quality is also where SkyTwin can beat the reference product: the reference digest
visibly mis-files items (it put a developer-infra shutdown under personal finance),
and SkyTwin's twin model + domain extractor can do better.
Current State
Verified 2026-06-06.
packages/policy-prompts/prompts/domain-extraction/v1.md — extracts 3-10 stable
life domains for a user (e.g. "Software Development", "Personal Finance"). This is
a profile-level operation (the user's domains), not a per-window clustering of
current signals.
packages/db/src/repositories/lifebook-repository.ts — signals are tagged to
domains (manually or by inference) and stored. Tagging ≠ clustering: there is no
step that takes "the last N signals" and produces topic groups for display.
packages/twin-model/src/analyzers/cross-domain-analyzer.ts — detects behavioral
traits (e.g. cautious_spender, quick_responder), not topic groups.
packages/policy-prompts/prompts/briefing-prose/v1.md:49 — briefing sections are
fixed (Meetings / Tasks / Signals); Signals is a flat list with no sub-grouping.
Proposed Change
Add a clustering step that takes the briefing's input window of awareness signals and
groups them into named topic clusters, each scoped to a life domain, for the Topics
section produced by spec 01. Two-tier strategy (LLM + deterministic fallback), aligned
with the existing domain-extraction approach.
A cluster is a named group of related signals sharing a topic within a domain.
Synthetic example output:
[
{ "domain": "finance", "title": "Subscriptions & billing", "signalRefs": ["s1","s4","s9"] },
{ "domain": "work", "title": "Vendor onboarding", "signalRefs": ["s2","s7"] }
]
Implementation Details
- New module
packages/decision-engine/src/topic-clusterer.ts (or a twin-model
analyzer — place beside cross-domain-analyzer.ts if it needs profile context):
export interface ClusterInput {
signals: Array<{ ref: string; domain: string | null; subject: string; summary: string }>;
knownDomains: string[]; // from domain-extraction, to anchor cluster domains
maxClusters: number; // default 8 (matches the reference product's shape)
}
export interface TopicCluster {
domain: string;
title: string; // short human label
signalRefs: string[];
confidence: number;
}
export function clusterSignals(input: ClusterInput): TopicCluster[];
- Anchor to known domains — clusters must map to a domain from
domain-extraction output when one fits; only mint an "Other / Misc" bucket for
genuine no-fit signals. This is the precision lever that beats the reference
product's mis-filing: a signal's domain is decided against the user's actual
domain set, not a generic taxonomy.
- Strategy:
- LLM: versioned
policy-prompts/topic-clustering/v1.md, JSON-schema output
(array of TopicCluster), input is the signal window + known domains.
- Deterministic fallback: group by the
domain already tagged on each signal
(from Lifebook / situation-interpreter extractDomain,
situation-interpreter.ts:306-334); title = domain name. Lower quality, zero
cost, always available.
- Bounded output — at most
maxClusters; overflow merges into the
lowest-confidence cluster or "More updates" (the reference product caps similarly).
Log when merging happens (no silent truncation).
- Wiring — runs before the briefing prompt; its
TopicCluster[] populates spec
01's topics array. Each cluster's signalRefs preserve citations.
- Stability — same signals on a re-render should produce the same clusters
(titles may vary with LLM temperature; pin clustering temperature low and key
dedup on domain+signalRefs, not on title text).
Acceptance Criteria
- Given 10 awareness signals spanning 3 known domains,
clusterSignals returns ≤8
clusters, each mapped to a known domain or an explicit "Other" bucket.
- No signal appears in two clusters; every input signal appears in exactly one.
- A signal whose tagged domain matches a known domain is never placed in "Other".
- Output never exceeds
maxClusters; when input would exceed it, a merge occurs and
is logged.
- With no LLM, the deterministic fallback groups by tagged domain and still returns
valid clusters.
- Cluster
signalRefs round-trip to the original signals (citations preserved).
- Tests written and passing. No degradation of existing functionality.
Testing Plan
| Layer |
What |
Count |
| Unit |
Partition completeness + no-overlap invariants |
+3 |
| Unit |
Known-domain anchoring; "Other" only for no-fit |
+3 |
| Unit |
maxClusters cap + merge logging |
+2 |
| Unit |
Deterministic fallback grouping by tagged domain |
+2 |
| Integration |
clusterer → spec 01 topics payload with citations intact |
+2 |
Rollback Plan
Additive and flagged (TOPIC_CLUSTERING=off). With it off, spec 01's Topics section
falls back to a flat domain-tagged list (the deterministic path), which is the
current behavior. No schema or data changes to reverse.
Effort Estimate
- Clusterer module + types: ~3h
- LLM prompt + schema: ~3h
- Deterministic fallback + cap/merge: ~2h
- Wiring into briefing: ~2h
- Tests: ~3h
Total: ~1.5-2 days.
Files Reference
| File |
Change |
packages/decision-engine/src/topic-clusterer.ts |
New: clustering + types |
packages/policy-prompts/prompts/topic-clustering/v1.md |
New: LLM template + schema |
packages/policy-prompts/prompts/domain-extraction/v1.md |
Reference (supplies known domains) |
packages/decision-engine/src/situation-interpreter.ts:306-334 |
Reference (per-signal domain tag) |
| briefing generator |
Feed TopicCluster[] into spec 01 topics |
Out of Scope
- Persisting clusters as durable entities (clusters are per-briefing presentation).
- Cross-window topic continuity ("this topic continued from yesterday").
- Entity-level linking inside a cluster — spec 05.
Related
- Consumes domains from
domain-extraction.
- Produces the
topics array consumed by spec 01.
- Spec 05 dedups entities that recur across clusters.
Signal topic clustering for the digest
Context
A reference AI-inbox product groups awareness items into a handful of named topic
clusters (one per life-area, each holding several related messages) so the user
reads by theme instead of one flat list. SkyTwin has the raw materials — a
life-domain extractor and a Lifebook that tags signals to domains — but it has no
step that clusters the current window of signals into presentable topic groups for
the briefing. Today the "Signals" section is a flat importance-ranked list. Cluster
quality is also where SkyTwin can beat the reference product: the reference digest
visibly mis-files items (it put a developer-infra shutdown under personal finance),
and SkyTwin's twin model + domain extractor can do better.
Current State
Verified 2026-06-06.
packages/policy-prompts/prompts/domain-extraction/v1.md— extracts 3-10 stablelife domains for a user (e.g. "Software Development", "Personal Finance"). This is
a profile-level operation (the user's domains), not a per-window clustering of
current signals.
packages/db/src/repositories/lifebook-repository.ts— signals are tagged todomains (manually or by inference) and stored. Tagging ≠ clustering: there is no
step that takes "the last N signals" and produces topic groups for display.
packages/twin-model/src/analyzers/cross-domain-analyzer.ts— detects behavioraltraits (e.g. cautious_spender, quick_responder), not topic groups.
packages/policy-prompts/prompts/briefing-prose/v1.md:49— briefing sections arefixed (Meetings / Tasks / Signals); Signals is a flat list with no sub-grouping.
Proposed Change
Add a clustering step that takes the briefing's input window of awareness signals and
groups them into named topic clusters, each scoped to a life domain, for the Topics
section produced by spec 01. Two-tier strategy (LLM + deterministic fallback), aligned
with the existing domain-extraction approach.
A cluster is a named group of related signals sharing a topic within a domain.
Synthetic example output:
Implementation Details
packages/decision-engine/src/topic-clusterer.ts(or a twin-modelanalyzer — place beside
cross-domain-analyzer.tsif it needs profile context):domain-extractionoutput when one fits; only mint an "Other / Misc" bucket forgenuine no-fit signals. This is the precision lever that beats the reference
product's mis-filing: a signal's domain is decided against the user's actual
domain set, not a generic taxonomy.
policy-prompts/topic-clustering/v1.md, JSON-schema output(array of
TopicCluster), input is the signal window + known domains.domainalready tagged on each signal(from Lifebook / situation-interpreter
extractDomain,situation-interpreter.ts:306-334); title = domain name. Lower quality, zerocost, always available.
maxClusters; overflow merges into thelowest-confidence cluster or "More updates" (the reference product caps similarly).
Log when merging happens (no silent truncation).
TopicCluster[]populates spec01's
topicsarray. Each cluster'ssignalRefspreserve citations.(titles may vary with LLM temperature; pin clustering temperature low and key
dedup on domain+signalRefs, not on title text).
Acceptance Criteria
clusterSignalsreturns ≤8clusters, each mapped to a known domain or an explicit "Other" bucket.
maxClusters; when input would exceed it, a merge occurs andis logged.
valid clusters.
signalRefsround-trip to the original signals (citations preserved).Testing Plan
topicspayload with citations intactRollback Plan
Additive and flagged (
TOPIC_CLUSTERING=off). With it off, spec 01's Topics sectionfalls back to a flat domain-tagged list (the deterministic path), which is the
current behavior. No schema or data changes to reverse.
Effort Estimate
Total: ~1.5-2 days.
Files Reference
packages/decision-engine/src/topic-clusterer.tspackages/policy-prompts/prompts/topic-clustering/v1.mdpackages/policy-prompts/prompts/domain-extraction/v1.mdpackages/decision-engine/src/situation-interpreter.ts:306-334TopicCluster[]into spec 01topicsOut of Scope
Related
domain-extraction.topicsarray consumed by spec 01.