You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A reference AI-inbox product cross-references the same entity wherever it appears —
one person or one vendor shows up in a to-do, in a topic cluster, and in a related
thread, and the digest treats them as the same thing. It also sometimes repeats the
same entity in two places because its linking is imperfect, which is the failure mode
to avoid. SkyTwin deduplicates signals only by message identity; it has no notion that
two different signals refer to the same person, organization, or thing. Without entity
linking, the briefing can list the same underlying matter three times under three
clusters, and the twin can't reason about "everything touching entity X."
Current State
Verified 2026-06-06.
apps/worker/src/signal-dedupe.ts:40-54 — SignalDeduper keys on ${signal.source}:${signal.id} with a TTL (DEFAULT_TTL_MS = 24h, signal-dedupe.ts:37) and per-user capacity (DEFAULT_MAX_PER_USER = 5000, signal-dedupe.ts:38). This prevents re-ingesting the same message; it does
nothing about two distinct messages referencing the same entity.
No entity extraction (people / orgs / things) exists in the signal pipeline.
Substrate that could hold entity links already exists — and more of it than first
assumed (confirmed during review):
@skytwin/memory-port already defines MemoryPort.recordEntity(KnowledgeEntity)
(packages/memory-port/src/port.ts:59) and a KnowledgeEntity interface
(id, userId, name, entityType, attributes, firstSeenAt, lastSeenAt). So entity
WRITE is already a contract method — this spec REUSES it, it does not invent it.
@skytwin/memory-gbrain (default) — vector + tsvector RRF over brain_* tables.
@skytwin/memory-mempalace — knowledge graph with temporal triples + episodic.
Missing: nothing extracts entities from signals to feed recordEntity, AND
there is no READ-by-entity method (getSignalsForEntity) on MemoryPort yet.
Proposed Change
Add an entity extraction + resolution step that pulls named entities (person, org,
thing/topic) from signals, resolves each to a stable entityId (linking mentions
across signals), and writes them to the memory backend's graph so retrieval and the
briefing can group by entity and avoid repeating one matter across clusters.
This is the heaviest spec — it introduces an entity store and a resolution problem
(when are two mentions the same entity?). Recommend landing it last.
Implementation Details
New modulepackages/decision-engine/src/entity-extractor.ts:
exporttypeEntityKind='person'|'org'|'thing';exportinterfaceExtractedEntity{kind: EntityKind;surface: string;// text as it appeared ("the vendor", "Acme")normalized: string;// canonical key for matching (lowercased, stripped)signalRef: string;confidence: number;}exportfunctionextractEntities(signal: {ref: string;subject: string;body: string;senderAddress?: string;}): ExtractedEntity[];
Resolution — packages/decision-engine/src/entity-resolver.ts maps an ExtractedEntity to a stable entityId:
People: prefer email address as the strong key (exact, no fuzzy needed);
fall back to normalized display name only within a thread.
Orgs/things: normalized-string exact match first; fuzzy match (token
overlap above a threshold) gated behind a floorRatio-style confidence bar so
weak matches don't merge unrelated entities. Conservative: when unsure, mint a
NEW entityId rather than wrongly merge (a false merge is worse than a false
split — it corrupts the graph).
Storage — reuse the existing MemoryPort.recordEntity(KnowledgeEntity)
(packages/memory-port/src/port.ts:59) to persist resolved entities; carry entityId, kind→entityType, surface/signalRef/provenance in KnowledgeEntity.attributes (or extend the interface if a first-class field reads
cleaner — decide in the spike below). Works against gbrain + mempalace via the port;
do NOT bind to one backend.
Briefing dedup — when spec 04 produces clusters, collapse signals that share a
primary entityId into one cluster line with multiple citations, instead of
repeating the matter across clusters. This is the concrete win: the reference
product's "same thing listed twice" bug does not happen.
Query surface — getSignalsForEntity(entityId) does NOT exist on MemoryPort
today; this spec ADDS it to the contract (packages/memory-port/src/port.ts) and
implements it in both the gbrain and mempalace adapters. Read-only; no auto-actions. Pre-work spike (1-2h): confirm whether getSignalsForEntity belongs on MemoryPort vs. a separate entity-query service, and whether KnowledgeEntity
needs a provenance field vs. stashing it in attributes. Lock both before coding.
Provenance preserved — entities extracted from untrusted_external signals
are tagged as such; the graph records origin so downstream consumers never treat
an inbound-asserted entity claim as trusted (safety invariant Live notification layer: SSE, approval expiry cron, push alerts #8).
Acceptance Criteria
Two distinct signals mentioning the same person (same email address) resolve to
one entityId.
Two signals mentioning different people with similar display names but different
addresses resolve to two distinct entityIds (no false merge).
An org name below the fuzzy-match confidence bar mints a new entityId rather
than merging into a near-match.
getSignalsForEntity(entityId) returns all and only the signals linked to that
entity.
In a briefing window where one matter spans 3 signals across 2 clusters, the
matter renders once with 3 citations (no cross-cluster repetition).
Entity records carry the originating signal's provenance.
Works against both gbrain and mempalace via MemoryPort (no backend-specific
code in the extractor/resolver).
Tests written and passing. No degradation of existing functionality.
Testing Plan
Layer
What
Count
Unit
Extraction: person/org/thing from synthetic bodies
Write to MemoryPort → getSignalsForEntity round-trip (gbrain)
+2
Integration
Same against mempalace backend (port parity)
+2
Integration
Briefing collapses cross-cluster repeated matter to one line
+2
Rollback Plan
Flagged (ENTITY_LINKING=off). With it off, no entities are written and the briefing
keeps spec 04's behavior (possible cross-cluster repetition, i.e. parity with the
reference product). Entity rows are additive in the memory backend; orphaned rows are
harmless and can be left or swept. Resolution false-merges are the main risk — the
conservative "mint-on-doubt" policy bounds blast radius; a bad merge affects only the
two entities involved and is reversible by re-running extraction after tuning the bar.
Effort Estimate
Entity extractor: ~4h
Resolver (keys + fuzzy bar + conservative policy): ~6h
MemoryPort write/read surface: ~4h
Briefing collapse integration: ~3h
Tests (incl. dual-backend): ~6h
Total: ~3 days. Largest spec in the set; sequence last.
Files Reference
File
Change
packages/decision-engine/src/entity-extractor.ts
New: entity extraction
packages/decision-engine/src/entity-resolver.ts
New: mention → stable entityId
packages/memory-port/*
Add entity write/read to the MemoryPort contract
apps/worker/src/signal-dedupe.ts
Reference (this is message-dedup; entity-link is separate)
briefing generator + spec 04 clusterer
Collapse by primary entityId
Out of Scope
A full relationship graph between entities ("X works at Y"). Mention-linking only.
Cross-user entity sharing (entities are per-user).
Coreference resolution beyond thread scope for pronouns/aliases.
Related
Builds on the memory backends (@skytwin/memory-gbrain, @skytwin/mempalace).
Dedups across spec 04 clusters; sequenced after 01-04.
Entity cross-linking across signals
Context
A reference AI-inbox product cross-references the same entity wherever it appears —
one person or one vendor shows up in a to-do, in a topic cluster, and in a related
thread, and the digest treats them as the same thing. It also sometimes repeats the
same entity in two places because its linking is imperfect, which is the failure mode
to avoid. SkyTwin deduplicates signals only by message identity; it has no notion that
two different signals refer to the same person, organization, or thing. Without entity
linking, the briefing can list the same underlying matter three times under three
clusters, and the twin can't reason about "everything touching entity X."
Current State
Verified 2026-06-06.
apps/worker/src/signal-dedupe.ts:40-54—SignalDeduperkeys on${signal.source}:${signal.id}with a TTL (DEFAULT_TTL_MS = 24h,signal-dedupe.ts:37) and per-user capacity (DEFAULT_MAX_PER_USER = 5000,signal-dedupe.ts:38). This prevents re-ingesting the same message; it doesnothing about two distinct messages referencing the same entity.
assumed (confirmed during review):
@skytwin/memory-portalready definesMemoryPort.recordEntity(KnowledgeEntity)(
packages/memory-port/src/port.ts:59) and aKnowledgeEntityinterface(
id, userId, name, entityType, attributes, firstSeenAt, lastSeenAt). So entityWRITE is already a contract method — this spec REUSES it, it does not invent it.
@skytwin/memory-gbrain(default) — vector + tsvector RRF overbrain_*tables.@skytwin/memory-mempalace— knowledge graph with temporal triples + episodic.recordEntity, ANDthere is no READ-by-entity method (
getSignalsForEntity) onMemoryPortyet.Proposed Change
Add an entity extraction + resolution step that pulls named entities (person, org,
thing/topic) from signals, resolves each to a stable
entityId(linking mentionsacross signals), and writes them to the memory backend's graph so retrieval and the
briefing can group by entity and avoid repeating one matter across clusters.
This is the heaviest spec — it introduces an entity store and a resolution problem
(when are two mentions the same entity?). Recommend landing it last.
Implementation Details
packages/decision-engine/src/entity-extractor.ts:packages/decision-engine/src/entity-resolver.tsmaps anExtractedEntityto a stableentityId:fall back to normalized display name only within a thread.
overlap above a threshold) gated behind a
floorRatio-style confidence bar soweak matches don't merge unrelated entities. Conservative: when unsure, mint a
NEW entityId rather than wrongly merge (a false merge is worse than a false
split — it corrupts the graph).
MemoryPort.recordEntity(KnowledgeEntity)(
packages/memory-port/src/port.ts:59) to persist resolved entities; carryentityId,kind→entityType,surface/signalRef/provenance inKnowledgeEntity.attributes(or extend the interface if a first-class field readscleaner — decide in the spike below). Works against gbrain + mempalace via the port;
do NOT bind to one backend.
primary
entityIdinto one cluster line with multiple citations, instead ofrepeating the matter across clusters. This is the concrete win: the reference
product's "same thing listed twice" bug does not happen.
getSignalsForEntity(entityId)does NOT exist onMemoryPorttoday; this spec ADDS it to the contract (
packages/memory-port/src/port.ts) andimplements it in both the gbrain and mempalace adapters. Read-only; no auto-actions.
Pre-work spike (1-2h): confirm whether
getSignalsForEntitybelongs onMemoryPortvs. a separate entity-query service, and whetherKnowledgeEntityneeds a provenance field vs. stashing it in
attributes. Lock both before coding.untrusted_externalsignalsare tagged as such; the graph records origin so downstream consumers never treat
an inbound-asserted entity claim as trusted (safety invariant Live notification layer: SSE, approval expiry cron, push alerts #8).
Acceptance Criteria
one
entityId.addresses resolve to two distinct
entityIds (no false merge).entityIdratherthan merging into a near-match.
getSignalsForEntity(entityId)returns all and only the signals linked to thatentity.
matter renders once with 3 citations (no cross-cluster repetition).
MemoryPort(no backend-specificcode in the extractor/resolver).
Testing Plan
getSignalsForEntityround-trip (gbrain)Rollback Plan
Flagged (
ENTITY_LINKING=off). With it off, no entities are written and the briefingkeeps spec 04's behavior (possible cross-cluster repetition, i.e. parity with the
reference product). Entity rows are additive in the memory backend; orphaned rows are
harmless and can be left or swept. Resolution false-merges are the main risk — the
conservative "mint-on-doubt" policy bounds blast radius; a bad merge affects only the
two entities involved and is reversible by re-running extraction after tuning the bar.
Effort Estimate
Total: ~3 days. Largest spec in the set; sequence last.
Files Reference
packages/decision-engine/src/entity-extractor.tspackages/decision-engine/src/entity-resolver.tspackages/memory-port/*MemoryPortcontractapps/worker/src/signal-dedupe.tsOut of Scope
Related
@skytwin/memory-gbrain,@skytwin/mempalace).