feat(privacy): redact PII from LLM context when privacy.redactPII is enabled by kraocode · Pull Request #47959 · openclaw/openclaw

kraocode · 2026-03-16T06:31:09Z

Summary

Add privacy.redactPII config option (boolean, default false). When enabled, redacts personally identifiable information from the prompt context sent to the LLM provider.

What's redacted

Field	Treatment
`SenderE164` (phone number)	Stripped entirely
`SenderId` (platform user ID)	Deterministic hash (`user_` + 12-char sha256)
`chat_id`	ID portion hashed, channel prefix preserved
`SenderName`	Not affected (user-chosen, publicly visible)
`SenderUsername`	Not affected (user-chosen public handle)
`SenderTag`	Not affected (platform identifier)

Why

Phone numbers (SenderE164) and user IDs (SenderId, chat_id) are PII that the LLM has no functional need for. Auth and routing happen at the gateway layer before the LLM call. Signal/WhatsApp always include phone numbers, and Telegram private chat chat_id equals the user ID.

Config

{
  "privacy": {
    "redactPII": true
  }
}

Design decisions

Deterministic hash (no salt): same user always maps to same pseudonym, consistent across sessions. Aligns with existing formatOwnerDisplayId (12-char sha256).
Strip vs hash: SenderE164 is stripped (redundant field). SenderId and chat_id are hashed (needed for speaker differentiation and context identification).
Public fields preserved: SenderName, SenderUsername, SenderTag are user-chosen and publicly visible — not treated as PII.

Changes

src/config/zod-schema.ts — new privacy.redactPII field
src/auto-reply/reply/inbound-meta.ts — hash/strip PII in both builder functions
src/auto-reply/reply/get-reply-run.ts — reads config, passes to builders
src/auto-reply/reply/inbound-meta.test.ts — 7 new test cases (30 total, all passing)

Closes #47958

greptile-apps · 2026-03-16T06:36:23Z

Greptile Summary

This PR adds a privacy.redactPII config option that pseudonymises SenderE164, SenderId, and chat_id before they are injected into the LLM prompt context. The implementation is well-scoped, the hashing helpers are correct, and the test suite covers the key permutations.

Key observations:

The senderId pre-computation variable is declared after conversationInfo is built, causing the same hashSenderId logic to be duplicated inline in both conversationInfo.sender_id and conversationInfo.sender. Moving the variable declaration above conversationInfo would eliminate this duplication and reduce the risk of the two fields diverging if the hashing strategy changes.
followupRun.run.senderE164 and followupRun.run.senderId in get-reply-run.ts continue to carry raw PII regardless of the setting. If these are ever serialised to a queue file or other durable storage, raw phone numbers and user IDs would persist despite the privacy opt-in. A comment clarifying the intended in-memory-only scope would help future contributors.
The SHA-256 hash is unsalted and truncated to 12 hex characters. For SenderE164 (phone numbers), this is reversible via a pre-computed lookup table over the finite E.164 space. The PR acknowledges this as an intentional trade-off for cross-session pseudonym consistency; operators should be aware the feature provides pseudonymisation rather than true anonymisation.

Confidence Score: 4/5

Safe to merge; the core redaction logic is correct and well-tested, with only code-quality and documentation gaps to address.
The implementation correctly redacts PII from both builder functions, all major code paths are covered by new tests, and the schema addition is minimal and consistent with existing patterns. The two deductions: (1) duplicated hash logic in conversationInfo creates a future consistency risk but is not currently a bug, and (2) raw PII in followupRun may persist to storage if the queue layer serialises these fields — the intent is unclear without a comment.
src/auto-reply/reply/inbound-meta.ts (duplicated hash logic) and src/auto-reply/reply/get-reply-run.ts (undocumented raw PII in followupRun)

Comments Outside Diff (2)

src/auto-reply/reply/inbound-meta.ts, line 127-160 (link)

Duplicated hashSenderId logic in conversationInfo

The senderId variable (lines 173-174) centralises the hash-or-passthrough logic for SenderId, but it is declared after conversationInfo is built, so the same logic is duplicated inline twice:

conversationInfo.sender_id (lines 131-133) – inline hash
conversationInfo.sender (lines 139-141) – inline hash again

If the hashing strategy changes (e.g. a different prefix or algo), only line 174 would typically be updated, leaving the two inline copies stale and causing conversationInfo.sender_id / conversationInfo.sender to diverge from senderInfo.id.

Moving the two pre-computed variables (senderE164, rawSenderId, senderId) above the conversationInfo object would eliminate this duplication:

const senderE164 = options?.redactPII ? undefined : safeTrim(ctx.SenderE164);
const rawSenderId = safeTrim(ctx.SenderId);
const senderId = options?.redactPII && rawSenderId ? hashSenderId(rawSenderId) : rawSenderId;

const conversationInfo = {
  ...
  sender_id: shouldIncludeConversationInfo ? senderId : undefined,
  ...
  sender: shouldIncludeConversationInfo
    ? (safeTrim(ctx.SenderName) ??
       (options?.redactPII ? undefined : senderE164) ??
       senderId ??
       safeTrim(ctx.SenderUsername))
    : undefined,

This also removes the repeated safeTrim(ctx.SenderId) calls in the condition vs body throughout conversationInfo.

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/auto-reply/reply/inbound-meta.ts
Line: 127-160

Comment:
**Duplicated `hashSenderId` logic in `conversationInfo`**

The `senderId` variable (lines 173-174) centralises the hash-or-passthrough logic for `SenderId`, but it is declared *after* `conversationInfo` is built, so the same logic is duplicated inline twice:

1. `conversationInfo.sender_id` (lines 131-133) – inline hash
2. `conversationInfo.sender` (lines 139-141) – inline hash again

If the hashing strategy changes (e.g. a different prefix or algo), only line 174 would typically be updated, leaving the two inline copies stale and causing `conversationInfo.sender_id` / `conversationInfo.sender` to diverge from `senderInfo.id`.

Moving the two pre-computed variables (`senderE164`, `rawSenderId`, `senderId`) above the `conversationInfo` object would eliminate this duplication:

```ts
const senderE164 = options?.redactPII ? undefined : safeTrim(ctx.SenderE164);
const rawSenderId = safeTrim(ctx.SenderId);
const senderId = options?.redactPII && rawSenderId ? hashSenderId(rawSenderId) : rawSenderId;

const conversationInfo = {
  ...
  sender_id: shouldIncludeConversationInfo ? senderId : undefined,
  ...
  sender: shouldIncludeConversationInfo
    ? (safeTrim(ctx.SenderName) ??
       (options?.redactPII ? undefined : senderE164) ??
       senderId ??
       safeTrim(ctx.SenderUsername))
    : undefined,
```

This also removes the repeated `safeTrim(ctx.SenderId)` calls in the condition vs body throughout `conversationInfo`.

How can I resolve this? If you propose a fix, please make it concise.

src/auto-reply/reply/get-reply-run.ts, line 500-505 (link)

Raw PII still stored in followupRun when redactPII is enabled

The redaction only covers the strings injected into the LLM prompt. However, followupRun.run still carries the verbatim senderId and senderE164 from sessionCtx:

senderId: sessionCtx.SenderId?.trim() || undefined,   // raw
senderE164: sessionCtx.SenderE164?.trim() || undefined, // raw phone number

If runReplyAgent enqueues or persists this object (e.g. to a queue file for deferred/steer runs), the original phone number and user ID will be written to disk even when the operator has opted in to PII redaction.

If the followupRun is never persisted — i.e. it is purely in-memory and only used for same-process routing — this is fine and the current scope ("redact from LLM context") is correctly maintained. But if it is serialised, these fields should either be omitted or replaced with their pseudonymised equivalents when redactPIIOpts is active. It would be worth a brief code comment here clarifying that the raw values are intentional for routing/auth and are never written to durable storage.

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/auto-reply/reply/get-reply-run.ts
Line: 500-505

Comment:
**Raw PII still stored in `followupRun` when `redactPII` is enabled**

The redaction only covers the strings injected into the LLM prompt. However, `followupRun.run` still carries the verbatim `senderId` and `senderE164` from `sessionCtx`:

```ts
senderId: sessionCtx.SenderId?.trim() || undefined,   // raw
senderE164: sessionCtx.SenderE164?.trim() || undefined, // raw phone number
```

If `runReplyAgent` enqueues or persists this object (e.g. to a queue file for deferred/steer runs), the original phone number and user ID will be written to disk even when the operator has opted in to PII redaction.

If the `followupRun` is never persisted — i.e. it is purely in-memory and only used for same-process routing — this is fine and the current scope ("redact from LLM context") is correctly maintained. But if it *is* serialised, these fields should either be omitted or replaced with their pseudonymised equivalents when `redactPIIOpts` is active. It would be worth a brief code comment here clarifying that the raw values are intentional for routing/auth and are never written to durable storage.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/auto-reply/reply/inbound-meta.ts
Line: 127-160

Comment:
**Duplicated `hashSenderId` logic in `conversationInfo`**

The `senderId` variable (lines 173-174) centralises the hash-or-passthrough logic for `SenderId`, but it is declared *after* `conversationInfo` is built, so the same logic is duplicated inline twice:

1. `conversationInfo.sender_id` (lines 131-133) – inline hash
2. `conversationInfo.sender` (lines 139-141) – inline hash again

If the hashing strategy changes (e.g. a different prefix or algo), only line 174 would typically be updated, leaving the two inline copies stale and causing `conversationInfo.sender_id` / `conversationInfo.sender` to diverge from `senderInfo.id`.

Moving the two pre-computed variables (`senderE164`, `rawSenderId`, `senderId`) above the `conversationInfo` object would eliminate this duplication:

```ts
const senderE164 = options?.redactPII ? undefined : safeTrim(ctx.SenderE164);
const rawSenderId = safeTrim(ctx.SenderId);
const senderId = options?.redactPII && rawSenderId ? hashSenderId(rawSenderId) : rawSenderId;

const conversationInfo = {
  ...
  sender_id: shouldIncludeConversationInfo ? senderId : undefined,
  ...
  sender: shouldIncludeConversationInfo
    ? (safeTrim(ctx.SenderName) ??
       (options?.redactPII ? undefined : senderE164) ??
       senderId ??
       safeTrim(ctx.SenderUsername))
    : undefined,
```

This also removes the repeated `safeTrim(ctx.SenderId)` calls in the condition vs body throughout `conversationInfo`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/auto-reply/reply/get-reply-run.ts
Line: 500-505

Comment:
**Raw PII still stored in `followupRun` when `redactPII` is enabled**

The redaction only covers the strings injected into the LLM prompt. However, `followupRun.run` still carries the verbatim `senderId` and `senderE164` from `sessionCtx`:

```ts
senderId: sessionCtx.SenderId?.trim() || undefined,   // raw
senderE164: sessionCtx.SenderE164?.trim() || undefined, // raw phone number
```

If `runReplyAgent` enqueues or persists this object (e.g. to a queue file for deferred/steer runs), the original phone number and user ID will be written to disk even when the operator has opted in to PII redaction.

If the `followupRun` is never persisted — i.e. it is purely in-memory and only used for same-process routing — this is fine and the current scope ("redact from LLM context") is correctly maintained. But if it *is* serialised, these fields should either be omitted or replaced with their pseudonymised equivalents when `redactPIIOpts` is active. It would be worth a brief code comment here clarifying that the raw values are intentional for routing/auth and are never written to durable storage.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/auto-reply/reply/inbound-meta.ts
Line: 7-8

Comment:
**Unsalted hash makes phone-number pseudonyms reversible**

```ts
return createHash("sha256").update(value).digest("hex").slice(0, 12);
```

For `SenderE164` (phone numbers), the input space is small and well-structured (~15 decimal digits, ITU-T E.164). An attacker who obtains the hashed prompt logs can reverse every pseudonymised phone number by pre-computing hashes for all valid numbers — a lookup table of the entire global number space is feasible in seconds on modern hardware. The truncation to 12 hex chars makes collisions more likely but does not protect against this reversal.

The PR description explicitly calls this out as an intentional design trade-off ("Deterministic hash (no salt): same user always maps to same pseudonym, consistent across sessions"), which is reasonable for cross-session correlation. Just worth making sure operators are aware that `redactPII: true` provides pseudonymisation (unlinkability to an outside observer without the original data) rather than true anonymisation (irreversibility).

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: 0cec448}

greptile-apps · 2026-03-16T06:36:33Z

+function hashId(value: string): string {
+  return createHash("sha256").update(value).digest("hex").slice(0, 12);


Unsalted hash makes phone-number pseudonyms reversible

return createHash("sha256").update(value).digest("hex").slice(0, 12);

For SenderE164 (phone numbers), the input space is small and well-structured (~15 decimal digits, ITU-T E.164). An attacker who obtains the hashed prompt logs can reverse every pseudonymised phone number by pre-computing hashes for all valid numbers — a lookup table of the entire global number space is feasible in seconds on modern hardware. The truncation to 12 hex chars makes collisions more likely but does not protect against this reversal.

The PR description explicitly calls this out as an intentional design trade-off ("Deterministic hash (no salt): same user always maps to same pseudonym, consistent across sessions"), which is reasonable for cross-session correlation. Just worth making sure operators are aware that redactPII: true provides pseudonymisation (unlinkability to an outside observer without the original data) rather than true anonymisation (irreversibility).

Prompt To Fix With AI

This is a comment left during a code review. Path: src/auto-reply/reply/inbound-meta.ts Line: 7-8 Comment: **Unsalted hash makes phone-number pseudonyms reversible** ```ts return createHash("sha256").update(value).digest("hex").slice(0, 12); ``` For `SenderE164` (phone numbers), the input space is small and well-structured (~15 decimal digits, ITU-T E.164). An attacker who obtains the hashed prompt logs can reverse every pseudonymised phone number by pre-computing hashes for all valid numbers — a lookup table of the entire global number space is feasible in seconds on modern hardware. The truncation to 12 hex chars makes collisions more likely but does not protect against this reversal. The PR description explicitly calls this out as an intentional design trade-off ("Deterministic hash (no salt): same user always maps to same pseudonym, consistent across sessions"), which is reasonable for cross-session correlation. Just worth making sure operators are aware that `redactPII: true` provides pseudonymisation (unlinkability to an outside observer without the original data) rather than true anonymisation (irreversibility). How can I resolve this? If you propose a fix, please make it concise.

…enabled Add privacy.redactPII config option (boolean, default false). When enabled: - SenderE164 (phone number) is stripped from user context - SenderId is hashed (user_ + 12-char sha256) in both conversation info and sender block - chat_id is hashed (preserving channel prefix) in system prompt Public fields (SenderName, SenderUsername, SenderTag) are preserved. No functional side effects: auth/routing happens at gateway layer before LLM call. Closes openclaw#47958

kraocode · 2026-03-16T06:47:38Z

Closing: re-submitting with cleaner branch name and commit history.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 01d8d4033e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-16T06:51:55Z

+  const colonIdx = value.indexOf(":");
+  if (colonIdx > 0) {
+    const prefix = value.slice(0, colonIdx);
+    const id = value.slice(colonIdx + 1);
+    return `${prefix}:${hashId(id)}`;


Hash full topic IDs before preserving chat_id prefixes

hashChatId assumes the segment before the first : is always a safe channel prefix, but valid routing IDs in this repo can start with the real chat identifier and append :topic:<id> (for example Telegram topic IDs in extensions/telegram/src/targets.ts). With privacy.redactPII=true, an input like -1001234567890:topic:42 becomes -1001234567890:<hash>, which still exposes the raw chat ID to the LLM. Redaction should only preserve known non-PII prefixes (like telegram:/channel:) and otherwise hash the entire identifier.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-16T06:51:55Z

+  const senderE164 = options?.redactPII ? undefined : safeTrim(ctx.SenderE164);
+  const rawSenderId = safeTrim(ctx.SenderId);
+  const senderId = options?.redactPII && rawSenderId ? hashSenderId(rawSenderId) : rawSenderId;


Apply redactPII to inbound history sender labels

The new redaction path only sanitizes the current sender fields (SenderE164/SenderId), but this function still emits InboundHistory[].sender unchanged later, so phone numbers can still reach the model when redaction is enabled. This is reachable on current channel code paths (for example WhatsApp builds history senders as name (e164) in extensions/whatsapp/src/auto-reply/monitor/group-gating.ts:53-56), so privacy.redactPII currently does not satisfy its privacy contract for grouped history context.

Useful? React with 👍 / 👎.

openclaw-barnacle Bot added the size: S label Mar 16, 2026

greptile-apps Bot reviewed Mar 16, 2026

View reviewed changes

kraocode force-pushed the feature/strip-sender-e164 branch from 0cec448 to 01d8d40 Compare March 16, 2026 06:42

kraocode closed this Mar 16, 2026

kraocode deleted the feature/strip-sender-e164 branch March 16, 2026 06:51

chatgpt-codex-connector Bot reviewed Mar 16, 2026

View reviewed changes

teknium1 mentioned this pull request Mar 16, 2026

feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled NousResearch/hermes-agent#1542

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(privacy): redact PII from LLM context when privacy.redactPII is enabled#47959

feat(privacy): redact PII from LLM context when privacy.redactPII is enabled#47959
kraocode wants to merge 1 commit into
openclaw:mainfrom
kraocode:feature/strip-sender-e164

kraocode commented Mar 16, 2026

Uh oh!

greptile-apps Bot commented Mar 16, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot Mar 16, 2026

Uh oh!

kraocode commented Mar 16, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 16, 2026

Uh oh!

chatgpt-codex-connector Bot Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		function hashId(value: string): string {
		return createHash("sha256").update(value).digest("hex").slice(0, 12);

Uh oh!

Conversation

kraocode commented Mar 16, 2026

Summary

What's redacted

Why

Config

Design decisions

Changes

Uh oh!

greptile-apps Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

kraocode commented Mar 16, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Mar 16, 2026 •

edited

Loading