Skip to content

fix(prompt): identity guard against foreign agent-platform workspaces#555

Merged
esengine merged 1 commit into
mainfrom
fix/identity-guard-issue-550
May 10, 2026
Merged

fix(prompt): identity guard against foreign agent-platform workspaces#555
esengine merged 1 commit into
mainfrom
fix/identity-guard-issue-550

Conversation

@esengine

Copy link
Copy Markdown
Owner

A user reported the model claiming a false architectural relationship
to whatever AI platform's data dir happened to sit at the workspace
root — e.g. "the underlying runtime is Hermes Agent" when launched
against a directory containing `SOUL.md`, `skills/`, `memories/`,
or a foreign `REASONIX.md`. The assistant's identity should come from
the prompt, not from `ls`.

Root cause

Two contamination vectors:

  1. No identity guard in CODE_SYSTEM_PROMPT. The base prompt opens
    with "You are Reasonix Code, a coding assistant" but never says
    the workspace is the user's project, not a spec for what
    Reasonix is
    . When the model gets asked "who are you?" and runs
    `directory_tree`, it pattern-matches whatever it finds ("oh,
    there's a SOUL.md and a skills/ — I must be a sub-profile of this
    platform") and confidently invents a relationship.
  2. No workspace sanity check at launch. Reasonix happily roots
    itself at any directory the user passes, including data dirs that
    were clearly never meant to be code projects.

Fix

Two layers — (1) is the load-bearing one, (2) is a UX nudge:

  • Top-of-prompt identity section (`src/code/prompt.ts`): names
    the failure mode explicitly. Lists the marker files that describe
    somebody else's runtime, not yours
    : `config.yaml` with agent /
    persona keys, `SOUL.md`, `AGENT.md`, `PERSONA.md`, foreign
    `skills/` / `memories/` trees, a foreign `REASONIX.md`. Tells
    the model to answer identity questions from the prompt — don't run
    `ls` to figure out what it is.
  • Foreign-platform detector (`src/memory/project.ts`):
    `detectForeignAgentPlatform(rootDir)` returns the marker(s) that
    flag a foreign workspace. `code.tsx`'s startup banner adds a one-
    line warning suggesting `--dir ` when any fire.
    Conservative signal: lone `skills/` doesn't trigger (it's common
    in coding repos); a `skills/` + `memories/` pair does.

Test plan

  • `npx tsc --noEmit` clean
  • `npx vitest run` — 2413 passed (added 7: 1 in
    `code-prompt.test.ts`, 6 in `project-memory.test.ts`)
  • Smoke: launch `reasonix code --dir ` and
    confirm the warning fires, then ask "who are you?" — expect a
    Reasonix Code answer, no Hermes / SOUL references

Closes #550

…#550)

When the sandbox root contained another product's data dir (Hermes
Agent's SOUL.md, skills/, memories/, etc.), the model would browse
those files and infer it was a sub-profile or runtime layered on top of
the host product. The system prompt defines the assistant as a
standalone coding assistant; the workspace contents shouldn't override
that.

Two layers:

- Top-of-prompt identity section names the failure mode explicitly:
  workspace files describe the user's project, never what Reasonix is.
  Calls out SOUL.md / AGENT.md / PERSONA.md / foreign skills+memories
  trees / a foreign REASONIX.md as not-the-spec, and tells the model to
  answer "who are you?" from the prompt instead of running ls.
- detectForeignAgentPlatform(rootDir) flags the workspace at launch if
  one of those markers sits at the root, and the code-mode banner adds
  a one-line warning suggesting --dir <real-project>. Conservative
  signal: lone skills/ doesn't trigger (common in coding repos);
  skills/ + memories/ pair does.

Closes #550
@esengine esengine merged commit 6761a31 into main May 10, 2026
3 checks passed
@esengine esengine deleted the fix/identity-guard-issue-550 branch May 10, 2026 00:36
ChasLui pushed a commit to ChasLui/DeepSeek-Reasonix that referenced this pull request May 23, 2026
…esengine#550) (esengine#555)

When the sandbox root contained another product's data dir (Hermes
Agent's SOUL.md, skills/, memories/, etc.), the model would browse
those files and infer it was a sub-profile or runtime layered on top of
the host product. The system prompt defines the assistant as a
standalone coding assistant; the workspace contents shouldn't override
that.

Two layers:

- Top-of-prompt identity section names the failure mode explicitly:
  workspace files describe the user's project, never what Reasonix is.
  Calls out SOUL.md / AGENT.md / PERSONA.md / foreign skills+memories
  trees / a foreign REASONIX.md as not-the-spec, and tells the model to
  answer "who are you?" from the prompt instead of running ls.
- detectForeignAgentPlatform(rootDir) flags the workspace at launch if
  one of those markers sits at the root, and the code-mode banner adds
  a one-line warning suggesting --dir <real-project>. Conservative
  signal: lone skills/ doesn't trigger (common in coding repos);
  skills/ + memories/ pair does.

Closes esengine#550
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Identity contamination when sandbox root contains another product's config files

1 participant