Skip to content

Identity contamination when sandbox root contains another product's config files #550

@esengine

Description

@esengine

Summary

When Reasonix Code is launched against a sandbox/workspace root that
happens to contain another product's configuration (e.g. a Hermes Agent
directory with config.yaml, SOUL.md, REASONIX.md, skills/,
memories/), the model can browse those files and infer a false
identity — claiming it's a sub-profile, persona, or runtime layered on
top of that host product. The system prompt defines the assistant as a
standalone coding assistant; the workspace contents should not override
that.

Reported against 0.36.1 on Linux/WSL2.

Reproduce

  1. Point the workspace root at a directory containing another agent
    platform's config files (config.yaml with agent settings,
    SOUL.md, REASONIX.md with cross-product references, etc.).
  2. Ask the model Who are you? or What is your underlying runtime?.
  3. The model reads the workspace, finds the foreign config files, and
    may assert a false architectural relationship — e.g. "the underlying
    runtime is Hermes Agent" — with no basis in its system prompt.

Why this matters

  • Identity drift. The assistant's self-description should come from
    the prompt, not from ls. Reading the workspace to determine what
    it is, rather than what the user's project is, is the bug.
  • Confidently wrong claims. Users get false statements about
    architecture and provenance, which erodes trust.
  • Generalizes beyond Hermes. Any sandbox root containing another
    AI tool's dot-directory will trigger the same failure mode.

Fix directions

Open to one or a combination of:

  1. Workspace selection — make the default sandbox root the user's
    actual project directory, not a host platform's config directory.
    These are separate concerns and shouldn't share a root.
  2. Prompt guard — explicit instruction to ignore platform/host
    configuration files at the workspace root when reasoning about
    identity, and to never describe itself based on filesystem layout.
  3. Detect and warn — if the workspace root looks like a platform
    config directory (e.g. contains a config.yaml with agent-platform
    keys or a SOUL.md), surface a warning before serving requests so
    the user can re-point at a real project.

(2) is the cheapest immediate mitigation; (1) is the right long-term
shape.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions