Skip to content

feat(prompt): add audit-mode rails for review/critique tasks#611

Merged
esengine merged 3 commits into
mainfrom
feat/audit-mode-rails
May 10, 2026
Merged

feat(prompt): add audit-mode rails for review/critique tasks#611
esengine merged 3 commits into
mainfrom
feat/audit-mode-rails

Conversation

@esengine

@esengine esengine commented May 10, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #610.

When users ask Reasonix Code to audit its own architecture, the
existing "Cite or shut up" rule catches absence-style claims but not
the more common audit-mode failure: confident proposals built on
factually wrong premises about runtime behavior, invented quantities,
or recommendations that contradict pinned memory.

Adds a six-bullet # When auditing or reviewing this codebase
section right after "Cite or shut up". Placed there because both are
about evidence integrity for evaluative claims, and audit-mode is
the most common context where a model produces evaluative output.

The six rails:

  1. Auto-preview is for locating, not auditing — long files come
    back as head + tail with the middle elided; re-read with
    range:"A-B" against the actual section before asserting what's
    there (covers runtime behavior, current architectural state, and
    doc freshness).
  2. Flag → consumer trace — reading a type field is not
    understanding behavior; grep the flag's consumer first. For
    inventory claims
    ("which tools have flag F?"), grep — don't
    enumerate from memory.
  3. No fabricated percentages — ground numbers in a cited
    measurement or hedge; never present unmeasured numbers as
    measured.
  4. Schema cost is real — new-tool proposals must cover (a) which
    existing-tool composition fails, (b) description-token cost, (c)
    why a prompt change can't reach the same end.
  5. MEMORY.md is part of the design space — recommendations
    contradicting loaded user feedback are wrong by construction.
  6. User-facing ≠ model-facing ≠ library-facing — four action
    surfaces: slash commands (user), tools (model), UI (user),
    library exports (src/index.ts). Promoting a user-level feature
    to a model tool breaks user-control invariants; treating a
    library export as "dead code" because the CLI doesn't register
    it misreads the design.

Tightening pass (5093c79)

First real audit session against the original six rails surfaced
two failure modes the original wording didn't catch:

Broadening pass (cd60b80)

A second session against the tightened rails showed the same
head-only-then-conclude failure on a plan-doc file: the model read
the head of docs/plans/architecture-refactoring-roadmap.md, saw
"8 services still use singletons", and asserted the plan was stale
without checking the rest of the doc. Rail #1's "runtime behavior"
scope was inherited from the loop.ts dispatcher case from #610
broadened to cover doc freshness and architectural-state claims
too.

Test plan

  • tests/code-prompt.test.ts extended with audit-rail asserts
    (now 8 assertions across the 6 rails) — anchored on stable
    phrases plus the concrete tool / param tokens (range:"A-B",
    parallelSafe?: boolean, 40-60% tokens, tighten prompt / existing tool, grep the flag, library exports (\src/index.ts`), current architectural state, whether
    a plan doc is still accurate`).
  • npm run typecheck / npm run lint clean
  • npx vitest run tests/code-prompt.test.ts — 25 pass
  • npx vitest run tests/comment-policy.test.ts — 9 pass

esengine added 3 commits May 10, 2026 06:40
When the user asks Reasonix Code to audit its own architecture, the
existing "cite or shut up" rule covers absence claims but doesn't
catch the more common audit-mode failure: confident, well-structured
proposals built on factually wrong premises about runtime behavior,
fabricated quantities, or recommendations that contradict pinned
memory.

Adds a six-bullet section after "Cite or shut up": auto-preview is
for locating not auditing, flag→consumer trace before claiming
runtime behavior, no fabricated percentages, schema-cost accounting
for new-tool proposals, MEMORY.md as design constraint, and
user-facing ≠ model-facing as a category-error guardrail.

Closes #610.
Audit session run against the original 6-rail section (#610) showed
two failures the wording didn't catch:

1. **Inventory-claim hallucination.** Asked which tools have
   `stormExempt: true`, the model enumerated 6 file-system tools as
   having it — only 2 actually do. The rail said "trace flag to
   consumer", which the model interpreted as "for one named tool",
   not "for an inventory claim covering many tools." Add an explicit
   inventory clause: grep the flag, don't enumerate from memory.
2. **Library API → dead-code mischaracterization.** The model
   labeled `registerSubagentTool` "dead code from CLI perspective"
   on the basis of a clean grep in `src/cli/`. It's a deliberate
   library export consumed by embedders via `src/index.ts`. The
   rail enumerated three surfaces (slash / tools / UI); add a fourth
   (library) so library exports aren't mistaken for unused code.

Two-test bump on tests/code-prompt.test.ts so the tightened wording
can't silently regress.
Second audit session run against the tightened rails (#611) showed
the same head-only-then-conclude failure mode again, this time on a
plan-doc file rather than runtime code: the model read the head of
docs/plans/architecture-refactoring-roadmap.md, saw "8 services
still use singletons", and asserted the plan was now stale —
without reading the rest of the doc to check for a "Status: done"
section that might have been there.

The original rail was scoped to "runtime behavior" because that was
the loop.ts dispatcher case from #610. The same blind spot applies
to any file: don't conclude what's in the elided middle off head +
tail. Broaden the wording to cover runtime behavior, current
architectural state, and doc freshness explicitly.

One test bump on tests/code-prompt.test.ts so the broader scope
can't silently regress to runtime-only.
@esengine esengine merged commit 56fcb2a into main May 10, 2026
3 checks passed
@esengine esengine deleted the feat/audit-mode-rails branch May 10, 2026 14:39
ChasLui pushed a commit to ChasLui/DeepSeek-Reasonix that referenced this pull request May 23, 2026
…e#611)

* feat(prompt): add audit-mode rails for review/critique tasks

When the user asks Reasonix Code to audit its own architecture, the
existing "cite or shut up" rule covers absence claims but doesn't
catch the more common audit-mode failure: confident, well-structured
proposals built on factually wrong premises about runtime behavior,
fabricated quantities, or recommendations that contradict pinned
memory.

Adds a six-bullet section after "Cite or shut up": auto-preview is
for locating not auditing, flag→consumer trace before claiming
runtime behavior, no fabricated percentages, schema-cost accounting
for new-tool proposals, MEMORY.md as design constraint, and
user-facing ≠ model-facing as a category-error guardrail.

Closes esengine#610.

* prompt: tighten rails #2 and #6 from real audit-session failure modes

Audit session run against the original 6-rail section (esengine#610) showed
two failures the wording didn't catch:

1. **Inventory-claim hallucination.** Asked which tools have
   `stormExempt: true`, the model enumerated 6 file-system tools as
   having it — only 2 actually do. The rail said "trace flag to
   consumer", which the model interpreted as "for one named tool",
   not "for an inventory claim covering many tools." Add an explicit
   inventory clause: grep the flag, don't enumerate from memory.
2. **Library API → dead-code mischaracterization.** The model
   labeled `registerSubagentTool` "dead code from CLI perspective"
   on the basis of a clean grep in `src/cli/`. It's a deliberate
   library export consumed by embedders via `src/index.ts`. The
   rail enumerated three surfaces (slash / tools / UI); add a fourth
   (library) so library exports aren't mistaken for unused code.

Two-test bump on tests/code-prompt.test.ts so the tightened wording
can't silently regress.

* prompt: broaden rail #1 to cover doc / state claims, not just runtime

Second audit session run against the tightened rails (esengine#611) showed
the same head-only-then-conclude failure mode again, this time on a
plan-doc file rather than runtime code: the model read the head of
docs/plans/architecture-refactoring-roadmap.md, saw "8 services
still use singletons", and asserted the plan was now stale —
without reading the rest of the doc to check for a "Status: done"
section that might have been there.

The original rail was scoped to "runtime behavior" because that was
the loop.ts dispatcher case from esengine#610. The same blind spot applies
to any file: don't conclude what's in the elided middle off head +
tail. Broaden the wording to cover runtime behavior, current
architectural state, and doc freshness explicitly.

One test bump on tests/code-prompt.test.ts so the broader scope
can't silently regress to runtime-only.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

prompt: harden audit-mode reasoning against known failure modes

1 participant