Skip to content

fix(context): preserve memory constraints across context folds (#1462)#1515

Merged
esengine merged 2 commits into
esengine:mainfrom
paradoxSCH:fix/1462-memory-constraint-persistence
May 22, 2026
Merged

fix(context): preserve memory constraints across context folds (#1462)#1515
esengine merged 2 commits into
esengine:mainfrom
paradoxSCH:fix/1462-memory-constraint-persistence

Conversation

@paradoxSCH

@paradoxSCH paradoxSCH commented May 22, 2026

Copy link
Copy Markdown
Contributor

Problem

When a long session triggers a context fold (history compression), the fold summary loses high-priority user constraints stored in the system prompt (e.g., "do NOT use X", "always Y"). After the fold, the model may violate these constraints because they are no longer in context.

Solution

Extract high-priority pinned constraints from the system prompt (User memory, Project memory, High Priority constraints) and append them verbatim to the fold summary message. This ensures they survive the fold.

Changes

  • `src/context-manager.ts`:
    • Add `extractPinnedConstraints()` to pull pinned blocks from the system prompt.
    • Append the extracted constraints to the summary message built by `fold()`.
    • Update the summarizer system prompt to explicitly remind it to preserve "do not" / "never" / "avoid" instructions.
  • `src/loop.ts`:
    • Wire `getSystemPrompt` into `ContextManagerDeps` so the fold can read the current system prompt.

Alternative considered

A runtime ToolInterceptor (option C) that checks every outgoing tool call against pinned constraints was evaluated. It is more robust but adds per-call latency and complexity; we can revisit it if prompt-level preservation proves insufficient.

Verification

  • Existing tests pass (3333 passed).
  • The fold now carries constraint text in the synthetic assistant message.
  • Added behavior-stability regression test:
    npx tsx benchmarks/behavior-stability/runner.ts --local
    This deterministic test injects 40 turns of synthetic history, triggers a fold, and asserts that all 4 pinned constraints (2 HIGH PRIORITY + 1 User memory + 1 Project memory) are preserved in the summary.

Two layers of defense for esengine#1462:

1. Summary prompt strengthening (summarizeForFold): The summarizer's
   system prompt now explicitly asks to preserve "the user's ORIGINAL
   OBJECTIVE" and "all 'do not' / 'never' / 'avoid' instructions".
   Previously, negative constraints were the most likely to be dropped
   during summarization.

2. Verbatim constraint pinning (fold): After generating the summary,
   high-priority memory, user memory, and project memory blocks are
   extracted from the system prompt and appended verbatim to the fold
   summary message. This ensures constraints survive the fold and
   remain visible in the conversation tail.

Also adds getSystemPrompt to ContextManagerDeps so the fold logic can
access the current system prompt content.

Refs: esengine#1462
paradoxSCH added a commit to paradoxSCH/DeepSeek-Reasonix that referenced this pull request May 22, 2026


Add benchmarks/behavior-stability/ with a single local scenario:

- constraint-persistence: verifies that ContextManager.fold()
  preserves pinned constraints (HIGH PRIORITY, User memory, Project
  memory) from the system prompt into the fold summary.

This is a deterministic, fast test that exercises the core fix in
PR esengine#1515 without requiring API calls.

Usage:
  npx tsx benchmarks/behavior-stability/runner.ts --local

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
paradoxSCH added a commit to paradoxSCH/DeepSeek-Reasonix that referenced this pull request May 22, 2026


Add benchmarks/behavior-stability/ with a single local scenario:

- constraint-persistence: verifies that ContextManager.fold()
  preserves pinned constraints (HIGH PRIORITY, User memory, Project
  memory) from the system prompt into the fold summary.

This is a deterministic, fast test that exercises the core fix in
PR esengine#1515 without requiring API calls.

Usage:
  npx tsx benchmarks/behavior-stability/runner.ts --local

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@paradoxSCH paradoxSCH force-pushed the fix/1462-memory-constraint-persistence branch from 69f9da4 to d2aba94 Compare May 22, 2026 06:10


Add benchmarks/behavior-stability/ with a single local scenario:

- constraint-persistence: verifies that ContextManager.fold()
  preserves pinned constraints (HIGH PRIORITY, User memory, Project
  memory) from the system prompt into the fold summary.

This is a deterministic, fast test that exercises the core fix in
PR esengine#1515 without requiring API calls.

Usage:
  npx tsx benchmarks/behavior-stability/runner.ts --local

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@paradoxSCH paradoxSCH force-pushed the fix/1462-memory-constraint-persistence branch from d2aba94 to 34fc823 Compare May 22, 2026 06:40
@esengine esengine merged commit 736deb1 into esengine:main May 22, 2026
4 checks passed
esengine added a commit that referenced this pull request May 22, 2026
)

Small follow-up to #1515.

`extractPinnedConstraints` used `.match()` without the `g` flag, which only returns the first capture. The system prompt can carry multiple blocks under the same prefix:

- `# User memory — global` and `# User memory — this project` both exist when a user has both (`src/memory/user.ts` emits them as separate sections).
- Multiple `# Project memory (filename)` sections can show up when a project carries per-subdir memory.

Single-match meant the second one was silently dropped from the fold-tail constraint reinforcement.

Combined the three prefixes into one regex and switched to `matchAll` so every pinned block makes it into the fold summary.
esengine added a commit that referenced this pull request May 22, 2026
…se (#1565)

* chore(release): 0.49.0 — static-history TUI, queued steers, Bing default, lifecycle plans

Headline themes:
- TUI: Static-history renderer is the only path; virtual-viewport layers removed (#1529 stages 1-4)
- Chat: queued mid-turn steer handling so input mid-render doesn't drop or fight the live frame (#1501)
- Web search: default switches to Bing; dashboard engine switcher; Mojeek dropped (#1558)
- Plans: lifecycle evidence summaries surface why a plan is ready to accept (#1500)
- Desktop: native OS notifications for approvals + completion (#1519)
- i18n: CLI command output (/mcp /sessions /prune /theme) + approval-prompt labels translated (#1524, #1560)
- Security: SSRF block in web_fetch (#1544), edit-snapshot path containment (#1454), shell redirect sandbox (#1457), Task integrity guardrail (#1516)
- Tools: per-turn dispatch-rate limit (#1356); run_command discourages shell-based edits (#1514)
- Client: DeepSeek 429 → concurrency-limit hint (#1526); timeoutMs honored with AbortSignal (#1535); --no-proxy opt-out for direct route (#1507)
- Files: read/edit/restore preserves source encoding (GB18030 / UTF-8 BOM) (#1518)
- Context: pinned constraints survive folds + full tail capture (#1515, #1552)
- Refactor: lifecycle risk policy extracted into its own module (#1557)

See CHANGELOG for the full list.

* fix(context): align fold summary prefix with main agent for cache reuse

The summarizer call was sending a bespoke "You compress conversation
history" system prompt and no tools, guaranteeing a 0% cache hit
against the main agent's just-cached prefix. Reshape the request so
system + tools + head bytes mirror the live agent's last call — the
only novel bytes are the trailing summarize instruction.

Skill-pin handling now collects bodies read-only instead of stubbing
mid-head, so the cache prefix stays unbroken. The summarize
instruction names pinned skills so the model knows not to paraphrase
their bodies (which we append verbatim regardless).

Measured on a real session at 48.7K prompt tokens:
  OLD shape: 0.0% cache hit  → $0.145 per fold
  NEW shape: 99.6% cache hit → $0.015 per fold
  saving: 89.6% per fold

* tools: add fold-cache shape + live benchmarks

bench-fold-cache-shape.mjs replays real session jsonls, simulates
OLD vs NEW summary-call shapes at the fold point, and reports
byte-level shared-prefix with the main agent's preceding request.
Pure local — no API required.

bench-fold-cache-live.mjs sends one priming + two summary calls to
DeepSeek and reports prompt_cache_hit_tokens / cost for each shape.
Used to confirm the shape change actually translates to API-side
cache hits.

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants