Skip to content

fix paper link#1

Merged
jalehman merged 1 commit into
Martian-Engineering:mainfrom
nick1udwig:patch-1
Feb 27, 2026
Merged

fix paper link#1
jalehman merged 1 commit into
Martian-Engineering:mainfrom
nick1udwig:patch-1

Conversation

@nick1udwig

Copy link
Copy Markdown
Contributor

https://voltropy.com/LCM isn't a real link: the paper actually lives at https://papers.voltropy.com/LCM

@jalehman jalehman merged commit e568e0c into Martian-Engineering:main Feb 27, 2026
@jalehman

Copy link
Copy Markdown
Contributor

ty!

billzhuang6569 pushed a commit to billzhuang6569/lossless-claw that referenced this pull request Mar 21, 2026
100yenadmin referenced this pull request in 100yenadmin/lossless-claw May 7, 2026
Opus subagent analysis of v4.1 baseline (333 blocks) vs v4.2 stubs (689
blocks) at the same 258K-token budget recommended four mitigations to
address moderate-risk findings:

1. Recency cue [t-NNm] on turn headers
2. Semantic stub wrapping <lcm-stub> XML tags
3. Empty-assistant collapsing
4. Resolution markers at completion boundaries

Applied first-principles-architectural-decision skill (research,
run-the-system, where-it-lives diagrams, adversarial debate) before
building any of them. Verdict: REJECT ALL FOUR. Each fails on a
specific load-bearing constraint:

- #1 fails on prefix-cache stability (clock-based tag changes the
  rendered string on every assemble, invalidating the cache that v4.2's
  whole value proposition relies on). User timestamps already exist
  inline.

- #2 fails on "novelty has cost, format already works" — the existing
  [LCM Tool Output: file_xxx | …] bracket form is correctly parsed by
  Opus in live tests (drilldown via lcm_describe works on Option F
  format). Replacing a working v4.1-trained format with a novel XML
  form is unjustified churn.

- #3 fails on Anthropic/OpenAI wire contract. The "empty assistants"
  contain tool_use blocks (required to live in assistant turns; paired
  with tool_results by toolCallId). Dropping them would break
  pairing — providers reject orphan tool_results.

- Martian-Engineering#4 fails on detection signal. No reliable way to mark "work
  completed" — user phrases like "go ahead" / "yes" / "keep digging"
  oscillate. False positives are strictly worse than no marker
  (license premature stubbing).

Adversarial debate at ≥95% confidence target on each. AGAINST won on
all four. Decision record committed for future operators who hit
similar moderate-risk findings and reach for similar mitigations.

Final v4.2 shipping shape: Options C + D + F at commit e309bed.
Architecturally additive, reversible, default-off. Empirically:
333→689 items at same budget; Opus drills down correctly; no
confabulation observed.
100yenadmin referenced this pull request in 100yenadmin/lossless-claw May 7, 2026
Wire #1 of 3 for the agent context-management architecture (Wave-14).

# What this lands

Subscribes to the openclaw `llm_output` hook to maintain a per-session
cache of (currentTokenCount, tokenBudget). Anchors via llm_output;
extends via per-tool additive self-updates between LLM calls so
parallel-tool-call sequences see accurate cumulative state, not stale
ground-truth from the previous LLM call.

Replaces lcm_compact's previously-vapor `getRuntimeContext` callback
with the real cache. Floor-check (currentRatio < reserveFraction)
finally works against live data instead of always-undefined.

# Files

NEW:
- `src/plugin/token-state.ts` (~165 LOC) — recordLlmOutput +
  accumulateToolResultTokens + getRuntimeContext + inferTokenBudget
- `test/v41-token-state.test.ts` (~110 LOC) — 16 tests covering
  anchoring, accumulation, drift reset, session isolation, budget
  inference

EDITED:
- `src/plugin/index.ts` — `api.on("llm_output", ...)` handler that
  records tokens; lcm_compact registration passes
  `getRuntimeContext: () => getTokenStateRuntimeContext(ctx.sessionKey)`
- `src/tools/lcm-compact-tool.ts` — docstring on getRuntimeContext
  documents the wiring + tolerance for undefined fields

# How it works

Round 1 (anchor):
  LLM call N → llm_output fires → recordLlmOutput stores
  { currentTokenCount, tokenBudget, lastUpdateSource: "llm_output" }
  keyed by sessionKey

Round 2 (parallel-tool-call protection):
  Tools fire sequentially between LLM calls. Each tool's execute()
  ends with accumulateToolResultTokens(sessionKey, resultText) which
  adds Math.ceil(resultText.length / 4) to currentTokenCount. This
  way a 5-tool batch from one LLM response sees accurate cumulative
  state at each tool, not the same stale value.

Round 3 (drift reset):
  Next LLM call → llm_output snaps cache back to ground truth.
  Any per-tool estimation drift bounded by one iteration's batch.

# Why this layer (vs. waiting for openclaw SDK addition)

`OpenClawPluginToolContext` does not expose token state today. Wave-14
research confirmed (lossless-claw#472, openclaw#68930 closed
NOT_PLANNED). The proper fix is an openclaw PR adding
`getTokenState?: () => TokenSnapshot` to the factory context. That PR
will be filed separately.

This module is the LCM-side bridge that makes the architecture work
TODAY without openclaw changes. Once openclaw lands the official
accessor, this hook handler becomes legacy / fallback for older
versions and the per-tool accumulator stays as a within-iteration
lag-protection layer.

# Verification

- 1573/1573 tests passing (1557 baseline + 16 new)
- 7/7 release-readiness preflight checks pass
- 330 TS errors (under 700 baseline; PR introduced none)
100yenadmin referenced this pull request in 100yenadmin/lossless-claw May 7, 2026
Opus subagent analysis of v4.1 baseline (333 blocks) vs v4.2 stubs (689
blocks) at the same 258K-token budget recommended four mitigations to
address moderate-risk findings:

1. Recency cue [t-NNm] on turn headers
2. Semantic stub wrapping <lcm-stub> XML tags
3. Empty-assistant collapsing
4. Resolution markers at completion boundaries

Applied first-principles-architectural-decision skill (research,
run-the-system, where-it-lives diagrams, adversarial debate) before
building any of them. Verdict: REJECT ALL FOUR. Each fails on a
specific load-bearing constraint:

- #1 fails on prefix-cache stability (clock-based tag changes the
  rendered string on every assemble, invalidating the cache that v4.2's
  whole value proposition relies on). User timestamps already exist
  inline.

- #2 fails on "novelty has cost, format already works" — the existing
  [LCM Tool Output: file_xxx | …] bracket form is correctly parsed by
  Opus in live tests (drilldown via lcm_describe works on Option F
  format). Replacing a working v4.1-trained format with a novel XML
  form is unjustified churn.

- #3 fails on Anthropic/OpenAI wire contract. The "empty assistants"
  contain tool_use blocks (required to live in assistant turns; paired
  with tool_results by toolCallId). Dropping them would break
  pairing — providers reject orphan tool_results.

- Martian-Engineering#4 fails on detection signal. No reliable way to mark "work
  completed" — user phrases like "go ahead" / "yes" / "keep digging"
  oscillate. False positives are strictly worse than no marker
  (license premature stubbing).

Adversarial debate at ≥95% confidence target on each. AGAINST won on
all four. Decision record committed for future operators who hit
similar moderate-risk findings and reach for similar mitigations.

Final v4.2 shipping shape: Options C + D + F at commit e309bed.
Architecturally additive, reversible, default-off. Empirically:
333→689 items at same budget; Opus drills down correctly; no
confabulation observed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants