fix(openclaw-plugin): enforce token budget and reduce context bloat (#730) by chethanuk · Pull Request #796 · volcengine/OpenViking

chethanuk · 2026-03-20T00:00:46Z

Problem

The OpenClaw plugin's memory injection pipeline has 5 compounding issues that inject 16K+ tokens of memory context per LLM call with no budget enforcement — inflating cost and degrading response quality.

Before: Unbounded Injection Pipeline

flowchart TD
    subgraph "❌ Current Pipeline — No Guards"
        A["🔍 client.find()\nReturns all matching memories"] --> B

        B{"⚠️ recallScoreThreshold = 0.01\n<i>Effectively no filter</i>"}
        B -->|"~70% irrelevant memories\npass through"| C

        C{"⚠️ isLeafLikeMemory()\nBoosts .md URIs + level-2"}
        C -->|"False positives\nranked artificially high"| D

        D{"⚠️ client.read(uri)\nAlways fetches full content"}
        D -->|"2K+ chars per memory\nfull .md files loaded"| E

        E{"⚠️ No truncation\nper memory item"}
        E -->|"Unbounded content\npassed through"| F

        F{"⚠️ No token budget\nin injection loop"}
        F -->|"All memories\nconcatenated"| G

        G["💥 16,384+ tokens\ninjected per LLM call"]
    end

    style A fill:#4a90d9,color:#fff,stroke:#2c5f8a
    style B fill:#e74c3c,color:#fff,stroke:#c0392b
    style C fill:#e74c3c,color:#fff,stroke:#c0392b
    style D fill:#e74c3c,color:#fff,stroke:#c0392b
    style E fill:#e74c3c,color:#fff,stroke:#c0392b
    style F fill:#e74c3c,color:#fff,stroke:#c0392b
    style G fill:#8b0000,color:#fff,stroke:#5c0000

After: Budget-Enforced Pipeline

flowchart TD
    subgraph "✅ Fixed Pipeline — 5 Defense Layers"
        A["🔍 client.find()\nReturns matching memories"] --> B

        B{"✅ Slice A\nrecallScoreThreshold ≥ 0.15"}
        B -->|"~70% irrelevant\nfiltered out"| C

        C{"✅ Slice C\nisLeafLikeMemory: level-2 only"}
        C -->|"No false .md\nURI boosting"| D

        D{"✅ Slice B\nPrefer item.abstract"}
        D -->|"100-300 chars\nvs full file fetch"| E

        E{"✅ Slice D\nrecallMaxContentChars ≤ 500"}
        E -->|"Per-memory\ntruncation"| F

        F{"✅ Slice E\nrecallTokenBudget ≤ 2000"}
        F -->|"Decrement loop\nstops at limit"| G

        G["✨ < 2,000 tokens\nbudget-enforced injection"]
    end

    style A fill:#4a90d9,color:#fff,stroke:#2c5f8a
    style B fill:#27ae60,color:#fff,stroke:#1e8449
    style C fill:#27ae60,color:#fff,stroke:#1e8449
    style D fill:#27ae60,color:#fff,stroke:#1e8449
    style E fill:#27ae60,color:#fff,stroke:#1e8449
    style F fill:#27ae60,color:#fff,stroke:#1e8449
    style G fill:#1a5e2f,color:#fff,stroke:#0d3b1a

Fix — 5 Independent Slices

Each slice is an atomic commit that can be reverted independently without affecting the others.

Slice	Commit	What Changed	Why
A	`raise recallScoreThreshold 0.01→0.15`	Default score filter from 0.01 to 0.15	Eliminates ~70% of low-relevance memories that add noise
C	`narrow isLeafLikeMemory to level-2`	Remove `.md` URI extension check from boost logic	Prevents container/index documents from getting artificial relevance boosts
B	`prefer abstract over full content`	Use `item.abstract` when available, skip `client.read()`	Reduces per-memory payload from 2K+ chars to 100-300 chars
D	`add recallMaxContentChars`	Truncate any single memory to 500 chars (configurable)	Hard cap on per-memory size prevents outlier content from dominating
E	`enforce tokenBudget with decrement loop`	Stop injecting when cumulative tokens hit budget (default: 2000)	Guarantees bounded total injection regardless of memory count

Additional Improvements (Review Feedback)

Fix	Description
DRY extraction	`resolveMemoryContent()` helper eliminates duplicate content-resolution logic between `buildMemoryLines` and `buildMemoryLinesWithBudget`
Empty abstract fallback	Changed `??` to `
Budget overshoot documented	JSDoc + inline comment: first memory always included even if it exceeds budget (spec §6.2 — bounded by `recallMaxContentChars`)

Testing

10 regression tests (vitest) — one or more per slice, all passing:

flowchart LR
    subgraph "Test Coverage Map"
        direction TB

        subgraph SliceA["Slice A — Score Filter"]
            T1["① Default threshold\nfilters scores < 0.15"]
            T2["② Backward compat\nexplicit 0.01 preserved"]
        end

        subgraph SliceC["Slice C — Ranking"]
            T3["③ Level-2 only boost\nno .md URI false positives"]
        end

        subgraph SliceB["Slice B — Abstract-First"]
            T4["④ client.read() skipped\nwhen abstract available"]
        end

        subgraph SliceD["Slice D — Truncation"]
            T5["⑤ Content truncated\nat recallMaxContentChars"]
            T6["⑥ Config defaults\nrecallMaxContentChars=500\nrecallPreferAbstract=true"]
        end

        subgraph SliceE["Slice E — Budget"]
            T7["⑦ Budget enforcement\ndecrement loop stops"]
            T8["⑧ First-line overshoot\n≤2 lines, ≤106 tokens"]
            T9["⑨ estimateTokenCount\nceil(chars/4) heuristic"]
            T10["⑩ Config default\nrecallTokenBudget=2000"]
        end
    end

    style SliceA fill:#e8f5e9,stroke:#27ae60
    style SliceC fill:#e8f5e9,stroke:#27ae60
    style SliceB fill:#e8f5e9,stroke:#27ae60
    style SliceD fill:#e8f5e9,stroke:#27ae60
    style SliceE fill:#e8f5e9,stroke:#27ae60

#	Test	Validates	Slice
1	Score threshold filtering with default config	Memories with scores [0.05, 0.10, 0.20, 0.50] → only ≥ 0.15 pass	A
2	Backward compatibility	Explicit `recallScoreThreshold: 0.01` config is preserved and respected	A
3	`isLeafLikeMemory` ranking	Level-2 items get boost; `.md` URI with `level≠2` does NOT	C
4	Abstract-first resolution	`client.read()` not called when `item.abstract` is populated	B
5	Content truncation	2000-char content truncated to 500 + `"..."` (503 total)	D
6	Config defaults	`recallMaxContentChars=500`, `recallPreferAbstract=true`	D
7	Token budget enforcement	10 memories × ~53 tokens each, budget=100 → only 1-2 injected	E
8	First-line overshoot bounds	With budget=100, result is ≤ 2 lines and ≤ 106 estimated tokens	E
9	`estimateTokenCount` accuracy	`""→0`, `"abcd"→1`, `"abcde"→2`, `"A"×100→25`	E
10	Config default	`recallTokenBudget=2000`	E

$ cd examples/openclaw-plugin && npx vitest run

 ✓ context-bloat-730.test.ts (10 tests) 148ms

 Test Files  1 passed (1)
      Tests  10 passed (10)
   Duration  320ms

Impact

graph LR
    subgraph "Before"
        B1["16K+ tokens/call"]
        B2["~$13.50/day"]
        B3["No budget control"]
    end

    subgraph "After"
        A1["< 2K tokens/call"]
        A2["< $1.50/day"]
        A3["3 config knobs"]
    end

    B1 -.->|"87% reduction"| A1
    B2 -.->|"89% savings"| A2
    B3 -.->|"user-tunable"| A3

    style B1 fill:#e74c3c,color:#fff
    style B2 fill:#e74c3c,color:#fff
    style B3 fill:#e74c3c,color:#fff
    style A1 fill:#27ae60,color:#fff
    style A2 fill:#27ae60,color:#fff
    style A3 fill:#27ae60,color:#fff

Metric	Before	After	Change
Context per call	16,384+ tokens (unbounded)	< 2,000 tokens (budget-enforced)	↓ 87%
Estimated daily cost (200 memories, 100 turns)	~$13.50	< $1.50	↓ 89%
Breaking changes	—	None	—
Observability	`injecting N memories`	`injecting N memories (~T tokens, budget=B)`	Token + budget logging

New Configuration Options

All options have backward-compatible defaults — zero config changes required for existing users.

Option	Type	Default	Description
`recallScoreThreshold`	`number`	`0.15` (was 0.01)	Minimum relevance score for memory injection
`recallPreferAbstract`	`boolean`	`true`	Use `item.abstract` instead of fetching full content via `client.read()`
`recallMaxContentChars`	`number`	`500`	Maximum characters per memory item (truncated with `...`)
`recallTokenBudget`	`number`	`2000`	Maximum total estimated tokens for injected memory context

Note: Users who previously set recallScoreThreshold: 0.01 explicitly will retain that behavior — the default change only affects unset configurations.

Files Changed

File	Changes
`examples/openclaw-plugin/config.ts`	New config options + raised default threshold
`examples/openclaw-plugin/index.ts`	`resolveMemoryContent()` helper, budget-enforced injection loop, abstract-first resolution
`examples/openclaw-plugin/memory-ranking.ts`	`isLeafLikeMemory` narrowed to `level === 2` only
`examples/openclaw-plugin/openclaw.plugin.json`	Schema for new config options
`examples/openclaw-plugin/__tests__/context-bloat-730.test.ts`	10 regression tests
`examples/openclaw-plugin/vitest.config.ts`	Test runner configuration
`examples/openclaw-plugin/package.json`	vitest dev dependency

Closes #730

… 0.15 (volcengine#730)

…olcengine#730)

…ry injection (volcengine#730)

…ract config (volcengine#730)

…t loop (volcengine#730)

…h new default (volcengine#730)

…get behavior (volcengine#730) Extract resolveMemoryContent() helper to eliminate duplicate content-resolution logic between buildMemoryLines and buildMemoryLinesWithBudget. Add JSDoc and inline comment documenting intentional first-line budget overshoot (spec §6.2). Tighten test assertion from <=120 to <=106 tokens.

Resolve conflict in index.ts: keep buildMemoryLinesWithBudget approach inside main's timeout wrapper (AUTO_RECALL_TIMEOUT_MS).

…olcengine#730) Change nullish coalescing (??) to truthy fallback (||) in resolveMemoryContent() so empty-string abstracts fall back to item.uri instead of producing empty content lines.

qin-ctx · 2026-03-20T06:55:47Z

cc @chenjw @Mijamind719

qin-ctx · 2026-03-20T10:19:54Z

Hey @chethanuk, thanks for this excellent contribution! The 5-slice decomposition is really well thought out — each slice being independently revertable is a great design choice, and the before/after pipeline diagrams make the problem and solution crystal clear. The test coverage is thorough too.

One request: could you retarget this PR from main to the refactor/openclaw-memory branch? We're currently doing a broader refactor of the memory pipeline on that branch, and your budget enforcement work fits right into it. Merging there first will let us integrate your changes with the other memory-related improvements and avoid conflicts down the line.

You can do this via:

gh pr edit 796 --base refactor/openclaw-memory

Or just click Edit next to the base branch on the PR page.

Thanks again for the detailed analysis and clean implementation!

chethanuk · 2026-03-20T14:49:16Z

gh pr edit 796 --base refactor/openclaw-memory

@qin-ctx Do you still want me to do this?

Mijamind719 · 2026-03-21T01:28:14Z

gh pr edit 796 --base refactor/openclaw-memory

@qin-ctx Do you still want me to do this?

Hi，@chethanuk，thank you for your pull request. We are improving the memory functionality in the context engine and will continue to evolve based on your modifications. We will be working directly on the openclaw-memory branch and expect to submit a version to main next Tuesday. We hope you can participate in the evolution of the solution.

chethanuk · 2026-03-21T12:22:07Z

We will be working directly on the openclaw-memory branch and expect to submit a version to main next Tuesday. We hope you can participate in the evolution of the solution.

Sure, let me know. thank you :)

…budget and reduce context bloat (#891) * fix(openclaw-plugin): enforce token budget and reduce context bloat (#730) (#796) * test(openclaw-plugin): add vitest test infrastructure for #730 * fix(openclaw-plugin): raise recallScoreThreshold default from 0.01 to 0.15 (#730) * fix(openclaw-plugin): narrow isLeafLikeMemory boost to level-2 only (#730) * fix(openclaw-plugin): prefer abstract over full content fetch in memory injection (#730) * feat(openclaw-plugin): add recallMaxContentChars and recallPreferAbstract config (#730) * feat(openclaw-plugin): enforce tokenBudget in injection with decrement loop (#730) * fix(openclaw-plugin): update recallScoreThreshold placeholder to match new default (#730) * fix(openclaw-plugin): deduplicate content resolution and document budget behavior (#730) Extract resolveMemoryContent() helper to eliminate duplicate content-resolution logic between buildMemoryLines and buildMemoryLinesWithBudget. Add JSDoc and inline comment documenting intentional first-line budget overshoot (spec §6.2). Tighten test assertion from <=120 to <=106 tokens. * fix(openclaw-plugin): use truthy fallback for empty abstract strings (#730) Change nullish coalescing (??) to truthy fallback (||) in resolveMemoryContent() so empty-string abstracts fall back to item.uri instead of producing empty content lines. * docs: add openclaw context engine refactor design Co-authored-by: Mijamind <mijamind@163.com> Co-authored-by: GPT-5.4 <noreply@openai.com> * add afterTurn refactor in design_doc * add afterTurn compact in design_doc --------- Co-authored-by: chethanuk <chethanuk@outlook.com> Co-authored-by: GPT-5.4 <noreply@openai.com> Co-authored-by: wlff123 <wulf234@163.com> Co-authored-by: xuwengui <huangxun375@gmail.com>

chethanuk added 10 commits March 19, 2026 21:17

test(openclaw-plugin): add vitest test infrastructure for volcengine#730

973c597

fix(openclaw-plugin): raise recallScoreThreshold default from 0.01 to…

164c4fe

… 0.15 (volcengine#730)

fix(openclaw-plugin): narrow isLeafLikeMemory boost to level-2 only (v…

54473bc

…olcengine#730)

fix(openclaw-plugin): prefer abstract over full content fetch in memo…

70de4ab

…ry injection (volcengine#730)

feat(openclaw-plugin): add recallMaxContentChars and recallPreferAbst…

7b781ec

…ract config (volcengine#730)

feat(openclaw-plugin): enforce tokenBudget in injection with decremen…

8c52d70

…t loop (volcengine#730)

fix(openclaw-plugin): update recallScoreThreshold placeholder to matc…

4d0ecec

…h new default (volcengine#730)

merge: integrate main into fix/730-context-bloat

5510127

Resolve conflict in index.ts: keep buildMemoryLinesWithBudget approach inside main's timeout wrapper (AUTO_RECALL_TIMEOUT_MS).

fix(openclaw-plugin): use truthy fallback for empty abstract strings (v…

43d217f

…olcengine#730) Change nullish coalescing (??) to truthy fallback (||) in resolveMemoryContent() so empty-string abstracts fall back to item.uri instead of producing empty content lines.

github-project-automation bot added this to OpenViking project Mar 20, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 20, 2026

qin-ctx assigned chenjw Mar 20, 2026

qin-ctx changed the base branch from main to refactor/openclaw-memory March 20, 2026 10:20

qin-ctx merged commit 6a089a5 into volcengine:refactor/openclaw-memory Mar 20, 2026
6 checks passed

github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openclaw-plugin): enforce token budget and reduce context bloat (#730)#796

fix(openclaw-plugin): enforce token budget and reduce context bloat (#730)#796
qin-ctx merged 10 commits intovolcengine:refactor/openclaw-memoryfrom
chethanuk:fix/730-context-bloat

chethanuk commented Mar 20, 2026 •

edited

Loading

Uh oh!

qin-ctx commented Mar 20, 2026

Uh oh!

qin-ctx commented Mar 20, 2026

Uh oh!

Uh oh!

chethanuk commented Mar 20, 2026

Uh oh!

Mijamind719 commented Mar 21, 2026

Uh oh!

chethanuk commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

chethanuk commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Before: Unbounded Injection Pipeline

After: Budget-Enforced Pipeline

Fix — 5 Independent Slices

Additional Improvements (Review Feedback)

Testing

Impact

New Configuration Options

Files Changed

Uh oh!

qin-ctx commented Mar 20, 2026

Uh oh!

qin-ctx commented Mar 20, 2026

Uh oh!

Uh oh!

chethanuk commented Mar 20, 2026

Uh oh!

Mijamind719 commented Mar 21, 2026

Uh oh!

chethanuk commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chethanuk commented Mar 20, 2026 •

edited

Loading