fix(openclaw-plugin): enforce token budget and reduce context bloat (#730) by chethanuk · Pull Request #6 · chethanuk/OpenViking

chethanuk · 2026-03-19T20:36:25Z

Problem

5 compounding causes inject 16K+ tokens of memory context per LLM call with no budget.
See: docs/744/category-1-openclaw-plugin/730-token-not-reduced.md

Fix (7 commits, each independently revertible)

Slice A: Raise recallScoreThreshold default 0.01→0.15 (filters ~70% irrelevant memories)
Slice C: Narrow isLeafLikeMemory boost to level-2 only (reduce false-positive relevance)
Slice B: Extract buildMemoryLines() and prefer item.abstract over client.read(uri) (100-300 chars vs full file)
Slice D: Add recallMaxContentChars (default: 500) and recallPreferAbstract (default: true) config options
Slice E: Add recallTokenBudget (default: 2000) with decrement loop — hard stop when budget exhausted

New Config Options

Option	Type	Default	Description
`recallMaxContentChars`	number	500	Max chars per memory content in injection
`recallPreferAbstract`	boolean	true	Use abstract instead of full content fetch
`recallTokenBudget`	number	2000	Max estimated tokens for injection total

All options added to: TypeScript type, assertAllowedKeys(), openclaw.plugin.json schema + uiHints.

Testing

10 regression tests (vitest), covering:

Score threshold filtering with default config
Backward compatibility (explicit recallScoreThreshold: 0.01 preserved)
isLeafLikeMemory ranking (level-2 only, no .md URI boost)
Abstract-first (client.read() skipped when abstract available)
Content truncation (recallMaxContentChars)
Token budget enforcement (decrement loop stops at limit)
estimateTokenCount (chars/4 heuristic)
Config defaults for all 3 new options

Impact

Context: 16K+ tokens → <2K tokens (bounded by budget)
Cost: ~$13.50/day → <$1.50/day for power users (200 memories, 100 turns)
No breaking changes: all new configs have backward-compatible defaults
Observability: injection log includes memory count + estimated tokens + budget

Test Plan

All 10 vitest tests pass
Each slice committed atomically (independently revertible)
Code review passed (spec compliance + code quality)
Manual verification with a real OpenViking server and 50+ memories

Closes volcengine#730

Summary by CodeRabbit

New Features
- Added three recall settings: max content chars, prefer-abstract toggle, and token budget for injected recall.
Improvements
- Raised default recall score threshold to 0.15.
- Recall now prefers abstracts, truncates long content with "..." and stops adding more memories when the token budget is exhausted (first item still included).
- Markdown (.md) items no longer receive automatic leaf-like boosting unless they meet the leaf condition.
Tests
- Added Vitest suite covering ranking, truncation, token estimation and budgeted injection.
Chores
- Added test scripts and Vitest config.

… 0.15 (volcengine#730)

…olcengine#730)

…ry injection (volcengine#730)

…ract config (volcengine#730)

…t loop (volcengine#730)

…h new default (volcengine#730)

coderabbitai · 2026-03-19T20:36:40Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7a61dcd5-2eb9-477e-8ea5-9c77e19de9b0

📥 Commits

Reviewing files that changed from the base of the PR and between 0bc26a2 and 43d217f.

📒 Files selected for processing (1)

examples/openclaw-plugin/index.ts

🚧 Files skipped from review as they are similar to previous changes (1)

examples/openclaw-plugin/index.ts

📝 Walkthrough

Walkthrough

Adds budget-aware memory injection: new config options (recallMaxContentChars, recallPreferAbstract, recallTokenBudget), token-budgeted sequential memory-line construction with truncation and abstract-preference, leaf detection tightened to level === 2, plus Vitest tests and test tooling.

Changes

Cohort / File(s)	Summary
Configuration & Manifest `examples/openclaw-plugin/config.ts`, `examples/openclaw-plugin/openclaw.plugin.json`	Introduce `recallMaxContentChars` (default 500, clamp 50–10000), `recallPreferAbstract` (default true), `recallTokenBudget` (default 2000, clamp 100–50000). Update `recallScoreThreshold` default to 0.15 and extend UI hints/placeholders.
Core logic & exports `examples/openclaw-plugin/index.ts`, `examples/openclaw-plugin/memory-ranking.ts`	Replace parallel memory-line creation with sequential, budget-aware `buildMemoryLinesWithBudget`. Add `estimateTokenCount`, `buildMemoryLines`, `buildMemoryLinesWithBudget` and option types. Resolve content preferring abstracts, fetch for `level === 2`, truncate to `recallMaxContentChars` with "..." and stop adding lines when token budget exceeded (ensure at least first line). `isLeafLikeMemory()` now only treats `level === 2` as leaf-like. Logging now reports injected count and estimated tokens.
Tests & tooling `examples/openclaw-plugin/__tests__/context-bloat-730.test.ts`, `examples/openclaw-plugin/vitest.config.ts`, `examples/openclaw-plugin/package.json`	Add Vitest test suite validating post-processing, building, budget behavior, token estimation, and ranking. Add Vitest config (globals, node env), test scripts, and `vitest` devDependency.

sequenceDiagram
    participant Plugin as Memory Plugin
    participant Builder as buildMemoryLinesWithBudget
    participant Config as Config
    participant Reader as readFn
    participant TokenEst as estimateTokenCount

    Plugin->>Builder: call(memories[], readFn, options)
    Builder->>Config: read recallPreferAbstract, recallTokenBudget, recallMaxContentChars
    loop per memory (until budget)
        Builder->>Config: check preferAbstract & item.level
        alt preferAbstract and item.abstract non-empty
            Builder->>Builder: use item.abstract
        else item.level == 2
            Builder->>Reader: read(item.uri)
            Reader-->>Builder: content or error
            alt success & non-blank
                Builder->>Builder: use content
            else
                Builder->>Builder: fallback to item.abstract or uri
            end
        end
        Builder->>Builder: truncate to recallMaxContentChars + "..."
        Builder->>TokenEst: estimateTokenCount(text)
        TokenEst-->>Builder: tokens
        alt current + tokens <= budget or first item
            Builder->>Builder: add line, update totals
        else
            Builder->>Builder: stop iteration
        end
    end
    Builder-->>Plugin: { lines, estimatedTokens }

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I nibble at bytes and count each token,
Abstracts first, full reads only when spoken,
Trimmed tails trail with a gentle "...",
Leaf-two bounds forward while budgets pause,
Tests hop in chorus — memory dance done.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'enforce token budget and reduce context bloat' directly and accurately summarizes the PR's main objective of reducing excessive memory context (16K+ tokens) through token budget enforcement and filtering.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/730-context-bloat

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qodo-code-review · 2026-03-19T20:36:46Z

Review Summary by Qodo

Enforce token budget and reduce context bloat in memory injection

✨ Enhancement 🐞 Bug fix

Walkthroughs

Description

• Enforce token budget (default 2000) with decrement loop to stop injection when exhausted
• Raise recallScoreThreshold default from 0.01 to 0.15 to filter ~70% irrelevant memories
• Prefer memory abstract over full content fetch to reduce token usage by 100-300 chars
• Add three new config options: recallMaxContentChars (500), recallPreferAbstract (true),
  recallTokenBudget (2000)
• Narrow isLeafLikeMemory boost to level-2 only, removing false-positive .md URI boost
• Add comprehensive vitest test suite (10 tests) covering all slices and backward compatibility

Diagram

flowchart LR
  A["Memory Ranking"] -->|"Filter by score >= 0.15"| B["Post-Process"]
  B -->|"Prefer abstract"| C["Build Memory Lines"]
  C -->|"Truncate to 500 chars"| D["Content Prep"]
  D -->|"Estimate tokens"| E["Budget Loop"]
  E -->|"Stop at 2000 tokens"| F["Inject Context"]

File Changes

1. examples/openclaw-plugin/__tests__/context-bloat-730.test.ts 🧪 Tests +193/-0

Add vitest test infrastructure for context bloat fixes

• Add comprehensive vitest test suite with 10 tests covering all optimization slices
• Test score threshold filtering (default 0.15 vs backward-compat 0.01)
• Test abstract-first preference with mocked client.read() calls
• Test content truncation with recallMaxContentChars limit
• Test token budget enforcement with decrement loop and token estimation
• Test isLeafLikeMemory narrowing to level-2 only

examples/openclaw-plugin/tests/context-bloat-730.test.ts

2. examples/openclaw-plugin/config.ts ⚙️ Configuration changes +36/-1

Add three new recall config options with validation

• Add three new config options to MemoryOpenVikingConfig type: recallMaxContentChars,
 recallPreferAbstract, recallTokenBudget
• Update DEFAULT_RECALL_SCORE_THRESHOLD from 0.01 to 0.15
• Add default constants for new options (500, true, 2000)
• Add validation and bounds-checking for all three new options in schema parser
• Add UI hints with labels, placeholders, and help text for new options
• Update assertAllowedKeys() to include new config keys

examples/openclaw-plugin/config.ts

3. examples/openclaw-plugin/index.ts ✨ Enhancement +111/-15

Implement token budget enforcement and abstract preference

• Extract buildMemoryLines() function to handle abstract-first preference and content truncation
• Add buildMemoryLinesWithBudget() function with token budget enforcement via decrement loop
• Add estimateTokenCount() utility using chars/4 heuristic for token estimation
• Replace inline memory injection logic with call to buildMemoryLinesWithBudget()
• Update injection log to include memory count, estimated tokens, and budget value
• Export new functions for testing

examples/openclaw-plugin/index.ts

View more (4)

4. examples/openclaw-plugin/memory-ranking.ts 🐞 Bug fix +1/-1

Narrow leaf-like memory boost to level-2 only
• Narrow isLeafLikeMemory() function to only check item.level === 2
• Remove .md URI suffix boost that was causing false-positive relevance
examples/openclaw-plugin/memory-ranking.ts

5. examples/openclaw-plugin/vitest.config.ts ⚙️ Configuration changes +8/-0

Add vitest configuration
• Add vitest configuration file with globals and node environment
examples/openclaw-plugin/vitest.config.ts

6. examples/openclaw-plugin/openclaw.plugin.json ⚙️ Configuration changes +27/-1

Update plugin schema with new recall options

• Update recallScoreThreshold placeholder from "0.01" to "0.15"
• Add schema definitions for three new config options with type and help text
• Add UI hints for recallMaxContentChars, recallPreferAbstract, recallTokenBudget

examples/openclaw-plugin/openclaw.plugin.json

7. examples/openclaw-plugin/package.json Dependencies +6/-1

Add vitest test infrastructure to package.json
• Add test scripts: test (vitest run) and test:watch (vitest watch mode)
• Add vitest ^4.1.0 to devDependencies
examples/openclaw-plugin/package.json

qodo-code-review · 2026-03-19T20:36:47Z

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0) 📐 Spec deviations (0)

1. ~~Budget can be exceeded~~ ☑ 🐞 Bug ✓ Correctness

Description

buildMemoryLinesWithBudget() can inject a memory line even when its estimated token count exceeds
recallTokenBudget (the first line is always allowed through), so injected tokens can exceed the
configured budget. Additionally, the budget only counts per-line tokens and does not include the
fixed wrapper/preamble text added to the prompt, further weakening the cap.

Code

examples/openclaw-plugin/index.ts[R744-785]

+  let budgetRemaining = options.recallTokenBudget;
+  const lines: string[] = [];
+  let totalTokens = 0;
+
+  for (const item of memories) {
+    if (budgetRemaining <= 0) {
+      break;
+    }
+
+    let content: string;
+
+    if (options.recallPreferAbstract && item.abstract?.trim()) {
+      content = item.abstract.trim();
+    } else if (item.level === 2) {
+      try {
+        const fullContent = await readFn(item.uri);
+        content =
+          fullContent && typeof fullContent === "string" && fullContent.trim()
+            ? fullContent.trim()
+            : (item.abstract ?? item.uri);
+      } catch {
+        content = item.abstract ?? item.uri;
+      }
+    } else {
+      content = item.abstract ?? item.uri;
+    }
+
+    if (content.length > options.recallMaxContentChars) {
+      content = content.slice(0, options.recallMaxContentChars) + "...";
+    }
+
+    const line = `- [${item.category ?? "memory"}] ${content}`;
+    const lineTokens = estimateTokenCount(line);
+
+    if (lineTokens > budgetRemaining && lines.length > 0) {
+      break;
+    }
+
+    lines.push(line);
+    totalTokens += lineTokens;
+    budgetRemaining -= lineTokens;
+  }

Evidence
The loop only stops on overflow when at least one line has already been added; therefore a single
oversized memory can exceed the budget. The injected context also includes a constant wrapper string
around the joined lines, but the budget calculation only accounts for the line strings.
examples/openclaw-plugin/index.ts[739-788]
examples/openclaw-plugin/index.ts[486-507]
examples/openclaw-plugin/config.ts[278-294]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`buildMemoryLinesWithBudget()` is intended to enforce `recallTokenBudget`, but it can exceed the budget because it permits the first line even if it is larger than the remaining budget. Also, the wrapper/preamble text added around memory lines is not included in the token estimate, so the injected context can exceed the configured budget even when the lines themselves fit.
## Issue Context
The config/help text says injection should stop when the budget is exhausted, so the implementation should not knowingly exceed the budget.
## Fix Focus Areas
- examples/openclaw-plugin/index.ts[739-788]
- examples/openclaw-plugin/index.ts[486-507]
- examples/openclaw-plugin/config.ts[278-294]
## What to change
- Make the budget check strict:
- If `lineTokens &amp;amp;gt; budgetRemaining`, do not add the line (even if it would be the first line), OR truncate further to fit the remaining budget.
- Account for non-line overhead:
- Subtract an estimated token cost for the wrapper/preamble (`&amp;amp;lt;relevant-memories&amp;amp;gt;...`) and newline separators from the budget before iterating, or include it in `totalTokens`/budget checks.
- Ensure the returned `estimatedTokens` reflects the same text you actually inject (including separators/overhead if you enforce budget on total injection).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Inject-detail logs wrong count 🐞 Bug ✓ Correctness

Description

The summary log reports the post-budget injected count (memoryLines.length), but inject-detail
still logs the pre-budget list/count (memories.length). When the budget trims injected lines, logs
become internally inconsistent and misleading for debugging.

Code

examples/openclaw-plugin/index.ts[R497-502]

+              api.logger.info(
+                `openviking: injecting ${memoryLines.length} memories (~${estimatedTokens} tokens, budget=${cfg.recallTokenBudget})`,
+              );
            api.logger.info(
              `openviking: inject-detail ${toJsonLog({ count: memories.length, memories: summarizeInjectionMemories(memories) })}`,
            );

Evidence
buildMemoryLinesWithBudget() may return fewer injected lines than the candidate memories array,
but the second log line still serializes the full pre-budget set.
examples/openclaw-plugin/index.ts[486-507]
examples/openclaw-plugin/index.ts[739-788]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`inject-detail` currently logs the pre-budget `memories` list/count, while the summary log uses the post-budget `memoryLines.length`. When the token budget trims injections, this produces contradictory logs.
## Issue Context
Accurate logs are important for verifying that budget enforcement is working and for understanding what context was actually injected.
## Fix Focus Areas
- examples/openclaw-plugin/index.ts[486-507]
- examples/openclaw-plugin/index.ts[739-788]
## What to change
- Change `inject-detail` to reflect the injected subset, not the pre-budget list.
- Option A (best): Have `buildMemoryLinesWithBudget` also return `injectedItems: FindResultItem[]` in the same order as `lines`, and log/summarize that.
- Option B: Log both counts explicitly (e.g., `selected=${memories.length}, injected=${memoryLines.length}`) and ensure the detailed list matches `injected`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. README defaults out of sync 🐞 Bug ⚙ Maintainability

Description

README still documents recallScoreThreshold default as 0.01 and does not document the new recall
options, but the code/schema now defaults to 0.15 and introduces three new recall configs. This will
cause users to configure/expect different recall behavior than the plugin actually uses.

Code

examples/openclaw-plugin/config.ts[R38-41]

+const DEFAULT_RECALL_SCORE_THRESHOLD = 0.15;
+const DEFAULT_RECALL_MAX_CONTENT_CHARS = 500;
+const DEFAULT_RECALL_PREFER_ABSTRACT = true;
+const DEFAULT_RECALL_TOKEN_BUDGET = 2000;

Evidence
The code default was changed to 0.15 and new defaults/options were added, but the README table still
reflects the old threshold and lacks the new configuration keys.
examples/openclaw-plugin/config.ts[31-44]
examples/openclaw-plugin/README.md[350-368]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The README’s plugin config table is outdated: it still lists `recallScoreThreshold` default as `0.01` and doesn’t mention `recallMaxContentChars`, `recallPreferAbstract`, or `recallTokenBudget`. This creates user-facing confusion and misconfiguration.
## Issue Context
This PR changes defaults and adds new config keys in `config.ts` and `openclaw.plugin.json`, so README should match those user-visible settings.
## Fix Focus Areas
- examples/openclaw-plugin/config.ts[31-44]
- examples/openclaw-plugin/README.md[350-368]
## What to change
- Update the README table:
- Change `recallScoreThreshold` default to `0.15`.
- Add rows for `recallMaxContentChars` (500), `recallPreferAbstract` (true), and `recallTokenBudget` (2000) with short descriptions matching the schema/ui hints.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

gemini-code-assist · 2026-03-19T20:36:49Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue of excessive memory context injection in the OpenClaw plugin, which led to high token usage and increased operational costs. The changes introduce several strategies to enforce a token budget, reduce the size of injected memories, and improve the relevance filtering of recalled information. This significantly optimizes the plugin's performance and cost-efficiency without introducing breaking changes.

Highlights

Increased Memory Filtering: The default recallScoreThreshold was raised from 0.01 to 0.15, significantly reducing irrelevant memories injected into the context.
Narrowed Memory Boosting: The isLeafLikeMemory logic was refined to boost only level-2 memories, improving relevance and reducing false positives.
Optimized Content Extraction: Memory content extraction was refactored to prioritize abstracts over full file reads, leading to substantial token savings.
Introduced New Configuration Options: Added recallMaxContentChars (default: 500), recallPreferAbstract (default: true), and recallTokenBudget (default: 2000) to provide fine-grained control over memory injection.
Enforced Token Budget: A new mechanism was implemented to enforce a token budget for memory injection, stopping when the budget is exhausted to prevent context bloat.
Comprehensive Testing: Ten new regression tests were added to cover all new features, default configurations, and backward compatibility.
Significant Impact: Reduced context size from 16K+ tokens to under 2K tokens and projected cost savings for power users.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-03-19T20:37:12Z

Failed to generate code suggestions for PR

examples/openclaw-plugin/index.ts

coderabbitai

🧹 Nitpick comments (3)

examples/openclaw-plugin/__tests__/context-bloat-730.test.ts (1)

148-153: Consider tightening the token budget assertion.

The test allows estimatedTokens to be up to 120 with a budget of 100, which is a 20% overshoot. This is because the implementation allows the first line to exceed the budget. While functionally correct, consider documenting this behavior in the test comment or adjusting the assertion to be more precise (e.g., expect(estimatedTokens).toBeLessThanOrEqual(106) for one line at ~53 tokens).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/openclaw-plugin/__tests__/context-bloat-730.test.ts` around lines
148 - 153, Tighten the token-budget assertion in the test by reducing the
allowed overshoot or document the acceptable overshoot behavior: change the
assertion on estimatedTokens in the test (the variable estimatedTokens used
alongside lines) to a stricter bound (e.g.,
expect(estimatedTokens).toBeLessThanOrEqual(106)) to reflect one-line ~53-token
rounding, or add a clarifying comment above the assertions explaining why a
small overshoot above the 100-token budget is allowed due to the first-line
rounding behavior.

examples/openclaw-plugin/index.ts (2)

701-733: Consider extracting shared content-resolution logic to reduce duplication.

buildMemoryLines and buildMemoryLinesWithBudget share nearly identical logic for resolving content (lines 708-724 vs 753-769) and truncation (lines 726-728 vs 771-773). Consider extracting a helper function like resolveMemoryContent(item, readFn, options) to reduce duplication.

♻️ Proposed refactor to extract shared logic

+async function resolveMemoryContent(
+  item: FindResultItem,
+  readFn: (uri: string) => Promise<string>,
+  options: BuildMemoryLinesOptions,
+): Promise<string> {
+  let content: string;
+
+  if (options.recallPreferAbstract && item.abstract?.trim()) {
+    content = item.abstract.trim();
+  } else if (item.level === 2) {
+    try {
+      const fullContent = await readFn(item.uri);
+      content =
+        fullContent && typeof fullContent === "string" && fullContent.trim()
+          ? fullContent.trim()
+          : (item.abstract ?? item.uri);
+    } catch {
+      content = item.abstract ?? item.uri;
+    }
+  } else {
+    content = item.abstract ?? item.uri;
+  }
+
+  if (content.length > options.recallMaxContentChars) {
+    content = content.slice(0, options.recallMaxContentChars) + "...";
+  }
+
+  return content;
+}

 export async function buildMemoryLines(
   memories: FindResultItem[],
   readFn: (uri: string) => Promise<string>,
   options: BuildMemoryLinesOptions,
 ): Promise<string[]> {
   const lines: string[] = [];
   for (const item of memories) {
-    let content: string;
-
-    if (options.recallPreferAbstract && item.abstract?.trim()) {
-      content = item.abstract.trim();
-    } else if (item.level === 2) {
-      try {
-        const fullContent = await readFn(item.uri);
-        content =
-          fullContent && typeof fullContent === "string" && fullContent.trim()
-            ? fullContent.trim()
-            : (item.abstract ?? item.uri);
-      } catch {
-        content = item.abstract ?? item.uri;
-      }
-    } else {
-      content = item.abstract ?? item.uri;
-    }
-
-    if (content.length > options.recallMaxContentChars) {
-      content = content.slice(0, options.recallMaxContentChars) + "...";
-    }
-
+    const content = await resolveMemoryContent(item, readFn, options);
     lines.push(`- [${item.category ?? "memory"}] ${content}`);
   }
   return lines;
 }

Apply similar changes to buildMemoryLinesWithBudget.

Also applies to: 739-788

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/openclaw-plugin/index.ts` around lines 701 - 733, buildMemoryLines
and buildMemoryLinesWithBudget duplicate the logic that resolves content and
truncates it; extract that into a shared helper (e.g.,
resolveMemoryContent(item: FindResultItem, readFn:
(uri:string)=>Promise<string>, options: BuildMemoryLinesOptions): string) and
call it from both buildMemoryLines and buildMemoryLinesWithBudget to centralize:
implement the preference for abstract (options.recallPreferAbstract), the
level===2 read-with-try/catch and fallback to item.abstract or item.uri, and the
truncation to options.recallMaxContentChars with ellipsis so both functions
simply format lines from the returned content.

776-784: Clarify the budget overshoot behavior.

The condition lineTokens > budgetRemaining && lines.length > 0 allows the first memory to exceed the budget, ensuring at least one memory is always injected. This is reasonable behavior but worth documenting in the function's JSDoc to set expectations.

📝 Proposed documentation addition

+/**
+ * Build memory lines with token budget enforcement.
+ * At least one memory is always included if the input is non-empty,
+ * even if that single memory exceeds the budget.
+ */
 export async function buildMemoryLinesWithBudget(

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/openclaw-plugin/index.ts` around lines 776 - 784, Update the
function's JSDoc that contains this token-budget loop to explicitly document the
overshoot behavior: note that the check (lineTokens > budgetRemaining &&
lines.length > 0) intentionally allows the first memory (when lines.length ===
0) to exceed the budget so at least one memory is injected; describe the roles
of lineTokens, budgetRemaining, and lines and state that subsequent lines will
not be added once they would push over the remaining budget. Reference the
variables lineTokens, budgetRemaining, and lines in the JSDoc so callers
understand the expected behavior and trade-off.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@examples/openclaw-plugin/__tests__/context-bloat-730.test.ts`:
- Around line 148-153: Tighten the token-budget assertion in the test by
reducing the allowed overshoot or document the acceptable overshoot behavior:
change the assertion on estimatedTokens in the test (the variable
estimatedTokens used alongside lines) to a stricter bound (e.g.,
expect(estimatedTokens).toBeLessThanOrEqual(106)) to reflect one-line ~53-token
rounding, or add a clarifying comment above the assertions explaining why a
small overshoot above the 100-token budget is allowed due to the first-line
rounding behavior.

In `@examples/openclaw-plugin/index.ts`:
- Around line 701-733: buildMemoryLines and buildMemoryLinesWithBudget duplicate
the logic that resolves content and truncates it; extract that into a shared
helper (e.g., resolveMemoryContent(item: FindResultItem, readFn:
(uri:string)=>Promise<string>, options: BuildMemoryLinesOptions): string) and
call it from both buildMemoryLines and buildMemoryLinesWithBudget to centralize:
implement the preference for abstract (options.recallPreferAbstract), the
level===2 read-with-try/catch and fallback to item.abstract or item.uri, and the
truncation to options.recallMaxContentChars with ellipsis so both functions
simply format lines from the returned content.
- Around line 776-784: Update the function's JSDoc that contains this
token-budget loop to explicitly document the overshoot behavior: note that the
check (lineTokens > budgetRemaining && lines.length > 0) intentionally allows
the first memory (when lines.length === 0) to exceed the budget so at least one
memory is injected; describe the roles of lineTokens, budgetRemaining, and lines
and state that subsequent lines will not be added once they would push over the
remaining budget. Reference the variables lineTokens, budgetRemaining, and lines
in the JSDoc so callers understand the expected behavior and trade-off.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 60ca8323-0844-4a1a-8914-1d1d8199a5c3

📥 Commits

Reviewing files that changed from the base of the PR and between 9d59d6b and 4d0ecec.

⛔ Files ignored due to path filters (1)

examples/openclaw-plugin/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (7)

examples/openclaw-plugin/__tests__/context-bloat-730.test.ts
examples/openclaw-plugin/config.ts
examples/openclaw-plugin/index.ts
examples/openclaw-plugin/memory-ranking.ts
examples/openclaw-plugin/openclaw.plugin.json
examples/openclaw-plugin/package.json
examples/openclaw-plugin/vitest.config.ts

gemini-code-assist

Code Review

This pull request is a solid improvement that effectively addresses the token bloat issue by introducing a token budget and other optimizations for memory recall. The changes are well-organized and include a comprehensive suite of new regression tests, which is excellent for ensuring stability. I have one suggestion to improve maintainability by reducing some code duplication in the newly added functions.

gemini-code-assist · 2026-03-19T20:40:51Z

examples/openclaw-plugin/index.ts

+export async function buildMemoryLines(
+  memories: FindResultItem[],
+  readFn: (uri: string) => Promise<string>,
+  options: BuildMemoryLinesOptions,
+): Promise<string[]> {
+  const lines: string[] = [];
+  for (const item of memories) {
+    let content: string;
+
+    if (options.recallPreferAbstract && item.abstract?.trim()) {
+      content = item.abstract.trim();
+    } else if (item.level === 2) {
+      try {
+        const fullContent = await readFn(item.uri);
+        content =
+          fullContent && typeof fullContent === "string" && fullContent.trim()
+            ? fullContent.trim()
+            : (item.abstract ?? item.uri);
+      } catch {
+        content = item.abstract ?? item.uri;
+      }
+    } else {
+      content = item.abstract ?? item.uri;
+    }
+
+    if (content.length > options.recallMaxContentChars) {
+      content = content.slice(0, options.recallMaxContentChars) + "...";
+    }
+
+    lines.push(`- [${item.category ?? "memory"}] ${content}`);
+  }
+  return lines;
+}
+
+export type BuildMemoryLinesWithBudgetOptions = BuildMemoryLinesOptions & {
+  recallTokenBudget: number;
+};
+
+export async function buildMemoryLinesWithBudget(
+  memories: FindResultItem[],
+  readFn: (uri: string) => Promise<string>,
+  options: BuildMemoryLinesWithBudgetOptions,
+): Promise<{ lines: string[]; estimatedTokens: number }> {
+  let budgetRemaining = options.recallTokenBudget;
+  const lines: string[] = [];
+  let totalTokens = 0;
+
+  for (const item of memories) {
+    if (budgetRemaining <= 0) {
+      break;
+    }
+
+    let content: string;
+
+    if (options.recallPreferAbstract && item.abstract?.trim()) {
+      content = item.abstract.trim();
+    } else if (item.level === 2) {
+      try {
+        const fullContent = await readFn(item.uri);
+        content =
+          fullContent && typeof fullContent === "string" && fullContent.trim()
+            ? fullContent.trim()
+            : (item.abstract ?? item.uri);
+      } catch {
+        content = item.abstract ?? item.uri;
+      }
+    } else {
+      content = item.abstract ?? item.uri;
+    }
+
+    if (content.length > options.recallMaxContentChars) {
+      content = content.slice(0, options.recallMaxContentChars) + "...";
+    }
+
+    const line = `- [${item.category ?? "memory"}] ${content}`;
+    const lineTokens = estimateTokenCount(line);
+
+    if (lineTokens > budgetRemaining && lines.length > 0) {
+      break;
+    }
+
+    lines.push(line);
+    totalTokens += lineTokens;
+    budgetRemaining -= lineTokens;
+  }
+
+  return { lines, estimatedTokens: totalTokens };
+}


There's significant code duplication between buildMemoryLines and buildMemoryLinesWithBudget. The logic for retrieving and truncating memory content is identical in both functions. To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, this common logic can be extracted into a private helper function.

async function getMemoryContent( item: FindResultItem, readFn: (uri: string) => Promise<string>, options: BuildMemoryLinesOptions, ): Promise<string> { let content: string; if (options.recallPreferAbstract && item.abstract?.trim()) { content = item.abstract.trim(); } else if (item.level === 2) { try { const fullContent = await readFn(item.uri); content = fullContent && typeof fullContent === "string" && fullContent.trim() ? fullContent.trim() : (item.abstract ?? item.uri); } catch { content = item.abstract ?? item.uri; } } else { content = item.abstract ?? item.uri; } if (content.length > options.recallMaxContentChars) { content = content.slice(0, options.recallMaxContentChars) + "..."; } return content; } export async function buildMemoryLines( memories: FindResultItem[], readFn: (uri: string) => Promise<string>, options: BuildMemoryLinesOptions, ): Promise<string[]> { const lines: string[] = []; for (const item of memories) { const content = await getMemoryContent(item, readFn, options); lines.push(`- [${item.category ?? "memory"}] ${content}`); } return lines; } export type BuildMemoryLinesWithBudgetOptions = BuildMemoryLinesOptions & { recallTokenBudget: number; }; export async function buildMemoryLinesWithBudget( memories: FindResultItem[], readFn: (uri: string) => Promise<string>, options: BuildMemoryLinesWithBudgetOptions, ): Promise<{ lines: string[]; estimatedTokens: number }> { let budgetRemaining = options.recallTokenBudget; const lines: string[] = []; let totalTokens = 0; for (const item of memories) { if (budgetRemaining <= 0) { break; } const content = await getMemoryContent(item, readFn, options); const line = `- [${item.category ?? "memory"}] ${content}`; const lineTokens = estimateTokenCount(line); if (lineTokens > budgetRemaining && lines.length > 0) { break; } lines.push(line); totalTokens += lineTokens; budgetRemaining -= lineTokens; } return { lines, estimatedTokens: totalTokens }; }

…get behavior (volcengine#730) Extract resolveMemoryContent() helper to eliminate duplicate content-resolution logic between buildMemoryLines and buildMemoryLinesWithBudget. Add JSDoc and inline comment documenting intentional first-line budget overshoot (spec §6.2). Tighten test assertion from <=120 to <=106 tokens.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/openclaw-plugin/index.ts`:
- Around line 714-721: The code uses nullish coalescing (item.abstract ??
item.uri) which does not fallback when abstract is an empty string, so change
these assignments to use a truthy fallback (e.g., item.abstract?.trim() ||
item.uri or item.abstract || item.uri) wherever content is set (inside the try
block assignment, the catch block assignment, and the else branch) so empty
strings fall back to item.uri; update the occurrences in the functions/blocks
that set content (the try block that checks fullContent and the subsequent
catch/else assignments) accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b66805c-4e11-435e-89f5-e0136cbc69a5

📥 Commits

Reviewing files that changed from the base of the PR and between 4d0ecec and 0bc26a2.

📒 Files selected for processing (2)

examples/openclaw-plugin/__tests__/context-bloat-730.test.ts
examples/openclaw-plugin/index.ts

examples/openclaw-plugin/index.ts

Resolve conflict in index.ts: keep buildMemoryLinesWithBudget approach inside main's timeout wrapper (AUTO_RECALL_TIMEOUT_MS).

…olcengine#730) Change nullish coalescing (??) to truthy fallback (||) in resolveMemoryContent() so empty-string abstracts fall back to item.uri instead of producing empty content lines.

chethanuk · 2026-03-20T00:58:00Z

Why 0.15 over 0.10?

Short answer: 0.15 is the better choice, but the margin is small. Neither has hard empirical data in the codebase.

Evidence from research

┌───────────────┬─────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────┐
│ Factor │ 0.10 │ 0.15 │
├───────────────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────┤
│ Noise floor │ Sits inside the noise floor for high-dim embeddings │ Just above the noise floor — filters random │
│ │ (768-3072 dims) — random vectors cluster at 0.0-0.12 │ noise without losing relevant results │
├───────────────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────┤
│ False │ Keeps ~5-15% more noise-tail results │ Excludes noise while losing <1% of genuinely │
│ positives │ │ relevant content │
├───────────────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────┤
│ Industry │ Below most recommended ranges │ Bottom of the 0.15-0.25 noise-floor range │
│ practice │ │ recommended for raw cosine similarity │
└───────────────┴─────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────┘

Codebase-specific factors

The +0.12 leaf boost in memory-ranking.ts means a raw 0.05 score becomes 0.17 after boosting — it passes 0.15 but barely. With
0.10, even a 0.00 raw score + boost would pass, defeating the purpose.
The original 0.01 was effectively "no filter" — nearly everything passed.
No score distribution data exists in the codebase to make a data-driven choice.

Recommendation

0.15 is correct for now. It's the conservative edge of the industry-standard noise floor range (0.15-0.25). Going to 0.10 would admit
too much noise, especially given the additive boost system. The real safety net is the token budget (Slice E) — even if a few
marginal memories pass the threshold, the budget caps total injection.

If you want to be data-driven later, you could add telemetry to log score distributions and empirically find the optimal cutoff for
this specific embedding model. But 0.15 is a well-justified default.

chethanuk added 7 commits March 19, 2026 21:17

test(openclaw-plugin): add vitest test infrastructure for volcengine#730

973c597

fix(openclaw-plugin): raise recallScoreThreshold default from 0.01 to…

164c4fe

… 0.15 (volcengine#730)

fix(openclaw-plugin): narrow isLeafLikeMemory boost to level-2 only (v…

54473bc

…olcengine#730)

fix(openclaw-plugin): prefer abstract over full content fetch in memo…

70de4ab

…ry injection (volcengine#730)

feat(openclaw-plugin): add recallMaxContentChars and recallPreferAbst…

7b781ec

…ract config (volcengine#730)

feat(openclaw-plugin): enforce tokenBudget in injection with decremen…

8c52d70

…t loop (volcengine#730)

fix(openclaw-plugin): update recallScoreThreshold placeholder to matc…

4d0ecec

…h new default (volcengine#730)

qodo-code-review bot reviewed Mar 19, 2026

View reviewed changes

examples/openclaw-plugin/index.ts Show resolved Hide resolved

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

examples/openclaw-plugin/index.ts Outdated Show resolved Hide resolved

chethanuk added 2 commits March 19, 2026 22:17

merge: integrate main into fix/730-context-bloat

5510127

Resolve conflict in index.ts: keep buildMemoryLinesWithBudget approach inside main's timeout wrapper (AUTO_RECALL_TIMEOUT_MS).

fix(openclaw-plugin): use truthy fallback for empty abstract strings (v…

43d217f

…olcengine#730) Change nullish coalescing (??) to truthy fallback (||) in resolveMemoryContent() so empty-string abstracts fall back to item.uri instead of producing empty content lines.

Conversation

chethanuk commented Mar 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix (7 commits, each independently revertible)

New Config Options

Testing

Impact

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

qodo-code-review bot commented Mar 19, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chethanuk commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chethanuk commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 19, 2026 •

edited

Loading

qodo-code-review bot commented Mar 19, 2026 •

edited

Loading