[Tracking]: Token 消耗与成本优化问题汇总 / Token Consumption & Cost Optimization Tracker

## 背景 / Background

社区陆续提交了多个与 **Token 消耗、API 调用成本、Embedding 处理** 相关的 issue。为了统一管理和推进，在此汇总分类，并说明各类问题的处理状态和方向。

Multiple issues related to **token consumption, API call costs, and embedding processing** have been reported by the community. This tracking issue consolidates and categorizes them, with status updates and resolution directions.

补充说明：下表中的“当前状态 / Current Status”按 **2026-03-25** 的 issue 状态更新。

Note: the “Current Status” column below reflects issue state as of **March 25, 2026**.

---

## 分类一：OpenClaw 插件相关 / Category 1: OpenClaw Plugin Related

> **状态：插件 2.0 正在开发中，将统一解决此类问题**
> **Status: Plugin 2.0 is under active development and will address these issues**

| Issue | 标题 / Title | 核心问题 / Core Problem | 当前状态 / Current Status |
|-------|-------------|------------------------|---------------------------|
| #730 | openclaw 配置 openviking 后 token 没有下降 | 所有 session 加载 *.md，上下文立刻到 16k+ / All sessions load *.md files, context immediately reaches 16k+ | OPEN |
| #455 | Openclaw 插件工具调用始终返回 extract returned 0 memories | 记忆提取失败，auto-capture 无效 / Memory extraction fails, auto-capture not working | OPEN |
| #630 | OpenClaw + OpenViking Memory Extraction Issue | 跨服务器部署下记忆提取返回 0 / Cross-server deployment returns 0 memories | OPEN |
| #680 | 启动 openclaw 插件后一直 debug cron: timer armed | 插件无法正常启动 / Plugin fails to start properly | OPEN |
| #551 | openclaw 无法集成 openviking | 字段报错不在 allowlist 中 / Field not in allowlist error | OPEN |

**处理方向 / Resolution Direction:**
插件 2.0 将重构上下文注入机制，优化记忆加载策略，避免全量加载 *.md 导致的 token 浪费。记忆提取和插件启动相关问题也将在 2.0 中统一修复。

Plugin 2.0 will overhaul the context injection mechanism, optimize memory loading strategy to avoid loading all *.md files, and fix memory extraction and plugin lifecycle issues.

---

## 分类二：API 调用成本异常 / Category 2: Abnormal API Call Costs

> **状态：正在测试并优化**
> **Status: Under testing and optimization**

| Issue | 标题 / Title | 核心问题 / Core Problem | 当前状态 / Current Status |
|-------|-------------|------------------------|---------------------------|
| #729 | VLM 用量异常：重试风暴导致 5 秒内 5405 次调用 | 欠费 403 后无熔断机制，导致重试风暴 / No circuit breaker after 403, causing retry storm | CLOSED |
| #505 | Memory extraction triggers O(n²) semantic reprocessing | 每次写入记忆都重新处理所有文件，成本二次增长 / Every memory write reprocesses all files, quadratic cost growth | OPEN |
| #769 | repeated parent-directory semantic recomputation on each new memory write | 单次写入触发父目录全量语义重算，成本与目录规模而非变更规模相关 / Parent-directory semantic recomputation makes cost scale with directory size rather than change size | OPEN |
| #907 | Batch multiple file summaries per VLM call to reduce RPM pressure | 当前 1 文件 1 次 VLM 调用，RPM 和按请求计费场景下成本过高 / One-file-per-request summary generation wastes RPM and per-request quota | OPEN |
| #922 | Unify config-driven retry across VLM and embedding | VLM/embedding 重试策略不一致，导致限流和瞬时故障处理成本失衡 / Retry behavior is inconsistent across VLM and embedding paths | OPEN |

**处理方向 / Resolution Direction:**
- **#729**：将引入熔断机制（circuit breaker）+ 指数退避重试策略，对 4xx 类不可恢复错误立即停止重试 / Introducing circuit breaker + exponential backoff; immediately stop retrying on non-recoverable 4xx errors
- **#505**：优化 `_enqueue_semantic_for_parent` 逻辑，引入增量处理，仅对变更的文件进行 re-vectorization / Optimizing to incremental processing, only re-vectorizing changed files
- **#769**：需要进一步避免 memory 目录的重复父目录重算 / Further work is needed to avoid repeated parent-directory recomputation for memory trees
- **#907**：通过批量文件摘要降低 RPM 压力 / Reduce RPM pressure through batched file summarization
- **#922**：统一 VLM 与 embedding 的配置驱动重试 / Unify config-driven retry across VLM and embedding

---

## 分类三：Embedding 处理与分块策略 / Category 3: Embedding Processing & Chunking Strategy

> **状态：正在测试并优化**
> **Status: Under testing and optimization**

| Issue | 标题 / Title | 核心问题 / Core Problem | 当前状态 / Current Status |
|-------|-------------|------------------------|---------------------------|
| #731 | Input sequence length exceeds max input length of embedding model | 输入超过模型 max_tokens（如 512），导致 500 错误 / Input exceeds model max_tokens, causing 500 error | CLOSED |
| #531 | Embedding truncation 和 chunking 职责不清 | 截断 vs 分块策略缺乏统一设计 / Truncation vs chunking lacks unified design | OPEN |
| #530 | Long memory indexing 应使用 chunked vectorization | 长记忆需要分块向量化而非单条 embedding / Long memories need chunked vectorization instead of single-record embedding | CLOSED |
| #857 | make text file vectorization strategy configurable to avoid embedding oversize failures | 文本文件向量化策略缺少可配置性，易触发超长输入失败 / Text-file vectorization needs configurable strategy to avoid oversize failures | OPEN |
| #931 | Large code files fail embedding: no input truncation before embedding API call | 大型代码文件嵌入前缺少截断/分块保护 / Large code files fail embedding due to missing truncation/chunking guardrails | CLOSED |

**处理方向 / Resolution Direction:**
统一 chunking 策略，在 vectorization 前做长度检测和智能分块，区分 memory/file/directory 各层级的分块策略，确保任何 embedding 模型都不会收到超长输入。

Unifying chunking strategy with pre-vectorization length detection and intelligent chunking. Establishing clear chunking policies per level (memory/file/directory) to ensure no embedding model receives oversized input.

---

## 分类四：基础设施优化 / Category 4: Infrastructure Optimization

> **状态：正在测试并优化**
> **Status: Under testing and optimization**

| Issue | 标题 / Title | 核心问题 / Core Problem | 当前状态 / Current Status |
|-------|-------------|------------------------|---------------------------|
| #613 | Persistent queue backend for semantic/embedding processing | 队列基于内存，重启后丢失，大批量导入时不可靠 / In-memory queue lost on restart, unreliable for bulk imports | CLOSED |
| #864 | Memory semantic queue stalls on context_type=memory jobs | memory 语义队列卡住，pending 增长但 processed 不前进 / Memory semantic queue stalls while pending keeps growing | OPEN |

**处理方向 / Resolution Direction:**
引入持久化队列后端，支持服务重启后恢复处理进度，并继续排查 memory 语义队列卡住与自重处理问题。

Introducing a persistent queue backend to survive server restarts and continue investigating memory semantic queue stalls and self-reprocessing issues.

---

## 总结 / Summary

| 分类 / Category | 状态 / Status | 涉及 Issues |
|-----------------|--------------|-------------|
| OpenClaw 插件相关 / OpenClaw Plugin | 插件 2.0 开发中 / Plugin 2.0 in progress | #730 #455 #630 #680 #551 |
| API 调用成本异常 / API Cost Anomalies | 测试优化中 / Under optimization | #729 #505 #769 #907 #922 |
| Embedding 处理策略 / Embedding Strategy | 测试优化中 / Under optimization | #731 #531 #530 #857 #931 |
| 基础设施优化 / Infrastructure | 测试优化中 / Under optimization | #613 #864 |

我们会在各个子 issue 中同步进展，也欢迎社区继续反馈。后续新的 token 消耗相关问题请先在此 issue 下评论，我们会统一归类处理。

We will sync progress in each sub-issue. Community feedback is welcome. For new token consumption related issues, please comment here first and we will categorize accordingly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking]: Token 消耗与成本优化问题汇总 / Token Consumption & Cost Optimization Tracker #744

背景 / Background

分类一：OpenClaw 插件相关 / Category 1: OpenClaw Plugin Related

分类二：API 调用成本异常 / Category 2: Abnormal API Call Costs

分类三：Embedding 处理与分块策略 / Category 3: Embedding Processing & Chunking Strategy

分类四：基础设施优化 / Category 4: Infrastructure Optimization

总结 / Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue	标题 / Title	核心问题 / Core Problem	当前状态 / Current Status
#730	openclaw 配置 openviking 后 token 没有下降	所有 session 加载 .md，上下文立刻到 16k+ / All sessions load .md files, context immediately reaches 16k+	OPEN
#455	Openclaw 插件工具调用始终返回 extract returned 0 memories	记忆提取失败，auto-capture 无效 / Memory extraction fails, auto-capture not working	OPEN
#630	OpenClaw + OpenViking Memory Extraction Issue	跨服务器部署下记忆提取返回 0 / Cross-server deployment returns 0 memories	OPEN
#680	启动 openclaw 插件后一直 debug cron: timer armed	插件无法正常启动 / Plugin fails to start properly	OPEN
#551	openclaw 无法集成 openviking	字段报错不在 allowlist 中 / Field not in allowlist error	OPEN

Issue	标题 / Title	核心问题 / Core Problem	当前状态 / Current Status
#729	VLM 用量异常：重试风暴导致 5 秒内 5405 次调用	欠费 403 后无熔断机制，导致重试风暴 / No circuit breaker after 403, causing retry storm	CLOSED
#505	Memory extraction triggers O(n²) semantic reprocessing	每次写入记忆都重新处理所有文件，成本二次增长 / Every memory write reprocesses all files, quadratic cost growth	OPEN
#769	repeated parent-directory semantic recomputation on each new memory write	单次写入触发父目录全量语义重算，成本与目录规模而非变更规模相关 / Parent-directory semantic recomputation makes cost scale with directory size rather than change size	OPEN
#907	Batch multiple file summaries per VLM call to reduce RPM pressure	当前 1 文件 1 次 VLM 调用，RPM 和按请求计费场景下成本过高 / One-file-per-request summary generation wastes RPM and per-request quota	OPEN
#922	Unify config-driven retry across VLM and embedding	VLM/embedding 重试策略不一致，导致限流和瞬时故障处理成本失衡 / Retry behavior is inconsistent across VLM and embedding paths	OPEN

Issue	标题 / Title	核心问题 / Core Problem	当前状态 / Current Status
#731	Input sequence length exceeds max input length of embedding model	输入超过模型 max_tokens（如 512），导致 500 错误 / Input exceeds model max_tokens, causing 500 error	CLOSED
#531	Embedding truncation 和 chunking 职责不清	截断 vs 分块策略缺乏统一设计 / Truncation vs chunking lacks unified design	OPEN
#530	Long memory indexing 应使用 chunked vectorization	长记忆需要分块向量化而非单条 embedding / Long memories need chunked vectorization instead of single-record embedding	CLOSED
#857	make text file vectorization strategy configurable to avoid embedding oversize failures	文本文件向量化策略缺少可配置性，易触发超长输入失败 / Text-file vectorization needs configurable strategy to avoid oversize failures	OPEN
#931	Large code files fail embedding: no input truncation before embedding API call	大型代码文件嵌入前缺少截断/分块保护 / Large code files fail embedding due to missing truncation/chunking guardrails	CLOSED

Issue	标题 / Title	核心问题 / Core Problem	当前状态 / Current Status
#613	Persistent queue backend for semantic/embedding processing	队列基于内存，重启后丢失，大批量导入时不可靠 / In-memory queue lost on restart, unreliable for bulk imports	CLOSED
#864	Memory semantic queue stalls on context_type=memory jobs	memory 语义队列卡住，pending 增长但 processed 不前进 / Memory semantic queue stalls while pending keeps growing	OPEN

分类 / Category	状态 / Status	涉及 Issues
OpenClaw 插件相关 / OpenClaw Plugin	插件 2.0 开发中 / Plugin 2.0 in progress	#730 #455 #630 #680 #551
API 调用成本异常 / API Cost Anomalies	测试优化中 / Under optimization	#729 #505 #769 #907 #922
Embedding 处理策略 / Embedding Strategy	测试优化中 / Under optimization	#731 #531 #530 #857 #931
基础设施优化 / Infrastructure	测试优化中 / Under optimization	#613 #864

[Tracking]: Token 消耗与成本优化问题汇总 / Token Consumption & Cost Optimization Tracker #744

Description

背景 / Background

分类一：OpenClaw 插件相关 / Category 1: OpenClaw Plugin Related

分类二：API 调用成本异常 / Category 2: Abnormal API Call Costs

分类三：Embedding 处理与分块策略 / Category 3: Embedding Processing & Chunking Strategy

分类四：基础设施优化 / Category 4: Infrastructure Optimization

总结 / Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions