✨ feat(agent-tracing): tool-result feedback quality analysis (tq command) by arvinxx · Pull Request #15508 · lobehub/lobehub

arvinxx · 2026-06-06T10:02:23Z

💻 Change Type

✨ feat

🔗 Related Issue

Related to LOBE-10057

🔀 Description of Change

引入对 tool 返回 content（环境反馈）干净度 / LLM 友好度 的客观量化，作为 agent harness 评测的第一阶段。

环境反馈是 agent loop 里模型每步决策的唯一依据，也是 harness 最能控制格式、却最容易脏的部分。本 PR 提供一个无需 LLM 的共享分析库 + 一个 CLI 预览命令，用来快速感知哪些工具在往 context 灌噪声。

src/analysis/toolFeedback.ts —— 纯函数分析库（可被 CLI / 后续 DC 入库 / judge 复用的唯一核心）。取数 step.toolsResult[].output，unwrap {"content":...} 信封。每条 tool result 指标：
- tokens（gpt-tokenizer）
- selfRedundancy（80-char shingle 去重比，抓退化 dump / 重复报错）
- structuralNoiseRatio（xml/html 标签占比，抓 markup 噪声）
- isError + 错误体积（错误本应小）
- format、estWasteTokens（token 加权浪费）
- 以及 op 级 / corpus 级 rollup
src/cli/tool-quality.ts —— agent-tracing tq（别名 tool-quality）：token-size 直方图、按 token 加权浪费排名的 dirty leaderboard、单 op 下钻、--json。

纯新增 + 在 cli/index.ts 注册一个子命令，不改动任何既有逻辑。

🧪 How to Test

在任意有 .agent-tracing/_remote/ 快照缓存的目录运行：

agent-tracing tq                 # corpus 直方图 + dirty leaderboard
agent-tracing tq <opId>          # 单 op 逐 tool-result 下钻
agent-tracing tq --json          # 机读输出

实测 98 ops / 770 results / 1.4M tokens 的样例输出：

Tool-result feedback quality  (98 ops · 770 results · 1.4M tokens)
  est. wasted ≈ 165.9k (12%)  of all tool-result tokens

  token-size distribution   bar = % of results · right = % of tokens
  <128   ████████████████████ 59%    1% tok
  <512   ████                 12%    2% tok
  <2048  ██████               16%    10% tok
  <8192  ███                  9%     20% tok
  <32768 █                    3%     18% tok
  ≥32768                      0%     49% tok

  dirty leaderboard  (ranked by token-weighted waste)
  tool                              calls p99    redund noise err%  waste
  lobe-agent-documents/readDocument 49    663.8k 0%     5%    20%   ≈45.1k
  lobe-web-browsing/search          39    2.4k   0%     59%   5%    ≈28.9k
  lobe-local-system/runCommand      205   3.8k   0%     0%    40%   ≈19.3k

Tested locally
Added/updated tests
No tests needed

📝 Additional Information

仅 CLI / 分析库，无运行时 / UI / schema 改动。后续阶段（落库 agent_operations rollup、LLM-as-judge 语义指标）见 LOBE-10057。

🤖 Generated with Claude Code

…command) Adds a shared, no-LLM analyzer that scores how "clean / LLM-friendly" the environment feedback (tool return content) is, plus an `agent-tracing tq` CLI command to preview it over a snapshot corpus. - src/analysis/toolFeedback.ts: pure analysis lib (reusable core) — per tool-result metrics (tokens, self-redundancy, structural-noise ratio, error flag/size, format) + op-level and corpus-level rollups. - src/cli/tool-quality.ts: `tq` (alias `tool-quality`) — token-size histogram, dirty leaderboard ranked by token-weighted waste, single-op drill-down, and --json. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel · 2026-06-06T10:02:30Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
lobehub	Ready	Preview, Comment	Jun 6, 2026 10:21am

sourcery-ai

We've reviewed this pull request using the Sourcery rules engine

codecov · 2026-06-06T10:07:27Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.64%. Comparing base (6f5a633) to head (684d1c1).
⚠️ Report is 1 commits behind head on canary.

Additional details and impacted files

@@            Coverage Diff             @@
##           canary   #15508      +/-   ##
==========================================
- Coverage   70.64%   70.64%   -0.01%     
==========================================
  Files        3274     3274              
  Lines      322959   322959              
  Branches    29419    29421       +2     
==========================================
- Hits       228155   228152       -3     
- Misses      94621    94624       +3     
  Partials      183      183

Flag	Coverage Δ
app	`61.30% <ø> (+<0.01%)`	⬆️
database	`92.54% <ø> (ø)`
packages/agent-manager-runtime	`49.69% <ø> (ø)`
packages/agent-runtime	`81.04% <ø> (ø)`
packages/builtin-tool-lobe-agent	`18.52% <ø> (ø)`
packages/context-engine	`84.19% <ø> (ø)`
packages/conversation-flow	`91.29% <ø> (ø)`
packages/device-gateway-client	`90.51% <ø> (ø)`
packages/eval-dataset-parser	`95.15% <ø> (ø)`
packages/eval-rubric	`76.11% <ø> (ø)`
packages/fetch-sse	`85.57% <ø> (-1.72%)`	⬇️
packages/file-loaders	`87.89% <ø> (ø)`
packages/memory-user-memory	`74.99% <ø> (ø)`
packages/model-bank	`99.99% <ø> (ø)`
packages/model-runtime	`84.22% <ø> (ø)`
packages/prompts	`72.51% <ø> (ø)`
packages/python-interpreter	`92.90% <ø> (ø)`
packages/ssrf-safe-fetch	`0.00% <ø> (ø)`
packages/types	`35.38% <ø> (ø)`
packages/utils	`84.98% <ø> (ø)`
packages/web-crawler	`88.08% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Store	`68.39% <ø> (ø)`
Services	`54.77% <ø> (ø)`
Server	`71.82% <ø> (-0.01%)`	⬇️
Libs	`54.34% <ø> (+0.13%)`	⬆️
Utils	`81.71% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…ldCorpusReport Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@ONLY-yours

# 🚀 LobeHub Release (20260610) **Release Date:** June 10, 2026 **Since v2.2.2:** 131 merged PRs · 13 contributors > This weekly release strengthens agent collaboration across cloud, desktop, CLI, and workspace flows, with steadier runtime behavior and a broader foundation for workspace-scoped data. --- ## ✨ Highlights - **Agent execution across devices** — Unifies per-device working directories, project skill discovery, and sub-agent suspend/resume behavior across server, QStash, and device RPC flows. (#15543, #15566, #15481, #15620, #15591) - **Connector and sandbox platform** — Expands connector permissions, custom OAuth MCP connector onboarding, sandbox provider support, and user-uploaded file sync into cloud sandbox runs. (#15463, #15546, #15184, #15550) - **Desktop and CLI reliability** — Fixes desktop cold-start, auto-update, Windows build, CLI skill discovery, and `lh connect` agent dispatch paths. (#15547, #15525, #15527, #15562, #15632, #15634) - **Pages and sharing** — Refreshes topic sharing, improves Page Editor layout behavior, and routes Page Agent tool execution through the server-side editor path. (#15581, #15556, #15588, #15023, #15610) - **Model availability and provider updates** — Adds user-scoped LobeHub model availability, Claude Fable 5, Qwen thinking preservation, and MiniMax M3 updates. (#15590, #15639, #13494, #15376) --- ## 🏗️ Core Product & Architecture ### Agent Runtime & Heterogeneous Agents - Improves sub-agent lifecycle handling, including async suspend/resume, queue-mode QStash resume delivery, and blocking nested sub-agent calls. (#15481, #15620, #15575) - Stabilizes heterogeneous agent ingestion and streaming with raw stream dumps, per-turn usage, image forwarding on regenerate, and duplicate-text fixes. (#15602, #15577, #15592, #15585) - Adds execution-device and working-directory controls across device RPC, legacy defaults, and remote-spawned Claude Code sessions. (#15543, #15566, #15591, #15572) - Improves runtime diagnostics and compatibility, including Gemini multimodal output capture, abort stream semantics, and trace quality analysis. (#15535, #13677, #15508) --- ## 📱 Platforms, Integrations & UX ### Connectors, Sandbox & Tools - Ships API-level connector tool permissions, custom OAuth MCP connector onboarding, and connector-first runtime execution. (#15463, #15546) - Adds sandbox provider support, cloud sandbox file sync, and safer external URL file input handling with SSRF validation. (#15184, #15550, #12657) - Improves tool visibility and execution with pinned app-fixed tools, ANSI output rendering, gateway-tunneled MCP calls, and automatic headless tool runs. (#15509, #15516, #15469, #15492) ### Desktop, CLI & Web UX - Restores desktop startup and reload behavior, preserves IPC error causes, and keeps the tab bar new-tab action visible across routes. (#15547, #15597, #15638) - Fixes desktop update and build stability for browser quit guards, macOS update signing, and Windows Visual Studio detection. (#15525, #15527, #15562) - Shows the plan-limit upgrade UI on desktop builds. (#15628) - Adds the Agent Run delivery checker and fixes CLI device dispatch plus skill list/search output. (#15489, #15634, #15632) - Refreshes onboarding, auth source preservation, topic UI states, referral/Fable campaign copy, and chat-input control bar behavior. (#15629, #15544, #15573, #15614, #15616, #15617, #15622, #15643) --- ## 🔒 Security, Reliability & Rollout Notes - External URL file input now includes SSRF validation for safer Google file handling. (#12657) - Database workspace-scope migrations are part of this release; self-hosted operators should run the normal migration path before serving the updated app. (#15446, #15465, #15468, #15472) - The release branch was re-cut from `canary` and includes the latest `main` release-version commit so `v2.2.2` is the verified compare base. --- ## 👥 Contributors @ONLY-yours, @sxjeru, @hardy-one, @xujingli, @hezhijie0327, @Coooolfan, @arvinxx, @tjx666, @Innei, @rivertwilight, @rdmclin2, @cy948, @AmAzing129 **Full Changelog**: v2.2.2...release/weekly-20260610-recut-3

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 6, 2026

dosubot Bot added the feature:agent Assistant/Agent configuration and behavior label Jun 6, 2026

sourcery-ai Bot reviewed Jun 6, 2026

View reviewed changes

vercel Bot deployed to Preview June 6, 2026 10:08 View deployment

🐛 fix(agent-tracing): guard against undefined histogram bucket in bui…

684d1c1

…ldCorpusReport Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview June 6, 2026 10:21 View deployment

arvinxx merged commit ad87e43 into canary Jun 6, 2026
34 of 35 checks passed

arvinxx deleted the arvinxx/feat/tool-feedback-quality branch June 6, 2026 10:31

This was referenced Jun 10, 2026

🚀 release: 20260610 #15645

Closed

🚀 release: 20260610 #15647

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨ feat(agent-tracing): tool-result feedback quality analysis (tq command)#15508

✨ feat(agent-tracing): tool-result feedback quality analysis (tq command)#15508
arvinxx merged 2 commits into
canaryfrom
arvinxx/feat/tool-feedback-quality

arvinxx commented Jun 6, 2026

Uh oh!

vercel Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

codecov Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

arvinxx commented Jun 6, 2026

💻 Change Type

🔗 Related Issue

🔀 Description of Change

🧪 How to Test

📝 Additional Information

Uh oh!

vercel Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 6, 2026 •

edited

Loading

codecov Bot commented Jun 6, 2026 •

edited

Loading