✨ feat(agent-tracing): tool-result feedback quality analysis (tq command)#15508
Merged
Conversation
…command) Adds a shared, no-LLM analyzer that scores how "clean / LLM-friendly" the environment feedback (tool return content) is, plus an `agent-tracing tq` CLI command to preview it over a snapshot corpus. - src/analysis/toolFeedback.ts: pure analysis lib (reusable core) — per tool-result metrics (tokens, self-redundancy, structural-noise ratio, error flag/size, format) + op-level and corpus-level rollups. - src/cli/tool-quality.ts: `tq` (alias `tool-quality`) — token-size histogram, dirty leaderboard ranked by token-weighted waste, single-op drill-down, and --json. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## canary #15508 +/- ##
==========================================
- Coverage 70.64% 70.64% -0.01%
==========================================
Files 3274 3274
Lines 322959 322959
Branches 29419 29421 +2
==========================================
- Hits 228155 228152 -3
- Misses 94621 94624 +3
Partials 183 183
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…ldCorpusReport Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 10, 2026
Closed
Merged
arvinxx
added a commit
that referenced
this pull request
Jun 10, 2026
# 🚀 LobeHub Release (20260610) **Release Date:** June 10, 2026 **Since v2.2.2:** 131 merged PRs · 13 contributors > This weekly release strengthens agent collaboration across cloud, desktop, CLI, and workspace flows, with steadier runtime behavior and a broader foundation for workspace-scoped data. --- ## ✨ Highlights - **Agent execution across devices** — Unifies per-device working directories, project skill discovery, and sub-agent suspend/resume behavior across server, QStash, and device RPC flows. (#15543, #15566, #15481, #15620, #15591) - **Connector and sandbox platform** — Expands connector permissions, custom OAuth MCP connector onboarding, sandbox provider support, and user-uploaded file sync into cloud sandbox runs. (#15463, #15546, #15184, #15550) - **Desktop and CLI reliability** — Fixes desktop cold-start, auto-update, Windows build, CLI skill discovery, and `lh connect` agent dispatch paths. (#15547, #15525, #15527, #15562, #15632, #15634) - **Pages and sharing** — Refreshes topic sharing, improves Page Editor layout behavior, and routes Page Agent tool execution through the server-side editor path. (#15581, #15556, #15588, #15023, #15610) - **Model availability and provider updates** — Adds user-scoped LobeHub model availability, Claude Fable 5, Qwen thinking preservation, and MiniMax M3 updates. (#15590, #15639, #13494, #15376) --- ## 🏗️ Core Product & Architecture ### Agent Runtime & Heterogeneous Agents - Improves sub-agent lifecycle handling, including async suspend/resume, queue-mode QStash resume delivery, and blocking nested sub-agent calls. (#15481, #15620, #15575) - Stabilizes heterogeneous agent ingestion and streaming with raw stream dumps, per-turn usage, image forwarding on regenerate, and duplicate-text fixes. (#15602, #15577, #15592, #15585) - Adds execution-device and working-directory controls across device RPC, legacy defaults, and remote-spawned Claude Code sessions. (#15543, #15566, #15591, #15572) - Improves runtime diagnostics and compatibility, including Gemini multimodal output capture, abort stream semantics, and trace quality analysis. (#15535, #13677, #15508) --- ## 📱 Platforms, Integrations & UX ### Connectors, Sandbox & Tools - Ships API-level connector tool permissions, custom OAuth MCP connector onboarding, and connector-first runtime execution. (#15463, #15546) - Adds sandbox provider support, cloud sandbox file sync, and safer external URL file input handling with SSRF validation. (#15184, #15550, #12657) - Improves tool visibility and execution with pinned app-fixed tools, ANSI output rendering, gateway-tunneled MCP calls, and automatic headless tool runs. (#15509, #15516, #15469, #15492) ### Desktop, CLI & Web UX - Restores desktop startup and reload behavior, preserves IPC error causes, and keeps the tab bar new-tab action visible across routes. (#15547, #15597, #15638) - Fixes desktop update and build stability for browser quit guards, macOS update signing, and Windows Visual Studio detection. (#15525, #15527, #15562) - Shows the plan-limit upgrade UI on desktop builds. (#15628) - Adds the Agent Run delivery checker and fixes CLI device dispatch plus skill list/search output. (#15489, #15634, #15632) - Refreshes onboarding, auth source preservation, topic UI states, referral/Fable campaign copy, and chat-input control bar behavior. (#15629, #15544, #15573, #15614, #15616, #15617, #15622, #15643) --- ## 🔒 Security, Reliability & Rollout Notes - External URL file input now includes SSRF validation for safer Google file handling. (#12657) - Database workspace-scope migrations are part of this release; self-hosted operators should run the normal migration path before serving the updated app. (#15446, #15465, #15468, #15472) - The release branch was re-cut from `canary` and includes the latest `main` release-version commit so `v2.2.2` is the verified compare base. --- ## 👥 Contributors @ONLY-yours, @sxjeru, @hardy-one, @xujingli, @hezhijie0327, @Coooolfan, @arvinxx, @tjx666, @Innei, @rivertwilight, @rdmclin2, @cy948, @AmAzing129 **Full Changelog**: v2.2.2...release/weekly-20260610-recut-3
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
💻 Change Type
🔗 Related Issue
Related to LOBE-10057
🔀 Description of Change
引入对 tool 返回 content(环境反馈)干净度 / LLM 友好度 的客观量化,作为 agent harness 评测的第一阶段。
环境反馈是 agent loop 里模型每步决策的唯一依据,也是 harness 最能控制格式、却最容易脏的部分。本 PR 提供一个无需 LLM 的共享分析库 + 一个 CLI 预览命令,用来快速感知哪些工具在往 context 灌噪声。
src/analysis/toolFeedback.ts—— 纯函数分析库(可被 CLI / 后续 DC 入库 / judge 复用的唯一核心)。取数step.toolsResult[].output,unwrap{"content":...}信封。每条 tool result 指标:tokens(gpt-tokenizer)selfRedundancy(80-char shingle 去重比,抓退化 dump / 重复报错)structuralNoiseRatio(xml/html 标签占比,抓 markup 噪声)isError+ 错误体积(错误本应小)format、estWasteTokens(token 加权浪费)src/cli/tool-quality.ts——agent-tracing tq(别名tool-quality):token-size 直方图、按 token 加权浪费排名的 dirty leaderboard、单 op 下钻、--json。🧪 How to Test
在任意有
.agent-tracing/_remote/快照缓存的目录运行:实测 98 ops / 770 results / 1.4M tokens 的样例输出:
📝 Additional Information
仅 CLI / 分析库,无运行时 / UI / schema 改动。后续阶段(落库
agent_operationsrollup、LLM-as-judge 语义指标)见 LOBE-10057。🤖 Generated with Claude Code