Skip to content

feat(bot):Add eval function(support locomo, skillsbench), open add-resource tool, add feishu progress notification capability#506

Merged
MaojiaSheng merged 35 commits intomainfrom
feature/eval
Mar 10, 2026
Merged

feat(bot):Add eval function(support locomo, skillsbench), open add-resource tool, add feishu progress notification capability#506
MaojiaSheng merged 35 commits intomainfrom
feature/eval

Conversation

@yeshion23333
Copy link
Copy Markdown
Collaborator

@yeshion23333 yeshion23333 commented Mar 10, 2026

Description

✅ Added

  • Complete SkillsBench automated evaluation tool, supporting benchmark data preparation, batch task execution, automatic result verification and pass rate statistics / 完整的SkillsBench自动化评测工具,支持基准数据准备、批量任务执行、自动结果验证、通过率统计
  • Locomo evaluation result statistics script, automatically generates accuracy, time cost, token consumption report / locomo评测结果统计脚本,自动生成准确率、耗时、Token消耗报告
  • Long-running task progress notification capability, Feishu channel dynamically displays processing emoji feedback / 长任务进度提示能力,飞书端动态展示处理中表情反馈
  • VikingAddResourceTool supporting both URL resources (images, Git repositories) and local file addition / VikingAddResourceTool工具,支持URL资源(图片、Git仓库)和本地文件添加
  • Channel-level memory sharing mode configuration, supporting memory isolation/sharing switching / 频道级内存共享模式配置,支持内存隔离/共享切换
  • CLI chat command --config/-c parameter, supporting custom configuration file path / CLI chat命令--config/-c参数,支持自定义配置文件路径
  • Memory search performance logging for performance troubleshooting / 内存搜索性能日志,便于性能问题排查

🔄 Changed

  • Locomo evaluation script optimization: supports resume evaluation, writes results to independent output file, --token changed to required parameter, updated default judge model / Locomo评测脚本优化:支持断点续评、结果写入独立输出文件、--token改为必填参数、更新默认评分模型
  • Single-turn request timeout increased from 300s to 3000s to adapt to long-running evaluation tasks / 单轮请求超时从300s延长至3000s,适配长耗时评测任务
  • Resource search/Grep tool logic optimized to adapt to the latest OV server API / 资源搜索/Grep工具逻辑优化,适配最新OV服务端接口
  • Feishu channel message processing logic refactored, prioritizes processing of generic metadata actions / 飞书频道消息处理逻辑重构,优先处理通用元数据动作
  • Configuration loading logic optimized to support custom configuration path input / 配置加载逻辑优化,支持自定义配置路径传入

🐛 Fixed

  • Fixed null pointer exception when fieldnames is None during CSV reading / 修复CSV读取时fieldnames为None的空指针异常
  • Fixed Feishu thread message reply logic, uses thread ID only in thread mode / 修复飞书线程消息回复逻辑,仅在线程模式下使用线程ID
  • Fixed non-standard JSON parsing failure issue, uses strict=False for compatible processing / 修复非标准JSON解析失败问题,使用strict=False兼容处理
  • Fixed AddResource timeout exception issue, returns task submission success prompt even on timeout / 修复AddResource超时异常问题,超时仍返回任务提交成功提示

❌ Removed

  • Removed redundant custom JSON preprocessing logic in run_eval.py / 移除run_eval.py中冗余的自定义JSON预处理逻辑
  • Removed built-in @mention parsing logic in Feishu channel / 移除飞书频道内置的@提及解析逻辑
  • Removed deprecated target_path and wait parameters in AddResource tool / 移除AddResource工具中废弃的target_path和wait参数
  • Removed logic that directly modifies input files in judge.py, changed to write to independent output file / 移除jduge.py中直接修改输入文件的逻辑,改为写入独立输出文件

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

# Conflicts:
#	bot/vikingbot/agent/loop.py
#	bot/vikingbot/agent/tools/ov_file.py
#	bot/vikingbot/hooks/builtins/openviking_hooks.py
from loguru import logger
from vikingbot.config.schema import Config

CONFIG_PATH = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] 使用模块级全局可变变量 CONFIG_PATHensure_config()load_config() 之间传递配置路径是比较脆弱的模式。多处代码(如 hooks)直接调用 load_config() 而不经过 ensure_config(),此时 CONFIG_PATHNone,会 fallback 到默认路径而非 CLI 传入的自定义路径。

考虑使用更显式的方式,例如将 config 对象缓存为单例,或在 load_config() 中保留 config_path 参数。

@MaojiaSheng MaojiaSheng merged commit a75f1ac into main Mar 10, 2026
5 of 6 checks passed
@MaojiaSheng MaojiaSheng deleted the feature/eval branch March 10, 2026 15:02
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants