-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Bug Description
问题描述
导入大型文档(如 100MB+ 的 Word 文件)时,semantic_processor.py 中的 _generate_overview() 构建的 prompt 超过模型上下文限制,导致:
Failed to generate overview for viking://resources/...
vLLM 侧报错:
vllm.exceptions.VLLMValidationError: You passed 65537 input tokens and requested 0 output tokens.
Steps to Reproduce
根因分析
问题由两个缺陷叠加导致:
1. _generate_overview() 拼接 prompt 无截断(semantic_processor.py L560-573)
所有文件摘要和子目录摘要被无限制地拼接到 prompt 中:
# 没有任何长度限制!
for idx, item in enumerate(file_summaries, 1):
file_summaries_lines.append(f"[{idx}] {item['name']}: {item['summary']}")
file_summaries_str = "\n".join(file_summaries_lines)
children_abstracts_str = "\n".join(
f"- {item['name']}/: {item['abstract']}" for item in children_abstracts
)以 100MB Word 文档为例,切片后产生 500-1000 个文件、50-100 个子目录,摘要拼接后 prompt 可达 20,000-55,000+ tokens,极易超过模型上下文限制。
2. VLM 调用未传 max_tokens(openai_vlm.py L60-66,litellm_vlm.py 同理)
kwargs = {
"model": self.model or "gpt-4o-mini",
"messages": [{"role": "user", "content": prompt}],
"temperature": self.temperature,
# 未设置 max_tokens!
}
response = client.chat.completions.create(**kwargs)未传 max_tokens 时,vLLM 将所有上下文空间分配给输入,输出分配 0 token。即使 prompt 仅超限 1 个 token,请求也会被直接拒绝。
Expected Behavior
建议修复方案
prompt 截断:
VLMConfig增加max_context_length配置项- 或自动通过
/v1/models接口获取模型上下文长度(vLLM 和 OpenAI 均支持) - 在
_generate_overview()中对file_summaries_str和children_abstracts_str进行截断,控制总 prompt 在模型上下文的 ~80% 以内 - 截断时优先保留具有代表性的摘要样本
max_tokens:
VLMConfig增加max_tokens配置项- 在所有 VLM 调用(
get_completion、get_completion_async、get_vision_completion等)中传递max_tokens - 设置合理默认值(如 4096)
配置示例:
{
"vlm": {
"provider": "openai",
"model": "your-model",
"api_key": "xxx",
"max_tokens": 4096,
"max_context_length": 65536
}
}Actual Behavior
环境信息
- vLLM: 0.17.0
- 模型: MiniMax-M2.5-AWQ(max_model_len=65536)
- 测试文档: 100MB+ Word 文件
相关文件
openviking/storage/queuefs/semantic_processor.py—_generate_overview()(L536-599)openviking/storage/queuefs/semantic_dag.py—_overview_task()(L245-287)openviking/models/vlm/backends/openai_vlm.py—get_completion()(L57-68)openviking/models/vlm/backends/litellm_vlm.py—get_completion()openviking_cli/utils/config/vlm_config.py—VLMConfig类openviking/prompts/templates/semantic/overview_generation.yaml— prompt 模板
Minimal Reproducible Example
Error Logs
OpenViking Version
0.2.6
Python Version
3.11
Operating System
macOS
Model Backend
OpenAI
Additional Context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
Type
Projects
Status
Done