[Bug]: VLM 调用溢出：_generate_overview() 缺少 prompt 截断和 max_tokens 参数

### Bug Description

## 问题描述

  导入大型文档（如 100MB+ 的 Word 文件）时，`semantic_processor.py` 中的 `_generate_overview()` 构建的 prompt 超过模型上下文限制，导致：

  ```
  Failed to generate overview for viking://resources/...
  ```

  vLLM 侧报错：
  ```
  vllm.exceptions.VLLMValidationError: You passed 65537 input tokens and requested 0 output tokens.
  ```

 

### Steps to Reproduce

 ## 根因分析

  问题由两个缺陷叠加导致：

  ### 1. `_generate_overview()` 拼接 prompt 无截断（semantic_processor.py L560-573）

  所有文件摘要和子目录摘要被**无限制地拼接**到 prompt 中：

  ```python
  # 没有任何长度限制！
  for idx, item in enumerate(file_summaries, 1):
      file_summaries_lines.append(f"[{idx}] {item['name']}: {item['summary']}")
  file_summaries_str = "\n".join(file_summaries_lines)

  children_abstracts_str = "\n".join(
      f"- {item['name']}/: {item['abstract']}" for item in children_abstracts
  )
  ```

  以 100MB Word 文档为例，切片后产生 500-1000 个文件、50-100 个子目录，摘要拼接后 prompt 可达 20,000-55,000+ tokens，极易超过模型上下文限制。

  ### 2. VLM 调用未传 `max_tokens`（openai_vlm.py L60-66，litellm_vlm.py 同理）

  ```python
  kwargs = {
      "model": self.model or "gpt-4o-mini",
      "messages": [{"role": "user", "content": prompt}],
      "temperature": self.temperature,
      # 未设置 max_tokens！
  }
  response = client.chat.completions.create(**kwargs)
  ```

  未传 `max_tokens` 时，vLLM 将所有上下文空间分配给输入，输出分配 0 token。即使 prompt 仅超限 1 个 token，请求也会被直接拒绝。

### Expected Behavior

## 建议修复方案

  ### prompt 截断：
  - `VLMConfig` 增加 `max_context_length` 配置项
  - 或自动通过 `/v1/models` 接口获取模型上下文长度（vLLM 和 OpenAI 均支持）
  - 在 `_generate_overview()` 中对 `file_summaries_str` 和 `children_abstracts_str` 进行截断，控制总 prompt 在模型上下文的 ~80% 以内
  - 截断时优先保留具有代表性的摘要样本

  ### max_tokens：
  - `VLMConfig` 增加 `max_tokens` 配置项
  - 在所有 VLM 调用（`get_completion`、`get_completion_async`、`get_vision_completion` 等）中传递 `max_tokens`
  - 设置合理默认值（如 4096）

  ### 配置示例：
  ```json
  {
    "vlm": {
      "provider": "openai",
      "model": "your-model",
      "api_key": "xxx",
      "max_tokens": 4096,
      "max_context_length": 65536
    }
  }
  ```

### Actual Behavior

  ## 环境信息
  - vLLM: 0.17.0
  - 模型: MiniMax-M2.5-AWQ（max_model_len=65536）
  - 测试文档: 100MB+ Word 文件

  ## 相关文件
  - `openviking/storage/queuefs/semantic_processor.py` — `_generate_overview()` (L536-599)
  - `openviking/storage/queuefs/semantic_dag.py` — `_overview_task()` (L245-287)
  - `openviking/models/vlm/backends/openai_vlm.py` — `get_completion()` (L57-68)
  - `openviking/models/vlm/backends/litellm_vlm.py` — `get_completion()`
  - `openviking_cli/utils/config/vlm_config.py` — `VLMConfig` 类
  - `openviking/prompts/templates/semantic/overview_generation.yaml` — prompt 模板

### Minimal Reproducible Example

```python

```

### Error Logs

```shell

```

### OpenViking Version

0.2.6

### Python Version

3.11

### Operating System

macOS

### Model Backend

OpenAI

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: VLM 调用溢出：_generate_overview() 缺少 prompt 截断和 max_tokens 参数 #674

Bug Description

问题描述

Steps to Reproduce

根因分析

1. `_generate_overview()` 拼接 prompt 无截断（semantic_processor.py L560-573）

2. VLM 调用未传 `max_tokens`（openai_vlm.py L60-66，litellm_vlm.py 同理）

Expected Behavior

建议修复方案

prompt 截断：

max_tokens：

配置示例：

Actual Behavior

环境信息

相关文件

Minimal Reproducible Example

Error Logs

OpenViking Version

Python Version

Operating System

Model Backend

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: VLM 调用溢出：_generate_overview() 缺少 prompt 截断和 max_tokens 参数 #674

Description

Bug Description

问题描述

Steps to Reproduce

根因分析

1. _generate_overview() 拼接 prompt 无截断（semantic_processor.py L560-573）

2. VLM 调用未传 max_tokens（openai_vlm.py L60-66，litellm_vlm.py 同理）

Expected Behavior

建议修复方案

prompt 截断：

max_tokens：

配置示例：

Actual Behavior

环境信息

相关文件

Minimal Reproducible Example

Error Logs

OpenViking Version

Python Version

Operating System

Model Backend

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `_generate_overview()` 拼接 prompt 无截断（semantic_processor.py L560-573）

2. VLM 调用未传 `max_tokens`（openai_vlm.py L60-66，litellm_vlm.py 同理）