Skip to content

[Bug]: 添加大型资源文件时,无法进行有效的切分和生成.abstract.md 和.overview.md #386

@qppq54s

Description

@qppq54s

Bug Description

当添加大型的 pdf 文件,例如 药典,只有文件切分,无法生成有效的.abstract.md 和.overview.md

Steps to Reproduce

  1. 配置 ov.conf 如下
{
  "storage": {
    "workspace": "/Users/aaa/.openviking/workspace"
  },
  "log": {
    "level": "INFO",
    "output": "stdout"
  },
  "embedding": {
    "dense": {
      "api_base": "https://ark.cn-beijing.volces.com/api/v3",
      "api_key": "a5b*****f0177",
      "provider": "volcengine",
      "dimension": 1024,
      "model": "doubao-embedding-vision-250615"
    }
  },
  "vlm": {
    "provider" : "volcengine",
    "model"    : "doubao-seed-1-8-251228",
    "api_key"  : "a5b*****f0177",
    "api_base" : "https://ark.cn-beijing.volces.com/api/v3"
  }
}
  1. 执行python -m openviking serve
  2. curl -X POST http://localhost:1933/api/v1/resources \
    -H "Content-Type: application/json"
    -d '{"path": "https://www.ynxzy.com/temp/1686884123423.pdf"}'

Expected Behavior

有效的拆分和总结抽象

Actual Behavior

整份文件只有拆分
.abstract.md 和.overview.md 内容都为Directory overview
无法通过 search 方法查找到对应的内容

Minimal Reproducible Example

Error Logs

OpenViking Version

0.1.18

Python Version

3.11.6

Operating System

macOS

Model Backend

Volcengine (Doubao)

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions