RAG 智能助手

基于 LangChain 构建的检索增强生成（RAG）系统，支持多源数据加载、混合检索和多轮对话。

项目特点

离线索引 / 在线检索分离：索引构建与对话服务解耦，启动速度快
多源数据支持：支持 txt、md、pdf（含图片 OCR）
混合检索：BM25 关键词检索 + 向量语义检索
增量更新：只处理变化的文件，无需全量重建
引用来源：回答时标注信息来源

项目结构

rag_assistant/
├── build_index.py        # 离线索引构建（低频执行）
├── main.py               # 对话入口（高频执行）
├── core/
│   ├── indexer.py        # 索引构建器
│   ├── retriever.py      # 检索器加载
│   └── agent.py          # Agent 构建
├── loaders/              # 多源数据加载器
│   ├── base.py           # 加载器基类
│   ├── text_loader.py    # 文本加载器
│   └── pdf_loader.py     # PDF 加载器（支持 OCR）
├── tools/
│   └── knowledge_search.py
├── prompts/
│   └── templates.py
├── data/                 # 知识库文档（放这里）
├── chroma_db/            # 向量数据库（自动生成）
└── index_meta.json       # 索引元数据（自动生成）

快速开始

1. 安装依赖

pip install -r requirements.txt

2. 配置环境变量

cp .env.example .env
# 编辑 .env 填入你的 API Key

3. 准备知识库

将文档放入 data/ 目录，支持：

.txt - 纯文本
.md - Markdown
.pdf - PDF（自动 OCR 图片）

4. 构建索引（首次或文档更新后）

# 增量更新（推荐，只处理变化的文件）
python build_index.py

# 全量重建
python build_index.py --rebuild

# 查看索引状态
python build_index.py --status

# 清空索引
python build_index.py --clear

5. 启动对话

# 内存模式（默认）
python main.py

# SQLite 持久化模式
python main.py --sqlite

# 恢复历史会话
python main.py --sqlite --thread-id <your-thread-id>

使用流程

┌─────────────────────────────────────────────────────────────┐
│                       使用流程                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  首次使用 / 文档更新后:                                      │
│  ┌─────────────────┐     ┌─────────────────┐               │
│  │ 放入文档到 data/ │ --> │ python build_index.py │         │
│  └─────────────────┘     └─────────────────┘               │
│                                   │                         │
│                                   ▼                         │
│                          chroma_db/ (索引)                  │
│                                   │                         │
│  日常使用:                         ▼                         │
│                          ┌─────────────────┐               │
│                          │ python main.py  │ (秒级启动)     │
│                          └─────────────────┘               │
│                                                             │
└─────────────────────────────────────────────────────────────┘

配置选项

build_index.py 参数

参数	说明	默认值
`--rebuild`	强制全量重建	False
`--status`	查看索引状态	-
`--clear`	清空索引	-
`--data-dir`	数据目录	./data
`--embedding-model`	Embedding 模型	all-MiniLM-L6-v2
`--chunk-size`	切分大小	500
`--chunk-overlap`	切分重叠	50

main.py 参数

参数	说明
`--sqlite`	使用 SQLite 持久化对话
`--thread-id`	指定会话 ID

技术栈

组件	技术选型
框架	LangChain + LangGraph
大模型	DeepSeek-R1 / 其他
向量数据库	Chroma
Embedding	sentence-transformers
检索策略	BM25 + Vector
OCR	RapidOCR

索引元数据

index_meta.json 记录了索引信息：

{
  "version": "1.0",
  "created_at": "2024-12-29T10:00:00",
  "updated_at": "2024-12-29T15:00:00",
  "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
  "chunk_size": 500,
  "total_chunks": 15,
  "files": [
    {
      "path": "data/doc.pdf",
      "hash": "a1b2c3...",
      "chunks": 5,
      "indexed_at": "2024-12-29T10:00:00"
    }
  ]
}

扩展方向

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG 智能助手

项目特点

项目结构

快速开始

1. 安装依赖

2. 配置环境变量

3. 准备知识库

4. 构建索引（首次或文档更新后）

5. 启动对话

使用流程

配置选项

build_index.py 参数

main.py 参数

技术栈

索引元数据

扩展方向

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
core		core
loaders		loaders
prompts		prompts
tools		tools
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
build_index.py		build_index.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

RAG 智能助手

项目特点

项目结构

快速开始

1. 安装依赖

2. 配置环境变量

3. 准备知识库

4. 构建索引（首次或文档更新后）

5. 启动对话

使用流程

配置选项

build_index.py 参数

main.py 参数

技术栈

索引元数据

扩展方向

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages