Skip to content

feat: add test inclusion for vectorDB tests in vitest configuration#6358

Merged
c121914yu merged 2 commits into
labring:ob-testfrom
alswl:vectordb-tests
Jan 31, 2026
Merged

feat: add test inclusion for vectorDB tests in vitest configuration#6358
c121914yu merged 2 commits into
labring:ob-testfrom
alswl:vectordb-tests

Conversation

@alswl

@alswl alswl commented Jan 31, 2026

Copy link
Copy Markdown
Contributor

本 PR 为 FastGPT 增加向量数据库集成测试,用于在真实环境下验证 PGVector 控制器的行为,保证向量相关操作的兼容性和稳定性。

主要改动:

  • 新增 test/vectorDB/ 目录

    • fixtures.ts:统一测试数据(TEST_TEAM_IDTEST_DATASET_IDTEST_COLLECTION_ID 及 1536 维向量),供 PG 及后续 Oceanbase、Milvus 复用。
    • pg.integration.test.ts:对 PgVectorCtrl 的集成测试,覆盖 initinsertgetVectorCountembRecallgetVectorDataByTimedelete,使用真实 PostgreSQL + pgvector,无 mock。
    • README.md:说明环境变量(如 PG_URL)及如何运行集成测试。
  • Vitest 配置

    • vitest.config.mtstest.include 中增加 test/vectorDB/**/*.test.ts

运行方式:

  • 未设置 PG_URL 时,该组集成测试会整体跳过,不影响现有单元测试。
  • 设置 PG_URL 后,可运行:pnpm test test/vectorDB

关联: 关闭 #6194

Title: feat(test): Add vector database integration tests (Issue #6194)

Description:

This PR adds integration tests for FastGPT’s vector database layer, validating the PGVector controller against a real PostgreSQL + pgvector instance to ensure compatibility and stability of vector operations.

Changes:

  • New test/vectorDB/ directory

    • fixtures.ts: Shared test data (TEST_TEAM_ID, TEST_DATASET_ID, TEST_COLLECTION_ID, and 1536-dim vectors) for PG and future Oceanbase/Milvus tests.
    • pg.integration.test.ts: Integration tests for PgVectorCtrl covering init, insert, getVectorCount, embRecall, getVectorDataByTime, and delete; no mocks, uses real DB.
    • README.md: Documents required env vars (e.g. PG_URL) and how to run the tests.
  • Vitest

    • Added test/vectorDB/**/*.test.ts to test.include in vitest.config.mts.

How to run:

  • Without PG_URL, the vectorDB integration suite is skipped; existing unit tests are unchanged.
  • With PG_URL set: pnpm test test/vectorDB.

Ref: Closes #6194

@gru-agent

gru-agent Bot commented Jan 31, 2026

Copy link
Copy Markdown
Contributor

TestGru Assignment

Summary

Link CommitId Status Reason
Detail 83203f5 🚫 Skipped No files need to be tested {"test/vectorDB/README.md":"File path does not match include patterns.","test/vectorDB/fixtures.ts":"File path does not match include patterns.","test/vectorDB/pg.integration.test.ts":"File path does not match include patterns.","vitest.config.mts":"File path does not match include patterns."}

History Assignment

Tip

You can @gru-agent and leave your feedback. TestGru will make adjustments based on your input

@cla-assistant

cla-assistant Bot commented Jan 31, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

1 similar comment
@cla-assistant

cla-assistant Bot commented Jan 31, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions

github-actions Bot commented Jan 31, 2026

Copy link
Copy Markdown

Preview mcp_server Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_mcp_server_7deca0fd4e1dc71cbb31e9045de3bd9f92a802b5

@c121914yu

c121914yu commented Jan 31, 2026

Copy link
Copy Markdown
Collaborator
  1. 需要增加对应环境变量用于连接数据库。在 test 目录下,可以提供一个.env.test.template,然后setup.ts 里读取.env.test.local 作为测试的环境变量
  2. 可以采用工厂模式,同一套数据集给 n 个向量库测试。

@github-actions

github-actions Bot commented Jan 31, 2026

Copy link
Copy Markdown

Preview sandbox Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_sandbox_7deca0fd4e1dc71cbb31e9045de3bd9f92a802b5

@github-actions

github-actions Bot commented Jan 31, 2026

Copy link
Copy Markdown

Preview fastgpt Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_7deca0fd4e1dc71cbb31e9045de3bd9f92a802b5

- Enhanced README to clarify the use of factory pattern for vectorDB integration tests.
- Updated instructions for setting up environment variables from a local file.
- Removed obsolete PG integration test file and adjusted test execution instructions.
- Improved structure explanation for shared test data and factory functions.
@alswl

alswl commented Jan 31, 2026

Copy link
Copy Markdown
Contributor Author

@c121914yu fixed.

@c121914yu c121914yu changed the base branch from main to ob-test January 31, 2026 06:55
@c121914yu

Copy link
Copy Markdown
Collaborator

代码审查反馈

感谢提交这个向量数据库集成测试的 PR!这是一个非常有价值的改进,为向量数据库层添加了真实环境下的集成测试保证。以下是详细的审查反馈:

✅ 优点

  1. 架构设计优秀

    • 采用工厂模式设计 factory.ts,同一套测试用例可复用于多个向量数据库驱动
    • 分离关注点清晰:fixtures(测试数据)、factory(测试用例)、integration(驱动注册)
    • 向后扩展性强,新增向量数据库(Oceanbase、Milvus)只需添加驱动配置
  2. 环境变量管理规范

    • 新增 .env.test.template 提供清晰的配置模板
    • .env.test.local 读取环境变量,避免敏感信息泄露
    • 未配置环境变量时自动跳过测试,不影响现有单元测试
  3. 测试覆盖全面

    • 覆盖了 VectorControllerType 接口的所有核心方法
    • 使用 Zod schema 验证返回数据结构,确保类型安全
    • afterEach 清理机制避免测试间数据污染
  4. 文档完善

    • README.md 提供清晰的运行说明和结构解释
    • 中英文双语注释,便于团队协作

🔍 建议改进

1. 测试数据生成逻辑

文件: fixtures.ts

// 当前实现
function makeVector(seed: number): number[] {
  const vec: number[] = [];
  for (let i = 0; i < VECTOR_DIM; i++) {
    vec.push((Math.sin(seed * 1000 + i * 0.1) * 0.5 + 0.5) * 0.01);
  }
  return vec;
}

建议:

  • 向量值过小(0.01 量级),可能导致归一化问题和相似度计算不准确
  • 建议使用更标准的归一化向量或模拟真实 embedding 值
  • 考虑添加向量数据验证(L2 范数检查)

改进示例:

function makeVector(seed: number): number[] {
  const vec: number[] = [];
  for (let i = 0; i < VECTOR_DIM; i++) {
    // 使用更真实的 embedding 值范围
    vec.push(Math.sin(seed * 1000 + i * 0.1));
  }
  // 归一化
  const norm = Math.sqrt(vec.reduce((sum, v) => sum + v * v, 0));
  return vec.map(v => v / norm);
}

2. 测试超时配置优化

文件: factory.ts

当前超时设置可能不够灵活:

  • init(): 30000ms
  • 其他测试: 15000ms

建议:

  • embRecall() 测试可能需要更长时间(特别是首次查询)
  • 考虑将超时配置提取为常量,便于统一调整
const TIMEOUT = {
  INIT: 30000,
  QUERY: 20000,  // embRecall 可能更慢
  DEFAULT: 15000
};

3. 环境变量加载安全性

文件: test/setup.ts

当前环境变量解析逻辑较为简单,建议增强:

// 建议添加
// 1. 处理注释行(行中有 # 的情况)
// 2. 处理空值情况
// 3. 处引号内的等号
// 4. 使用 dotenv 库替代手动解析

改进建议:

import { config } from 'dotenv';

const envTestLocalPath = path.resolve(process.cwd(), 'test', '.env.test.local');
if (existsSync(envTestLocalPath)) {
  config({ path: envTestLocalPath, override: false });
}

4. 测试清理策略

文件: factory.ts

当前 afterEach 清理可能在某些场景下不够:

afterEach(async () => {
  if (!ctrl) return;
  try {
    await ctrl.delete({
      teamId: TEST_TEAM_ID,
      datasetIds: [TEST_DATASET_ID],
      collectionIds: [TEST_COLLECTION_ID]
    });
  } catch {
    // ignore cleanup errors
  }
});

建议:

  • 考虑使用 beforeAll + afterAll 替代 afterEach(减少数据库操作)
  • 或添加测试状态跟踪,只在测试失败时保留数据
  • 记录清理失败日志,便于调试

5. 测试覆盖增强

文件: `factory.ts)

当前测试用例较为基础,建议增加:

  1. 边界条件测试

    • 空向量数组插入
    • 大批量插入(测试性能)
    • 查询 limit 为 0 的情况
  2. 错误处理测试

    • 无效的向量维度
    • 重复数据插入
    • 并发插入/删除
  3. 向量检索精度测试

    • 验证返回结果的 score 排序
    • 验证 forbidCollectionIdList 和 filterCollectionIdList 逻辑

示例:

it('embRecall() returns results sorted by score', async () => {
  await ctrl.insert({...});
  const res = await ctrl.embRecall({...});
  
  for (let i = 1; i < res.results.length; i++) {
    expect(res.results[i-1].score).toBeGreaterThanOrEqual(res.results[i].score);
  }
});

6. 并发测试缺失

文件: integration.test.ts

建议添加并发场景测试:

it.concurrent('handles concurrent insert operations', async () => {
  const promises = Array.from({ length: 10 }, (_, i) => 
    ctrl.insert({
      teamId: TEST_TEAM_ID,
      datasetId: TEST_DATASET_ID,
      collectionId: TEST_COLLECTION_ID,
      vectors: [makeVector(i)]
    })
  );
  
  await expect(Promise.all(promises)).resolves.not.toThrow();
});

⚠️ 潜在问题

  1. 向量维度硬编码

    • VECTOR_DIM = 1536 固定为 OpenAI embedding 维度
    • 如果支持其他 embedding 模型(如 512, 768, 1024 维),需要参数化
  2. 测试数据隔离

    • 使用固定的 TEST_TEAM_ID 等可能导致并发测试冲突
    • 建议使用随机 ID 或时间戳后缀
  3. 缺少性能基准测试

    • 建议添加简单的性能断言(如:1000 条向量插入应在 X 秒内完成)

📋 其他建议

  1. CI/CD 集成

    • 考虑在 CI 中配置测试用 PostgreSQL(使用 GitHub Actions 的 postgres 服务)
    • 或添加 GitHub Action workflow 示例
  2. 测试报告增强

    • 考虑添加每个向量数据库的测试结果标记
    • 输出连接信息和基础配置(脱敏后)
  3. Mock 数据可选

    • 考虑提供 Mock 模式(使用 MockVectorController)用于快速验证逻辑
    • 无需真实数据库即可运行测试套件

🎯 总结

这是一个架构清晰、扩展性好的集成测试实现。主要改进点在于:

  • 测试数据的真实性和向量值优化
  • 环境变量加载的健壮性
  • 增强边界条件和错误处理测试
  • 添加并发和性能测试

整体上,这个 PR 为向量数据库层提供了很好的测试保障,值得合并!建议在后续迭代中逐步完善上述改进点。

审查结果: ✅ 批准(建议合并后继续优化)

@c121914yu c121914yu merged commit c12402a into labring:ob-test Jan 31, 2026
3 of 4 checks passed
c121914yu added a commit that referenced this pull request Feb 2, 2026
* feat(vectordb): add OceanBase HNSW quantization (HNSW_SQ/HNSW_BQ) (#6348)

Support OceanBase vector index quantization via VECTOR_VQ_LEVEL:
- 32 (default): hnsw + inner_product
- 8: hnsw_sq + inner_product (2-3x memory savings)
- 1: hnsw_bq + cosine (~15x memory savings)

HNSW_BQ requires cosine distance per OceanBase docs.
Tested on OceanBase 4.3.5.5 (BP5).

Closes #6202

* feat: add test inclusion for vectorDB tests in vitest configuration (#6358)

* feat: add test inclusion for vectorDB tests in vitest configuration

* refactor: update vectorDB README and setup for environment configuration

- Enhanced README to clarify the use of factory pattern for vectorDB integration tests.
- Updated instructions for setting up environment variables from a local file.
- Removed obsolete PG integration test file and adjusted test execution instructions.
- Improved structure explanation for shared test data and factory functions.

* perf: integrationTest

* feat: vector integration

---------

Co-authored-by: ZHANG Yixin <hi.yixinz@gmail.com>
Co-authored-by: Jingchao <alswlx@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

增加向量数据库集成测试

2 participants