Conversation
) Support OceanBase vector index quantization via VECTOR_VQ_LEVEL: - 32 (default): hnsw + inner_product - 8: hnsw_sq + inner_product (2-3x memory savings) - 1: hnsw_bq + cosine (~15x memory savings) HNSW_BQ requires cosine distance per OceanBase docs. Tested on OceanBase 4.3.5.5 (BP5). Closes #6202
…6358) * feat: add test inclusion for vectorDB tests in vitest configuration * refactor: update vectorDB README and setup for environment configuration - Enhanced README to clarify the use of factory pattern for vectorDB integration tests. - Updated instructions for setting up environment variables from a local file. - Removed obsolete PG integration test file and adjusted test execution instructions. - Improved structure explanation for shared test data and factory functions.
|
There is too much information in the pull request to test. |
|
|
|
|
Preview mcp_server Image: |
Preview sandbox Image: |
Docs Preview:🚀 FastGPT Document Preview Ready! |
Coverage Report
File Coverage
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
This PR introduces comprehensive integration tests for vector databases (PostgreSQL, OceanBase, SeekDB, Milvus) and adds support for OceanBase quantization with three levels (32, 8, 1) using different HNSW index types. The changes include refactoring the OceanBase/SeekDB controllers to support multiple database types, updating deployment configurations, and fixing Milvus API compatibility issues.
Changes:
- Added integration test suite for vector databases with factory pattern for shared test cases
- Implemented OceanBase vector quantization support with configurable HNSW index types (hnsw, hnsw_sq, hnsw_bq)
- Updated SeekDB and OceanBase docker-compose configurations with corrected connection strings and environment variables
Reviewed changes
Copilot reviewed 38 out of 38 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| vitest.config.mts | Excludes vector DB integration tests from main test suite |
| test/setup.ts | Minor whitespace formatting change |
| test/integrationTest/vectorDB/yml/docker-compose.yml | Docker compose configuration for test databases (PG, Milvus, OceanBase, SeekDB) |
| test/integrationTest/vectorDB/vitest.config.mts | Vitest configuration for vector DB integration tests |
| test/integrationTest/vectorDB/utils.ts | Utility for loading environment variables from .env.test.local |
| test/integrationTest/vectorDB/testSuites.ts | Reusable test suite factory for all vector databases |
| test/integrationTest/vectorDB/testData.ts | Test fixtures with vector data and ID generators |
| test/integrationTest/vectorDB/setup.ts | Test setup file that loads environment variables |
| test/integrationTest/vectorDB/seekdb/index.integration.test.ts | SeekDB integration tests |
| test/integrationTest/vectorDB/pg/index.integration.test.ts | PostgreSQL integration tests |
| test/integrationTest/vectorDB/oceanbase/index.integration.test.ts | OceanBase integration tests |
| test/integrationTest/vectorDB/milvus/index.integration.test.ts | Milvus integration tests |
| test/integrationTest/vectorDB/globalSetup.ts | Global setup for vector DB tests with environment logging |
| test/integrationTest/vectorDB/README.md | Documentation for vector DB integration tests |
| test/integrationTest/vectorDB/.env.test.tempalte | Environment template for test configuration (contains typo in filename) |
| test/integrationTest/READMD.md | Integration test directory documentation (contains typo in filename) |
| projects/app/.env.template | Updated vector quantization documentation and configuration |
| packages/service/common/vectorDB/seekdb/index.ts | Removed duplicate export |
| packages/service/common/vectorDB/oceanbase/index.ts | Refactored to support both OceanBase and SeekDB with quantization |
| packages/service/common/vectorDB/oceanbase/controller.ts | Refactored ObClass to support multiple database types |
| packages/service/common/vectorDB/milvus/index.ts | Fixed Milvus search API parameters and type handling |
| packages/service/common/vectorDB/controller.ts | Updated to pass type parameter to OceanBase/SeekDB constructors |
| packages/service/common/vectorDB/constants.ts | Added OceanBaseIndexConfig for quantization support |
| package.json | Added test:vector npm script |
| document/public/deploy/docker/* | Updated docker-compose files with corrected database URLs and ports |
| document/data/doc-last-modified.json | Updated documentation modification timestamp |
| document/content/docs/upgrading/4-14/4147.mdx | Added changelog entry for vector DB integration tests |
| document/content/docs/toc.mdx | Added link to version 4.14.7 documentation |
| deploy/templates/vector/* | Updated deployment templates with corrected configurations |
| deploy/init.mjs | Updated initial configuration with correct database URLs |
| deploy/docker/* | Updated docker configurations with corrected database URLs |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # FastGPT 集成测试 | ||
|
|
||
| ## 目录 | ||
|
|
||
| - vectorDB: 向量数据库 No newline at end of file |
There was a problem hiding this comment.
The file name has a typo: 'READMD.md' should be 'README.md'. This should follow the standard README naming convention.
| - ./seekdb/data:/var/lib/mysql | ||
| - ./seekdb/config:/etc/mysql/conf.d |
There was a problem hiding this comment.
The volume path change from '../seekdb/data' to './seekdb/data' could break existing deployments if users have data in the old path. This is a potentially breaking change that should be documented in the upgrade guide or migration notes. Consider keeping backward compatibility or providing clear migration instructions.
| - **fixtures.ts**:统一测试数据(`TEST_TEAM_ID`、`TEST_DATASET_ID`、`TEST_COLLECTION_ID`、1536 维 `TEST_VECTORS`),所有向量库共用。 | ||
| - **factory.ts**:工厂函数 `runVectorDBTests(driver)`,同一套用例(init、insert、getVectorCount、embRecall、getVectorDataByTime、delete)供各驱动复用。 | ||
| - **integration.test.ts**:注册各驱动(PG、后续 Oceanbase/Milvus),按 `driver.envKey` 决定是否跳过;每个驱动执行同一套 `runVectorDBTests(driver)`。 | ||
|
|
||
| 新增向量库时:在 `integration.test.ts` 的 `drivers` 数组中增加一项(`name`、`envKey`、`createCtrl`),无需改 fixtures 或 factory。 |
There was a problem hiding this comment.
The documentation refers to 'fixtures.ts', 'factory.ts', and 'integration.test.ts' files, but the actual implementation uses different file names ('testData.ts', 'testSuites.ts', and individual test files per database). The documentation should be updated to reflect the actual file structure.
| - **fixtures.ts**:统一测试数据(`TEST_TEAM_ID`、`TEST_DATASET_ID`、`TEST_COLLECTION_ID`、1536 维 `TEST_VECTORS`),所有向量库共用。 | |
| - **factory.ts**:工厂函数 `runVectorDBTests(driver)`,同一套用例(init、insert、getVectorCount、embRecall、getVectorDataByTime、delete)供各驱动复用。 | |
| - **integration.test.ts**:注册各驱动(PG、后续 Oceanbase/Milvus),按 `driver.envKey` 决定是否跳过;每个驱动执行同一套 `runVectorDBTests(driver)`。 | |
| 新增向量库时:在 `integration.test.ts` 的 `drivers` 数组中增加一项(`name`、`envKey`、`createCtrl`),无需改 fixtures 或 factory。 | |
| - **testData.ts**:统一测试数据(`TEST_TEAM_ID`、`TEST_DATASET_ID`、`TEST_COLLECTION_ID`、1536 维 `TEST_VECTORS`),所有向量库共用。 | |
| - **testSuites.ts**:工厂函数 `runVectorDBTests(driver)`,同一套用例(init、insert、getVectorCount、embRecall、getVectorDataByTime、delete)供各驱动复用。 | |
| - **各驱动的集成测试文件**:为每种向量库(PG、后续 Oceanbase/Milvus 等)提供独立的测试入口,按对应环境变量决定是否跳过;每个驱动执行同一套 `runVectorDBTests(driver)`。 | |
| 新增向量库时:新增一个对应驱动的集成测试文件,并在其中复用 `testData.ts` 和 `testSuites.ts`,无需修改这两个公共文件。 |
| const describePg = isEnabled ? describe : describe.skip; | ||
|
|
||
| describePg('Seekdb Vector Integration', () => { |
There was a problem hiding this comment.
The variable name 'describePg' is used for all vector database tests (SeekDB, PG, OceanBase, Milvus), not just PostgreSQL. This naming is misleading and should be renamed to something more generic like 'describeDB' or 'conditionalDescribe'.
| const describePg = isEnabled ? describe : describe.skip; | |
| describePg('Seekdb Vector Integration', () => { | |
| const describeDB = isEnabled ? describe : describe.skip; | |
| describeDB('Seekdb Vector Integration', () => { |
| const describePg = isEnabled ? describe : describe.skip; | ||
|
|
||
| describePg('PG Vector Integration', () => { |
There was a problem hiding this comment.
The variable name 'describePg' is used for all vector database tests (SeekDB, PG, OceanBase, Milvus), not just PostgreSQL. This naming is misleading and should be renamed to something more generic like 'describeDB' or 'conditionalDescribe'.
| const describePg = isEnabled ? describe : describe.skip; | |
| describePg('PG Vector Integration', () => { | |
| const describeDB = isEnabled ? describe : describe.skip; | |
| describeDB('PG Vector Integration', () => { |
| const describePg = isEnabled ? describe : describe.skip; | ||
|
|
||
| describePg('Oceanbase Vector Integration', () => { |
There was a problem hiding this comment.
The variable name 'describePg' is used for all vector database tests (SeekDB, PG, OceanBase, Milvus), not just PostgreSQL. This naming is misleading and should be renamed to something more generic like 'describeDB' or 'conditionalDescribe'.
| const describePg = isEnabled ? describe : describe.skip; | |
| describePg('Oceanbase Vector Integration', () => { | |
| const describeDB = isEnabled ? describe : describe.skip; | |
| describeDB('Oceanbase Vector Integration', () => { |
| VECTOR_VQ_LEVEL=32 | ||
| # PG | ||
| PG_URL=postgresql://username:password@localhost:6001/postgres | ||
| # OceanBase 可以用云服务来测 | ||
| # OCEANBASE_URL=mysql://root%40tenantname:tenantpassword@localhost:6005/mysql | ||
| # SeekDB vector database connection | ||
| SEEKDB_URL=mysql://root:seekdbpassword@127.0.0.1:6003/mysql | ||
| # Milvus vector database connection | ||
| MILVUS_ADDRESS=http://localhost:6002 | ||
| MILVUS_TOKEN= No newline at end of file |
There was a problem hiding this comment.
The file name has a typo: 'tempalte' should be 'template'. This should be '.env.test.template' to match the naming convention used in the codebase and referenced in the README.
| - ./ob/data:/root/ob | ||
| - ./ob/config:/root/.obd/cluster |
There was a problem hiding this comment.
The volume path change from '../ob/data' to './ob/data' could break existing deployments if users have data in the old path. This is a potentially breaking change that should be documented in the upgrade guide or migration notes. Consider keeping backward compatibility or providing clear migration instructions.
| teamId: `test_team`, | ||
| datasetId: `test_dataset_${suffix}` |
There was a problem hiding this comment.
The teamId is hardcoded as 'test_team' for all tests, which could cause test isolation issues if multiple test suites run concurrently. While the datasetId has a unique suffix, sharing the same teamId across concurrent tests could lead to race conditions or data conflicts, especially when tests clean up by teamId. Consider adding the same unique suffix to teamId to ensure complete test isolation.
| exclude: ['node_modules', 'dist'], | ||
| testTimeout: 60000, | ||
| hookTimeout: 60000, | ||
| fileParallelism: false, |
There was a problem hiding this comment.
The test configuration sets 'fileParallelism: false', which means tests will run sequentially. However, the test suite structure with conditional describe blocks and unique dataset IDs suggests tests could potentially run in parallel. The comment or documentation should clarify whether this is a temporary limitation or a design decision, especially since parallel execution could significantly speed up test runs.
Preview fastgpt Image: |
No description provided.