-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
feat(postgres): support vector/halfvec data types #11437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
commit: |
32d95fd to
b72e4bd
Compare
Resolved conflicts: - ColumnTypes.ts: Added both PostgreSQL vector types (vector, halfvec) and SAP vector types (half_vector, real_vector) - docker-compose.yml: Kept custom PostgreSQL image with pgvector extension - tests-linux.yml: Updated to use postgres-version matrix with custom pgvector image - docs/docs/entity/1-entities.md: Added vector column documentation to new documentation structure - Removed docs/zh_CN/entities.md as it was deleted in master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/driver/sap/SapDriver.ts (1)
535-566: Implement or remove doc claim about vectorOutputTypeNo occurrences of a { extra: { vectorOutputType: "Array" } } handler found in the repo. SapDriver.preparePersistentValue (src/driver/sap/SapDriver.ts:535–566) does not handle vector values; SapDriver only maps types to "real_vector"/"half_vector" (src/driver/sap/SapDriver.ts:648–653, 669–672). Use the Postgres driver’s vector handling as a reference (src/driver/postgres/PostgresDriver.ts ~645–729).
- Action: implement array<->vector coercion for persist (preparePersistentValue) and for result parsing (likely in SapQueryRunner), or remove the docs claim.
🧹 Nitpick comments (6)
.github/workflows/tests-linux.yml (2)
222-224: Use standard POSTGRES_USER env variableOfficial images expect
POSTGRES_USER.POSTGRES_USERNAMEis non‑standard and may be ignored, leading to surprises.- POSTGRES_USERNAME: postgres + POSTGRES_USER: postgres
218-229: Harden service healthcheckAdd a health start period to reduce flakiness on cold starts.
options: >- --health-cmd pg_isready + --health-start-period 30s --health-interval 10s --health-timeout 5s --health-retries 5src/driver/sap/SapDriver.ts (1)
669-675: Graceful downgrade maps vector types to varbinary on non‑Cloud HANAGood fallback, but add an integration test to assert DDL generation and round‑trip behavior when
isReleaseVersionOrGreater(...,"4.0")is false.Would you like me to add a test that boots a simulated pre‑4.0 environment and verifies
real_vector/half_vectornormalize tovarbinary?test/functional/database-schema/vectors/postgres/vector-similarity.ts (2)
25-37: Unnecessary clear; DB is already reloaded per test
reloadTestingDatabasesresets schema. The extrarepository.clear()is redundant.- await postRepository.clear() // Clear existing data + // DB is reloaded before each test; explicit clear is unnecessary
47-50: Cast param to vector to avoid type inference quirksBeing explicit prevents “could not determine data type” edge cases.
- `SELECT id, embedding FROM "post" ORDER BY embedding <-> $1 LIMIT 2`, + `SELECT id, embedding FROM "post" ORDER BY embedding <-> ($1)::vector LIMIT 2`,Apply similarly to other raw queries using vector/halfvec params.
docs/docs/entity/1-entities.md (1)
183-243: Add explicit pgvector extension note; duplication checked
- Append under the PostgreSQL bullet in the Note block of docs/docs/entity/1-entities.md:
> **Note**: > - **PostgreSQL**: Vector columns require the `pgvector` extension to be installed. +> Enable it in your database: +> `CREATE EXTENSION IF NOT EXISTS vector;`
- Duplication check: only one "### Vector columns" header found at line 183 in docs/docs/entity/1-entities.md.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
.github/workflows/tests-linux.yml(2 hunks)docker-compose.yml(1 hunks)docs/docs/drivers/sap.md(1 hunks)docs/docs/entity/1-entities.md(1 hunks)src/driver/postgres/PostgresDriver.ts(4 hunks)src/driver/postgres/PostgresQueryRunner.ts(1 hunks)src/driver/sap/SapDriver.ts(1 hunks)src/driver/types/ColumnTypes.ts(1 hunks)test/functional/database-schema/vectors/postgres/vector-similarity.ts(1 hunks)test/functional/migrations/vector/0000000000001-CreatePost.ts(1 hunks)test/functional/migrations/vector/vector-test.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- test/functional/migrations/vector/vector-test.ts
- src/driver/types/ColumnTypes.ts
- src/driver/postgres/PostgresDriver.ts
- docker-compose.yml
- src/driver/postgres/PostgresQueryRunner.ts
🧰 Additional context used
🧬 Code graph analysis (1)
test/functional/database-schema/vectors/postgres/vector-similarity.ts (1)
test/utils/test-utils.ts (3)
createTestingConnections(388-482)reloadTestingDatabases(500-505)closeTestingConnections(487-495)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (5)
.github/workflows/tests-linux.yml (2)
247-248: LGTM: Coveralls flag adjustment matches new matrix key
214-214: Verify GHCR image availability for Postgres 14 & 17
GHCR returned no .tags; I couldn't validate the images. Confirm naorpeled/typeorm-postgres exposes tags "pg14-postgis3-pgvectorv0.8.0" and "pg17-postgis3-pgvectorv0.8.0" or update .github/workflows/tests-linux.yml (line 214) to use available/pinned tags.src/driver/sap/SapDriver.ts (1)
649-653: Alias mapping for vector types looks correct
vector -> real_vectorandhalfvec -> half_vectorare appropriate for SAP HANA Cloud.test/functional/database-schema/vectors/postgres/vector-similarity.ts (1)
11-24: Ensure pgvector (Postgres vector extension) is available in the test DBSearch returned no explicit "CREATE EXTENSION ... vector" in tests. Either confirm the CI image pre‑installs pgvector, or add a test-setup step to run:
CREATE EXTENSION IF NOT EXISTS vector;
before creating test connections (e.g., in the before() hook that sets up connections in test/functional/database-schema/vectors/postgres/vector-similarity.ts).docs/docs/drivers/sap.md (1)
40-73: Docs: call out SAP HANA 2.0 fallback and clarify vectorOutputType origin
- The SAP docs should explicitly state that on SAP HANA 2.0 TypeORM falls back to varbinary (implemented in src/driver/sap/SapDriver.ts — the driver returns "varbinary" for real_vector/half_vector when the version check fails).
- The docs' claim that the driver can convert vectors to number[] via
{ extra: { vectorOutputType: "Array" } }is only exercised in tests (test/functional/database-schema/vectors/sap/sap-vector.ts sets that option). I could not find code in the driver that itself parses a vectorOutputType flag — either the option is passed through to the underlying HANA client or the conversion is performed elsewhere; clarify whether TypeORM or the @sap/hana-client performs the conversion and link to the implementation (or soften the wording to say "pass { extra: { vectorOutputType: 'Array' } } to the client/connection options to receive arrays" if the driver doesn't convert).Locations to check/adjust: docs/docs/drivers/sap.md; src/driver/sap/SapDriver.ts (version-gating / varbinary fallback); test/functional/database-schema/vectors/sap/sap-vector.ts (example using vectorOutputType).
| postgres-version: ["14", "17"] | ||
|
|
||
| services: | ||
| postgres: | ||
| image: postgis/postgis:${{ matrix.postgis-version }} | ||
| image: ghcr.io/naorpeled/typeorm-postgres:pg${{ matrix.postgres-version }}-postgis3-pgvectorv0.8.0 | ||
| ports: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supply‑chain risk: custom image from contributor namespace; pin by digest and/or move under org
Using ghcr.io/naorpeled/typeorm-postgres:pg${{ matrix.postgres-version }}-postgis3-pgvectorv0.8.0 introduces trust and reproducibility risk. Host the image under the typeorm org (or build in‑workflow) and pin by digest. Also parameterize the pgvector version so CI can bump it centrally.
Example tweak:
- image: ghcr.io/naorpeled/typeorm-postgres:pg${{ matrix.postgres-version }}-postgis3-pgvectorv0.8.0
+ image: ghcr.io/typeorm/postgres:pg${{ matrix.postgres-version }}-postgis3-pgvectorv${{ matrix.pgvector-version }}@sha256:<pindigest>And extend the matrix:
- postgres-version: ["14", "17"]
+ include:
+ - postgres-version: "14"
+ pgvector-version: "0.8.0"
+ - postgres-version: "17"
+ pgvector-version: "0.8.0"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| postgres-version: ["14", "17"] | |
| services: | |
| postgres: | |
| image: postgis/postgis:${{ matrix.postgis-version }} | |
| image: ghcr.io/naorpeled/typeorm-postgres:pg${{ matrix.postgres-version }}-postgis3-pgvectorv0.8.0 | |
| ports: | |
| include: | |
| - postgres-version: "14" | |
| pgvector-version: "0.8.0" | |
| - postgres-version: "17" | |
| pgvector-version: "0.8.0" | |
| services: | |
| postgres: | |
| image: ghcr.io/typeorm/postgres:pg${{ matrix.postgres-version }}-postgis3-pgvectorv${{ matrix.pgvector-version }}@sha256:<pindigest> | |
| ports: |
🤖 Prompt for AI Agents
In .github/workflows/tests-linux.yml around lines 214–219, the workflow
references an external contributor image
ghcr.io/naorpeled/typeorm-postgres:pg${{ matrix.postgres-version
}}-postgis3-pgvectorv0.8.0 which poses supply‑chain and reproducibility risks;
update the job to either (a) use an image hosted under our org (e.g.,
ghcr.io/typeorm/...) or build the image in‑workflow so we control the source,
(b) pin the selected image to a specific digest instead of a floating tag to
ensure immutability, (c) extract the pgvector version into a top-level parameter
(matrix or env like PGVECTOR_VERSION) so it can be bumped centrally, and (d)
extend the matrix to include any additional postgres versions desired; implement
these changes in the workflow and ensure ports/other references use the new
image variable or build output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/driver/postgres/PostgresDriver.ts (1)
756-764: Hydration: add halfvec and optional dimension check (use length as dimensions)
halfvecdeserialization is missing.- Optional validation against declared
length(dimensions) improves fail-fast behavior.Apply:
- } else if (columnMetadata.type === "vector") { - if ( - typeof value === "string" && - value.startsWith("[") && - value.endsWith("]") - ) { - if (value === "[]") return [] - return value.slice(1, -1).split(",").map(Number) - } + } else if ( + columnMetadata.type === "vector" || + columnMetadata.type === "halfvec" + ) { + if ( + typeof value === "string" && + value.startsWith("[") && + value.endsWith("]") + ) { + const arr = + value === "[]" + ? [] + : value.slice(1, -1).split(",").map((v) => Number(v)) + const expected = + columnMetadata.length != null && columnMetadata.length !== "" + ? parseInt(String(columnMetadata.length), 10) + : undefined + if (expected != null && arr.length !== expected) { + throw new TypeORMError( + `Invalid ${columnMetadata.type} length for column "${columnMetadata.databaseName}": expected ${expected}, got ${arr.length}`, + ) + } + return arr + }If throwing feels too strict for hydration, we can log or skip the check; DB constraints should already prevent mismatches.
🧹 Nitpick comments (2)
src/driver/postgres/PostgresQueryRunner.ts (1)
3494-3504: Vector/Halfvec length parsing: consider array forms and minor regex tighteningCurrent regex won’t match array types like
vector(3)[]. While length is later handled for arrays in the generic withLength block, you can make this block self-sufficient and avoid double work by also accepting an optional[].Apply this diff if you prefer to capture arrays here too:
- ].match(/^(?:vector|halfvec)\((\d+)\)$/) + ].match(/^(?:vector|halfvec)\((\d+)\)(?:\[\])?$/)src/driver/postgres/PostgresDriver.ts (1)
496-507: Auto-installing pgvector: OK, but message can clarify halfvecThe warning mentions “vector column” only. Consider “vector/halfvec” to be explicit. Functionality is fine.
Apply:
- "At least one of the entities has a vector column, but the 'vector' extension (pgvector) cannot be installed automatically. Please install it manually using superuser rights", + "At least one of the entities has a vector/halfvec column, but the 'vector' extension (pgvector) cannot be installed automatically. Please install it manually using superuser rights",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/driver/postgres/PostgresDriver.ts(10 hunks)src/driver/postgres/PostgresQueryRunner.ts(1 hunks)test/functional/database-schema/vectors/postgres/vector-similarity.ts(1 hunks)test/functional/database-schema/vectors/postgres/vector.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- test/functional/database-schema/vectors/postgres/vector.ts
- test/functional/database-schema/vectors/postgres/vector-similarity.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: tests-linux (20) / postgres (14)
- GitHub Check: tests-linux (20) / mongodb
- GitHub Check: tests-linux (20) / sqlite
- GitHub Check: tests-linux (20) / better-sqlite3
- GitHub Check: tests-linux (20) / postgres (17)
- GitHub Check: tests-linux (18) / sqlite
- GitHub Check: tests-linux (20) / oracle
- GitHub Check: tests-linux (18) / postgres (17)
- GitHub Check: tests-linux (18) / sap
- GitHub Check: tests-linux (18) / mysql_mariadb
- GitHub Check: tests-linux (18) / postgres (14)
- GitHub Check: tests-linux (18) / sqljs
- GitHub Check: tests-linux (18) / mssql
- GitHub Check: tests-linux (18) / mongodb
- GitHub Check: tests-linux (18) / oracle
- GitHub Check: tests-linux (18) / better-sqlite3
- GitHub Check: tests-windows / sqlite
- GitHub Check: tests-windows / sqljs
- GitHub Check: tests-windows / better-sqlite3
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (5)
src/driver/postgres/PostgresDriver.ts (5)
194-196: Additions to supported types look good
vectorandhalfveccorrectly included.
219-221: Length-capable types updated correctlyMaking
vector/halfveclength-aware aligns with pgvector dimensions.
576-586: Extension detection updated appropriately
hasVectorColumnscorrectly detects bothvectorandhalfvec; propagation intohasExtensionsis consistent.Also applies to: 600-611
1187-1190: Full type emission for vector/halfvec looks correctGenerates
vector(n)/halfvec(n)as expected; respects no-length case.
674-680: Add halfvec handling and avoid stringifying vector[]
- Serialize "vector" and "halfvec" only for scalar columns; for vector[] leave the value untouched and support typed arrays.
File: src/driver/postgres/PostgresDriver.ts (lines 674-680)
- } else if (columnMetadata.type === "vector") { - if (Array.isArray(value)) { - return `[${value.join(",")}]` - } else { - return value - } + } else if ( + columnMetadata.type === "vector" || + columnMetadata.type === "halfvec" + ) { + // Only serialize scalar vector columns. For vector[] columns let pg handle arrays. + if (!columnMetadata.isArray) { + const toArray = (v: any) => + Array.isArray(v) + ? v + : (ArrayBuffer.isView(v) ? Array.from(v as ArrayLike<number>) : undefined) + const arr = toArray(value) + return arr ? `[${arr.join(",")}]` : value + } + return valueAutomated search returned no matches for vector[] usage — confirm whether vector[] appears in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
docker-compose.yml (2)
52-52: Avoid personal registry; pin by digest and update comment context
- Prefer an org-owned image (or build in-repo) and pin by SHA256 digest for supply‑chain and reproducibility.
- The comments above still refer to the generic postgis image; update to reflect the custom pgvector image.
Apply this diff to refresh the comment:
- # postgis is postgres + PostGIS (only). If you need additional extensions, - # it's probably time to create a purpose-built image. + # Custom image includes PostGIS and pgvector (v0.8.0). + # Using a purpose-built image to run vector/halfvec tests.
65-65: Same as pg14: org-owned image and digest pinningMirror the pg14 guidance here: host under the TypeORM org (or build locally) and pin by digest.
src/driver/postgres/PostgresDriver.ts (1)
756-767: Dimension and NaN checks on hydration for symmetry with serializationAdd the same finite/length validation when parsing strings.
Apply this diff:
if ( typeof value === "string" && value.startsWith("[") && value.endsWith("]") ) { - if (value === "[]") return [] - return value.slice(1, -1).split(",").map(Number) + if (value === "[]") return [] + const arr = value + .slice(1, -1) + .split(",") + .map((s) => Number(s)) + if (arr.some((n) => !Number.isFinite(n))) { + throw new TypeORMError( + `Invalid ${columnMetadata.type} element on hydration: all elements must be finite numbers`, + ) + } + const expected = + columnMetadata.length != null + ? Number(columnMetadata.length) + : undefined + if ( + expected != null && + Number.isFinite(expected) && + arr.length !== expected + ) { + throw new TypeORMError( + `Invalid ${columnMetadata.type} length on hydration: expected ${expected}, got ${arr.length}`, + ) + } + return arr }
🧹 Nitpick comments (2)
docker-compose.yml (1)
55-56: Port collisions: postgres-14 and postgres-17 both expose 5432 on hostRunning both containers together will fail. Use distinct host ports.
Apply this diff:
postgres-17: @@ ports: - - "5432:5432" + - "5433:5432"Also applies to: 68-69
src/driver/postgres/PostgresDriver.ts (1)
496-507: Surface extension install errors to aid debuggingInclude the error message in the warning; today failures are silent aside from a generic note.
Apply this diff:
- if (hasVectorColumns) - try { + if (hasVectorColumns) + try { await this.executeQuery( connection, `CREATE EXTENSION IF NOT EXISTS "vector"`, ) - } catch (_) { - logger.log( + } catch (e: any) { + logger.log( "warn", - "At least one of the entities has a vector column, but the 'vector' extension (pgvector) cannot be installed automatically. Please install it manually using superuser rights", + `At least one of the entities has a vector column, but the 'vector' extension (pgvector) cannot be installed automatically. Please install it manually using superuser rights. Error: ${e?.message ?? e}`, ) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.github/workflows/tests-linux.yml(2 hunks)docker-compose.yml(2 hunks)docs/docs/entity/1-entities.md(1 hunks)src/driver/postgres/PostgresDriver.ts(10 hunks)src/driver/types/ColumnTypes.ts(1 hunks)test/functional/database-schema/vectors/postgres/vector-similarity.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- docs/docs/entity/1-entities.md
- src/driver/types/ColumnTypes.ts
- test/functional/database-schema/vectors/postgres/vector-similarity.ts
- .github/workflows/tests-linux.yml
🔇 Additional comments (5)
src/driver/postgres/PostgresDriver.ts (5)
194-196: Add vector/halfvec to supported types — LGTMThis unblocks metadata validation and schema diffs.
219-221: Mark vector/halfvec as length-capable — LGTMEnables vector(n)/halfvec(n) DDL generation.
576-583: Vector column detection — LGTMCovers both vector and halfvec across entities.
1190-1193: DDL rendering for vector/halfvec — LGTMGenerates vector(n)/halfvec(n) and respects arrays via the existing array suffix.
121-121: 'dimensions' option removed — 'length' is used for vector/halfvec columnsSearch shows @column("vector", { length: 3, ... }) and @column("halfvec", { length: 4, ... }) in test/functional/database-schema/vectors/postgres/entity/Post.ts; no occurrences of a "dimensions" option remain.
| } else if ( | ||
| columnMetadata.type === "vector" || | ||
| columnMetadata.type === "halfvec" | ||
| ) { | ||
| if (Array.isArray(value)) { | ||
| return `[${value.join(",")}]` | ||
| } else { | ||
| return value | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate vector payloads before sending to DB (finite numbers, dimension match)
Prevents malformed/NaN values and mismatched lengths from reaching Postgres.
Apply this diff:
} else if (
columnMetadata.type === "vector" ||
columnMetadata.type === "halfvec"
) {
if (Array.isArray(value)) {
- return `[${value.join(",")}]`
+ const arr = value.map((n) => Number(n))
+ if (arr.some((n) => !Number.isFinite(n))) {
+ throw new TypeORMError(
+ `Invalid ${columnMetadata.type} element: all elements must be finite numbers`,
+ )
+ }
+ const expected =
+ columnMetadata.length != null
+ ? Number(columnMetadata.length)
+ : undefined
+ if (
+ expected != null &&
+ Number.isFinite(expected) &&
+ arr.length !== expected
+ ) {
+ throw new TypeORMError(
+ `Invalid ${columnMetadata.type} length: expected ${expected}, got ${arr.length}`,
+ )
+ }
+ return `[${arr.join(",")}]`
} else {
return value
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/driver/postgres/PostgresDriver.ts around lines 671 to 679, the
vector/halfvec branch currently returns array values without validation; update
it to validate that the value is an array of numbers, every element is a finite
number (no NaN/Infinity), and the array length matches the expected dimension
(use columnMetadata.dimension or columnMetadata.size as appropriate), otherwise
throw a clear error; after validation, return the serialized form as before
(e.g. "[x,y,...]"); also handle the non-array case by throwing a validation
error rather than silently returning invalid data.
gioboa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👏
Description of change
Added
vectordata type support forpgvector.This is part of #11485.
Closes #10056.
Pull-Request Checklist
masterbranchnpm run formatto apply prettier formattingnpm run testpasses with this changeFixes #0000Summary by CodeRabbit
New Features
Documentation
Tests
Chores