-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
feat(mssql): support 'vector' type for MS SQL Server #11732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for the vector column type in Microsoft SQL Server, enabling storage and querying of high-dimensional vector embeddings for machine learning and semantic search applications. The implementation follows SQL Server's native vector type specification and includes proper serialization/deserialization handling.
Key Changes
- Added vector type support to SQL Server driver with dimension validation and JSON serialization
- Implemented tests covering vector column creation, CRUD operations, and similarity searches
- Added comprehensive documentation and sample application demonstrating vector usage
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/driver/types/ColumnTypes.ts | Added mssql to the list of databases supporting vector type |
| src/driver/sqlserver/SqlServerDriver.ts | Implemented vector type handling including serialization, validation, and column definition |
| src/driver/sqlserver/SqlServerQueryRunner.ts | Added special length handling for vector columns to preserve dimensions |
| test/functional/database-schema/vectors/sqlserver/vector.ts | Comprehensive test suite covering vector operations and similarity searches |
| test/functional/database-schema/vectors/sqlserver/entity/*.ts | Test entities demonstrating vector column usage |
| sample/sample37-vector-sqlserver/* | Sample application with documentation showing real-world vector usage |
| docs/docs/entity/1-entities.md | Updated general vector documentation to include SQL Server support |
| docs/docs/drivers/microsoft-sqlserver.md | Added SQL Server-specific vector type documentation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sample/sample37-vector-sqlserver/README.md (1)
179-179: Fix typo in the NOTE.Line 179 contains a typo: "supprot" should be "support".
Apply this diff:
-NOTE: general `halfvec` type supprot is unavailable because this feature is still in preview: https://learn.microsoft.com/en-us/sql/t-sql/data-types/vector-data-type +NOTE: general `halfvec` type support is unavailable because this feature is still in preview: https://learn.microsoft.com/en-us/sql/t-sql/data-types/vector-data-type
🧹 Nitpick comments (1)
docs/docs/entity/1-entities.md (1)
241-254: Add security note about query construction in SQL Server vector examples.The documentation example uses string interpolation for the vector parameter, which could mislead users into adopting this pattern in production code without understanding the security implications.
Consider adding a security note after the code example:
const results = await dataSource.query(` DECLARE @question AS VECTOR (3) = '${JSON.stringify(queryEmbedding)}'; SELECT TOP (5) id, embedding, VECTOR_DISTANCE('cosine', @question, embedding) AS distance FROM post ORDER BY VECTOR_DISTANCE('cosine', @question, embedding) `) + +> **Security Note**: When using vector queries with user-provided data, ensure proper input validation and sanitization. The example above uses string interpolation for demonstration purposes with trusted, application-generated embeddings. For production use with untrusted input, implement appropriate security measures.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
docs/docs/drivers/microsoft-sqlserver.md(1 hunks)docs/docs/entity/1-entities.md(2 hunks)sample/sample37-vector-sqlserver/README.md(1 hunks)sample/sample37-vector-sqlserver/app.ts(1 hunks)sample/sample37-vector-sqlserver/entity/Document.ts(1 hunks)sample/sample37-vector-sqlserver/entity/DocumentChunk.ts(1 hunks)sample/sample37-vector-sqlserver/package.json(1 hunks)src/driver/sqlserver/SqlServerDriver.ts(7 hunks)src/driver/sqlserver/SqlServerQueryRunner.ts(1 hunks)src/driver/types/ColumnTypes.ts(1 hunks)test/functional/database-schema/vectors/sqlserver/entity/DocumentChunk.ts(1 hunks)test/functional/database-schema/vectors/sqlserver/entity/Point.ts(1 hunks)test/functional/database-schema/vectors/sqlserver/vector.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
test/functional/database-schema/vectors/sqlserver/entity/DocumentChunk.ts (5)
sample/sample37-vector-sqlserver/entity/Document.ts (1)
Entity(9-19)sample/sample37-vector-sqlserver/entity/DocumentChunk.ts (1)
Entity(10-27)test/functional/database-schema/vectors/sqlserver/entity/Point.ts (1)
Entity(5-15)src/decorator/columns/PrimaryGeneratedColumn.ts (1)
PrimaryGeneratedColumn(55-119)src/decorator/columns/Column.ts (1)
Column(134-220)
sample/sample37-vector-sqlserver/app.ts (1)
sample/playground/src/ormconfig.ts (1)
AppDataSource(5-10)
test/functional/database-schema/vectors/sqlserver/entity/Point.ts (4)
sample/sample37-vector-sqlserver/entity/DocumentChunk.ts (1)
Entity(10-27)test/functional/database-schema/vectors/sqlserver/entity/DocumentChunk.ts (1)
Entity(5-18)src/decorator/columns/PrimaryGeneratedColumn.ts (1)
PrimaryGeneratedColumn(55-119)src/decorator/columns/Column.ts (1)
Column(134-220)
src/driver/sqlserver/SqlServerDriver.ts (2)
src/error/TypeORMError.ts (1)
TypeORMError(1-17)src/driver/sqlserver/SqlServerQueryRunner.ts (1)
query(203-332)
sample/sample37-vector-sqlserver/entity/Document.ts (4)
sample/sample37-vector-sqlserver/entity/DocumentChunk.ts (1)
Entity(10-27)src/decorator/columns/PrimaryGeneratedColumn.ts (1)
PrimaryGeneratedColumn(55-119)src/decorator/columns/Column.ts (1)
Column(134-220)src/decorator/relations/OneToMany.ts (1)
OneToMany(11-47)
sample/sample37-vector-sqlserver/entity/DocumentChunk.ts (3)
sample/sample37-vector-sqlserver/entity/Document.ts (1)
Entity(9-19)src/decorator/relations/ManyToOne.ts (1)
ManyToOne(33-77)src/decorator/relations/JoinColumn.ts (1)
JoinColumn(31-48)
test/functional/database-schema/vectors/sqlserver/vector.ts (2)
test/utils/test-utils.ts (3)
createTestingConnections(388-482)reloadTestingDatabases(504-509)closeTestingConnections(487-499)src/driver/sqlserver/SqlServerQueryRunner.ts (1)
query(203-332)
🪛 LanguageTool
docs/docs/drivers/microsoft-sqlserver.md
[grammar] ~179-~179: Ensure spelling is correct
Context: ...lications NOTE: general halfvec type supprot is unavailable because this feature is ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
sample/sample37-vector-sqlserver/README.md
[style] ~83-~83: The double modal “needed Run” is nonstandard (only accepted in certain dialects). Consider “to be Run”.
Context: ...ction settings in app.ts if needed 3. Run: ```bash npm install ts-nod...
(NEEDS_FIXED)
🪛 markdownlint-cli2 (0.18.1)
docs/docs/drivers/microsoft-sqlserver.md
179-179: Bare URL used
(MD034, no-bare-urls)
docs/docs/entity/1-entities.md
190-190: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
191-191: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
192-192: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
194-194: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
195-195: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
🔇 Additional comments (19)
sample/sample37-vector-sqlserver/package.json (1)
1-19: LGTM!The package.json configuration is well-structured for a sample project. Dependencies are appropriate:
mssql ^11.0.0provides SQL Server connectivityreflect-metadata ^0.2.2is the latest stable version- TypeScript tooling is properly configured
Note: Using
"typeorm": "latest"is acceptable for samples but should be pinned to a specific version in production applications.src/driver/types/ColumnTypes.ts (1)
78-78: LGTM!The change correctly adds
mssqlto the list of databases supporting thevectorcolumn type. This aligns with the PR's objective of enabling vector support for Microsoft SQL Server.docs/docs/drivers/microsoft-sqlserver.md (1)
170-225: Excellent documentation for vector type support.The new section provides comprehensive documentation including:
- Clear explanation of vector type usage for embeddings and semantic search
- Entity definition example with vector column
- VECTOR_DISTANCE usage for similarity search
- Documentation of all three distance metrics
This will be very helpful for developers implementing vector-based features.
test/functional/database-schema/vectors/sqlserver/entity/Point.ts (1)
1-15: LGTM!The Point entity is well-structured for testing vector functionality:
- Simple 3-dimensional vector for easy verification
- Nullable fields allow flexible test scenarios
- Proper TypeORM decorator usage
src/driver/sqlserver/SqlServerQueryRunner.ts (1)
3154-3166: LGTM! Critical fix for vector dimension preservation.This change ensures vector column dimensions are always preserved during table reconstruction. The logic correctly:
- Detects vector type columns
- Preserves the length value (which represents dimensions) regardless of default-length checks
- Falls back to existing logic for non-vector types
This is essential because vector dimensions (e.g., 1998 in
vector(1998)) must remain intact for semantic search operations to work correctly.test/functional/database-schema/vectors/sqlserver/entity/DocumentChunk.ts (1)
1-18: LGTM!The DocumentChunk test entity is well-designed for testing vector functionality:
- Uses realistic embedding dimension (1998) matching common models like OpenAI embeddings
- Nullable fields provide flexibility for various test scenarios
- Clean entity structure focused on testing vector column operations
sample/sample37-vector-sqlserver/entity/Document.ts (1)
1-19: LGTM!The Document entity is well-structured for the sample application:
- Uses UUID primary key generation, appropriate for document identifiers
- OneToMany relationship properly configured with inverse side reference
- Clean entity design that complements the DocumentChunk relationship
sample/sample37-vector-sqlserver/app.ts (2)
6-19: Hardcoded credentials are acceptable for sample code.The hardcoded database credentials and
trustServerCertificate: trueare appropriate for a local development sample. The inline comment on line 16 clearly explains the purpose.
21-88: Sample logic is sound and demonstrates vector functionality well.The code correctly demonstrates:
- Entity creation and persistence
- Random embedding generation with consistent dimensions (1998)
- Vector similarity search using VECTOR_DISTANCE
- Proper cleanup with connection.destroy()
test/functional/database-schema/vectors/sqlserver/vector.ts (4)
26-40: Good test coverage for vector column schema validation.The test correctly validates the vector column type and length. The comment on line 37 provides useful context about SQL Server's internal length calculation.
42-70: Well-structured persistence and hydration test.The test properly validates array integrity and uses appropriate floating-point comparison with
closeTo(embedding[idx], 0.0001)to handle precision issues.
203-222: Type cast for null embedding is acceptable for testing.Using
as anyon line 210 to assignnullto anumber[]property is a pragmatic approach for testing null handling. The test correctly validates that null values persist and hydrate properly.
128-137: <!-- [scratchpad_update]
[observations]
- Web search shows TypeORM SQL Server driver uses MssqlParameter for parameterized queries
- Search result index 3, 5: "Sql server driver requires parameter types to be specified for input parameters"
- Search result index 6, 10: MssqlParameter wraps values with type information
- No explicit mention of VECTOR type support found in search results
- Column types list (index 1-27, 8-36) does NOT include "vector" type
- SAP database types include "half_vector" and "real_vector" but not SQL Server
[analysis]
- The MssqlParameter approach requires the type to be recognized by the driver
- "vector" type is not listed in SQL Server's supported column types in TypeORM
- This suggests SQL Server driver doesn't natively support vector parameters
- The string interpolation approach in the code may be necessary workaround
[pending]
- Confirm if SQL Server vector type is actually supported or if there's a newer version with support
- Check if there are workarounds or specific setup needed
[actions]
- Search for "SQL Server vector" support in TypeORM or specific version info
[scratchpad_end] --><!-- [scratchpad_update]
[observations]- Search result 11-1, 11-20: TypeORM supports vector/halfvec types but explicitly lists "works on PostgreSQL and SAP HANA"—NOT SQL Server
- Search result 11-12: "Vector columns are supported on both PostgreSQL (via pgvector extension) and SAP HANA Cloud"
- Search result 12-11: Recent TypeORM release adds "support vector/halfvec data types" but only for Postgres (PR #11437)
- Search result 15-3: SQL Server 2025 added NATIVE vector type support
- Search result 18-1: SQL Server driver supported types list does NOT include "vector"
- Search result 14-2: ColumnTypes only lists "half_vector" and "real_vector" for SAP, NOT "vector" for SQL Server
[analysis]
- TypeORM currently does NOT support vector data type for SQL Server
- Vector support was added to TypeORM for PostgreSQL and SAP HANA only
- SQL Server 2025 introduced native vector type support but TypeORM has not yet added SQL Server vector support
- This explains why the code uses string interpolation: no parameterized vector support exists in TypeORM for SQL Server
- The approach in the code is a necessary workaround
[pending]
- Confirm if there's any issue with the current approach (security, performance)
[actions]
- Consider if SQL injection risk applies here (JSON.stringify used, not raw string input)
- Note that this is consistent pattern across entire codebase
- Conclusion: Review comment suggestion is valid but appears to already be the standard approach
[done]
- Verification confirms: TypeORM SQL Server driver does NOT support parameterized vector types
- All vector queries in codebase use string interpolation consistently
- This is not a bug but a necessary limitation
[scratchpad_end] -->Based on the verification results, I can now provide a definitive rewritten review comment.
TypeORM SQL Server driver lacks parameterized vector type support; current string interpolation approach is necessary.
TypeORM's vector type support is limited to PostgreSQL (via pgvector extension) and SAP HANA, and the SQL Server driver's supported column types do not include the vector type. While SQL Server 2025 added native vector storage and vector search capabilities, TypeORM has not yet extended parameterized query support for vector parameters in SQL Server.
The string interpolation approach used in this test (lines 128-137) matches the pattern in the sample app and is the current required workaround. While not ideal from a security perspective,
JSON.stringify()mitigates injection risks better than raw string concatenation. A comment documenting this TypeORM limitation would improve code clarity for future maintainers.docs/docs/entity/1-entities.md (1)
185-260: Documentation comprehensively covers SQL Server vector support.The documentation clearly explains vector types across databases, provides usage examples, and includes database-specific notes about requirements and compatibility.
sample/sample37-vector-sqlserver/entity/DocumentChunk.ts (1)
10-27: Entity definition is well-structured.The entity correctly demonstrates:
- UUID primary key generation
- VARCHAR(MAX) for content storage
- Vector column with explicit length (1998 dimensions)
- Proper relationship mapping with @manytoone and @joincolumn
src/driver/sqlserver/SqlServerDriver.ts (4)
145-145: Vector type correctly added to driver metadata.The vector type is appropriately added to both
supportedDataTypesandwithLengthColumnTypes, ensuring proper schema generation and validation.Also applies to: 168-168
554-560: Vector serialization logic is correct.The implementation properly serializes vector arrays to JSON for persistence, with a fallback to return the value as-is if it's not an array.
597-604: Robust vector deserialization with error handling.The JSON.parse is wrapped in a try-catch block with a fallback to return the original value, which gracefully handles edge cases where the stored value might not be valid JSON.
727-735: Good validation for vector length requirement.The code correctly enforces that vector columns must have an explicit length set, throwing a descriptive TypeORMError when missing. The comment on line 733 provides useful context about SQL Server's internal storage calculation.
commit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/driver/sqlserver/SqlServerDriver.ts (1)
1101-1113: Prevent endless schema diffs: compare vector dimensions, not raw byte length.SQL Server reports
tableColumn.lengthfor VECTOR in bytes (N*4+8), while entity uses dimensions (N). Current comparison will always differ (e.g., 8000 vs 1998).- return ( - tableColumn.length.toUpperCase() !== - this.getColumnLength(columnMetadata).toUpperCase() - ) + if (tableColumn.type === "vector") { + const tableBytes = parseInt(String(tableColumn.length), 10) + const metaDims = Number(this.getColumnLength(columnMetadata)) + if (!isNaN(tableBytes) && !isNaN(metaDims)) { + const expectedBytes = metaDims * 4 + 8 + return tableBytes !== expectedBytes + } + // Fallback to string compare if parsing fails + } + return ( + String(tableColumn.length).toUpperCase() !== + this.getColumnLength(columnMetadata).toUpperCase() + )
🧹 Nitpick comments (5)
docs/docs/drivers/microsoft-sqlserver.md (1)
207-214: Parameterize the VECTOR assignment to avoid string interpolation.Safer and consistent with TypeORM’s mssql parameter handling.
-const results = await dataSource.query(` - DECLARE @question AS VECTOR (1998) = '${JSON.stringify(queryEmbedding)}'; - SELECT TOP (10) dc.*, - VECTOR_DISTANCE('cosine', @question, embedding) AS distance - FROM document_chunk dc - ORDER BY VECTOR_DISTANCE('cosine', @question, embedding) -`) +const results = await dataSource.query( + ` + DECLARE @question AS VECTOR (1998) = @0; + SELECT TOP (10) dc.*, + VECTOR_DISTANCE('cosine', @question, embedding) AS distance + FROM document_chunk dc + ORDER BY VECTOR_DISTANCE('cosine', @question, embedding) + `, + [JSON.stringify(queryEmbedding)], +)test/functional/database-schema/vectors/sqlserver/vector.ts (3)
37-39: Avoid magic number; compute expected byte length from dimensions.Keeps the assertion clear if dimensions change.
-// NOTE: real returned length is (N*4 + 8) where N is desired dimensions -expect(embeddingColumn!.length).to.equal("8000") +// NOTE: DB reports bytes: (N*4 + 8), where N is desired dimensions. +const expectedBytes = (1998 * 4 + 8).toString() +expect(embeddingColumn!.length).to.equal(expectedBytes)
128-137: Parameterize the query vector instead of interpolating JSON.Prevents accidental injection and quoting issues.
-const results = await connection.query( - ` - DECLARE @query AS VECTOR (1998) = '${JSON.stringify( - query, - )}'; - SELECT TOP (2) *, VECTOR_DISTANCE('cosine', @query, embedding) AS distance - FROM document_chunk - ORDER BY VECTOR_DISTANCE('cosine', @query, embedding) - `, -) +const results = await connection.query( + ` + DECLARE @query AS VECTOR (1998) = @0; + SELECT TOP (2) *, VECTOR_DISTANCE('cosine', @query, embedding) AS distance + FROM document_chunk + ORDER BY VECTOR_DISTANCE('cosine', @query, embedding) + `, + [JSON.stringify(query)], +)
179-186: Parameterize the Euclidean query too.-const results = await connection.query( - ` - DECLARE @origin AS VECTOR (3) = '${JSON.stringify(origin)}'; - SELECT TOP (2) *, VECTOR_DISTANCE('euclidean', @origin, coords) AS distance - FROM point - ORDER BY VECTOR_DISTANCE('euclidean', @origin, coords) - `, -) +const results = await connection.query( + ` + DECLARE @origin AS VECTOR (3) = @0; + SELECT TOP (2) *, VECTOR_DISTANCE('euclidean', @origin, coords) AS distance + FROM point + ORDER BY VECTOR_DISTANCE('euclidean', @origin, coords) + `, + [JSON.stringify(origin)], +)docs/docs/entity/1-entities.md (1)
190-196: Fix nested list indentation to satisfy markdownlint.Add a blank line before sublists and use consistent two-space indentation.
-- `vector` - stores vectors as 4-byte floats (single precision) - - PostgreSQL: native `vector` type via pgvector extension - - SQL Server: native `vector` type - - SAP HANA: alias for `real_vector` type -- `halfvec` - stores vectors as 2-byte floats (half precision) for memory efficiency - - PostgreSQL: native `halfvec` type via pgvector extension - - SAP HANA: alias for `half_vector` type +- `vector` - stores vectors as 4-byte floats (single precision) + + - PostgreSQL: native `vector` type via pgvector extension + - SQL Server: native `vector` type + - SAP HANA: alias for `real_vector` type + +- `halfvec` - stores vectors as 2-byte floats (half precision) for memory efficiency + + - PostgreSQL: native `halfvec` type via pgvector extension + - SAP HANA: alias for `half_vector` type
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/docs/drivers/microsoft-sqlserver.md(1 hunks)docs/docs/entity/1-entities.md(2 hunks)sample/sample37-vector-sqlserver/app.ts(1 hunks)src/driver/sqlserver/SqlServerDriver.ts(5 hunks)test/functional/database-schema/vectors/sqlserver/vector.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- sample/sample37-vector-sqlserver/app.ts
🧰 Additional context used
🧬 Code graph analysis (2)
src/driver/sqlserver/SqlServerDriver.ts (1)
src/error/TypeORMError.ts (1)
TypeORMError(1-17)
test/functional/database-schema/vectors/sqlserver/vector.ts (2)
test/utils/test-utils.ts (3)
createTestingConnections(388-482)reloadTestingDatabases(504-509)closeTestingConnections(487-499)src/driver/sqlserver/SqlServerQueryRunner.ts (1)
query(203-332)
🪛 markdownlint-cli2 (0.18.1)
docs/docs/drivers/microsoft-sqlserver.md
179-179: Bare URL used
(MD034, no-bare-urls)
docs/docs/entity/1-entities.md
191-191: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
191-191: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
192-192: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
192-192: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
193-193: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
193-193: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
195-195: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
195-195: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
196-196: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
196-196: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
docs/docs/entity/1-entities.md (1)
191-196: Fix nested list indentation to satisfy markdownlint.The nested list items under
vectorandhalfvechave inconsistent indentation, causing MD005 and MD007 linting errors. These nested items need proper indentation relative to their parent items.Apply this diff to correct the indentation:
- `vector` - stores vectors as 4-byte floats (single precision) - - PostgreSQL: native `vector` type via pgvector extension - - SQL Server: native `vector` type - - SAP HANA: alias for `real_vector` type + - PostgreSQL: native `vector` type via pgvector extension + - SQL Server: native `vector` type + - SAP HANA: alias for `real_vector` type - `halfvec` - stores vectors as 2-byte floats (half precision) for memory efficiency - - PostgreSQL: native `halfvec` type via pgvector extension - - SAP HANA: alias for `half_vector` type + - PostgreSQL: native `halfvec` type via pgvector extension + - SAP HANA: alias for `half_vector` typeBased on static analysis hints.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/docs/entity/1-entities.md(2 hunks)src/driver/sqlserver/MssqlParameter.ts(1 hunks)src/driver/sqlserver/SqlServerQueryRunner.ts(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/driver/sqlserver/SqlServerQueryRunner.ts
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/docs/entity/1-entities.md
191-191: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
191-191: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
192-192: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
192-192: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
193-193: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
193-193: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
195-195: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
195-195: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
196-196: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
196-196: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
🔇 Additional comments (1)
docs/docs/entity/1-entities.md (1)
185-196: Approve vector column documentation updates.The documentation now correctly reflects SQL Server support alongside PostgreSQL and SAP HANA, and the inline comment on line 206 properly clarifies that SQL Server requires explicit dimensions. The database-specific examples and notes are comprehensive and accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/docs/entity/1-entities.md (1)
190-196: Fix markdown list indentation per linter.Static analysis (markdownlint MD005/MD007) flags inconsistent indentation on lines 191–196. The nested database-specific items under
vectorandhalfvecshould use consistent indentation formatting.- `vector` - stores vectors as 4-byte floats (single precision) - - PostgreSQL: native `vector` type via pgvector extension - - SQL Server: native `vector` type - - SAP HANA: alias for `real_vector` type + - PostgreSQL: native `vector` type via pgvector extension + - SQL Server: native `vector` type + - SAP HANA: alias for `real_vector` type - `halfvec` - stores vectors as 2-byte floats (half precision) for memory efficiency - - PostgreSQL: native `halfvec` type via pgvector extension - - SAP HANA: alias for `half_vector` type + - PostgreSQL: native `halfvec` type via pgvector extension + - SAP HANA: alias for `half_vector` type
🧹 Nitpick comments (1)
sample/sample37-vector-sqlserver/README.md (1)
85-90: Minor grammar suggestion: Use 'then' or restructure the list.Line 86 reads "if needed 3. Run:" which is slightly awkward. Consider adding "then" or restructuring for clarity.
2. Update the connection settings in `app.ts` if needed - 3. Run: + 3. Then run:or alternatively:
- 2. Update the connection settings in `app.ts` if needed - 3. Run: + 2. Update the connection settings in `app.ts` if needed. + 3. Run:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/docs/drivers/microsoft-sqlserver.md(1 hunks)docs/docs/entity/1-entities.md(2 hunks)sample/sample37-vector-sqlserver/README.md(1 hunks)src/driver/sqlserver/MssqlParameter.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/driver/sqlserver/MssqlParameter.ts
- docs/docs/drivers/microsoft-sqlserver.md
🧰 Additional context used
🪛 LanguageTool
sample/sample37-vector-sqlserver/README.md
[style] ~86-~86: The double modal “needed Run” is nonstandard (only accepted in certain dialects). Consider “to be Run”.
Context: ...ction settings in app.ts if needed 3. Run: ```bash npm install ts-nod...
(NEEDS_FIXED)
🪛 markdownlint-cli2 (0.18.1)
docs/docs/entity/1-entities.md
191-191: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
191-191: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
192-192: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
192-192: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
193-193: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
193-193: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
195-195: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
195-195: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
196-196: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
196-196: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
🔇 Additional comments (5)
sample/sample37-vector-sqlserver/README.md (2)
48-66: Excellent parameterized query example.The SQL Server vector similarity search example uses proper parameter binding (
@0) instead of string interpolation, demonstrating secure query construction. The VECTOR_DISTANCE syntax and the JSON.stringify transformation are correct.
1-96: Well-documented sample with clear examples.The README provides comprehensive coverage: entity definitions with vector columns, similarity search patterns, distance metrics, and requirements. The code examples are runnable and the documentation is clear for developers getting started with SQL Server vectors.
docs/docs/entity/1-entities.md (3)
206-206: Verify: SQL Server dimension requirement comment is accurate.Line 206's comment correctly notes that SQL Server requires explicit dimensions (unlike PostgreSQL and SAP HANA). This aligns with the dimension-specific code example below at lines 210–212 and the SQL Server sample app.
242-255: Excellent: SQL Server example uses parameterized queries.The SQL Server VECTOR_DISTANCE example correctly uses parameterized query approach (
@0placeholder with separate[JSON.stringify(queryEmbedding)]array), preventing SQL injection risks. TheDECLARE @question AS VECTOR(3) = @0;syntax and distance metric usage are correct.
185-261: Strong documentation update for SQL Server vector support.The vector columns section now comprehensively covers SQL Server alongside PostgreSQL and SAP HANA. Examples are secure (parameterized), requirements are clear, and the SQL Server-specific VECTOR_DISTANCE function documentation is helpful for users implementing similarity search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/tests-linux.yml(1 hunks)docker-compose.yml(1 hunks)sample/sample37-vector-sqlserver/app.ts(1 hunks)src/commands/InitCommand.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
sample/sample37-vector-sqlserver/app.ts (1)
sample/playground/src/ormconfig.ts (1)
AppDataSource(5-10)
🔇 Additional comments (3)
src/commands/InitCommand.ts (1)
622-622: LGTM! Consistent with infrastructure updates.The addition of
MSSQL_PID: "Express"aligns with the updated MSSQL 2025 image and matches the configuration indocker-compose.ymland the CI workflow..github/workflows/tests-linux.yml (1)
73-79: LGTM! Workflow updated for SQL Server 2025 vector support.The MSSQL service configuration correctly uses the 2025-latest image and includes the required
MSSQL_PID: "Express"environment variable, consistent withdocker-compose.yml.docker-compose.yml (1)
76-83: LGTM! Docker configuration updated for vector support.The MSSQL service now uses the 2025-latest image with the required
MSSQL_PID: "Express"environment variable, enabling native vector data type support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/driver/sqlserver/SqlServerDriver.ts (2)
554-560: Consider adding error handling for JSON.stringify.While unlikely for numeric vector arrays,
JSON.stringifycould theoretically fail on circular references or non-serializable values. Consider wrapping it in a try-catch for defensive coding:} else if (columnMetadata.type === "vector") { if (Array.isArray(value)) { - return JSON.stringify(value) + try { + return JSON.stringify(value) + } catch (e) { + throw new TypeORMError( + `Failed to serialize vector value for column "${columnMetadata.propertyName}": ${e.message}` + ) + } } else { return value } }
597-604: Consider logging parse errors for debugging.The try-catch correctly prevents crashes, but silently ignoring parse errors might make debugging harder. Consider logging the error:
} else if (columnMetadata.type === "vector") { if (typeof value === "string") { try { value = JSON.parse(value) } catch (e) { - // If parsing fails, return the value as-is + // If parsing fails, return the value as-is + // Note: This might indicate corrupted data or invalid format } } }Alternatively, you could log at warn level if the connection has a logger available, though this would require accessing
this.connection.logger.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/driver/sqlserver/SqlServerDriver.ts(5 hunks)src/driver/sqlserver/SqlServerQueryRunner.ts(2 hunks)test/functional/database-schema/vectors/sqlserver/vector.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- test/functional/database-schema/vectors/sqlserver/vector.ts
- src/driver/sqlserver/SqlServerQueryRunner.ts
🧰 Additional context used
🧬 Code graph analysis (1)
src/driver/sqlserver/SqlServerDriver.ts (1)
src/error/TypeORMError.ts (1)
TypeORMError(1-17)
🔇 Additional comments (4)
src/driver/sqlserver/SqlServerDriver.ts (4)
145-145: LGTM: Vector type registration.The vector type is correctly added to
supportedDataTypesandwithLengthColumnTypes, aligning with MS SQL Server's vector data type requirements.Also applies to: 168-168
726-734: LGTM: Vector type validation and creation.The length validation is appropriate and prevents invalid DDL. The error message is clear, and the
vector(${column.length})format matches MS SQL Server syntax.
736-736: LGTM: Correct use of else-if.Changing to
else ifprevents vector types from being double-processed in the subsequent length-based logic, avoiding malformed type strings likevector(N)(N).
145-145: Well-implemented vector type support.The implementation correctly handles:
- Type registration in the appropriate arrays
- Persistence via JSON serialization
- Hydration with safe parsing
- Length validation and DDL generation
- Integration with existing type handling infrastructure
The suggested improvements above are optional enhancements for error handling and debugging, not blocking issues.
Also applies to: 168-168, 554-560, 597-604, 726-734, 736-736
|
Left a few minor comments, but looks good otherwise. Nice contribution, thank you very much! |
add basic support of vector column type + sample app
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
a1c88a8 to
3b2e981
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
test/functional/database-schema/vectors/sqlserver/vector.ts (2)
127-136: Consider using parameterized queries for consistency.The test uses string interpolation to embed the vector query directly in SQL (line 129-131). While safe in this controlled test context, the README demonstrates parameterized queries using
@0parameters (README lines 54-66), which is more maintainable and consistent with best practices.Consider refactoring to use parameterized queries:
+ const queryParam = JSON.stringify(query) const results = await connection.query( ` - DECLARE @query AS VECTOR (1998) = '${JSON.stringify( - query, - )}'; + DECLARE @query AS VECTOR (1998) = @0; SELECT TOP (2) *, VECTOR_DISTANCE('cosine', @query, embedding) AS distance FROM document_chunk ORDER BY VECTOR_DISTANCE('cosine', @query, embedding) `, + [queryParam] )
178-185: Consider using parameterized queries for consistency.Similar to the cosine similarity test, this test uses string interpolation for the vector parameter. Consider using parameterized queries as shown in the README for consistency with documented best practices.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
.github/workflows/tests-linux.yml(1 hunks)docker-compose.yml(1 hunks)docs/docs/drivers/microsoft-sqlserver.md(1 hunks)docs/docs/entity/1-entities.md(2 hunks)sample/sample37-vector-sqlserver/README.md(1 hunks)sample/sample37-vector-sqlserver/app.ts(1 hunks)sample/sample37-vector-sqlserver/entity/Document.ts(1 hunks)sample/sample37-vector-sqlserver/entity/DocumentChunk.ts(1 hunks)sample/sample37-vector-sqlserver/package.json(1 hunks)src/commands/InitCommand.ts(1 hunks)src/driver/sqlserver/MssqlParameter.ts(1 hunks)src/driver/sqlserver/SqlServerDriver.ts(6 hunks)src/driver/sqlserver/SqlServerQueryRunner.ts(2 hunks)src/driver/types/ColumnTypes.ts(1 hunks)test/functional/database-schema/vectors/sqlserver/entity/DocumentChunk.ts(1 hunks)test/functional/database-schema/vectors/sqlserver/entity/Point.ts(1 hunks)test/functional/database-schema/vectors/sqlserver/vector.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
- docker-compose.yml
- src/driver/sqlserver/MssqlParameter.ts
- sample/sample37-vector-sqlserver/package.json
- docs/docs/drivers/microsoft-sqlserver.md
- docs/docs/entity/1-entities.md
- sample/sample37-vector-sqlserver/app.ts
- src/commands/InitCommand.ts
- sample/sample37-vector-sqlserver/entity/DocumentChunk.ts
- src/driver/sqlserver/SqlServerQueryRunner.ts
- sample/sample37-vector-sqlserver/entity/Document.ts
- src/driver/sqlserver/SqlServerDriver.ts
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-27T20:50:10.364Z
Learnt from: alumni
Repo: typeorm/typeorm PR: 11581
File: docs/docs/drivers/postgres.md:23-23
Timestamp: 2025-07-27T20:50:10.364Z
Learning: The correct data source type for Aurora PostgreSQL in TypeORM is `aurora-postgres`, not `aurora-data-api-pg`. The `aurora-data-api-pg` driver was renamed to `aurora-postgres` according to the CHANGELOG.md. This is defined in the DatabaseType union type and AuroraPostgresConnectionOptions interface.
Applied to files:
src/driver/types/ColumnTypes.ts
📚 Learning: 2025-10-16T19:19:25.146Z
Learnt from: ZBAGI
Repo: typeorm/typeorm PR: 11724
File: test/functional/schema-builder/entity/react-native/Post.ts:0-0
Timestamp: 2025-10-16T19:19:25.146Z
Learning: In TypeORM tests, react-native uses SQLite as its underlying database, not MySQL or MariaDB. SQLite treats double quotes as identifier quotes, making them appropriate for CHECK constraints and other SQL expressions in react-native entity files.
Applied to files:
test/functional/database-schema/vectors/sqlserver/vector.ts
🧬 Code graph analysis (3)
test/functional/database-schema/vectors/sqlserver/entity/DocumentChunk.ts (4)
sample/sample37-vector-sqlserver/entity/DocumentChunk.ts (1)
Entity(10-27)test/functional/database-schema/vectors/sqlserver/entity/Point.ts (1)
Entity(5-15)src/decorator/columns/PrimaryGeneratedColumn.ts (1)
PrimaryGeneratedColumn(55-119)src/decorator/columns/Column.ts (1)
Column(134-220)
test/functional/database-schema/vectors/sqlserver/entity/Point.ts (5)
sample/sample37-vector-sqlserver/entity/Document.ts (1)
Entity(9-19)sample/sample37-vector-sqlserver/entity/DocumentChunk.ts (1)
Entity(10-27)test/functional/database-schema/vectors/sqlserver/entity/DocumentChunk.ts (1)
Entity(5-18)src/decorator/columns/PrimaryGeneratedColumn.ts (1)
PrimaryGeneratedColumn(55-119)src/decorator/columns/Column.ts (1)
Column(134-220)
test/functional/database-schema/vectors/sqlserver/vector.ts (2)
test/utils/test-utils.ts (3)
createTestingConnections(388-482)reloadTestingDatabases(504-509)closeTestingConnections(487-499)src/driver/sqlserver/SqlServerQueryRunner.ts (1)
query(203-332)
🪛 LanguageTool
sample/sample37-vector-sqlserver/README.md
[style] ~86-~86: The double modal “needed Run” is nonstandard (only accepted in certain dialects). Consider “to be Run”.
Context: ...ction settings in app.ts if needed 3. Run: ```bash npm install ts-nod...
(NEEDS_FIXED)
🔇 Additional comments (10)
.github/workflows/tests-linux.yml (1)
73-79: LGTM! CI configuration updated for vector support.The update to SQL Server 2025 and Express edition is appropriate for testing the new vector type functionality. The Express edition provides the necessary vector support while remaining suitable for CI environments.
src/driver/types/ColumnTypes.ts (1)
78-78: LGTM! Type definition correctly updated.The addition of MSSQL to the vector type comment accurately reflects the new support for vector columns in SQL Server, consistent with the driver implementation changes.
test/functional/database-schema/vectors/sqlserver/entity/DocumentChunk.ts (1)
5-18: LGTM! Test entity is well-structured.The DocumentChunk entity appropriately defines a vector column with 1998 dimensions for testing. The simplified structure (without relations) is suitable for functional testing of vector column behavior.
test/functional/database-schema/vectors/sqlserver/entity/Point.ts (1)
5-15: LGTM! Point entity complements the test suite.The 3-dimensional vector column is well-suited for testing geometric distance calculations and provides good coverage alongside the 1998-dimension DocumentChunk entity.
sample/sample37-vector-sqlserver/README.md (1)
1-96: LGTM! Comprehensive documentation.The README provides excellent coverage of vector type usage, including entity definition, similarity search examples, and distance metrics. The parameterized query examples (lines 54-66) demonstrate proper usage patterns.
Note: The LanguageTool hint about "needed Run" on line 86 is a false positive caused by the line break formatting in the numbered list.
test/functional/database-schema/vectors/sqlserver/vector.ts (5)
12-24: LGTM! Test suite properly configured.The test setup follows TypeORM conventions with appropriate before/after hooks and schema management.
26-39: LGTM! DDL verification test is thorough.The test correctly verifies that the vector column is created with the specified type and dimensions.
41-69: LGTM! Persistence test handles floating-point precision correctly.The test appropriately uses
closeTowith a 0.0001 tolerance for floating-point comparisons and thoroughly validates round-trip persistence of vector data.
71-92: LGTM! Update test validates vector modification.The test appropriately verifies that vector columns can be updated, using exact equality checks for the clean integer values.
202-221: LGTM! Null handling test is appropriate.The test correctly verifies that null vector values are handled properly through the persistence layer.
gioboa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your help @artiz 👏
LGTM 👍
Description of change
Added support of vector column type for MS SQL Server with related examples in docs and sample app.
Related SQL Server documentation: https://learn.microsoft.com/en-us/sql/t-sql/data-types/vector-data-type
No realted issue found.
I use
typeormin my pet project https://github.com/artiz/kate-chat/ and already has basic RAG with MS SQL server, but it would be great to have native support of vectors here.Thank you for great job!
Pull-Request Checklist
masterbranchFixes #00000Summary by CodeRabbit
New Features
Documentation
Samples
Tests
Chores