Skip to content

Commit 2681051

Browse files
authored
feat(mssql): support 'vector' type for MS SQL Server (#11732)
1 parent e04ffd3 commit 2681051

File tree

17 files changed

+652
-30
lines changed

17 files changed

+652
-30
lines changed

.github/workflows/tests-linux.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,12 +70,13 @@ jobs:
7070

7171
services:
7272
mssql:
73-
image: "mcr.microsoft.com/mssql/server:2022-latest"
73+
image: "mcr.microsoft.com/mssql/server:2025-latest"
7474
ports:
7575
- "1433:1433"
7676
env:
7777
SA_PASSWORD: "Admin12345"
7878
ACCEPT_EULA: "Y"
79+
MSSQL_PID: "Express"
7980

8081
steps:
8182
- uses: actions/checkout@v5

docker-compose.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,13 +73,14 @@ services:
7373

7474
# mssql
7575
mssql:
76-
image: "mcr.microsoft.com/mssql/server:2022-latest"
76+
image: "mcr.microsoft.com/mssql/server:2025-latest"
7777
container_name: "typeorm-mssql"
7878
ports:
7979
- "1433:1433"
8080
environment:
8181
SA_PASSWORD: "Admin12345"
8282
ACCEPT_EULA: "Y"
83+
MSSQL_PID: "Express"
8384

8485
# cockroachdb
8586
cockroachdb:

docs/docs/drivers/microsoft-sqlserver.md

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,4 +165,64 @@ Based on [tedious](https://tediousjs.github.io/node-mssql/) MSSQL implementation
165165

166166
## Column Types
167167

168-
`int`, `bigint`, `bit`, `decimal`, `money`, `numeric`, `smallint`, `smallmoney`, `tinyint`, `float`, `real`, `date`, `datetime2`, `datetime`, `datetimeoffset`, `smalldatetime`, `time`, `char`, `varchar`, `text`, `nchar`, `nvarchar`, `ntext`, `binary`, `image`, `varbinary`, `hierarchyid`, `sql_variant`, `timestamp`, `uniqueidentifier`, `xml`, `geometry`, `geography`, `rowversion`
168+
`int`, `bigint`, `bit`, `decimal`, `money`, `numeric`, `smallint`, `smallmoney`, `tinyint`, `float`, `real`, `date`, `datetime2`, `datetime`, `datetimeoffset`, `smalldatetime`, `time`, `char`, `varchar`, `text`, `nchar`, `nvarchar`, `ntext`, `binary`, `image`, `varbinary`, `hierarchyid`, `sql_variant`, `timestamp`, `uniqueidentifier`, `xml`, `geometry`, `geography`, `rowversion`, `vector`
169+
170+
### Vector Type (vector)
171+
172+
The `vector` data type is available in SQL Server for storing high-dimensional vectors, commonly used for:
173+
174+
- Semantic search with embeddings
175+
- Recommendation systems
176+
- Similarity matching
177+
- Machine learning applications
178+
179+
NOTE: general `halfvec` type support is unavailable because this feature is still in preview. See the Microsoft docs: [Vector data type](https://learn.microsoft.com/en-us/sql/t-sql/data-types/vector-data-type).
180+
181+
#### Usage
182+
183+
```typescript
184+
@Entity()
185+
export class DocumentChunk {
186+
@PrimaryGeneratedColumn()
187+
id: number
188+
189+
@Column("varchar")
190+
content: string
191+
192+
// Vector column with 1998 dimensions
193+
@Column("vector", { length: 1998 })
194+
embedding: number[]
195+
}
196+
```
197+
198+
#### Vector Similarity Search
199+
200+
SQL Server provides the `VECTOR_DISTANCE` function for calculating distances between vectors:
201+
202+
```typescript
203+
const queryEmbedding = [
204+
/* your query vector */
205+
]
206+
207+
const results = await dataSource.query(
208+
`
209+
DECLARE @question AS VECTOR (1998) = @0;
210+
SELECT TOP (10) dc.*,
211+
VECTOR_DISTANCE('cosine', @question, embedding) AS distance
212+
FROM document_chunk dc
213+
ORDER BY VECTOR_DISTANCE('cosine', @question, embedding)
214+
`,
215+
[JSON.stringify(queryEmbedding)],
216+
)
217+
```
218+
219+
**Distance Metrics:**
220+
221+
- `'cosine'` - Cosine distance (most common for semantic search)
222+
- `'euclidean'` - Euclidean (L2) distance
223+
- `'dot'` - Negative dot product
224+
225+
**Requirements:**
226+
227+
- SQL Server version with vector support enabled
228+
- Vector dimensions must be specified using the `length` option

docs/docs/entity/1-entities.md

Lines changed: 38 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -182,16 +182,17 @@ There are several special column types with additional functionality available:
182182

183183
### Vector columns
184184

185-
Vector columns are supported on both PostgreSQL (via [`pgvector`](https://github.com/pgvector/pgvector) extension) and SAP HANA Cloud, enabling storing and querying vector embeddings for similarity search and machine learning applications.
185+
Vector columns are supported on PostgreSQL (via [`pgvector`](https://github.com/pgvector/pgvector) extension), Microsoft SQL Server, and SAP HANA Cloud, enabling storing and querying vector embeddings for similarity search and machine learning applications.
186186

187187
TypeORM supports both `vector` and `halfvec` column types across databases:
188188

189-
- `vector` - stores vectors as 4-byte floats (single precision)
190-
- PostgreSQL: native `vector` type via pgvector extension
191-
- SAP HANA: alias for `real_vector` type
192-
- `halfvec` - stores vectors as 2-byte floats (half precision) for memory efficiency
193-
- PostgreSQL: native `halfvec` type via pgvector extension
194-
- SAP HANA: alias for `half_vector` type
189+
- `vector` - stores vectors as 4-byte floats (single precision)
190+
- PostgreSQL: native `vector` type via pgvector extension
191+
- SQL Server: native `vector` type
192+
- SAP HANA: alias for `real_vector` type
193+
- `halfvec` - stores vectors as 2-byte floats (half precision) for memory efficiency
194+
- PostgreSQL: native `halfvec` type via pgvector extension
195+
- SAP HANA: alias for `half_vector` type
195196

196197
You can specify the vector dimensions using the `length` option:
197198

@@ -201,45 +202,65 @@ export class Post {
201202
@PrimaryGeneratedColumn()
202203
id: number
203204

204-
// Vector without specified dimensions (works on PostgreSQL and SAP HANA)
205+
// Vector without specified dimensions (works on PostgreSQL and SAP HANA; SQL Server requires explicit dimensions)
205206
@Column("vector")
206207
embedding: number[] | Buffer
207208

208-
// Vector with 3 dimensions: vector(3) (works on PostgreSQL and SAP HANA)
209+
// Vector with 3 dimensions: vector(3)
209210
@Column("vector", { length: 3 })
210211
embedding_3d: number[] | Buffer
211212

212-
// Half-precision vector with 4 dimensions: halfvec(4) (works on PostgreSQL and SAP HANA)
213+
// Half-precision vector with 4 dimensions: halfvec(4) (PostgreSQL and SAP HANA only)
213214
@Column("halfvec", { length: 4 })
214215
halfvec_embedding: number[] | Buffer
215216
}
216217
```
217218

218-
Vector columns can be used for similarity searches using PostgreSQL's vector operators:
219+
**PostgreSQL** - Vector columns can be used for similarity searches using vector operators:
219220

220221
```typescript
221222
// L2 distance (Euclidean) - <->
222223
const results = await dataSource.query(
223224
`SELECT id, embedding FROM post ORDER BY embedding <-> $1 LIMIT 5`,
224-
["[1,2,3]"]
225+
["[1,2,3]"],
225226
)
226227

227228
// Cosine distance - <=>
228229
const results = await dataSource.query(
229230
`SELECT id, embedding FROM post ORDER BY embedding <=> $1 LIMIT 5`,
230-
["[1,2,3]"]
231+
["[1,2,3]"],
231232
)
232233

233234
// Inner product - <#>
234235
const results = await dataSource.query(
235236
`SELECT id, embedding FROM post ORDER BY embedding <#> $1 LIMIT 5`,
236-
["[1,2,3]"]
237+
["[1,2,3]"],
237238
)
238239
```
239240

240-
> **Note**:
241-
> - **PostgreSQL**: Vector columns require the `pgvector` extension to be installed. The extension provides the vector data types and similarity operators.
242-
> - **SAP HANA**: Vector columns require SAP HANA Cloud (2024Q1+) and a supported version of `@sap/hana-client`. Use the appropriate [vector similarity functions](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-sql-reference-guide/vector-functions) for similarity searches.
241+
**SQL Server** - Use the `VECTOR_DISTANCE` function for similarity searches:
242+
243+
```typescript
244+
const queryEmbedding = [1, 2, 3]
245+
246+
// Cosine distance
247+
const results = await dataSource.query(
248+
`
249+
DECLARE @question AS VECTOR(3) = @0;
250+
SELECT TOP (5) id, embedding,
251+
VECTOR_DISTANCE('cosine', @question, embedding) AS distance
252+
FROM post
253+
ORDER BY VECTOR_DISTANCE('cosine', @question, embedding)
254+
`,
255+
[JSON.stringify(queryEmbedding)],
256+
)
257+
```
258+
259+
> **Note**:
260+
>
261+
> - **PostgreSQL**: Vector columns require the `pgvector` extension to be installed. The extension provides the vector data types and similarity operators.
262+
> - **SQL Server**: Vector type support requires a compatible SQL Server version with vector functionality enabled.
263+
> - **SAP HANA**: Vector columns require SAP HANA Cloud (2024Q1+) and a supported version of `@sap/hana-client`. Use the appropriate [vector similarity functions](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-sql-reference-guide/vector-functions) for similarity searches.
243264
244265
## Column types
245266

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Vector Type Support in SQL Server
2+
3+
This sample demonstrates how to use the `vector` column type in SQL Server with TypeORM for storing and querying vector embeddings.
4+
5+
## Overview
6+
7+
SQL Server supports the `vector` data type for storing high-dimensional vectors, which is useful for:
8+
9+
- Semantic search with embeddings
10+
- Recommendation systems
11+
- Similarity matching
12+
- Machine learning applications
13+
14+
## Features Demonstrated
15+
16+
1. **Vector Column Definition**: Define columns with specific vector dimensions
17+
2. **Storing Embeddings**: Save vector data as arrays of numbers
18+
3. **Vector Similarity Search**: Use `VECTOR_DISTANCE` function for cosine similarity
19+
20+
## Entity Definition
21+
22+
```typescript
23+
@Entity("document_chunks")
24+
export class DocumentChunk {
25+
@PrimaryGeneratedColumn("uuid")
26+
id: string
27+
28+
@Column("varchar", { length: "MAX" })
29+
content: string
30+
31+
// Vector column with 1998 dimensions
32+
@Column("vector", { length: 1998 })
33+
embedding: number[]
34+
35+
@Column("uuid")
36+
documentId: string
37+
38+
@ManyToOne(() => Document, (document) => document.chunks)
39+
@JoinColumn({ name: "documentId" })
40+
document: Document
41+
}
42+
```
43+
44+
## Vector Similarity Search
45+
46+
SQL Server provides the `VECTOR_DISTANCE` function for calculating distances between vectors:
47+
48+
```typescript
49+
const queryEmbedding = [
50+
/* your query vector */
51+
]
52+
const documentIds = ["doc-id-1", "doc-id-2"]
53+
54+
const results = await connection.query(
55+
`
56+
DECLARE @question AS VECTOR (1998) = @0;
57+
SELECT TOP (10) dc.*,
58+
VECTOR_DISTANCE('cosine', @question, embedding) AS distance,
59+
d.fileName as "documentName"
60+
FROM document_chunks dc
61+
LEFT JOIN documents d ON dc.documentId = d.id
62+
WHERE documentId IN (@1))
63+
ORDER BY VECTOR_DISTANCE('cosine', @question, embedding)
64+
`,
65+
[JSON.stringify(queryEmbedding), documentIds.join(", ")],
66+
)
67+
```
68+
69+
## Distance Metrics
70+
71+
The `VECTOR_DISTANCE` function supports different distance metrics:
72+
73+
- `'cosine'` - Cosine distance (most common for semantic search)
74+
- `'euclidean'` - Euclidean (L2) distance
75+
- `'dot'` - Negative dot product
76+
77+
## Requirements
78+
79+
- SQL Server with vector support enabled
80+
- TypeORM with SQL Server driver (`mssql` package)
81+
82+
## Running the Sample
83+
84+
1. Make sure you have SQL Server running with vector support
85+
2. Update the connection settings in `app.ts` if needed
86+
3. Run:
87+
```bash
88+
npm install
89+
ts-node app.ts
90+
```
91+
92+
## Notes
93+
94+
- Vector dimensions must be specified using the `length` option
95+
- Embeddings are stored as JSON strings internally and converted to/from arrays automatically
96+
- The maximum vector dimension depends on your SQL Server version and configuration
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
import "reflect-metadata"
2+
import { DataSource } from "../../src/index"
3+
import { DocumentChunk } from "./entity/DocumentChunk"
4+
import { Document } from "./entity/Document"
5+
6+
const AppDataSource = new DataSource({
7+
type: "mssql",
8+
host: "localhost",
9+
username: "sa",
10+
password: "Admin12345",
11+
database: "test",
12+
synchronize: true,
13+
dropSchema: true,
14+
logging: false,
15+
entities: [Document, DocumentChunk],
16+
options: {
17+
// Enable trust server certificate for local development
18+
trustServerCertificate: true,
19+
},
20+
})
21+
22+
AppDataSource.initialize()
23+
.then(async (connection) => {
24+
console.log("Inserting documents and chunks with vector embeddings...")
25+
26+
// Create a document
27+
const document = new Document()
28+
document.fileName = "sample-document.txt"
29+
await connection.manager.save(document)
30+
31+
// Generate sample embeddings (in a real app, these would come from an ML model)
32+
const generateEmbedding = (dimension: number): number[] => {
33+
return Array.from({ length: dimension }, () => Math.random())
34+
}
35+
36+
// Create document chunks with embeddings
37+
const chunk1 = new DocumentChunk()
38+
chunk1.content =
39+
"TypeORM is an ORM that can run in NodeJS and can be used with TypeScript and JavaScript."
40+
chunk1.embedding = generateEmbedding(1998)
41+
chunk1.document = document
42+
43+
const chunk2 = new DocumentChunk()
44+
chunk2.content =
45+
"It supports both Active Record and Data Mapper patterns."
46+
chunk2.embedding = generateEmbedding(1998)
47+
chunk2.document = document
48+
49+
const chunk3 = new DocumentChunk()
50+
chunk3.content =
51+
"TypeORM supports MySQL, PostgreSQL, MariaDB, SQLite, MS SQL Server, Oracle, and more."
52+
chunk3.embedding = generateEmbedding(1998)
53+
chunk3.document = document
54+
55+
await connection.manager.save([chunk1, chunk2, chunk3])
56+
57+
console.log("Documents and chunks have been saved!")
58+
59+
// Perform a vector similarity search
60+
console.log("\nPerforming vector similarity search...")
61+
62+
// Query embedding (in a real app, this would be generated from user query)
63+
const queryEmbedding = generateEmbedding(1998)
64+
const documentIds = [document.id]
65+
66+
const docIdParams = documentIds.map((_, i) => `@${i + 1}`).join(", ")
67+
const results = await connection.query(
68+
`
69+
DECLARE @question AS VECTOR (1998) = @0;
70+
SELECT TOP (3) dc.*, VECTOR_DISTANCE('cosine', @question, embedding) AS distance, d.fileName as "documentName"
71+
FROM document_chunks dc
72+
LEFT JOIN documents d ON dc.documentId = d.id
73+
WHERE documentId IN (${docIdParams})
74+
ORDER BY VECTOR_DISTANCE('cosine', @question, embedding)
75+
`,
76+
[JSON.stringify(queryEmbedding), ...documentIds],
77+
)
78+
79+
console.log("Search results (top 3 most similar chunks):")
80+
results.forEach((result: any, index: number) => {
81+
console.log(`\n${index + 1}. Distance: ${result.distance}`)
82+
console.log(` Content: ${result.content.substring(0, 80)}...`)
83+
console.log(` Document: ${result.documentName}`)
84+
})
85+
86+
await connection.destroy()
87+
})
88+
.catch((error) => console.log(error))

0 commit comments

Comments
 (0)