feat: Add documentation for CREATE VECTOR INDEX#27331
Conversation
Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing:** **SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...** **The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.** 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91385788
Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: **StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.** **This results in a structured CreateVectorIndexAnalysis object.** 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91524358
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers Differential Revision: D95325176
Summary: Add create_vector_index function signature ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D95341384
Summary: Support vector search in LogicalPlanner ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: **• LogicalPlanner.createVectorIndexPlan() builds the core execution query:** **CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...** **• The resulting plan tree includes:** **TableFinishNode(target = CreateVectorIndexReference)** **└── TableWriterNode(target = CreateVectorIndexReference)** **└── query plan** 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D93690255
Summary: Add documentation for CREATE VECTOR INDEX Differential Revision: D96414538
Reviewer's GuideImplements full planner, analyzer, metadata, SPI, and parser support for a new CREATE VECTOR INDEX SQL statement, including a dummy create_vector_index aggregation function, a new TableWriter writer target, metadata plumbing, stats hooks, and initial documentation stub. Sequence diagram for CREATE VECTOR INDEX execution pathsequenceDiagram
actor User
participant Parser as SqlParser
participant Analyzer as StatementAnalyzer
participant Analysis as Analysis
participant Planner as LogicalPlanner
participant TWN as TableWriterNode
participant TWI as TableWriteInfo
participant Metadata as MetadataManager
participant SPI as ConnectorMetadata
User->>Parser: parse CREATE VECTOR INDEX ...
Parser-->>User: AST CreateVectorIndex
User->>Analyzer: analyze CreateVectorIndex
Analyzer->>Metadata: metadataResolver.tableExists(sourceTable)
Analyzer-->>User: SemanticException if missing
Analyzer->>Metadata: metadataResolver.tableExists(targetIndexTable)
Analyzer-->>User: SemanticException if exists
Analyzer->>Analyzer: analyze(Table sourceTable)
Analyzer->>Metadata: getTableHandle(sourceTable)
Analyzer->>Metadata: getColumnHandles(sourceTable)
Analyzer->>Analyzer: analyzeWhere(updatingFor)
Analyzer->>Analysis: setCreateVectorIndexAnalysis(...)
Analysis-->>Analyzer: CreateVectorIndexAnalysis stored
User->>Planner: plan CreateVectorIndex
Planner->>Analysis: getCreateVectorIndexAnalysis()
Planner->>Metadata: getHandleVersion(sourceTableName)
Planner->>Metadata: getTableMetadata(sourceTableHandle)
Planner->>Metadata: getColumnHandles(sourceTableHandle)
Planner->>Planner: build TableScanNode
Planner->>Planner: add FilterNode for updatingFor (optional)
Planner->>Planner: lookupFunction create_vector_index
Planner->>Planner: build AggregationNode
Planner->>Planner: build TableWriterNode.CreateVectorIndexReference
Planner->>Planner: wrap in TableFinishNode
Planner-->>User: RelationPlan
User->>TWI: createWriterTarget(CreateVectorIndexReference)
TWI-->>User: PrestoException NOT_SUPPORTED if no optimizer
User->>Metadata: beginCreateVectorIndex(session, catalogName, indexMetadata, layout, sourceTableName)
Metadata->>SPI: beginCreateVectorIndex(connectorSession, indexMetadata, layout, sourceTableName)
SPI-->>Metadata: ConnectorOutputTableHandle
Metadata-->>User: OutputTableHandle
User->>Metadata: finishCreateVectorIndex(session, OutputTableHandle, fragments, stats)
Metadata->>SPI: finishCreateVectorIndex(connectorSession, handle, fragments, stats)
SPI-->>Metadata: Optional ConnectorOutputMetadata
Metadata-->>User: Optional ConnectorOutputMetadata
Class diagram for CREATE VECTOR INDEX AST, analysis, and planningclassDiagram
class Statement {
}
class CreateVectorIndex {
+QualifiedName indexName
+QualifiedName tableName
+List~Identifier~ columns
+Optional~Expression~ updatingFor
+List~Property~ properties
+CreateVectorIndex(QualifiedName indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
+CreateVectorIndex(NodeLocation location, QualifiedName indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
+QualifiedName getIndexName()
+QualifiedName getTableName()
+List~Identifier~ getColumns()
+Optional~Expression~ getUpdatingFor()
+List~Property~ getProperties()
}
Statement <|-- CreateVectorIndex
class Analysis {
-Optional~CreateVectorIndexAnalysis~ createVectorIndexAnalysis
+void setCreateVectorIndexAnalysis(CreateVectorIndexAnalysis analysis)
+Optional~CreateVectorIndexAnalysis~ getCreateVectorIndexAnalysis()
}
class CreateVectorIndexAnalysis {
+QualifiedObjectName sourceTableName
+QualifiedObjectName targetTableName
+List~Identifier~ columns
+Map~String,Expression~ properties
+Optional~Expression~ updatingFor
+CreateVectorIndexAnalysis(QualifiedObjectName sourceTableName, QualifiedObjectName targetTableName, List~Identifier~ columns, Map~String,Expression~ properties, Optional~Expression~ updatingFor)
+QualifiedObjectName getSourceTableName()
+QualifiedObjectName getTargetTableName()
+List~Identifier~ getColumns()
+Map~String,Expression~ getProperties()
+Optional~Expression~ getUpdatingFor()
}
Analysis *-- CreateVectorIndexAnalysis
class StatementAnalyzer {
+Scope visitCreateVectorIndex(CreateVectorIndex node, Optional~Scope~ scope)
}
class LogicalPlanner {
+RelationPlan createVectorIndexPlan(Analysis analysis, CreateVectorIndex statement)
-Object evaluatePropertyExpression(Expression expression, Analysis analysis)
}
class TableWriterNode {
}
class CreateVectorIndexReference {
+ConnectorId connectorId
+ConnectorTableMetadata tableMetadata
+Optional~NewTableLayout~ layout
+Optional~List~OutputColumnMetadata~~ columns
+SchemaTableName sourceTableName
+CreateVectorIndexReference(ConnectorId connectorId, ConnectorTableMetadata tableMetadata, Optional~NewTableLayout~ layout, Optional~List~OutputColumnMetadata~~ columns, SchemaTableName sourceTableName)
+ConnectorId getConnectorId()
+ConnectorTableMetadata getTableMetadata()
+Optional~NewTableLayout~ getLayout()
+SchemaTableName getSchemaTableName()
+Optional~List~OutputColumnMetadata~~ getOutputColumns()
+SchemaTableName getSourceTableName()
}
class WriterTarget {
<<abstract>>
}
TableWriterNode o-- WriterTarget
WriterTarget <|-- CreateVectorIndexReference
class TableWriteInfo {
-static Optional~ExecutionWriterTarget~ createWriterTarget(Optional~TableWriterNode.WriterTarget~ target, Optional~TableExecuteNode.TableExecuteTarget~ tableExecuteTarget, Optional~TableWriterNode.MergeTarget~ mergeTarget)
}
CreateVectorIndex ..> Identifier
CreateVectorIndex ..> Expression
CreateVectorIndex ..> Property
StatementAnalyzer ..> Analysis
StatementAnalyzer ..> CreateVectorIndex
LogicalPlanner ..> Analysis
LogicalPlanner ..> CreateVectorIndex
LogicalPlanner ..> CreateVectorIndexReference
LogicalPlanner ..> TableWriterNode
TableWriteInfo ..> CreateVectorIndexReference
Class diagram for CREATE VECTOR INDEX metadata and SPI plumbingclassDiagram
class Metadata {
<<interface>>
+Optional~ConnectorOutputMetadata~ finishCreateTable(Session session, OutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
+OutputTableHandle beginCreateVectorIndex(Session session, String catalogName, ConnectorTableMetadata indexMetadata, Optional~NewTableLayout~ layout, SchemaTableName sourceTableName)
+Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(Session session, OutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
}
class MetadataManager {
+OutputTableHandle beginCreateVectorIndex(Session session, String catalogName, ConnectorTableMetadata indexMetadata, Optional~NewTableLayout~ layout, SchemaTableName sourceTableName)
+Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(Session session, OutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
}
class DelegatingMetadataManager {
+OutputTableHandle beginCreateVectorIndex(Session session, String catalogName, ConnectorTableMetadata indexMetadata, Optional~NewTableLayout~ layout, SchemaTableName sourceTableName)
+Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(Session session, OutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
}
class StatsRecordingMetadataManager {
-MetadataManagerStats stats
+OutputTableHandle beginCreateVectorIndex(Session session, String catalogName, ConnectorTableMetadata indexMetadata, Optional~NewTableLayout~ layout, SchemaTableName sourceTableName)
+Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(Session session, OutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
}
Metadata <|.. MetadataManager
Metadata <|.. DelegatingMetadataManager
Metadata <|.. StatsRecordingMetadataManager
class MetadataManagerStats {
-AtomicLong beginCreateVectorIndexCalls
-AtomicLong finishCreateVectorIndexCalls
-TimeStat beginCreateVectorIndexTime
-TimeStat finishCreateVectorIndexTime
+TimeStat getBeginCreateVectorIndexTime()
+TimeStat getFinishCreateVectorIndexTime()
+void recordBeginCreateVectorIndexCall(long duration)
+void recordFinishCreateVectorIndexCall(long duration)
}
StatsRecordingMetadataManager ..> MetadataManagerStats
class ConnectorMetadata {
<<interface>>
+ConnectorOutputTableHandle beginCreateVectorIndex(ConnectorSession session, ConnectorTableMetadata indexMetadata, Optional~ConnectorNewTableLayout~ layout, SchemaTableName sourceTableName)
+Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
}
class ClassLoaderSafeConnectorMetadata {
-ConnectorMetadata delegate
+ConnectorOutputTableHandle beginCreateVectorIndex(ConnectorSession session, ConnectorTableMetadata indexMetadata, Optional~ConnectorNewTableLayout~ layout, SchemaTableName sourceTableName)
+Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
}
ConnectorMetadata <|.. ClassLoaderSafeConnectorMetadata
MetadataManager ..> ConnectorMetadata
DelegatingMetadataManager ..> Metadata
StatsRecordingMetadataManager ..> Metadata
class OutputTableHandle {
+ConnectorId connectorId
+ConnectorTransactionHandle transactionHandle
+ConnectorOutputTableHandle connectorHandle
}
MetadataManager ..> OutputTableHandle
class CreateVectorIndexAggregation {
<<aggregate function>>
+static void inputRealArray(SliceState state, Block embedding)
+static void inputDoubleArray(SliceState state, Block embedding)
+static void inputRealArrayWithLongId(SliceState state, long id, Block embedding)
+static void inputRealArrayWithDoubleId(SliceState state, double id, Block embedding)
+static void inputRealArrayWithSliceId(SliceState state, Slice id, Block embedding)
+static void inputDoubleArrayWithLongId(SliceState state, long id, Block embedding)
+static void inputDoubleArrayWithDoubleId(SliceState state, double id, Block embedding)
+static void inputDoubleArrayWithSliceId(SliceState state, Slice id, Block embedding)
+static void combine(SliceState state, SliceState otherState)
+static void output(SliceState state, BlockBuilder out)
}
class SliceState {
}
CreateVectorIndexAggregation ..> SliceState
CreateVectorIndexAggregation ..> Block
CreateVectorIndexAggregation ..> BlockBuilder
class BuiltInTypeAndFunctionNamespaceManager {
-List~SqlFunction~ getBuiltInFunctions(FunctionsConfig functionsConfig)
}
BuiltInTypeAndFunctionNamespaceManager ..> CreateVectorIndexAggregation
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
SqlBase.g4, thecreateVectorIndexrule currently limits the column list to at most two columns withidentifier (',' identifier)?; this should use(',' identifier)*to allow an arbitrary number of indexed columns as implied by the tests and overall design. - The overloads in
CreateVectorIndexAggregationmix@TypeParameter("T")with primitive@SqlTypearguments (long,double,Slice) annotated as@SqlType("T"), which is inconsistent with how typed aggregation functions are normally declared and is unlikely to be resolvable by the function manager; consider using concrete SQL types (e.g.,@SqlType(StandardTypes.BIGINT)/VARCHAR) or proper generic constraints instead of the currentTplaceholder.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `SqlBase.g4`, the `createVectorIndex` rule currently limits the column list to at most two columns with `identifier (',' identifier)?`; this should use `(',' identifier)*` to allow an arbitrary number of indexed columns as implied by the tests and overall design.
- The overloads in `CreateVectorIndexAggregation` mix `@TypeParameter("T")` with primitive `@SqlType` arguments (`long`, `double`, `Slice`) annotated as `@SqlType("T")`, which is inconsistent with how typed aggregation functions are normally declared and is unlikely to be resolvable by the function manager; consider using concrete SQL types (e.g., `@SqlType(StandardTypes.BIGINT)` / `VARCHAR`) or proper generic constraints instead of the current `T` placeholder.
## Individual Comments
### Comment 1
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java" line_range="1185" />
<code_context>
+ // Analyze UPDATING FOR predicate (validates column references, types, etc.)
+ node.getUpdatingFor().ifPresent(where -> analyzeWhere(node, tableScope, where));
+
+ validateProperties(node.getProperties(), scope);
+
+ Map<String, Expression> allProperties = mapFromProperties(node.getProperties());
</code_context>
<issue_to_address>
**issue (bug_risk):** Property validation may conflict with unregistered vector index properties
This uses `validateProperties`, but `LogicalPlanner.createVectorIndexPlan` notes that vector index properties are not registered with `TablePropertyManager` and are evaluated directly. If they remain unregistered, this validation will likely reject all such properties as unknown. Either register vector index properties with `TablePropertyManager` (per catalog), or bypass/adjust this validation for `CREATE VECTOR INDEX` so connector-specific properties are allowed.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // Analyze UPDATING FOR predicate (validates column references, types, etc.) | ||
| node.getUpdatingFor().ifPresent(where -> analyzeWhere(node, tableScope, where)); | ||
|
|
||
| validateProperties(node.getProperties(), scope); |
There was a problem hiding this comment.
issue (bug_risk): Property validation may conflict with unregistered vector index properties
This uses validateProperties, but LogicalPlanner.createVectorIndexPlan notes that vector index properties are not registered with TablePropertyManager and are evaluated directly. If they remain unregistered, this validation will likely reject all such properties as unknown. Either register vector index properties with TablePropertyManager (per catalog), or bypass/adjust this validation for CREATE VECTOR INDEX so connector-specific properties are allowed.
Summary: Add documentation for CREATE VECTOR INDEX Differential Revision: D96414538
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 78ae082...d485a5a.
|
Summary: Pull Request resolved: prestodb#27332 Pull Request resolved: prestodb#27331 Add documentation for CREATE VECTOR INDEX Reviewed By: zhichenxu-meta Differential Revision: D96414538
Summary: Pull Request resolved: prestodb#27332 Pull Request resolved: prestodb#27331 Add documentation for CREATE VECTOR INDEX ## Release Notes ``` == NO RELEASE NOTE == ``` Reviewed By: zhichenxu-meta Differential Revision: D96414538
Summary: Add documentation for CREATE VECTOR INDEX
Differential Revision: D96414538
Summary by Sourcery
Add planner, analyzer, metadata, and SPI support for a new CREATE VECTOR INDEX SQL statement, including its AST, grammar, and aggregation function, along with associated statistics tracking and parser tests.
New Features:
Enhancements: