Skip to content

feat: Add documentation for CREATE VECTOR INDEX#27331

Closed
skyelves wants to merge 6 commits into
prestodb:masterfrom
skyelves:export-D96414538
Closed

feat: Add documentation for CREATE VECTOR INDEX#27331
skyelves wants to merge 6 commits into
prestodb:masterfrom
skyelves:export-D96414538

Conversation

@skyelves

@skyelves skyelves commented Mar 13, 2026

Copy link
Copy Markdown
Member

Summary: Add documentation for CREATE VECTOR INDEX

Differential Revision: D96414538

Summary by Sourcery

Add planner, analyzer, metadata, and SPI support for a new CREATE VECTOR INDEX SQL statement, including its AST, grammar, and aggregation function, along with associated statistics tracking and parser tests.

New Features:

  • Introduce a CREATE VECTOR INDEX SQL command with full parser, AST, and formatter support, including properties and UPDATING FOR predicates.
  • Add logical planning and analysis for CREATE VECTOR INDEX, wiring it into the query type classification and table write pipeline with a dedicated writer target.
  • Provide metadata and connector SPI hooks, including a dummy create_vector_index aggregation function, to allow connectors to implement vector index creation.

Enhancements:

  • Track begin and finish statistics and timing for vector index creation in the metadata manager stats and expose them via the stats recording metadata manager.
  • Extend error handling tests and non-reserved keyword lists to accommodate the new VECTOR INDEX syntax.

Summary:

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
Summary:

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

**StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.**
**This results in a structured CreateVectorIndexAnalysis object.**

3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91524358
…estodb#27261)

Summary:

Add dedicated WriterTarget subclass and ConnectorMetadata SPI for
CREATE VECTOR INDEX, enabling each connector to implement vector index
creation independently.

- CreateVectorIndexReference: plan-time target carrying index metadata
  and source table reference
- beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to
  NOT_SUPPORTED so connectors must opt in
- ClassLoaderSafeConnectorMetadata: delegation wrappers

Differential Revision: D95325176
Summary:

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D95341384
Summary:

Support vector search in  LogicalPlanner

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
**• LogicalPlanner.createVectorIndexPlan() builds the core execution query:**
**CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...**
**• The resulting plan tree includes:**

**TableFinishNode(target = CreateVectorIndexReference)**
**└── TableWriterNode(target = CreateVectorIndexReference)**
**└── query plan**
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D93690255
Summary: Add documentation for CREATE VECTOR INDEX

Differential Revision: D96414538
@sourcery-ai

sourcery-ai Bot commented Mar 13, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Implements full planner, analyzer, metadata, SPI, and parser support for a new CREATE VECTOR INDEX SQL statement, including a dummy create_vector_index aggregation function, a new TableWriter writer target, metadata plumbing, stats hooks, and initial documentation stub.

Sequence diagram for CREATE VECTOR INDEX execution path

sequenceDiagram
    actor User
    participant Parser as SqlParser
    participant Analyzer as StatementAnalyzer
    participant Analysis as Analysis
    participant Planner as LogicalPlanner
    participant TWN as TableWriterNode
    participant TWI as TableWriteInfo
    participant Metadata as MetadataManager
    participant SPI as ConnectorMetadata

    User->>Parser: parse CREATE VECTOR INDEX ...
    Parser-->>User: AST CreateVectorIndex

    User->>Analyzer: analyze CreateVectorIndex
    Analyzer->>Metadata: metadataResolver.tableExists(sourceTable)
    Analyzer-->>User: SemanticException if missing
    Analyzer->>Metadata: metadataResolver.tableExists(targetIndexTable)
    Analyzer-->>User: SemanticException if exists
    Analyzer->>Analyzer: analyze(Table sourceTable)
    Analyzer->>Metadata: getTableHandle(sourceTable)
    Analyzer->>Metadata: getColumnHandles(sourceTable)
    Analyzer->>Analyzer: analyzeWhere(updatingFor)
    Analyzer->>Analysis: setCreateVectorIndexAnalysis(...)
    Analysis-->>Analyzer: CreateVectorIndexAnalysis stored

    User->>Planner: plan CreateVectorIndex
    Planner->>Analysis: getCreateVectorIndexAnalysis()
    Planner->>Metadata: getHandleVersion(sourceTableName)
    Planner->>Metadata: getTableMetadata(sourceTableHandle)
    Planner->>Metadata: getColumnHandles(sourceTableHandle)
    Planner->>Planner: build TableScanNode
    Planner->>Planner: add FilterNode for updatingFor (optional)
    Planner->>Planner: lookupFunction create_vector_index
    Planner->>Planner: build AggregationNode
    Planner->>Planner: build TableWriterNode.CreateVectorIndexReference
    Planner->>Planner: wrap in TableFinishNode
    Planner-->>User: RelationPlan

    User->>TWI: createWriterTarget(CreateVectorIndexReference)
    TWI-->>User: PrestoException NOT_SUPPORTED if no optimizer

    User->>Metadata: beginCreateVectorIndex(session, catalogName, indexMetadata, layout, sourceTableName)
    Metadata->>SPI: beginCreateVectorIndex(connectorSession, indexMetadata, layout, sourceTableName)
    SPI-->>Metadata: ConnectorOutputTableHandle
    Metadata-->>User: OutputTableHandle

    User->>Metadata: finishCreateVectorIndex(session, OutputTableHandle, fragments, stats)
    Metadata->>SPI: finishCreateVectorIndex(connectorSession, handle, fragments, stats)
    SPI-->>Metadata: Optional ConnectorOutputMetadata
    Metadata-->>User: Optional ConnectorOutputMetadata
Loading

Class diagram for CREATE VECTOR INDEX AST, analysis, and planning

classDiagram
    class Statement {
    }

    class CreateVectorIndex {
        +QualifiedName indexName
        +QualifiedName tableName
        +List~Identifier~ columns
        +Optional~Expression~ updatingFor
        +List~Property~ properties
        +CreateVectorIndex(QualifiedName indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
        +CreateVectorIndex(NodeLocation location, QualifiedName indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
        +QualifiedName getIndexName()
        +QualifiedName getTableName()
        +List~Identifier~ getColumns()
        +Optional~Expression~ getUpdatingFor()
        +List~Property~ getProperties()
    }

    Statement <|-- CreateVectorIndex

    class Analysis {
        -Optional~CreateVectorIndexAnalysis~ createVectorIndexAnalysis
        +void setCreateVectorIndexAnalysis(CreateVectorIndexAnalysis analysis)
        +Optional~CreateVectorIndexAnalysis~ getCreateVectorIndexAnalysis()
    }

    class CreateVectorIndexAnalysis {
        +QualifiedObjectName sourceTableName
        +QualifiedObjectName targetTableName
        +List~Identifier~ columns
        +Map~String,Expression~ properties
        +Optional~Expression~ updatingFor
        +CreateVectorIndexAnalysis(QualifiedObjectName sourceTableName, QualifiedObjectName targetTableName, List~Identifier~ columns, Map~String,Expression~ properties, Optional~Expression~ updatingFor)
        +QualifiedObjectName getSourceTableName()
        +QualifiedObjectName getTargetTableName()
        +List~Identifier~ getColumns()
        +Map~String,Expression~ getProperties()
        +Optional~Expression~ getUpdatingFor()
    }

    Analysis *-- CreateVectorIndexAnalysis

    class StatementAnalyzer {
        +Scope visitCreateVectorIndex(CreateVectorIndex node, Optional~Scope~ scope)
    }

    class LogicalPlanner {
        +RelationPlan createVectorIndexPlan(Analysis analysis, CreateVectorIndex statement)
        -Object evaluatePropertyExpression(Expression expression, Analysis analysis)
    }

    class TableWriterNode {
    }

    class CreateVectorIndexReference {
        +ConnectorId connectorId
        +ConnectorTableMetadata tableMetadata
        +Optional~NewTableLayout~ layout
        +Optional~List~OutputColumnMetadata~~ columns
        +SchemaTableName sourceTableName
        +CreateVectorIndexReference(ConnectorId connectorId, ConnectorTableMetadata tableMetadata, Optional~NewTableLayout~ layout, Optional~List~OutputColumnMetadata~~ columns, SchemaTableName sourceTableName)
        +ConnectorId getConnectorId()
        +ConnectorTableMetadata getTableMetadata()
        +Optional~NewTableLayout~ getLayout()
        +SchemaTableName getSchemaTableName()
        +Optional~List~OutputColumnMetadata~~ getOutputColumns()
        +SchemaTableName getSourceTableName()
    }

    class WriterTarget {
        <<abstract>>
    }

    TableWriterNode o-- WriterTarget
    WriterTarget <|-- CreateVectorIndexReference

    class TableWriteInfo {
        -static Optional~ExecutionWriterTarget~ createWriterTarget(Optional~TableWriterNode.WriterTarget~ target, Optional~TableExecuteNode.TableExecuteTarget~ tableExecuteTarget, Optional~TableWriterNode.MergeTarget~ mergeTarget)
    }

    CreateVectorIndex ..> Identifier
    CreateVectorIndex ..> Expression
    CreateVectorIndex ..> Property
    StatementAnalyzer ..> Analysis
    StatementAnalyzer ..> CreateVectorIndex
    LogicalPlanner ..> Analysis
    LogicalPlanner ..> CreateVectorIndex
    LogicalPlanner ..> CreateVectorIndexReference
    LogicalPlanner ..> TableWriterNode
    TableWriteInfo ..> CreateVectorIndexReference
Loading

Class diagram for CREATE VECTOR INDEX metadata and SPI plumbing

classDiagram
    class Metadata {
        <<interface>>
        +Optional~ConnectorOutputMetadata~ finishCreateTable(Session session, OutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
        +OutputTableHandle beginCreateVectorIndex(Session session, String catalogName, ConnectorTableMetadata indexMetadata, Optional~NewTableLayout~ layout, SchemaTableName sourceTableName)
        +Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(Session session, OutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
    }

    class MetadataManager {
        +OutputTableHandle beginCreateVectorIndex(Session session, String catalogName, ConnectorTableMetadata indexMetadata, Optional~NewTableLayout~ layout, SchemaTableName sourceTableName)
        +Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(Session session, OutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
    }

    class DelegatingMetadataManager {
        +OutputTableHandle beginCreateVectorIndex(Session session, String catalogName, ConnectorTableMetadata indexMetadata, Optional~NewTableLayout~ layout, SchemaTableName sourceTableName)
        +Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(Session session, OutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
    }

    class StatsRecordingMetadataManager {
        -MetadataManagerStats stats
        +OutputTableHandle beginCreateVectorIndex(Session session, String catalogName, ConnectorTableMetadata indexMetadata, Optional~NewTableLayout~ layout, SchemaTableName sourceTableName)
        +Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(Session session, OutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
    }

    Metadata <|.. MetadataManager
    Metadata <|.. DelegatingMetadataManager
    Metadata <|.. StatsRecordingMetadataManager

    class MetadataManagerStats {
        -AtomicLong beginCreateVectorIndexCalls
        -AtomicLong finishCreateVectorIndexCalls
        -TimeStat beginCreateVectorIndexTime
        -TimeStat finishCreateVectorIndexTime
        +TimeStat getBeginCreateVectorIndexTime()
        +TimeStat getFinishCreateVectorIndexTime()
        +void recordBeginCreateVectorIndexCall(long duration)
        +void recordFinishCreateVectorIndexCall(long duration)
    }

    StatsRecordingMetadataManager ..> MetadataManagerStats

    class ConnectorMetadata {
        <<interface>>
        +ConnectorOutputTableHandle beginCreateVectorIndex(ConnectorSession session, ConnectorTableMetadata indexMetadata, Optional~ConnectorNewTableLayout~ layout, SchemaTableName sourceTableName)
        +Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
    }

    class ClassLoaderSafeConnectorMetadata {
        -ConnectorMetadata delegate
        +ConnectorOutputTableHandle beginCreateVectorIndex(ConnectorSession session, ConnectorTableMetadata indexMetadata, Optional~ConnectorNewTableLayout~ layout, SchemaTableName sourceTableName)
        +Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
    }

    ConnectorMetadata <|.. ClassLoaderSafeConnectorMetadata

    MetadataManager ..> ConnectorMetadata
    DelegatingMetadataManager ..> Metadata
    StatsRecordingMetadataManager ..> Metadata

    class OutputTableHandle {
        +ConnectorId connectorId
        +ConnectorTransactionHandle transactionHandle
        +ConnectorOutputTableHandle connectorHandle
    }

    MetadataManager ..> OutputTableHandle

    class CreateVectorIndexAggregation {
        <<aggregate function>>
        +static void inputRealArray(SliceState state, Block embedding)
        +static void inputDoubleArray(SliceState state, Block embedding)
        +static void inputRealArrayWithLongId(SliceState state, long id, Block embedding)
        +static void inputRealArrayWithDoubleId(SliceState state, double id, Block embedding)
        +static void inputRealArrayWithSliceId(SliceState state, Slice id, Block embedding)
        +static void inputDoubleArrayWithLongId(SliceState state, long id, Block embedding)
        +static void inputDoubleArrayWithDoubleId(SliceState state, double id, Block embedding)
        +static void inputDoubleArrayWithSliceId(SliceState state, Slice id, Block embedding)
        +static void combine(SliceState state, SliceState otherState)
        +static void output(SliceState state, BlockBuilder out)
    }

    class SliceState {
    }

    CreateVectorIndexAggregation ..> SliceState
    CreateVectorIndexAggregation ..> Block
    CreateVectorIndexAggregation ..> BlockBuilder

    class BuiltInTypeAndFunctionNamespaceManager {
        -List~SqlFunction~ getBuiltInFunctions(FunctionsConfig functionsConfig)
    }

    BuiltInTypeAndFunctionNamespaceManager ..> CreateVectorIndexAggregation
Loading

File-Level Changes

Change Details Files
Add CREATE VECTOR INDEX statement parsing, AST, formatting, and validation, including support for properties and an optional UPDATING FOR predicate.
  • Extend SQL grammar with CREATE VECTOR INDEX syntax, new VECTOR/INDEX/UPDATING tokens, and adjust error expectations in parser tests.
  • Introduce CreateVectorIndex AST node with columns, properties, and optional updatingFor expression, and wire it into AstVisitor and DefaultTraversalVisitor.
  • Implement SqlFormatter support to pretty-print CREATE VECTOR INDEX statements with WITH properties and UPDATING FOR clauses.
  • Add parser unit tests for valid and invalid CREATE VECTOR INDEX statements and update error handling tests.
presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
presto-parser/src/main/java/com/facebook/presto/sql/tree/CreateVectorIndex.java
presto-parser/src/main/java/com/facebook/presto/sql/tree/AstVisitor.java
presto-parser/src/main/java/com/facebook/presto/sql/tree/DefaultTraversalVisitor.java
presto-parser/src/main/java/com/facebook/presto/sql/parser/AstBuilder.java
presto-parser/src/main/java/com/facebook/presto/sql/SqlFormatter.java
presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParser.java
presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParserErrorHandling.java
Analyze CREATE VECTOR INDEX statements, capture analysis state, enforce access control, and classify query type.
  • Add visitCreateVectorIndex in StatementAnalyzer to resolve source/target tables, validate index columns, analyze UPDATING FOR predicate, and validate properties.
  • Record required table and column read permissions for the source table and a TABLE_CREATE check for the target index table.
  • Add Analysis.CreateVectorIndexAnalysis container and plumbing on Analysis to store source/target names, columns, properties, and optional updatingFor.
  • Classify CreateVectorIndex as an INSERT-type query in StatementUtils.
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/Analysis.java
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/utils/StatementUtils.java
Plan execution for CREATE VECTOR INDEX by scanning source data, applying optional filters, invoking a dummy aggregation, and writing through a new writer target.
  • Extend LogicalPlanner to dispatch CreateVectorIndex statements to a new createVectorIndexPlan method.
  • In createVectorIndexPlan, resolve the source table, build a TableScan over index and filter columns, optionally add a FilterNode for UPDATING FOR, and construct an AggregationNode over a create_vector_index aggregation call.
  • Create index table metadata with a single VARCHAR result column and evaluated connector-specific properties, including serializing the UPDATING FOR expression to a property.
  • Introduce TableWriterNode.CreateVectorIndexReference as a new WriterTarget carrying connector id, target table metadata, optional layout/columns, and source table name, and wrap the plan in a TableWriterNode/TableFinishNode that returns a VARCHAR result instead of a row count.
  • Make TableWriteInfo reject unoptimized CreateVectorIndexReference targets with a NOT_SUPPORTED error, directing connectors to provide a ConnectorPlanOptimizer.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/LogicalPlanner.java
presto-spi/src/main/java/com/facebook/presto/spi/plan/TableWriterNode.java
presto-main-base/src/main/java/com/facebook/presto/execution/scheduler/TableWriteInfo.java
Plumb CREATE VECTOR INDEX operations through metadata and connector SPI, including stats recording and classloader-safe wrappers.
  • Extend Metadata and ConnectorMetadata with beginCreateVectorIndex/finishCreateVectorIndex default methods that throw NOT_SUPPORTED by default.
  • Implement beginCreateVectorIndex/finishCreateVectorIndex in MetadataManager, DelegatingMetadataManager, and StatsRecordingMetadataManager, including timing stats.
  • Add counters and TimeStat fields for begin/finish vector index calls in MetadataManagerStats with corresponding getters and record* methods.
  • Add classloader-safe wrappers for the new ConnectorMetadata methods in ClassLoaderSafeConnectorMetadata.
presto-main-base/src/main/java/com/facebook/presto/metadata/Metadata.java
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java
presto-main-base/src/main/java/com/facebook/presto/metadata/MetadataManager.java
presto-main-base/src/main/java/com/facebook/presto/metadata/DelegatingMetadataManager.java
presto-main-base/src/main/java/com/facebook/presto/metadata/StatsRecordingMetadataManager.java
presto-main-base/src/main/java/com/facebook/presto/metadata/MetadataManagerStats.java
presto-spi/src/main/java/com/facebook/presto/spi/connector/classloader/ClassLoaderSafeConnectorMetadata.java
Register and implement a dummy create_vector_index aggregation used only for planning, and hook it into built-in functions.
  • Add CreateVectorIndexAggregation, a no-op aggregation over embedding (and optional id) with multiple overloads for array(real)/array(double) and various id types, returning VARCHAR and never executing meaningful logic.
  • Register CreateVectorIndexAggregation in BuiltInTypeAndFunctionNamespaceManager getBuiltInFunctions so the function can be resolved during planning.
presto-main-base/src/main/java/com/facebook/presto/operator/aggregation/CreateVectorIndexAggregation.java
presto-main-base/src/main/java/com/facebook/presto/metadata/BuiltInTypeAndFunctionNamespaceManager.java
Add documentation scaffolding for CREATE VECTOR INDEX.
  • Create a new Sphinx documentation stub file for the CREATE VECTOR INDEX statement.
  • Reference the new SQL page from the central sql.rst index (diff not fully shown but implied by the new file path).
presto-docs/src/main/sphinx/sql/create-vector-index.rst
presto-docs/src/main/sphinx/sql.rst

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In SqlBase.g4, the createVectorIndex rule currently limits the column list to at most two columns with identifier (',' identifier)?; this should use (',' identifier)* to allow an arbitrary number of indexed columns as implied by the tests and overall design.
  • The overloads in CreateVectorIndexAggregation mix @TypeParameter("T") with primitive @SqlType arguments (long, double, Slice) annotated as @SqlType("T"), which is inconsistent with how typed aggregation functions are normally declared and is unlikely to be resolvable by the function manager; consider using concrete SQL types (e.g., @SqlType(StandardTypes.BIGINT) / VARCHAR) or proper generic constraints instead of the current T placeholder.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `SqlBase.g4`, the `createVectorIndex` rule currently limits the column list to at most two columns with `identifier (',' identifier)?`; this should use `(',' identifier)*` to allow an arbitrary number of indexed columns as implied by the tests and overall design.
- The overloads in `CreateVectorIndexAggregation` mix `@TypeParameter("T")` with primitive `@SqlType` arguments (`long`, `double`, `Slice`) annotated as `@SqlType("T")`, which is inconsistent with how typed aggregation functions are normally declared and is unlikely to be resolvable by the function manager; consider using concrete SQL types (e.g., `@SqlType(StandardTypes.BIGINT)` / `VARCHAR`) or proper generic constraints instead of the current `T` placeholder.

## Individual Comments

### Comment 1
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java" line_range="1185" />
<code_context>
+            // Analyze UPDATING FOR predicate (validates column references, types, etc.)
+            node.getUpdatingFor().ifPresent(where -> analyzeWhere(node, tableScope, where));
+
+            validateProperties(node.getProperties(), scope);
+
+            Map<String, Expression> allProperties = mapFromProperties(node.getProperties());
</code_context>
<issue_to_address>
**issue (bug_risk):** Property validation may conflict with unregistered vector index properties

This uses `validateProperties`, but `LogicalPlanner.createVectorIndexPlan` notes that vector index properties are not registered with `TablePropertyManager` and are evaluated directly. If they remain unregistered, this validation will likely reject all such properties as unknown. Either register vector index properties with `TablePropertyManager` (per catalog), or bypass/adjust this validation for `CREATE VECTOR INDEX` so connector-specific properties are allowed.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

// Analyze UPDATING FOR predicate (validates column references, types, etc.)
node.getUpdatingFor().ifPresent(where -> analyzeWhere(node, tableScope, where));

validateProperties(node.getProperties(), scope);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Property validation may conflict with unregistered vector index properties

This uses validateProperties, but LogicalPlanner.createVectorIndexPlan notes that vector index properties are not registered with TablePropertyManager and are evaluated directly. If they remain unregistered, this validation will likely reject all such properties as unknown. Either register vector index properties with TablePropertyManager (per catalog), or bypass/adjust this validation for CREATE VECTOR INDEX so connector-specific properties are allowed.

@skyelves skyelves closed this Mar 13, 2026
skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:

Add documentation for CREATE VECTOR INDEX

Differential Revision: D96414538
@github-actions

Copy link
Copy Markdown

Codenotify: Notifying subscribers in CODENOTIFY files for diff 78ae082...d485a5a.

Notify File(s)
@aditi-pandit presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@elharo presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@kaikalur presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@rschlussel presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4

skyelves added a commit to skyelves/presto that referenced this pull request Mar 18, 2026
Summary:
Pull Request resolved: prestodb#27332

Pull Request resolved: prestodb#27331

Add documentation for CREATE VECTOR INDEX

Reviewed By: zhichenxu-meta

Differential Revision: D96414538
skyelves added a commit to skyelves/presto that referenced this pull request Mar 20, 2026
Summary:
Pull Request resolved: prestodb#27332

Pull Request resolved: prestodb#27331

Add documentation for CREATE VECTOR INDEX

## Release Notes
```
== NO RELEASE NOTE ==
```

Reviewed By: zhichenxu-meta

Differential Revision: D96414538
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant