Draft
Conversation
Add restructured PoC plan with two key investigation areas: 1) UDT/UDF impact on RelNode processing in Analytics engine 2) RelNode (de-)serialization via Calcite JSON serializer Phases reordered to tackle SQL/PPL RelNode generation first, then routing and REST handler, then Analytics engine integration.
Phase 1: extract ANSI SQL logic from UnifiedQueryPlanner rather than copying the whole file. Note mock schema is temporary until replaced by Analytics engine SchemaProvider in Phase 4. Phase 4: add reference link to OpenSearch sandbox plugins.
Add Calcite native SQL parser path (SqlParser -> SqlValidator -> SqlToRelConverter -> RelNode) to UnifiedQueryPlanner for SQL queries. PPL path remains unchanged using ANTLR parser -> AST -> CalciteRelNodeVisitor. Logical optimization deferred to Phase 4 when Analytics engine schema with pushdown rules is integrated.
Phase 2: PPL queries against a schema with all 5 standard Calcite types (TIMESTAMP, DATE, TIME, VARBINARY, VARCHAR) produce RelNode plans with standard SqlTypeName preserved — no UDT interference. Key Area 1 finding: when schema bypasses OpenSearchTypeFactory, PPL datetime functions (hour, day) resolve correctly via PPLTypeChecker bridge and return standard INTEGER type.
Phase 3: Wire end-to-end unified query pipeline: - AnalyticsExecutionEngine stubs transport action handoff to Analytics engine, returns empty results (TODO: serialize RelNode and submit) - RestUnifiedQueryAction routes through UnifiedQueryPlanner -> RelNode -> AnalyticsExecutionEngine, formats via JdbcResponseFormatter - RestSqlAction routes SQL queries with 'parquet_' to unified path - RestPPLQueryAction routes PPL queries with 'parquet_' to unified path - Add :api dependency to legacy/ and plugin/ modules
… builder Phase 4: Analytics engine integration: - New analytics-engine-stub/ submodule with SPI interfaces copied from analytics-framework (EngineContext, QueryPlanExecutor, SchemaProvider, EngineBridge, AnalyticsBackEndPlugin) - OpenSearchSchemaBuilder copied from analytics-engine: builds Calcite SchemaPlus from ClusterState index mappings with standard SqlTypeName - RestUnifiedQueryAction now uses real cluster state schema via OpenSearchSchemaBuilder instead of empty context - Wire ClusterService through RestSqlAction and RestPPLQueryAction
Verify end-to-end flow: REST routing -> UnifiedQueryPlanner -> RelNode -> AnalyticsExecutionEngine (stub) -> JdbcResponseFormatter. AnalyticsExecutionEngine now derives schema from RelNode row type instead of returning empty schema, enabling proper JDBC response with correct column names and types. Tests: SQL select with filter, SQL aggregate, non-parquet regression.
Add 7 integration tests covering SQL/PPL query and explain paths: - SQL select with filter, SQL aggregate - PPL where+project, PPL stats - PPL datetime function (Key Area 1: no UDT in plan) - PPL match() search function (MAP arguments in plan) - Non-parquet regression test Each test uses fluent withSQL/withPPL().verifySchema().verifyDataRows() .verifyExplain() pattern asserting full logical plan output. Refactor: move isUnifiedQueryPath() to RestUnifiedQueryAction, deduplicate routing logic from RestSqlAction and RestPPLQueryAction.
Key findings from Phase 5 investigation: - Transport action is NOT needed: both plugins run in same JVM - Analytics engine uses ExtensiblePlugin for SPI discovery, not transport actions. Exposes QueryPlanExecutor via Guice bindings. - SQL plugin has its own private Guice injector, isolated from the node-level injector where Analytics engine bindings live. - RelNode can be passed directly in-process, no serialization needed. - Calcite JSON serializer works as fallback for future cross-node needs. Replace Phase 5 (serialization) and Phase 6 (transport action) with Phase 5 (direct Guice integration with QueryPlanExecutor).
…findings Key finding: each OpenSearch plugin gets its own URLClassLoader. Even though both SQL and Analytics plugins bundle calcite-core:1.41.0, RelNode is a different Class in each classloader -> ClassCastException. Solution: SQL plugin declares extendedPlugins = ['analytics-engine'], inheriting Analytics engine's classloader as parent. Both share the same Calcite classes, enabling direct RelNode passing without serialization. Calcite JSON serialization validated as fallback option.
Phase 5: Direct plugin extension integration: - Remove analytics-engine-stub module, use real analytics-framework and analytics-engine jars from OpenSearch sandbox build in libs/ - SQL plugin declares extendedPlugins = ['analytics-engine'], sharing Calcite classloader — no serialization needed for RelNode handoff - AnalyticsExecutionEngine uses real QueryPlanExecutor interface - Exclude overlapping jars from SQL plugin zip to avoid jar hell - Analytics-engine plugin loaded in integ-test cluster - All 7 ITs pass with real analytics-engine plugin loaded
Document PoC findings including: - Key Area 1: UDT interference (none with standard Calcite types) - Key Area 2: RelNode handoff (direct via extendedPlugins, no serialization) - Key Area 3: Optimization boundary (SQL plugin owns logical rewrites, Analytics engine owns physical pushdown) - Component responsibilities ordered by workflow - API contract: routing, schema, executor, result format
…atting Wire Iterable<Object[]> from QueryPlanExecutor through ExprTupleValue to the existing JdbcResponseFormatter chain: - Each Object[] row mapped to ExprTupleValue keyed by column names from RelNode.getRowType() - ExprValueUtils.fromObjectValue() handles all standard Java types - QueryResponse(Schema, List<ExprValue>, Cursor) flows through existing formatter unchanged
Phase 6 cross-cutting concerns: - Schedule execution on sql-worker thread pool to avoid blocking transport threads (same pattern as OpenSearchExecutionEngine) - Log planning and execution time with [unified] prefix - Increment REQ_TOTAL/REQ_COUNT_TOTAL on success, FAILED_REQ_COUNT_SYS on failure - Update PoC result doc with observability and thread management responsibilities
Demonstrate RelNode object tree structure via RelJsonWriter: - demoTableScanIsStandardCalcite: LogicalTableScan with no custom node - demoDatetimeFunctionProducesStandardTypes: HOUR as UDF producing standard INTEGER type, no UDT - demoMatchFunctionIsPplUdf: match() as PPL UDF with MAP operands, class=UserDefinedFunctionBuilder - Analytics engine must handle/reject Add UnifiedQueryPlanner.optimize() API: - Runs VolcanoPlanner with adapter-specific pushdown rules - Test simulates Analytics engine with custom EngineTableScan and filter absorption rule: LogicalFilter absorbed into EngineTableScan after optimize()
This was referenced Mar 18, 2026
Open
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PoC Result: Unified Query Pipeline (Option B)
Summary
This PoC validates the end-to-end unified query pipeline as shown in the diagram above:
RestSqlActionroutes queries targeting non-Lucene indices to the newRestUnifiedQueryAction; Lucene queries fall through to the existing V2/V3 path unchanged.RestUnifiedQueryActioncallsOpenSearchSchemaBuilder.buildSchema(clusterState)from the Analytics engine to build a Calcite schema with standard SQL types.UnifiedQueryPlanner.plan()parses the query (Calcite SQL or PPL V3 parser) and generates a logicalRelNodeagainst the schema.AnalyticsExecutionEnginepasses theRelNodedirectly to the Analytics engine'sQueryPlanExecutor.execute()with no serialization needed.QueryPlanExecutorreturnsIterable<Object[]>results, which are converted toExprValueand formatted as default JSON via the existingJdbcResponseFormatter.Core classes: RestUnifiedQueryAction, AnalyticsExecutionEngine
Key Areas Verified
UDT/UDF Impact
RelNodewhen the schema uses standard Calcite types.OpenSearchSchemaBuildermaps fields to standard Calcite types and unified query planner bypassOpenSearchTypeFactory. Datetime functions likehour()resolve correctly with standard datetime type.RelNode Handoff
RelNodeserialization is needed to pass plans between the SQL plugin and Analytics engine.RelNodepassing toQueryPlanExecutor.execute().Optimization Boundary
TableScan.register()rules. Unified query library can provide a newUnifiedQueryPlanner.optimize()API if needed.Component Responsibility Proposal
SQL/PPL Plugin
SqlParser→SqlValidator→SqlToRelConverter, or PPL via ANTLR →CalciteRelNodeVisitor, producing a logicalRelNodeRelNodeto the Analytics engine'sQueryPlanExecutor.execute()Iterable<Object[]>results from the Analytics engine into default response using existingJdbcResponseFormatterREQ_TOTAL,REQ_COUNT_TOTAL,FAILED_REQ_COUNT_SYS)sql-workerthread pool to avoid blocking transport threadsAnalytics Engine Plugin
OpenSearchSchemaBuilderto build CalciteSchemaPlusfrom cluster state index mappings with standard Calcite types (no UDTs)LogicalTableScanwith engine-specific physical scan node, register pushdown rules (filter, project, aggregate, sort, limit) viaTableScan.register()Iterable<Object[]>result rowsAPI Contracts
SQL/PPL Plugin provides:
Analytics Engine Plugin provides: