[RFC] Support `Append` command in PPL

**Problem Statement**
Implement `append` command to simulate SQL standard union all operation. The `append` command syntax is quite similar to the `appendcols` command, which is already implemented during Calcite V3 engine development. Although their syntax are similar, their final combined result formats are quite different. The key difference is that the `append` command functions as a special union operator in SQL, while the `appendcols` operates as a special join operator in SQL.

**Syntax**

`append <subsearch>`

**Core Functionality**
OpenSearch doesn’t have similar functionality as append command. So we need to prioritize the new command development.

The append command appends the results of subsearch to the current results vertically. Say, the current search outputs 1000 rows with fields schema [a, b] and the subsearch outputs another 1000 rows with different fields schema [a, c, d], the final output will contain 2000 rows with the combined fields of two searches, aka [a, b, c, d].

The following table illustrates the concept. The green colored results are the main query. The yellow colored results are the subquery output. When combining results, the command fill any missing fields with NULL values.

<img width="412" height="164" alt="Image" src="https://github.com/user-attachments/assets/52c6737f-d6c5-4f02-9e22-1fb398ccf3ba" />

**Implementation**
OpenSearch PPL has already supported subsearch syntax starting with search command, follows a serious of other commands. To support the core functionality, we just need to implement union operation over two result rows for the main query and the subquery. Standard SQL union operator (like that in Calcite) only supports union two datasets with the same field schema. But we can achieve the functionality by a simple workaround approach. The problem can be translated to a similar SQL query like:

```
select a, b, NULL as c, NULL as d from main_query
union all
select a, NULL as b, c, d from sub_query
```

**Proposed Solution**

* Analyze current query in Calcite RelNode visitor. Get its rowType (output fields schema).
* Analyze subquery in Calcite RelNode visitor. Get its rowType.
* Merge two schemas and add projects over two queries separately. The missing fields in each project will be filled with constant NULL values
* Perform union all operation over two projected RelNodes.
* Support different indices query in subsearch

**Long-Term Goals**
* Support subsearch timeout to avoid time consuming query
* Allow users to set maximum output rows
* Allow users to inherit timerange columns from upstream.

**Pushdown Support**

Don’t support it. It’s an operation that is suitable for execution on coordinator node. We should let the pushdown work in main query and subquery internally.

**Limitation**

Standard SQL union operator requires the exactly the same field schema(including the field types). If both the main query and the subquery has a same name field but with different types, the standard union could error out. A workaround could be using different field alias for conflicting field.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Support `Append` command in PPL #4078

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Support Append command in PPL #4078

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[RFC] Support `Append` command in PPL #4078