Skip to content

[RFC] Add table Command to PPL (Calcite Engine) #3877

@aalva500-prog

Description

@aalva500-prog

1. Overview

Description

Implement a table command in OpenSearch PPL that provides field selection operations. This command will return search results with only the specified fields.

Use Cases

  • Selecting specific fields from search results
  • Filtering result sets to include only fields of interest
  • Controlling field order in results
  • Using wildcard patterns for field selection

2. Final Syntax

table <wc-field-list>

Parameters

  • <wc-field-list>: A space or comma-delimited list of field names. Supports wildcard patterns using asterisk (*).

Options

  • Wildcard patterns: Supports various wildcard patterns using asterisk (*)
  • Prefix wildcard: EMP* (matches fields starting with "EMP")
  • Suffix wildcard: *NO (matches fields ending with "NO")
  • Contains wildcard: *E* (matches fields containing "E")
  • Mixed explicit and wildcard: ENAME, EMP*, JOB

3. Usage Examples

Basic Usage

source=EMP | table EMPNO, ENAME, JOB

Returns results containing only the EMPNO, ENAME, and JOB fields.

Advanced Usage

Using Renamed Fields

source=EMP | rename EMPNO as emp_id, ENAME as emp_name | table emp_id, emp_name, emp*

Returns results containing the emp_id, emp_name fields and any fields matching the emp* pattern.

Sorting and Filtering

source=EMP | where SAL > 1000 | sort - SAL | table ENAME, SAL, DEPTNO | head 3

Returns the top 3 results containing only ENAME, SAL, and DEPTNO fields.

Using Wildcards

source=EMP | table ENAME, JOB, DEPT*

Returns results containing ENAME, JOB, and any fields matching the DEPT* pattern.

Table with Evaluation

source=EMP | dedup DEPTNO | eval dept_type=case(DEPTNO=10, 'accounting' else 'other') | table EMPNO, dept_type

Returns results containing only EMPNO and dept_type fields.

Table with Multiple Wildcards

source=EMP | table *NAME, *NO, JOB

Returns results containing fields matching *NAME, *NO patterns, and the JOB field.

Best Practices

  • Place the table command at the end of search pipelines for optimal performance
  • Use wildcards to select groups of related fields efficiently
  • Perform field renaming before using the table command

Optimization Notes

  • The table command is a non-streaming, transforming command that processes the entire result set at once
  • For large result sets, consider limiting the number of fields to improve performance
  • When possible, use the table command as the final operation in a query pipeline

4. Implementation Details

Technical Approach

The table command will be implemented as a non-streaming, transforming command that:

  1. Accepts a list of field names with wildcard support
  2. Resolves wildcards against available fields in the result set
  3. Preserves the specified order of fields in the output
  4. Filters results to include only the specified fields
  5. Integrates with the existing PPL command pipeline architecture

Dependencies

  • Existing PPL command infrastructure
  • Field resolution and wildcard pattern matching capabilities

Testing Strategy

  • Unit tests for wildcard pattern matching and field resolution
  • Integration tests with various field combinations and data types
  • Performance tests with large result sets
  • Compatibility tests with existing PPL commands

Metadata

Metadata

Assignees

Labels

PPLPiped processing languageenhancementNew feature or requestv3.3.0

Type

No type

Projects

Status

New

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions