Problem Statement
In many analytical and observability queries, users want to inspect data in reverse order, for example:
- Viewing the most recent log lines last instead of first.
- Extracting the last N rows in the dataset.
- Reversing the effect of a sort without re-specifying fields.
Currently, PPL does not support a way to reverse the order of results in a query pipeline. This limits query flexibility and forces users to use workarounds or rely on external sorting.
Current State
- PPL supports
sort, which orders records by specified fields, but does not offer a simple way to reverse record order without explicitly specifying sort criteria again
- There is no current native support for a reverse operation in either Calcite or Spark engines
Long-Term Goals
- Provide a simple, intuitive mechanism (reverse) for users to invert row order in a PPL pipeline.
- Establish clear guidelines and patterns for lightweight, non-parametric commands in PPL.
- Avoid introducing performance bottlenecks by enforcing pushdown optimization or limiting use cases on large datasets.
Proposal
Syntax
<source> | [commands] | reverse | [commands]
reverse takes no arguments.
- It can appear anywhere in the pipeline including the end, and will reverse the order of rows produced up to that point.
- Unlike
sort, which allows field-based sorting, reverse simply inverts the row order as is.
Semantics
Expected Behavior:
- Action: Reverses the display order of all rows in the current result set
- Scope: Operates on the entire result set at the point where
reverse appears in the pipeline
- Data Preservation: Does not modify field values or schema
Alternative
sort - requires field: helpful but doesn't support reversing arbitrary pipeline outputs, and would require re-specifying field names.
reverse works even when no sort is specified, or when you're working with filtered, projected, or unsorted data.
Implementation Discussion
Default Implementation: To implement reverse, we insert a logical sort with a synthetic column based on ROW_NUMBER():
LogicalSort(reverse_row_num DESC)
Project(..., reverse_row_num = ROW_NUMBER() OVER ())
[upstream operator]
Pushdown Implementation: (TBD, will be in 2nd followup PR)
The optimization is triggered when the logical plan has this exact shape:
LogicalSort (reverse sort)
CalciteLogicalIndexScan
Conditions for pushdown [not implemented yet]:
- Reverse should be able to be pushed down when preceded by sort by flipping the direction of the sort from asc to desc or vice versa.
- However, pushdown is disabled if a filter or limit appears before reverse, or if the reverse requires a local fallback implementation using ROW_NUMBER().
Example Pushdown Optimization:
curl -X POST "localhost:9200/_plugins/_ppl/_explain" \
-H "Content-Type: application/json" \
-d '{
"query": "source = employees | reverse | head 5"
}'
If reverse appears directly after the source, it matches this optimized pattern:
Sort DESC (fetch=5)
IndexScan
This enables pushdown of the reverse and head commands into the scan phase.
LogicalProject(skills=[$0], performance_score=[$1], level=[$2], employee_id=[$3], name=[$4], hire_date=[$5], department=[$6], remote=[$7], salary=[$8], age=[$9], email=[$10])
LogicalSort(sort0=[$11], dir0=[DESC], fetch=[5])
CalciteLogicalIndexScan(table=[[OpenSearch, employees]])
Engine Support
- Currently implemented only in Calcite engine, not supported in Spark engine
- Once [FEATURE] Export PPL-Calcite Engine as Reusable Library (#3734) is completed,
reverse can be reused in the Spark engine.
Performance
- When reverse follows a sort on large datasets (ex: 20K) the sort is applied to all 20K values, then the reverse is applied to all 20K values properly, then on that final result OpenSearch returns a limited 10K values due to the automatically imposed limit, as if head 10000 was applied after the reverse.
Open Question: Semantics When sort Precedes reverse
Query:
curl -X POST "localhost:9200/_plugins/_ppl/_explain" \
-H "Content-Type: application/json" \
-d '{
"query": "source = employees | sort salary | reverse"
}'
Logical Plan:
LogicalProject(...)
LogicalSort(sort0=[$17], dir0=[DESC])
LogicalProject(..., reverse_row_num=[ROW_NUMBER() OVER ()])
LogicalSort(sort0=[$8], dir0=[ASC-nulls-first])
CalciteLogicalIndexScan(table=[[OpenSearch, employees]])
Answer: The first result returned is the highest salary as expected, which is 20000 in this case.
source = employees | sort salary | reverse | head 1 will return the item with salary = 20000. We delay head or limit pushdown until after reverse, ensuring the entire result set is available to reverse when appropriate, avoiding truncated or misleading results.
Open Question
- Should
reverse pushdown be disabled or restricted when preceded by sort or multiple sorts to avoid partial results and incorrect output?
- Should usage of
reverse be limited to queries bounded by head or similar limits?
- Should a configuration flag exist to disable or warn about
reverse usage on large unbounded result sets?
Problem Statement
In many analytical and observability queries, users want to inspect data in reverse order, for example:
Currently, PPL does not support a way to reverse the order of results in a query pipeline. This limits query flexibility and forces users to use workarounds or rely on external sorting.
Current State
sort, which orders records by specified fields, but does not offer a simple way to reverse record order without explicitly specifying sort criteria againLong-Term Goals
Proposal
Syntax
reversetakes no arguments.sort, which allows field-based sorting,reversesimply inverts the row order as is.Semantics
Expected Behavior:
reverseappears in the pipelineAlternative
sort -requires field: helpful but doesn't support reversing arbitrary pipeline outputs, and would require re-specifying field names.reverseworks even when no sort is specified, or when you're working with filtered, projected, or unsorted data.Implementation Discussion
Default Implementation: To implement reverse, we insert a logical sort with a synthetic column based on
ROW_NUMBER():Pushdown Implementation: (TBD, will be in 2nd followup PR)
The optimization is triggered when the logical plan has this exact shape:
Conditions for pushdown [not implemented yet]:
Example Pushdown Optimization:
If reverse appears directly after the source, it matches this optimized pattern:
This enables pushdown of the
reverseandheadcommands into the scan phase.Engine Support
reversecan be reused in the Spark engine.Performance
Open Question: Semantics When
sortPrecedesreverseQuery:
Logical Plan:
Answer: The first result returned is the highest salary as expected, which is 20000 in this case.
source = employees | sort salary | reverse | head 1 will return the item with salary = 20000. We delay head or limit pushdown until after reverse, ensuring the entire result set is available to reverse when appropriate, avoiding truncated or misleading results.
Open Question
reversepushdown be disabled or restricted when preceded bysortor multiple sorts to avoid partial results and incorrect output?reversebe limited to queries bounded byheador similar limits?reverseusage on large unbounded result sets?