-
Notifications
You must be signed in to change notification settings - Fork 190
[BUG] Multiple Aggregation Functions (MIN, MAX, FIRST, LAST, TAKE) Return Null/Empty Values With Alias Fields in PPL #4595
Description
Query Information
PPL Queries:
# All of these fail with alias fields:
source=test_alias_bug | stats MIN(@timestamp), MAX(@timestamp)
source=test_alias_bug | stats FIRST(@timestamp), LAST(@timestamp)
source=test_alias_bug | stats TAKE(@timestamp, 2)
Expected Results:
// MIN/MAX
{"MIN(@timestamp)": "2024-01-01T10:00:00.000Z", "MAX(@timestamp)": "2024-01-03T10:00:00.000Z"}
// FIRST/LAST
{"FIRST(@timestamp)": "2024-01-01T10:00:00.000Z", "LAST(@timestamp)": "2024-01-03T10:00:00.000Z"}
// TAKE
{"TAKE(@timestamp, 2)": ["2024-01-01T10:00:00Z", "2024-01-02T10:00:00Z"]}Actual Results:
// MIN/MAX
{"MIN(@timestamp)": null, "MAX(@timestamp)": null}
// FIRST/LAST
{"FIRST(@timestamp)": null, "LAST(@timestamp)": null}
// TAKE
{"TAKE(@timestamp, 2)": []}Dataset Information
Dataset/Schema Type
- Custom (minimal reproduction case)
Index Mapping
{
"mappings": {
"properties": {
"created_at": {"type": "date"},
"@timestamp": {"type": "alias", "path": "created_at"},
"value": {"type": "integer"},
"value_alias": {"type": "alias", "path": "value"}
}
}
}Sample Data
{"created_at": "2024-01-01T10:00:00Z", "value": 100}
{"created_at": "2024-01-02T10:00:00Z", "value": 200}
{"created_at": "2024-01-03T10:00:00Z", "value": 300}Bug Description
Issue Summary:
Five aggregation functions (MIN, MAX, FIRST, LAST, TAKE) return null or empty values when applied to alias fields in PPL queries. The same queries work correctly when using the original field names instead of aliases. This affects both date and numeric alias fields.
Steps to Reproduce:
- Create a test index with alias fields:
curl -X PUT "localhost:9200/test_alias_bug" -H 'Content-Type: application/json' -d'
{
"mappings": {
"properties": {
"created_at": {"type": "date"},
"@timestamp": {"type": "alias", "path": "created_at"},
"value": {"type": "integer"},
"value_alias": {"type": "alias", "path": "value"}
}
}
}'- Insert test documents:
curl -X POST "localhost:9200/test_alias_bug/_bulk" -H 'Content-Type: application/json' -d'
{"index":{}}
{"created_at": "2024-01-01T10:00:00Z", "value": 100}
{"index":{}}
{"created_at": "2024-01-02T10:00:00Z", "value": 200}
{"index":{}}
{"created_at": "2024-01-03T10:00:00Z", "value": 300}
'- Test affected functions with alias fields (all return null/empty):
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'
{"query": "source=test_alias_bug | stats MIN(@timestamp), MAX(@timestamp)"}'
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'
{"query": "source=test_alias_bug | stats FIRST(@timestamp), LAST(@timestamp)"}'
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'
{"query": "source=test_alias_bug | stats TAKE(@timestamp, 2)"}'
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'
{"query": "source=test_alias_bug | stats MIN(value_alias), MAX(value_alias)"}'- Test with original field names (all work correctly):
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'
{"query": "source=test_alias_bug | stats MIN(created_at), MAX(created_at)"}'- Verify unaffected functions still work with aliases:
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'
{"query": "source=test_alias_bug | stats SUM(value_alias), AVG(value_alias), COUNT(value_alias)"}'
# Returns correct values: SUM=600, AVG=200.0, COUNT=3Impact:
This bug prevents users from using standard timestamp alias fields (like @timestamp, commonly used in observability data) with multiple critical aggregation functions. It affects:
- 5 aggregation functions: MIN, MAX, FIRST, LAST, TAKE
- Any PPL query using these functions on alias fields
- Both date/time and numeric alias fields
- Users migrating from systems that rely on
@timestampas a standard field name - Observability use cases that depend on temporal aggregations
Unaffected Functions:
- SUM, AVG, COUNT, VAR_SAMP, VAR_POP, STDDEV_SAMP, STDDEV_POP work correctly with alias fields
Root Cause Analysis
Root Cause:
This is a preliminary analysis and requires further investigation.
The issue stems from how the Calcite v3 engine implements certain aggregation functions using OpenSearch's top_hits aggregation with the fetchSource API, which cannot retrieve alias fields.
Code Location:
Specific Implementation:
All affected functions use this pattern:
AggregationBuilders.topHits(aggFieldName)
.fetchSource(helper.inferNamedField(args.getFirst()).getRootName(), null)
.size(1)
.from(0)
.sort(fieldName, sortOrder)Why It Fails:
fetchSource()requests fields from OpenSearch's_sourcedocument- OpenSearch alias fields are not stored in
_source- they're virtual references to other fields - When
_sourceincludes an alias field, OpenSearch returns an empty object{} - The aggregation parser receives no data and returns null/empty values
Evidence:
Direct OpenSearch DSL test confirms the limitation:
# Using fetchSource with alias - returns empty _source
curl -X POST "localhost:9200/test_alias_bug/_search" -d'{
"size": 0,
"aggs": {
"test": {
"top_hits": {
"size": 1,
"_source": {"includes": ["@timestamp"]},
"sort": [{"@timestamp": "asc"}]
}
}
}
}'
# Result: "_source": {}Why Other Functions Work:
Functions like SUM, AVG, COUNT use standard OpenSearch aggregations (sum, avg, value_count) which operate on field values directly, not _source:
// These work with aliases:
AggregationBuilders.sum(name).field(fieldName)
AggregationBuilders.avg(name).field(fieldName)Tentative Proposed Fix
This is a preliminary analysis and requires further investigation.
Solution: Use OpenSearch Fields API Instead of fetchSource
Replace .fetchSource(fieldName, null) with .docValueField(fieldName) or use the fields API in all top_hits aggregations.
Verification that Fields API Works:
curl -X POST "localhost:9200/test_alias_bug/_search" -d'{
"size": 0,
"aggs": {
"test_fields_api": {
"top_hits": {
"size": 1,
"fields": ["@timestamp"],
"sort": [{"@timestamp": "asc"}]
}
}
}
}'
# Result: "fields": {"@timestamp": ["2024-01-01T10:00:00.000Z"]} ✓Required Code Changes in AggregateAnalyzer.java:
Replace .fetchSource() with .docValueField() in:
- MIN aggregation (lines 401-409)
- MAX aggregation (lines 421-429)
- FIRST aggregation (lines 473-479)
- LAST aggregation (lines 480-488)
- TAKE aggregation (lines 467-471)
Alternative Solution:
Resolve aliases to their target field paths before building aggregations in the logical plan optimizer or physical plan builder.
Disclaimer: This is a preliminary analysis based on code review and testing. The actual implementation should be reviewed by maintainers familiar with:
- TopHitsParser compatibility with fields API vs fetchSource
- Performance implications of using fields API
- Edge cases with nested fields or multi-field mappings
Workaround
Use the original field names instead of aliases in aggregation queries:
# Instead of:
source=index | stats MIN(@timestamp), MAX(@timestamp)
# Use:
source=index | stats MIN(created_at), MAX(created_at)
Related Issues
- RFC : PPL Query Engine Status and Roadmap for 2025/2026 #4287 - PPL Query Engine roadmap marks MIN/MAX as "Needs Fixes" for Q4/3.4 release
Environment Information
OpenSearch Version: 3.4.0-SNAPSHOT
Plugin Version: opensearch-sql 3.4.0.0-SNAPSHOT
Build Details:
- Build Hash: 4094287ce37848e38e71ee98dfa6fe483add7b9f
- Build Date: 2025-10-16T23:11:05.580414Z
- Lucene Version: 10.3.1
Metadata
Metadata
Assignees
Labels
Type
Projects
Status