Skip to content

[BUG] Multiple Aggregation Functions (MIN, MAX, FIRST, LAST, TAKE) Return Null/Empty Values With Alias Fields in PPL #4595

@alexey-temnikov

Description

@alexey-temnikov

Query Information

PPL Queries:

# All of these fail with alias fields:
source=test_alias_bug | stats MIN(@timestamp), MAX(@timestamp)
source=test_alias_bug | stats FIRST(@timestamp), LAST(@timestamp)
source=test_alias_bug | stats TAKE(@timestamp, 2)

Expected Results:

// MIN/MAX
{"MIN(@timestamp)": "2024-01-01T10:00:00.000Z", "MAX(@timestamp)": "2024-01-03T10:00:00.000Z"}

// FIRST/LAST
{"FIRST(@timestamp)": "2024-01-01T10:00:00.000Z", "LAST(@timestamp)": "2024-01-03T10:00:00.000Z"}

// TAKE
{"TAKE(@timestamp, 2)": ["2024-01-01T10:00:00Z", "2024-01-02T10:00:00Z"]}

Actual Results:

// MIN/MAX
{"MIN(@timestamp)": null, "MAX(@timestamp)": null}

// FIRST/LAST
{"FIRST(@timestamp)": null, "LAST(@timestamp)": null}

// TAKE
{"TAKE(@timestamp, 2)": []}

Dataset Information

Dataset/Schema Type

  • Custom (minimal reproduction case)

Index Mapping

{
  "mappings": {
    "properties": {
      "created_at": {"type": "date"},
      "@timestamp": {"type": "alias", "path": "created_at"},
      "value": {"type": "integer"},
      "value_alias": {"type": "alias", "path": "value"}
    }
  }
}

Sample Data

{"created_at": "2024-01-01T10:00:00Z", "value": 100}
{"created_at": "2024-01-02T10:00:00Z", "value": 200}
{"created_at": "2024-01-03T10:00:00Z", "value": 300}

Bug Description

Issue Summary:

Five aggregation functions (MIN, MAX, FIRST, LAST, TAKE) return null or empty values when applied to alias fields in PPL queries. The same queries work correctly when using the original field names instead of aliases. This affects both date and numeric alias fields.

Steps to Reproduce:

  1. Create a test index with alias fields:
curl -X PUT "localhost:9200/test_alias_bug" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "created_at": {"type": "date"},
      "@timestamp": {"type": "alias", "path": "created_at"},
      "value": {"type": "integer"},
      "value_alias": {"type": "alias", "path": "value"}
    }
  }
}'
  1. Insert test documents:
curl -X POST "localhost:9200/test_alias_bug/_bulk" -H 'Content-Type: application/json' -d'
{"index":{}}
{"created_at": "2024-01-01T10:00:00Z", "value": 100}
{"index":{}}
{"created_at": "2024-01-02T10:00:00Z", "value": 200}
{"index":{}}
{"created_at": "2024-01-03T10:00:00Z", "value": 300}
'
  1. Test affected functions with alias fields (all return null/empty):
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'
{"query": "source=test_alias_bug | stats MIN(@timestamp), MAX(@timestamp)"}'

curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'
{"query": "source=test_alias_bug | stats FIRST(@timestamp), LAST(@timestamp)"}'

curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'
{"query": "source=test_alias_bug | stats TAKE(@timestamp, 2)"}'

curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'
{"query": "source=test_alias_bug | stats MIN(value_alias), MAX(value_alias)"}'
  1. Test with original field names (all work correctly):
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'
{"query": "source=test_alias_bug | stats MIN(created_at), MAX(created_at)"}'
  1. Verify unaffected functions still work with aliases:
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'
{"query": "source=test_alias_bug | stats SUM(value_alias), AVG(value_alias), COUNT(value_alias)"}'
# Returns correct values: SUM=600, AVG=200.0, COUNT=3

Impact:

This bug prevents users from using standard timestamp alias fields (like @timestamp, commonly used in observability data) with multiple critical aggregation functions. It affects:

  • 5 aggregation functions: MIN, MAX, FIRST, LAST, TAKE
  • Any PPL query using these functions on alias fields
  • Both date/time and numeric alias fields
  • Users migrating from systems that rely on @timestamp as a standard field name
  • Observability use cases that depend on temporal aggregations

Unaffected Functions:

  • SUM, AVG, COUNT, VAR_SAMP, VAR_POP, STDDEV_SAMP, STDDEV_POP work correctly with alias fields

Root Cause Analysis

Root Cause:
This is a preliminary analysis and requires further investigation.

The issue stems from how the Calcite v3 engine implements certain aggregation functions using OpenSearch's top_hits aggregation with the fetchSource API, which cannot retrieve alias fields.

Code Location:

https://github.com/opensearch-project/sql/blob/main/opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java#L392-L490

Specific Implementation:

All affected functions use this pattern:

AggregationBuilders.topHits(aggFieldName)
    .fetchSource(helper.inferNamedField(args.getFirst()).getRootName(), null)
    .size(1)
    .from(0)
    .sort(fieldName, sortOrder)

Why It Fails:

  1. fetchSource() requests fields from OpenSearch's _source document
  2. OpenSearch alias fields are not stored in _source - they're virtual references to other fields
  3. When _source includes an alias field, OpenSearch returns an empty object {}
  4. The aggregation parser receives no data and returns null/empty values

Evidence:

Direct OpenSearch DSL test confirms the limitation:

# Using fetchSource with alias - returns empty _source
curl -X POST "localhost:9200/test_alias_bug/_search" -d'{
  "size": 0,
  "aggs": {
    "test": {
      "top_hits": {
        "size": 1,
        "_source": {"includes": ["@timestamp"]},
        "sort": [{"@timestamp": "asc"}]
      }
    }
  }
}'
# Result: "_source": {}

Why Other Functions Work:

Functions like SUM, AVG, COUNT use standard OpenSearch aggregations (sum, avg, value_count) which operate on field values directly, not _source:

// These work with aliases:
AggregationBuilders.sum(name).field(fieldName)
AggregationBuilders.avg(name).field(fieldName)

Tentative Proposed Fix

This is a preliminary analysis and requires further investigation.
Solution: Use OpenSearch Fields API Instead of fetchSource

Replace .fetchSource(fieldName, null) with .docValueField(fieldName) or use the fields API in all top_hits aggregations.

Verification that Fields API Works:

curl -X POST "localhost:9200/test_alias_bug/_search" -d'{
  "size": 0,
  "aggs": {
    "test_fields_api": {
      "top_hits": {
        "size": 1,
        "fields": ["@timestamp"],
        "sort": [{"@timestamp": "asc"}]
      }
    }
  }
}'
# Result: "fields": {"@timestamp": ["2024-01-01T10:00:00.000Z"]} ✓

Required Code Changes in AggregateAnalyzer.java:

Replace .fetchSource() with .docValueField() in:

  1. MIN aggregation (lines 401-409)
  2. MAX aggregation (lines 421-429)
  3. FIRST aggregation (lines 473-479)
  4. LAST aggregation (lines 480-488)
  5. TAKE aggregation (lines 467-471)

Alternative Solution:

Resolve aliases to their target field paths before building aggregations in the logical plan optimizer or physical plan builder.

Disclaimer: This is a preliminary analysis based on code review and testing. The actual implementation should be reviewed by maintainers familiar with:

  • TopHitsParser compatibility with fields API vs fetchSource
  • Performance implications of using fields API
  • Edge cases with nested fields or multi-field mappings

Workaround

Use the original field names instead of aliases in aggregation queries:

# Instead of:
source=index | stats MIN(@timestamp), MAX(@timestamp)

# Use:
source=index | stats MIN(created_at), MAX(created_at)

Related Issues

Environment Information

OpenSearch Version: 3.4.0-SNAPSHOT

Plugin Version: opensearch-sql 3.4.0.0-SNAPSHOT

Build Details:

  • Build Hash: 4094287ce37848e38e71ee98dfa6fe483add7b9f
  • Build Date: 2025-10-16T23:11:05.580414Z
  • Lucene Version: 10.3.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    PPLPiped processing languagebugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions