Query Information
PPL Command/Query:
source=jaeger-span-2024-05-08 | dedup references.refType
Expected Result:
The query should return deduplicated results based on the references.refType field values (e.g., "CHILD_OF", "FOLLOWS_FROM", and null), similar to how source=jaeger-span-2024-05-08 | fields references.refType successfully returns 4226 results with these values.
Actual Result:
The query returns 0 results:
{
"schema": [...],
"datarows": [],
"total": 0,
"size": 0
}
Dataset Information
Dataset/Schema Type
Index Mapping
{
"mappings": {
"properties": {
"references": {
"type": "nested",
"dynamic": "false",
"properties": {
"refType": {
"type": "keyword",
"ignore_above": 256
},
"spanID": {
"type": "keyword",
"ignore_above": 256
},
"traceID": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
Sample Data
{
"traceID": "3c61c828d40ae3c8b553f0ba2d185898",
"spanID": "1dc84d1a2920a6bf",
"operationName": "oteldemo.ProductCatalogService/GetProduct",
"references": [
{
"refType": "CHILD_OF",
"traceID": "3c61c828d40ae3c8b553f0ba2d185898",
"spanID": "047eded12fcda381"
}
]
}
Bug Description
Issue Summary:
The dedup command returns zero results when applied to fields within nested objects, even though the same field can be successfully queried using the fields command.
Steps to Reproduce:
- Create an index with a nested field mapping (e.g., Jaeger span data with nested
references field)
- Insert documents containing nested objects with the target field populated
- Run
source=<index> | fields <nested_field> - this works correctly
- Run
source=<index> | dedup <nested_field> - this returns 0 results
Minimal Reproduction:
# Create test index
curl -X PUT "localhost:9200/test-nested-dedup" -H 'Content-Type: application/json' -d '{
"mappings": {
"properties": {
"items": {
"type": "nested",
"properties": {
"name": {"type": "keyword"}
}
}
}
}
}'
# Insert test data
curl -X POST "localhost:9200/test-nested-dedup/_bulk" -H 'Content-Type: application/json' -d '
{"index":{"_id":"1"}}
{"items":[{"name":"apple"}]}
{"index":{"_id":"2"}}
{"items":[{"name":"banana"}]}
'
# Refresh index
curl -X POST "localhost:9200/test-nested-dedup/_refresh"
# This works - returns 2 results
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{
"query": "source=test-nested-dedup | fields items.name"
}'
# This fails - returns 0 results
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{
"query": "source=test-nested-dedup | dedup items.name"
}'
Impact:
This bug prevents users from deduplicating results based on fields within nested objects.
Environment Information
OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT
Additional Details:
The issue occurs with the Calcite-based query engine (V3).
Technical Analysis
Tentative Root Cause:
Note: This is a preliminary analysis and requires further investigation.
The dedup command internally generates an IS NOT NULL filter on the dedup field to exclude null values before applying the deduplication logic. The query execution plan shows:
LogicalFilter(condition=[IS NOT NULL($42)])
This filter is then converted to an OpenSearch exists query:
{
"query": {
"exists": {
"field": "references.refType"
}
}
}
However, for nested fields in OpenSearch, an exists query on a nested field path (e.g., references.refType) must be wrapped in a nested query to function correctly:
{
"query": {
"nested": {
"path": "references",
"query": {
"exists": {
"field": "references.refType"
}
}
}
}
}
Verification:
- Direct
exists query on references.refType: 0 results
exists query wrapped in nested query: 2565 results (correct)
Code Location:
The issue appears to be in /opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java around lines 577-586:
// OpenSearch DSL does not handle IS_NULL / IS_NOT_NULL on nested fields correctly
checkForNestedFieldOperands(call);
Expression a = call.getOperands().get(0).accept(this);
QueryExpression operand = QueryExpression.create((TerminalExpression) a);
return call.getKind() == SqlKind.IS_NOT_NULL ? operand.exists() : operand.notExists();
The code has a comment acknowledging the limitation and includes a check (checkForNestedFieldOperands) that throws an exception for nested fields. However, this check only detects when the operand type is ArraySqlType (the nested array itself), not when accessing a field within a nested object (e.g., references.refType).
Tentative Proposed Fix:
Note: This is a preliminary analysis and requires further investigation.
The fix would require:
-
Enhanced Detection: Modify the nested field detection logic to identify field paths that traverse nested objects (e.g., references.refType where references is nested).
-
Nested Query Wrapping: When generating an exists query for a field within a nested object, wrap it in a nested query with the appropriate path.
-
Implementation Approach:
- Extract the nested path from the field name (e.g.,
references from references.refType)
- Check if this path corresponds to a nested field in the index mapping
- If yes, generate a nested query wrapper around the exists query
Example Fix Location:
In PredicateAnalyzer.java, the postfix() method would need to:
- Detect if the field is within a nested object
- Generate an appropriate nested query DSL instead of a simple exists query
Note: This analysis is preliminary. A complete fix would require further investigation!
Related Issues
Query Information
PPL Command/Query:
Expected Result:
The query should return deduplicated results based on the
references.refTypefield values (e.g., "CHILD_OF", "FOLLOWS_FROM", and null), similar to howsource=jaeger-span-2024-05-08 | fields references.refTypesuccessfully returns 4226 results with these values.Actual Result:
The query returns 0 results:
{ "schema": [...], "datarows": [], "total": 0, "size": 0 }Dataset Information
Dataset/Schema Type
Index Mapping
{ "mappings": { "properties": { "references": { "type": "nested", "dynamic": "false", "properties": { "refType": { "type": "keyword", "ignore_above": 256 }, "spanID": { "type": "keyword", "ignore_above": 256 }, "traceID": { "type": "keyword", "ignore_above": 256 } } } } } }Sample Data
{ "traceID": "3c61c828d40ae3c8b553f0ba2d185898", "spanID": "1dc84d1a2920a6bf", "operationName": "oteldemo.ProductCatalogService/GetProduct", "references": [ { "refType": "CHILD_OF", "traceID": "3c61c828d40ae3c8b553f0ba2d185898", "spanID": "047eded12fcda381" } ] }Bug Description
Issue Summary:
The
dedupcommand returns zero results when applied to fields within nested objects, even though the same field can be successfully queried using thefieldscommand.Steps to Reproduce:
referencesfield)source=<index> | fields <nested_field>- this works correctlysource=<index> | dedup <nested_field>- this returns 0 resultsMinimal Reproduction:
Impact:
This bug prevents users from deduplicating results based on fields within nested objects.
Environment Information
OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT
Additional Details:
The issue occurs with the Calcite-based query engine (V3).
Technical Analysis
Tentative Root Cause:
The
dedupcommand internally generates anIS NOT NULLfilter on the dedup field to exclude null values before applying the deduplication logic. The query execution plan shows:This filter is then converted to an OpenSearch
existsquery:{ "query": { "exists": { "field": "references.refType" } } }However, for nested fields in OpenSearch, an
existsquery on a nested field path (e.g.,references.refType) must be wrapped in anestedquery to function correctly:{ "query": { "nested": { "path": "references", "query": { "exists": { "field": "references.refType" } } } } }Verification:
existsquery onreferences.refType: 0 resultsexistsquery wrapped innestedquery: 2565 results (correct)Code Location:
The issue appears to be in
/opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.javaaround lines 577-586:The code has a comment acknowledging the limitation and includes a check (
checkForNestedFieldOperands) that throws an exception for nested fields. However, this check only detects when the operand type isArraySqlType(the nested array itself), not when accessing a field within a nested object (e.g.,references.refType).Tentative Proposed Fix:
Note: This is a preliminary analysis and requires further investigation.
The fix would require:
Enhanced Detection: Modify the nested field detection logic to identify field paths that traverse nested objects (e.g.,
references.refTypewherereferencesis nested).Nested Query Wrapping: When generating an
existsquery for a field within a nested object, wrap it in anestedquery with the appropriate path.Implementation Approach:
referencesfromreferences.refType)Example Fix Location:
In
PredicateAnalyzer.java, thepostfix()method would need to:Note: This analysis is preliminary. A complete fix would require further investigation!
Related Issues