Skip to content

[BUG] Push down redundant filter for time span #4811

@qianheng-aws

Description

@qianheng-aws

What is the bug?
Currently, for time span agg, we always view it as bucket_nullalbe=false and won't show null bucket after this PR:#4327. We implements this by adding another filter operator on the time field before aggregation.

After push down span by using date_histogram, it actually won't have null bucket already. But we still push down a redundant filter into the scan. e.g.

// PPL 
source=events  |  stats count() by span(@timestamp, 1d)

// Final plan
CalciteEnumerableIndexScan(table=[[OpenSearch, events]], PushDownContext=[[PROJECT->[@timestamp], FILTER->IS NOT NULL($0), AGGREGATION->rel#630:LogicalAggregate.NONE.[](input=RelSubset#629,group={0},count()=COUNT()), PROJECT->[count(), span(@timestamp,1d)], LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"exists":{"field":"@timestamp","boost":1.0}},"_source":{"includes":["@timestamp"],"excludes":[]},"aggregations":{"composite_buckets":{"composite":{"size":10000,"sources":[{"span(@timestamp,1d)":{"date_histogram":{"field":"@timestamp","missing_bucket":false,"order":"asc","fixed_interval":"1d"}}}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])

It has FILTER->IS NOT NULL($0) in the PushDownContext and "query":{"exists":{"field":"@timestamp","boost":1.0}} in the DSL.

It will introduce more performance downgrade if the bucket field is a derived field, which will generate script in the DSL query.

How can one reproduce the bug?

  1. Create a index with time field, e.g.
PUT localhost:9200/events
{
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "host": {
        "type": "text"
      },
      "cpu_usage": {
        "type": "double"
      },
      "region": {
        "type": "keyword"
      }
    }
  }
}
  1. Run explain on a query with time span, e.g.
source=events  |  stats count() by span(@timestamp, 1d)

What is the expected behavior?
The final plan shouldn't contain the filter derived from time span aggregation.

What is your host/environment?

  • OS: [e.g. iOS]
  • Version 3.4-SNAPSHOT
  • Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

Metadata

Metadata

Assignees

Labels

aggregationbugSomething isn't workingcalcitecalcite migration releatedpushdownpushdown related issues

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions