Mark shard failures caused by unsupported aggregations or queries against rolled up data so Kibana can identify them#89252
Conversation
To start with we decided that we would like to support only date histogram aggregations using 'fixed_interval' and throw an error if 'calendar_interval' is used on a time series index.
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
server/src/main/java/org/elasticsearch/ElasticsearchException.java
Outdated
Show resolved
Hide resolved
.../rollup/qa/rest/src/yamlRestTest/resources/rest-api-spec/test/rollup/20_unsupported_aggs.yml
Outdated
Show resolved
Hide resolved
.../org/elasticsearch/search/aggregations/bucket/histogram/DateHistogramAggregationBuilder.java
Outdated
Show resolved
Hide resolved
Date histogram aggregations must fail if 'calendar_interval' or a non-utc time zone is used, only if the index the aggregation is running on is a rollup index. For time series indices, date histogram aggregations should work as expected without throwing any error.
| EnumSet.of(ClusterBlockLevel.WRITE) | ||
| ); | ||
|
|
||
| public boolean isRollupIndex() { |
There was a problem hiding this comment.
I have the feeling this is not the best I can do here. I am working on a better way to do this at the moment but I wanted to push this change because it might be good enough. If that feeling is shared but this is considered "good enough" we could proceed merging this to unblock Kibana folks experimenting with catching exceptions. Then I can create another issue to refactor this. N.B. I am on vacation next week and Christos is on vacation too and I would like to avoid Kibana folks being stuck waiting for this.
As a result, if required, I can work on this on another PR whose purpose would be to refactor this "isRollupIndex" logic.
There was a problem hiding this comment.
I think it's cool as a temporary thing. Maybe a TODO in the code so we know we thought it was temporary when we added it.
flash1293
left a comment
There was a problem hiding this comment.
Looks mostly good to me, left a nit about another test we could add
| - match: { error.root_cause.0.reason: "Field [total_memory_used] of type [aggregate_metric_double] is not supported for aggregation [percentiles]" } | ||
|
|
||
| --- | ||
| "Top-metrics aggregation on aggregate_metric_double field": |
There was a problem hiding this comment.
Not sure whether out of scope but what about a test for the mixed case (as this will be extremely common in practice)? Querying both indices at the same time, resulting in one of them succeeding and the other producing a shard failure so there's partial data in the response
There was a problem hiding this comment.
Oh that is a good test I think...I am not sure what the result is in this case. I guess we hava a shard failure for the rollup index and partial results coming from the time series index...?
There was a problem hiding this comment.
This test is not necessary for the errors we throw if the aggregation is not supported (aggregation on field of type agrgegate_metric_double). In that case we don't check if the index is a time series index or not...just the field type. Also when doing a rollup on a field, lat's say a histogram aggregation on a gauge field called test_field. In that case we create the rollup index and a new field that is called test_field_histogram. As a result it would not be possible to run a query on both indices on the same field because the field name in the rollup index would be different 'by construction' (original_field_name + aggregation_name).
Anyway, it is a good test for the date histogram case...in that case indeed the field is the timestamp field which is the same in both the source index and the rollup index.
Hitting both indices produces partial results coming from the source time series index and a shard failure coming from the rollup index.
| calendar_interval: hour | ||
| min_doc_count: 1 | ||
|
|
||
| - match: { _shards.total: 2 } |
There was a problem hiding this comment.
NOTE: here we rely on the fact that the rollup index is created with the same number of shards of the original index, which is set to 1 in the setup block.
| @@ -0,0 +1,250 @@ | |||
| setup: | |||
There was a problem hiding this comment.
I had to change this test adding the logic to create the rollup index from the source index because some settings and metadata are applied by the rollup operation and it is not possible to set them from the yaml test. The two settings I use are the source index name and the rollup status which I check to verify if the index is a rollup index or not.
| EnumSet.of(ClusterBlockLevel.WRITE) | ||
| ); | ||
|
|
||
| public boolean isRollupIndex() { |
There was a problem hiding this comment.
I think it's cool as a temporary thing. Maybe a TODO in the code so we know we thought it was temporary when we added it.
| String valuesSourceDescription, | ||
| String aggregationName | ||
| ) { | ||
| if (indexSettings.getIndexMetadata().isRollupIndex() && DateIntervalWrapper.IntervalTypeEnum.CALENDAR.equals(intervalType)) { |
There was a problem hiding this comment.
I think we want this test for all time rolled up indices. Not that there are any non-time series ones, but I think the addition of isRollupIndex means we don't need these new methods in IndexMode any more.
| final boolean rollupSuccess = IndexMetadata.RollupTaskStatus.SUCCESS.name() | ||
| .toLowerCase(Locale.ROOT) | ||
| .equals(indexRollupStatus != null ? indexRollupStatus.toLowerCase(Locale.ROOT) : IndexMetadata.RollupTaskStatus.UNKNOWN); | ||
| return Strings.isNullOrEmpty(sourceIndex) == false && rollupSuccess; |
| * Downsampling uses specific types while aggregating some fields (like 'aggregate_metric_double'). | ||
| * Such field types do not support some aggregations. | ||
| */ | ||
| public class UnsupportedAggregationOnDownsampledField extends AggregationExecutionException { |
There was a problem hiding this comment.
It's probably worth leaving a note that the name of this class is part of a contract with Kibana so we shouldn't change it.
| from: 401.0 | ||
|
|
||
| - match: { status: 400 } | ||
| - match: { error.root_cause.0.type: unsupported_aggregation_on_downsampled_field } |
There was a problem hiding this comment.
It's probably worth leaving a comment here that this type name is part of a contract with kibana. Just extra paranoia so folks don't change it.
csoulios
left a comment
There was a problem hiding this comment.
I had a very quick look at the code. I only left a small comment about naming the exception, because this is something we cannot change in the future.
| * Downsampling uses specific types while aggregating some fields (like 'aggregate_metric_double'). | ||
| * Such field types do not support some aggregations. | ||
| */ | ||
| public class UnsupportedAggregationOnDownsampledField extends AggregationExecutionException { |
There was a problem hiding this comment.
Since we name the action "rollup" everywhere in the code, what about naming this exception UnsupportedAggregationOnRollupField?
Also, since this is more generic than a rollup field we can name it UnsupportedAggregationOnRollupIndex
|
I've merged this so @flash1293 should be able to grab it sooner rather than later. |
|
FYI @tsullivan @ppisljar - you should be able to wire this up in your warnings handling pr |
When an unsupported aggregation is executed on a
aggregate_double_metricfield an error with a specifictypefield is generated. This allows clients like Kibanato identify these errors and handle them properly.
Also we would like to fail on queries using a date histogram
aggregation with
calendar_intervalon rollup indices.Date histogram aggregations are executed on rollup indices
only if using
fixed_intervaland aUTCtimezone.