[Rollup] Log deprecation if config and request intervals are mixed#33284
[Rollup] Log deprecation if config and request intervals are mixed#33284jimczi merged 6 commits intoelastic:6.4from
Conversation
Backport of elastic#32052, with some changes. In 6.5, we forbid mixed intervals (one fixed, one calendar). But for a 6.4.x bugfix release this would be a pretty extreme break. So we continue to allow mixed intervals in 6.4, but log a deprecation warning. 6.4 also doesn't enforce that the request is a multiple of the config. We also log a deprecation if that is not the case, but don't prevent the query. Finally, mixed intervals are only used if a better job cap can't be found. To support the case where we have to compare fixed and calendar, there is a table of rough conversions (e.g. 1 day == 86400000ms). This lets us crudely ensure the user doesn't try to query `2d` on an index that is set for `1w`.
|
Pinging @elastic/es-search-aggs |
| intervals, the <<rollup-search>> API can aggregate on any time interval hourly or greater. Intervals that are less than an hour will throw | ||
| an exception, since the data simply doesn't exist for finer granularities. | ||
| Rollups are stored at a certain granularity, as defined by the `date_histogram` group in the configuration. This means you | ||
| can only search/aggregate the rollup data with an interval that is greater-than or equal to the configured rollup interval. |
There was a problem hiding this comment.
This isn't technically true, but I'm inclined to leave as-is to encourage proper config in 6.4 :)
| A non-multiple wouldn't work, since the rolled up data wouldn't cleanly "overlap" with the buckets generated | ||
| by the aggregation, leading to incorrect results. | ||
|
|
||
| For that reason, an error is thrown if a whole multiple of the configured interval isn't found. |
|
Note: I also spotted a bug in the comparator, similar to the parsing bug that's being fixed. But I'd like to fix that in a followup PR because it applies to all branches. |
jimczi
left a comment
There was a problem hiding this comment.
It looks good @polyfractal , I left some comments
| */ | ||
| static boolean validateMixedInterval(long requestInterval, DateHistogramInterval configInterval) { | ||
| if (isCalendarInterval(configInterval)) { | ||
| long configIntervalMillis = CALENDAR_ORDERING.getOrDefault(configInterval.toString(), Long.MAX_VALUE); |
There was a problem hiding this comment.
We need to log the deprecation warning if the statement below is true ?
| // TODO should be divisor of interval | ||
| if (interval <= source.interval()) { | ||
| // query interval must be gte the configured interval, and a whole multiple | ||
| if (interval <= source.interval() && source.interval() % interval == 0) { |
There was a problem hiding this comment.
should we also turn this breaking change in a deprecation warning ?
There was a problem hiding this comment.
argh, yes. adding thanks
| // The request must be gte the config. The CALENDAR_ORDERING map values are integers representing | ||
| // relative orders between the calendar units | ||
| long requestOrder = CALENDAR_ORDERING.getOrDefault(requestInterval.toString(), Long.MAX_VALUE); | ||
| long configOrder = CALENDAR_ORDERING.getOrDefault(configInterval.toString(), Long.MAX_VALUE); |
There was a problem hiding this comment.
You can maybe replace the static map with:
DateTimeUnit configUnit = DateHistogramAggregationBuilder.DATE_FIELD_UNITS.get(configInterval.toString());
long configOrder = configUnit.field(DateTimeZone.UTC).getDurationField().getUnitMillis();
?
There was a problem hiding this comment.
Didn't realize you could do this. ++
|
Tidied up, and added a few more tests for the multiples warning |
Backport of #32052, with some changes.
In 6.5, we forbid mixed intervals (one fixed, one calendar). But for a 6.4.x bugfix release this would be a pretty extreme break. So we continue to allow mixed intervals in 6.4, but log a deprecation warning.
6.4 also doesn't enforce that the request is a multiple of the config. So we log a deprecation if that is not the case, but don't prevent the query.
Finally, mixed intervals are only used if a better job cap can't be found.
To support the case where we have to compare fixed and calendar, there is a table of rough conversions (e.g. 1 day == 86400000ms). This lets us crudely ensure the user doesn't try to query
2don an index that is set for1w.@jimczi Mind giving a quick review on this? The major change from the other PR is the addition of
validateMixedInterval(), changes to theCALENDAR_ORDERINGtable, and that we don't enforce multiples.