Skip to content

Commit aecd9ac

Browse files
committed
Aggregations: Speed up include/exclude in terms aggregations with regexps.
Today we check every regular expression eagerly against every possible term. This can be very slow if you have lots of unique terms, and even the bottleneck if your query is selective. This commit switches to Lucene regular expressions instead of Java (not exactly the same syntax yet most existing regular expressions should keep working) and uses the same logic as RegExpQuery to intersect the regular expression with the terms dictionary. I wrote a quick benchmark (in the PR) to make sure it made things faster and the same request that took 750ms on master now takes 74ms with this change. Close #7526
1 parent 2a844fc commit aecd9ac

14 files changed

Lines changed: 334 additions & 326 deletions

File tree

docs/reference/migration/migrate_2_0.asciidoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,9 @@ equivalent to the former `pre_zone` option. Setting `time_zone` to a value like
139139
being applied in the specified time zone but In addition to this, also the `pre_zone_adjust_large_interval` is removed because we
140140
now always return dates and bucket keys in UTC.
141141

142+
`include`/`exclude` filtering on the `terms` aggregation now uses the same syntax as regexp queries instead of the Java syntax. While simple
143+
regexps should still work, more complex ones might need some rewriting. Also, the `flags` parameter is not supported anymore.
144+
142145
=== Terms filter lookup caching
143146

144147
The terms filter lookup mechanism does not support the `cache` option anymore

docs/reference/search/aggregations/bucket/terms-aggregation.asciidoc

Lines changed: 1 addition & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -482,42 +482,7 @@ with `water_` (so the tag `water_sports` will no be aggregated). The `include` r
482482
values are "allowed" to be aggregated, while the `exclude` determines the values that should not be aggregated. When
483483
both are defined, the `exclude` has precedence, meaning, the `include` is evaluated first and only then the `exclude`.
484484

485-
The regular expression are based on the Java(TM) http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html[Pattern],
486-
and as such, they it is also possible to pass in flags that will determine how the compiled regular expression will work:
487-
488-
[source,js]
489-
--------------------------------------------------
490-
{
491-
"aggs" : {
492-
"tags" : {
493-
"terms" : {
494-
"field" : "tags",
495-
"include" : {
496-
"pattern" : ".*sport.*",
497-
"flags" : "CANON_EQ|CASE_INSENSITIVE" <1>
498-
},
499-
"exclude" : {
500-
"pattern" : "water_.*",
501-
"flags" : "CANON_EQ|CASE_INSENSITIVE"
502-
}
503-
}
504-
}
505-
}
506-
}
507-
--------------------------------------------------
508-
509-
<1> the flags are concatenated using the `|` character as a separator
510-
511-
The possible flags that can be used are:
512-
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#CANON_EQ[`CANON_EQ`],
513-
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#CASE_INSENSITIVE[`CASE_INSENSITIVE`],
514-
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#COMMENTS[`COMMENTS`],
515-
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#DOTALL[`DOTALL`],
516-
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#LITERAL[`LITERAL`],
517-
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#MULTILINE[`MULTILINE`],
518-
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CASE[`UNICODE_CASE`],
519-
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS[`UNICODE_CHARACTER_CLASS`] and
520-
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNIX_LINES[`UNIX_LINES`]
485+
The syntax is the same as <<regexp-syntax,regexp queries>>.
521486

522487
For matching based on exact values the `include` and `exclude` parameters can simply take an array of
523488
strings that represent the terms as they are found in the index:

src/main/java/org/elasticsearch/search/aggregations/bucket/significant/GlobalOrdinalsSignificantTermsAggregator.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ public class GlobalOrdinalsSignificantTermsAggregator extends GlobalOrdinalsStri
4848

4949
public GlobalOrdinalsSignificantTermsAggregator(String name, AggregatorFactories factories, ValuesSource.Bytes.WithOrdinals.FieldData valuesSource,
5050
BucketCountThresholds bucketCountThresholds,
51-
IncludeExclude includeExclude, AggregationContext aggregationContext, Aggregator parent,
51+
IncludeExclude.OrdinalsFilter includeExclude, AggregationContext aggregationContext, Aggregator parent,
5252
SignificantTermsAggregatorFactory termsAggFactory, Map<String, Object> metaData) throws IOException {
5353

5454
super(name, factories, valuesSource, null, bucketCountThresholds, includeExclude, aggregationContext, parent, SubAggCollectionMode.DEPTH_FIRST, false, metaData);
@@ -145,7 +145,7 @@ public static class WithHash extends GlobalOrdinalsSignificantTermsAggregator {
145145

146146
private final LongHash bucketOrds;
147147

148-
public WithHash(String name, AggregatorFactories factories, ValuesSource.Bytes.WithOrdinals.FieldData valuesSource, BucketCountThresholds bucketCountThresholds, IncludeExclude includeExclude, AggregationContext aggregationContext, Aggregator parent, SignificantTermsAggregatorFactory termsAggFactory, Map<String, Object> metaData) throws IOException {
148+
public WithHash(String name, AggregatorFactories factories, ValuesSource.Bytes.WithOrdinals.FieldData valuesSource, BucketCountThresholds bucketCountThresholds, IncludeExclude.OrdinalsFilter includeExclude, AggregationContext aggregationContext, Aggregator parent, SignificantTermsAggregatorFactory termsAggFactory, Map<String, Object> metaData) throws IOException {
149149
super(name, factories, valuesSource, bucketCountThresholds, includeExclude, aggregationContext, parent, termsAggFactory, metaData);
150150
bucketOrds = new LongHash(1, aggregationContext.bigArrays());
151151
}

src/main/java/org/elasticsearch/search/aggregations/bucket/significant/SignificantStringTermsAggregator.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ public class SignificantStringTermsAggregator extends StringTermsAggregator {
4747

4848
public SignificantStringTermsAggregator(String name, AggregatorFactories factories, ValuesSource valuesSource,
4949
BucketCountThresholds bucketCountThresholds,
50-
IncludeExclude includeExclude, AggregationContext aggregationContext, Aggregator parent,
50+
IncludeExclude.StringFilter includeExclude, AggregationContext aggregationContext, Aggregator parent,
5151
SignificantTermsAggregatorFactory termsAggFactory, Map<String, Object> metaData) throws IOException {
5252

5353
super(name, factories, valuesSource, null, bucketCountThresholds, includeExclude, aggregationContext, parent, SubAggCollectionMode.DEPTH_FIRST, false, metaData);

src/main/java/org/elasticsearch/search/aggregations/bucket/significant/SignificantTermsAggregatorFactory.java

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,8 @@ public enum ExecutionMode {
6565
Aggregator create(String name, AggregatorFactories factories, ValuesSource valuesSource,
6666
TermsAggregator.BucketCountThresholds bucketCountThresholds, IncludeExclude includeExclude,
6767
AggregationContext aggregationContext, Aggregator parent, SignificantTermsAggregatorFactory termsAggregatorFactory, Map<String, Object> metaData) throws IOException {
68-
return new SignificantStringTermsAggregator(name, factories, valuesSource, bucketCountThresholds, includeExclude, aggregationContext, parent, termsAggregatorFactory, metaData);
68+
final IncludeExclude.StringFilter filter = includeExclude == null ? null : includeExclude.convertToStringFilter();
69+
return new SignificantStringTermsAggregator(name, factories, valuesSource, bucketCountThresholds, filter, aggregationContext, parent, termsAggregatorFactory, metaData);
6970
}
7071

7172
},
@@ -77,7 +78,8 @@ Aggregator create(String name, AggregatorFactories factories, ValuesSource value
7778
AggregationContext aggregationContext, Aggregator parent, SignificantTermsAggregatorFactory termsAggregatorFactory, Map<String, Object> metaData) throws IOException {
7879
ValuesSource.Bytes.WithOrdinals valueSourceWithOrdinals = (ValuesSource.Bytes.WithOrdinals) valuesSource;
7980
IndexSearcher indexSearcher = aggregationContext.searchContext().searcher();
80-
return new GlobalOrdinalsSignificantTermsAggregator(name, factories, (ValuesSource.Bytes.WithOrdinals.FieldData) valuesSource, bucketCountThresholds, includeExclude, aggregationContext, parent, termsAggregatorFactory, metaData);
81+
final IncludeExclude.OrdinalsFilter filter = includeExclude == null ? null : includeExclude.convertToOrdinalsFilter();
82+
return new GlobalOrdinalsSignificantTermsAggregator(name, factories, (ValuesSource.Bytes.WithOrdinals.FieldData) valuesSource, bucketCountThresholds, filter, aggregationContext, parent, termsAggregatorFactory, metaData);
8183
}
8284

8385
},
@@ -87,7 +89,8 @@ Aggregator create(String name, AggregatorFactories factories, ValuesSource value
8789
Aggregator create(String name, AggregatorFactories factories, ValuesSource valuesSource,
8890
TermsAggregator.BucketCountThresholds bucketCountThresholds, IncludeExclude includeExclude,
8991
AggregationContext aggregationContext, Aggregator parent, SignificantTermsAggregatorFactory termsAggregatorFactory, Map<String, Object> metaData) throws IOException {
90-
return new GlobalOrdinalsSignificantTermsAggregator.WithHash(name, factories, (ValuesSource.Bytes.WithOrdinals.FieldData) valuesSource, bucketCountThresholds, includeExclude, aggregationContext, parent, termsAggregatorFactory, metaData);
92+
final IncludeExclude.OrdinalsFilter filter = includeExclude == null ? null : includeExclude.convertToOrdinalsFilter();
93+
return new GlobalOrdinalsSignificantTermsAggregator.WithHash(name, factories, (ValuesSource.Bytes.WithOrdinals.FieldData) valuesSource, bucketCountThresholds, filter, aggregationContext, parent, termsAggregatorFactory, metaData);
9194
}
9295
};
9396

src/main/java/org/elasticsearch/search/aggregations/bucket/significant/SignificantTermsParser.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ public AggregatorFactory parse(String aggregationName, XContentParser parser, Se
5757
.scriptable(false)
5858
.formattable(true)
5959
.build();
60-
IncludeExclude.Parser incExcParser = new IncludeExclude.Parser(aggregationName, SignificantStringTerms.TYPE, context);
60+
IncludeExclude.Parser incExcParser = new IncludeExclude.Parser();
6161
aggParser.parse(aggregationName, parser, context, vsParser, incExcParser);
6262

6363
TermsAggregator.BucketCountThresholds bucketCountThresholds = aggParser.getBucketCountThresholds();

src/main/java/org/elasticsearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@
5757
public class GlobalOrdinalsStringTermsAggregator extends AbstractStringTermsAggregator {
5858

5959
protected final ValuesSource.Bytes.WithOrdinals.FieldData valuesSource;
60-
protected final IncludeExclude includeExclude;
60+
protected final IncludeExclude.OrdinalsFilter includeExclude;
6161

6262
// TODO: cache the acceptedglobalValues per aggregation definition.
6363
// We can't cache this yet in ValuesSource, since ValuesSource is reused per field for aggs during the execution.
@@ -71,7 +71,7 @@ public class GlobalOrdinalsStringTermsAggregator extends AbstractStringTermsAggr
7171

7272
public GlobalOrdinalsStringTermsAggregator(String name, AggregatorFactories factories, ValuesSource.Bytes.WithOrdinals.FieldData valuesSource,
7373
Terms.Order order, BucketCountThresholds bucketCountThresholds,
74-
IncludeExclude includeExclude, AggregationContext aggregationContext, Aggregator parent, SubAggCollectionMode collectionMode, boolean showTermDocCountError, Map<String, Object> metaData) throws IOException {
74+
IncludeExclude.OrdinalsFilter includeExclude, AggregationContext aggregationContext, Aggregator parent, SubAggCollectionMode collectionMode, boolean showTermDocCountError, Map<String, Object> metaData) throws IOException {
7575
super(name, factories, aggregationContext, parent, order, bucketCountThresholds, collectionMode, showTermDocCountError, metaData);
7676
this.valuesSource = valuesSource;
7777
this.includeExclude = includeExclude;
@@ -260,7 +260,7 @@ public static class WithHash extends GlobalOrdinalsStringTermsAggregator {
260260
private final LongHash bucketOrds;
261261

262262
public WithHash(String name, AggregatorFactories factories, ValuesSource.Bytes.WithOrdinals.FieldData valuesSource,
263-
Terms.Order order, BucketCountThresholds bucketCountThresholds, IncludeExclude includeExclude, AggregationContext aggregationContext,
263+
Terms.Order order, BucketCountThresholds bucketCountThresholds, IncludeExclude.OrdinalsFilter includeExclude, AggregationContext aggregationContext,
264264
Aggregator parent, SubAggCollectionMode collectionMode, boolean showTermDocCountError, Map<String, Object> metaData) throws IOException {
265265
super(name, factories, valuesSource, order, bucketCountThresholds, includeExclude, aggregationContext, parent, collectionMode, showTermDocCountError, metaData);
266266
bucketOrds = new LongHash(1, aggregationContext.bigArrays());

src/main/java/org/elasticsearch/search/aggregations/bucket/terms/StringTermsAggregator.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,11 @@ public class StringTermsAggregator extends AbstractStringTermsAggregator {
4545

4646
private final ValuesSource valuesSource;
4747
protected final BytesRefHash bucketOrds;
48-
private final IncludeExclude includeExclude;
48+
private final IncludeExclude.StringFilter includeExclude;
4949

5050
public StringTermsAggregator(String name, AggregatorFactories factories, ValuesSource valuesSource,
5151
Terms.Order order, BucketCountThresholds bucketCountThresholds,
52-
IncludeExclude includeExclude, AggregationContext aggregationContext, Aggregator parent, SubAggCollectionMode collectionMode, boolean showTermDocCountError, Map<String, Object> metaData) throws IOException {
52+
IncludeExclude.StringFilter includeExclude, AggregationContext aggregationContext, Aggregator parent, SubAggCollectionMode collectionMode, boolean showTermDocCountError, Map<String, Object> metaData) throws IOException {
5353

5454
super(name, factories, aggregationContext, parent, order, bucketCountThresholds, collectionMode, showTermDocCountError, metaData);
5555
this.valuesSource = valuesSource;

src/main/java/org/elasticsearch/search/aggregations/bucket/terms/TermsAggregatorFactory.java

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,8 @@ public enum ExecutionMode {
5050
Aggregator create(String name, AggregatorFactories factories, ValuesSource valuesSource,
5151
Terms.Order order, TermsAggregator.BucketCountThresholds bucketCountThresholds, IncludeExclude includeExclude,
5252
AggregationContext aggregationContext, Aggregator parent, SubAggCollectionMode subAggCollectMode, boolean showTermDocCountError, Map<String, Object> metaData) throws IOException {
53-
return new StringTermsAggregator(name, factories, valuesSource, order, bucketCountThresholds, includeExclude, aggregationContext, parent, subAggCollectMode, showTermDocCountError, metaData);
53+
final IncludeExclude.StringFilter filter = includeExclude == null ? null : includeExclude.convertToStringFilter();
54+
return new StringTermsAggregator(name, factories, valuesSource, order, bucketCountThresholds, filter, aggregationContext, parent, subAggCollectMode, showTermDocCountError, metaData);
5455
}
5556

5657
@Override
@@ -65,7 +66,8 @@ boolean needsGlobalOrdinals() {
6566
Aggregator create(String name, AggregatorFactories factories, ValuesSource valuesSource,
6667
Terms.Order order, TermsAggregator.BucketCountThresholds bucketCountThresholds, IncludeExclude includeExclude,
6768
AggregationContext aggregationContext, Aggregator parent, SubAggCollectionMode subAggCollectMode, boolean showTermDocCountError, Map<String, Object> metaData) throws IOException {
68-
return new GlobalOrdinalsStringTermsAggregator(name, factories, (ValuesSource.Bytes.WithOrdinals.FieldData) valuesSource, order, bucketCountThresholds, includeExclude, aggregationContext, parent, subAggCollectMode, showTermDocCountError, metaData);
69+
final IncludeExclude.OrdinalsFilter filter = includeExclude == null ? null : includeExclude.convertToOrdinalsFilter();
70+
return new GlobalOrdinalsStringTermsAggregator(name, factories, (ValuesSource.Bytes.WithOrdinals.FieldData) valuesSource, order, bucketCountThresholds, filter, aggregationContext, parent, subAggCollectMode, showTermDocCountError, metaData);
6971
}
7072

7173
@Override
@@ -80,7 +82,8 @@ boolean needsGlobalOrdinals() {
8082
Aggregator create(String name, AggregatorFactories factories, ValuesSource valuesSource,
8183
Terms.Order order, TermsAggregator.BucketCountThresholds bucketCountThresholds, IncludeExclude includeExclude,
8284
AggregationContext aggregationContext, Aggregator parent, SubAggCollectionMode subAggCollectMode, boolean showTermDocCountError, Map<String, Object> metaData) throws IOException {
83-
return new GlobalOrdinalsStringTermsAggregator.WithHash(name, factories, (ValuesSource.Bytes.WithOrdinals.FieldData) valuesSource, order, bucketCountThresholds, includeExclude, aggregationContext, parent, subAggCollectMode, showTermDocCountError, metaData);
85+
final IncludeExclude.OrdinalsFilter filter = includeExclude == null ? null : includeExclude.convertToOrdinalsFilter();
86+
return new GlobalOrdinalsStringTermsAggregator.WithHash(name, factories, (ValuesSource.Bytes.WithOrdinals.FieldData) valuesSource, order, bucketCountThresholds, filter, aggregationContext, parent, subAggCollectMode, showTermDocCountError, metaData);
8487
}
8588

8689
@Override

0 commit comments

Comments
 (0)