Add intervals query by romseygeek · Pull Request #36135 · elastic/elasticsearch

romseygeek · 2018-12-01T18:02:44Z

This commit exposes the lucene intervals query in elasticsearch, as a replacement for the
Span query family. Instead of building query structures up from individual terms, the intervals query
uses a tree of sources, the leaves of which are formed of match-type queries that are passed
through analysis. These can then be combined using or, combine or relate sources to test
for proximity or position.

Replaces #32406.

Closes #29636

elasticmachine · 2018-12-01T18:02:45Z

Pinging @elastic/es-search

jimczi

The change looks good overall @romseygeek , I left some comments.

jimczi · 2018-12-03T08:42:18Z

rest-api-spec/src/main/resources/rest-api-spec/test/search/230_interval_query.yml

@@ -0,0 +1,58 @@
+setup:
+  - skip:


Is this working ? I was not aware that the skip section can be set in the setup part but if it's working... ;)

jimczi · 2018-12-03T08:46:50Z

server/src/main/java/org/elasticsearch/index/query/IntervalBuilder.java

+            if (posAtt.getPositionIncrement() == 1) {
+                if (synonyms.size() == 1) {
+                    terms.add(synonyms.get(0));
+                }


nit: can you add an else to desambiguate ?

jimczi · 2018-12-03T08:48:14Z

server/src/main/java/org/elasticsearch/index/query/IntervalQueryBuilder.java

+import java.util.Objects;
+
+import static org.elasticsearch.common.xcontent.ConstructingObjectParser.constructorArg;
+


Can you add javadocs to explain the kind of queries that this builder handles ?

jimczi · 2018-12-03T08:50:02Z

server/src/main/java/org/elasticsearch/index/query/IntervalsSourceProvider.java

+
+import static org.elasticsearch.common.xcontent.ConstructingObjectParser.constructorArg;
+import static org.elasticsearch.common.xcontent.ConstructingObjectParser.optionalConstructorArg;
+


Same here, this is the user API so I'd expect some explanations regarding the different options ?

server/src/main/java/org/elasticsearch/search/SearchModule.java

romseygeek · 2018-12-05T14:10:52Z

I've added some better YAML tests and some basic documentation in the query DSL reference. I'm still a bit unsure of the DSL interface and some of the terms I'm using (eg source for something that generates intervals) but I think it's a good start.

cc @clintongormley

mayya-sharipova · 2018-12-05T15:06:28Z

rest

+
+./gradlew :distribution:archives:integ-test-zip:integTest \
+	-Dtests.class="org.elasticsearch.test.rest.*Yaml*IT" \
+	-Dtests.method="test {p0=$1}"


I wonder what is the role of this file?

It makes running rest tests easier, but I didn't mean to commit it :) Will remove

romseygeek · 2018-12-11T17:56:15Z

After conferring with @clintongormley I've reworked the API somewhat. My one reservation is the name of the query, which is still intervals, which I think is still too based on the internal implementation of these queries. Something like proximity_match might be better? It links it to the other match_* queries. I'd also like to rename the internal intervals field names to something like rules.

jpountz

This is very clean overall. I'm curious why you decided to support a filter element in every intervals source instead of eg. having a filter intervals source?

Regarding the name, I don't dislike "intervals" as this is their name in the paper that introduced them. If we give them a more general name, I'm afraid that users would be even more surprised eg. by the behavior of any_of as it only returns minimal intervals?

jpountz · 2018-12-12T13:17:00Z

docs/reference/query-dsl/intervals-query.asciidoc

+favourite food is porridge` would not match, because the interval matching
+`cold porridge` starts before the interval matching `my favourite food`.
+
+[[intervals-match]]


Huge +1 to exposing a match and no term

jpountz · 2018-12-12T13:19:16Z

server/src/main/java/org/elasticsearch/index/mapper/TextFieldMapper.java

+        @Override
+        public IntervalsSource intervals(String text, int maxGaps, boolean ordered, NamedAnalyzer analyzer) throws IOException {
+            if (indexOptions().compareTo(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS) < 0) {
+                throw new IllegalArgumentException("Cannot create source against field [" + name() + "] with no positions indexed");


maybe say "intervals source" instead of just source?

jpountz · 2018-12-12T13:20:10Z

server/src/main/java/org/elasticsearch/index/query/IntervalBuilder.java

+/**
+ * Constructs an IntervalsSource based on analyzed text
+ */
+public class IntervalBuilder {


Can we make it pkg-private?

It's called from TextFieldMapper which is in a different package, so unfortunately we can't.

should we move it to the same package to keep visibility to a minimum?

It's also called from IntervalsSourceProvider, which doesn't really fit in the mapper package. I think it has to stay public...

jpountz · 2018-12-12T13:22:14Z

server/src/main/java/org/elasticsearch/index/query/IntervalQueryBuilder.java

+                default:
+                    provider = IntervalsSourceProvider.fromXContent(parser);
+
+            }


it seems that this won't fail if multiple providers are provided?

Have updated

jpountz · 2018-12-12T13:23:28Z

server/src/main/java/org/elasticsearch/index/query/IntervalQueryBuilder.java

+        MappedFieldType fieldType = context.fieldMapper(field);
+        if (fieldType == null) {
+            throw new IllegalArgumentException("Cannot create IntervalQuery over non-existent field [" + field + "]");
+        }


in general we are lenient with unmapped fields because of cross-index search, should we be lenient here too?

Have updated

jpountz · 2018-12-12T13:31:32Z

docs/reference/query-dsl/intervals-query.asciidoc

+The `or` rule will match any of its nested sub-rules.
+
+[horizontal]
+`intervals`::


Maybe we need to add documentation here about the fact that this only returns minimal intervals and consequences?

I've added a section on minimization at the end of the doc, with some examples of queries that can produce surprising results, and how to deal with that.

romseygeek · 2018-12-13T10:51:26Z

I've pushed some changes addressing your comments @jpountz. For the filtering, it seemed to make more sense to add it as an option on to each rule, because the intervals produced by a filter all belong to the rule being filtered, and Clint and I thought that this was the best way of making that obvious.

romseygeek · 2018-12-13T12:07:53Z

@elasticmachine retest this please

romseygeek · 2018-12-13T13:32:59Z

@elasticmachine retest this please

jpountz

Thanks @romseygeek. I thought having some sort of filtering intervals would be more consistent with how we deal with queries but I don't feel too strongly about it either.

LGTM

jpountz · 2018-12-14T10:40:11Z

server/src/main/java/org/elasticsearch/index/query/IntervalBuilder.java

+/**
+ * Constructs an IntervalsSource based on analyzed text
+ */
+public class IntervalBuilder {


should we move it to the same package to keep visibility to a minimum?

jpountz · 2018-12-14T10:41:18Z

docs/reference/query-dsl/intervals-query.asciidoc

+The `or` rule will match any of its nested sub-rules.
+
+[horizontal]
+`intervals`::


clintongormley · 2018-12-15T12:59:59Z

Thanks @romseygeek. I thought having some sort of filtering intervals would be more consistent with how we deal with queries but I don't feel too strongly about it either.

@jpountz I didn't have strong reasons for adding filtering the way it is here. Happy to discuss if you think it makes sense doing it in a different way.

romseygeek added 14 commits July 26, 2018 14:13

Add IntervalQueryBuilder with support for match and combine intervals

6b7d175

Add relative intervals

7d9b9ef

Merge branch 'master' into interval-query

1197fdf

feedback

b0439c3

YAML test - broekn

6cb7fe8

Merge remote-tracking branch 'origin/master' into interval-query

df4d329

yaml test; begin to add block source

b0d28aa

Add block; make disjunction its own source

a8806e2

Merge remote-tracking branch 'origin/master' into interval-query

a4cecc9

WIP

8489e86

Merge remote-tracking branch 'origin/master' into interval-query

c8212f1

Extract IntervalBuilder and add tests for it

2a2244d

Fix eq/hashcode in Disjunction

6e5339d

New yaml test

52bcf1f

romseygeek added >feature :Search/Search Search-related issues that do not fall into other categories v7.0.0 labels Dec 1, 2018

romseygeek self-assigned this Dec 1, 2018

romseygeek mentioned this pull request Dec 1, 2018

Make lucene's IntervalQuery available via the Query DSL #32406

Closed

romseygeek requested a review from jimczi December 1, 2018 18:03

romseygeek added 6 commits December 1, 2018 18:19

Merge remote-tracking branch 'origin/master' into interval-query

872f913

checkstyle

6f2c73c

license headers

f044495

test fix

1377bcc

YAML format

0368133

YAML formatting again

9c2f035

jimczi reviewed Dec 3, 2018

View reviewed changes

romseygeek added 2 commits December 3, 2018 14:44

yaml tests; javadoc

7cde116

Add OR test -> requires fix from LUCENE-8586

dabdd77

romseygeek added 2 commits December 5, 2018 13:08

Merge remote-tracking branch 'origin/master' into interval-query

ba979e5

Add docs

122f192

mayya-sharipova reviewed Dec 5, 2018

View reviewed changes

romseygeek added 7 commits December 11, 2018 12:02

Re-do API

22f99b4

Merge remote-tracking branch 'origin/master' into interval-query

6de587d

Clint's API

3146c47

Delete bash script

3bf1b0d

doc fixes

2d2df63

imports

67bc11a

docs

abf75bd

test fix

0b14af3

jpountz reviewed Dec 12, 2018

View reviewed changes

romseygeek added 2 commits December 13, 2018 10:39

feedback

45bf499

Merge remote-tracking branch 'origin/master' into interval-query

6780a57

romseygeek added 2 commits December 13, 2018 10:59

comma

a33d816

docs fixes

9834a06

jpountz approved these changes Dec 14, 2018

View reviewed changes

Tidy up doc references to old rule

a754165

romseygeek merged commit 09bf93d into elastic:master Dec 14, 2018

romseygeek deleted the interval-query branch December 14, 2018 15:14

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

romseygeek mentioned this pull request Apr 5, 2019

Make IntervalQuery available via the Query DSL #29636

Closed

consulthys mentioned this pull request Nov 26, 2019

Fuzziness support in intervals query #49595

Closed

		import java.util.Objects;

		import static org.elasticsearch.common.xcontent.ConstructingObjectParser.constructorArg;

Conversation

romseygeek commented Dec 1, 2018 • edited by polyfractal Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Dec 1, 2018

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

romseygeek commented Dec 5, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

romseygeek commented Dec 11, 2018

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

romseygeek commented Dec 13, 2018

Uh oh!

romseygeek commented Dec 13, 2018

Uh oh!

romseygeek commented Dec 13, 2018

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clintongormley commented Dec 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

romseygeek commented Dec 1, 2018 •

edited by polyfractal

Loading