Make lucene's IntervalQuery available via the Query DSL#32406
Make lucene's IntervalQuery available via the Query DSL#32406romseygeek wants to merge 4 commits intoelastic:masterfrom romseygeek:interval-query
Conversation
|
This is still something of a work-in-progress, in that it needs Yaml tests and docs to be added, but I wanted to put up a PR to make sure that I'm heading in the right direction with this. |
|
Pinging @elastic/es-search-aggs |
jimczi
left a comment
There was a problem hiding this comment.
It looks good @romseygeek ! I left some comments.
| public TokenStream tokenize(String field, String text) { | ||
| throw new IllegalArgumentException("Can only tokenize text on text fields - not on [" + name + "] which is of type [" + typeName() + "]"); | ||
| } | ||
|
|
There was a problem hiding this comment.
Why do you need this ? We could check the index_options and fail match parsing if the field is not indexed with positions or check tokenized method on the field type ?
| MappedFieldType fieldType = context.fieldMapper(field); | ||
| if (fieldType == null) { | ||
| throw new IllegalArgumentException("Cannot create IntervalQuery over non-existent field [" + field + "]"); | ||
| } |
There was a problem hiding this comment.
Should we check if the field is indexed with positions and fail the query if not ?
| /** | ||
| * The new {@link IntervalsSourceProvider}s added by this plugin | ||
| */ | ||
| default List<IntervalSpec<?>> getIntervalsSourceProviders() { |
There was a problem hiding this comment.
Should we provide a scripted intervals source first, I have the feeling that it will cover most of the use cases and that writing a plugin should be for rare cases where the scripted intervals is not enough ?
There was a problem hiding this comment.
Please keep this. I will use this right out the gate for functionality I know will never be allowed to be contributed due to the potential performance risk.
There was a problem hiding this comment.
I'm guessing Matt wants this for wildcard support? Which wouldn't be covered by a script filter. I don't think we lose anything by making this pluggable.
There was a problem hiding this comment.
@romseygeek yes that is my immediate use-case assuming we don't get a wildcard interval into lucene.
There was a problem hiding this comment.
I don't think we lose anything by making this pluggable.
Sure, although this is really an expert use case that requires to know the internals of how minimum interval works. For wildcard we could have something in Lucene (assuming we limit the expansion to the max boolean clause limit) that would be complementary of the index_prefixes option on the text field ?
|
|
||
| @Override | ||
| protected IntervalQueryBuilder doCreateTestQueryBuilder() { | ||
| assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0); |
There was a problem hiding this comment.
You don't need this anymore, this method always run with a single registered type.
|
|
||
| public void testMatchInterval() throws IOException { | ||
|
|
||
| assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0); |
|
|
||
| public void testCombineInterval() throws IOException { | ||
|
|
||
| assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0); |
| IntervalsSourceProvider match2 = new IntervalsSourceProvider.Match("floo", Integer.MAX_VALUE, true); | ||
| IntervalsSourceProvider combi | ||
| = new IntervalsSourceProvider.Combine(Arrays.asList(match1, match2), IntervalsSourceProvider.Combine.Type.ORDERED, 30); | ||
| return new IntervalQueryBuilder(STRING_FIELD_NAME, combi); |
There was a problem hiding this comment.
Can you add some randomization ? We run this method multiple times and then perform some checks on the generated query (serialization, correctness, ...).
|
|
||
| private final List<IntervalsSourceProvider> subSources; | ||
| private final Type type; | ||
| private final int maxWidth; |
There was a problem hiding this comment.
Should we extract this option to have a dedicated source for max_width ? Or could it be handled by a scripted intervals source ?
There was a problem hiding this comment.
I was originally going to do this as a separate source, but it only really makes sense on ordered/unordered combinations, so I think it's useful to have it here?
There was a problem hiding this comment.
I'd prefer a dedicated source but I can live with it. I agree that it makes sense on ordered/unordered sources but it can also be useful for others so we could also have a dedicated source and an option in the match source ?
| public static final String NAME = "match"; | ||
|
|
||
| private final String text; | ||
| private final int maxWidth; |
There was a problem hiding this comment.
Should we extract this option to have a dedicated source for max_width ? Or could it be handled by a scripted intervals source ?
|
Closed in favour of #36135 |
Relates to #29636