LOOKUP JOIN on DATETIME uses incorrect date parsing

As discussed in https://github.com/elastic/elasticsearch/pull/127962, the code path used for `LOOKUP JOIN` makes use of the `MappedFieldType.termQuery(Object)` method. This method was designed for the Query DSL, allowing many types of alternative formatting of fields to be cleverly handled by the field type itself. For `LOOKUP JOIN` we always have fully parsed data in the raw internal Block types, and do not need parsing. In the case of `DateTime` the situation is much more extreme because that field type does a large number of clever things that are not only unnecessary, but could cause unexpected behaviour for the join use case:
 
* Date arithmetic
* Magic text like `now` and `||` is searched for and processed
* If the value is a long or integer, the actual value has meaning, small values are interpreted as years, and larger values are interpreted as ms since the epoch
* If the date is a formatted string ending with seconds (ie no fractional seconds), then the underlying `rangeQuery` is called in a way that looks for a `1s` range, otherwise it looks for a `1ms` range.

In the case of the support for DATE_NANOS added in https://github.com/elastic/elasticsearch/pull/127962, we needed to bypass this mechanism, because it completely disallowed `ns` range checks. We decided to provide a new API onto `DateFieldType`:
* `equalityQuery(Long)` for exact `ms` or `ns` matching (no `s` matching) based on the incoming `Long` values (read from the `LongBlock`). The `Resolution` of the DateFieldType would control the meaning of the incoming values.
* `rangeQuery(Long, Long)` for range matching, with the same `ms` or `ns` rules as for the previous method. In fact, just like with the original `termQuery`, the `equalityQuery` simply calls down to the `rangeQuery`

This issue requires that we perform the same fix for `LOOKUP JOIN` on `DATETIME` types, to avoid any of the risks associated with the excessively clever intelligence in the original `termQuery`. It could be argued that this should be done for all types, but we're focusing on `DateFieldType` for now because that is the one with the most extreme set of edge cases.

This will also open up the door to mixed ns-ms LOOKUP JOIN. For that we simply need to decide on which is the common type, and cast appropriately. That should be done in a separate PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LOOKUP JOIN on DATETIME uses incorrect date parsing #128961

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

LOOKUP JOIN on DATETIME uses incorrect date parsing #128961

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions