Skip to content

LOOKUP JOIN on DATETIME uses incorrect date parsing #128961

@craigtaverner

Description

@craigtaverner

As discussed in #127962, the code path used for LOOKUP JOIN makes use of the MappedFieldType.termQuery(Object) method. This method was designed for the Query DSL, allowing many types of alternative formatting of fields to be cleverly handled by the field type itself. For LOOKUP JOIN we always have fully parsed data in the raw internal Block types, and do not need parsing. In the case of DateTime the situation is much more extreme because that field type does a large number of clever things that are not only unnecessary, but could cause unexpected behaviour for the join use case:

  • Date arithmetic
  • Magic text like now and || is searched for and processed
  • If the value is a long or integer, the actual value has meaning, small values are interpreted as years, and larger values are interpreted as ms since the epoch
  • If the date is a formatted string ending with seconds (ie no fractional seconds), then the underlying rangeQuery is called in a way that looks for a 1s range, otherwise it looks for a 1ms range.

In the case of the support for DATE_NANOS added in #127962, we needed to bypass this mechanism, because it completely disallowed ns range checks. We decided to provide a new API onto DateFieldType:

  • equalityQuery(Long) for exact ms or ns matching (no s matching) based on the incoming Long values (read from the LongBlock). The Resolution of the DateFieldType would control the meaning of the incoming values.
  • rangeQuery(Long, Long) for range matching, with the same ms or ns rules as for the previous method. In fact, just like with the original termQuery, the equalityQuery simply calls down to the rangeQuery

This issue requires that we perform the same fix for LOOKUP JOIN on DATETIME types, to avoid any of the risks associated with the excessively clever intelligence in the original termQuery. It could be argued that this should be done for all types, but we're focusing on DateFieldType for now because that is the one with the most extreme set of edge cases.

This will also open up the door to mixed ns-ms LOOKUP JOIN. For that we simply need to decide on which is the common type, and cast appropriately. That should be done in a separate PR.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions