As discussed in #127962, the code path used for LOOKUP JOIN makes use of the MappedFieldType.termQuery(Object) method. This method was designed for the Query DSL, allowing many types of alternative formatting of fields to be cleverly handled by the field type itself. For LOOKUP JOIN we always have fully parsed data in the raw internal Block types, and do not need parsing. In the case of DateTime the situation is much more extreme because that field type does a large number of clever things that are not only unnecessary, but could cause unexpected behaviour for the join use case:
- Date arithmetic
- Magic text like
now and || is searched for and processed
- If the value is a long or integer, the actual value has meaning, small values are interpreted as years, and larger values are interpreted as ms since the epoch
- If the date is a formatted string ending with seconds (ie no fractional seconds), then the underlying
rangeQuery is called in a way that looks for a 1s range, otherwise it looks for a 1ms range.
In the case of the support for DATE_NANOS added in #127962, we needed to bypass this mechanism, because it completely disallowed ns range checks. We decided to provide a new API onto DateFieldType:
equalityQuery(Long) for exact ms or ns matching (no s matching) based on the incoming Long values (read from the LongBlock). The Resolution of the DateFieldType would control the meaning of the incoming values.
rangeQuery(Long, Long) for range matching, with the same ms or ns rules as for the previous method. In fact, just like with the original termQuery, the equalityQuery simply calls down to the rangeQuery
This issue requires that we perform the same fix for LOOKUP JOIN on DATETIME types, to avoid any of the risks associated with the excessively clever intelligence in the original termQuery. It could be argued that this should be done for all types, but we're focusing on DateFieldType for now because that is the one with the most extreme set of edge cases.
This will also open up the door to mixed ns-ms LOOKUP JOIN. For that we simply need to decide on which is the common type, and cast appropriately. That should be done in a separate PR.
As discussed in #127962, the code path used for
LOOKUP JOINmakes use of theMappedFieldType.termQuery(Object)method. This method was designed for the Query DSL, allowing many types of alternative formatting of fields to be cleverly handled by the field type itself. ForLOOKUP JOINwe always have fully parsed data in the raw internal Block types, and do not need parsing. In the case ofDateTimethe situation is much more extreme because that field type does a large number of clever things that are not only unnecessary, but could cause unexpected behaviour for the join use case:nowand||is searched for and processedrangeQueryis called in a way that looks for a1srange, otherwise it looks for a1msrange.In the case of the support for DATE_NANOS added in #127962, we needed to bypass this mechanism, because it completely disallowed
nsrange checks. We decided to provide a new API ontoDateFieldType:equalityQuery(Long)for exactmsornsmatching (nosmatching) based on the incomingLongvalues (read from theLongBlock). TheResolutionof the DateFieldType would control the meaning of the incoming values.rangeQuery(Long, Long)for range matching, with the samemsornsrules as for the previous method. In fact, just like with the originaltermQuery, theequalityQuerysimply calls down to therangeQueryThis issue requires that we perform the same fix for
LOOKUP JOINonDATETIMEtypes, to avoid any of the risks associated with the excessively clever intelligence in the originaltermQuery. It could be argued that this should be done for all types, but we're focusing onDateFieldTypefor now because that is the one with the most extreme set of edge cases.This will also open up the door to mixed ns-ms LOOKUP JOIN. For that we simply need to decide on which is the common type, and cast appropriately. That should be done in a separate PR.