Epoch millis and second formats accept float implicitly#26119
Epoch millis and second formats accept float implicitly#26119albertzaharovits merged 4 commits intoelastic:masterfrom
Conversation
The coerce parameter is implicity true for the epoch millis DateFormater. It is not defined for other date formaters. This extends the current "coerce" from numbers to strings for all dates. See: elastic#14641
|
@cbuescher I think |
colings86
left a comment
There was a problem hiding this comment.
@albertzaharovits I left a small comment
|
|
||
| public void testThatFloatEpochsCanBeParsed() { | ||
|
|
||
| long millisFromEpoch = randomNonNegativeLong(); |
There was a problem hiding this comment.
Since dates previous to epoch 0 can still be expressed as a long and may well be encountered if the user is indexing historical data should we test negative epoch values here too?
cbuescher
left a comment
There was a problem hiding this comment.
@albertzaharovits this looks great, I left a couple of smaller comments but nothing big.
Some more things:
-
since the original issue revolves around documents not being indexed when they have float values, should we add an integration test for this? DateFieldMapperTests already more or less checks that, but I think it would be good to have one round-trip test here as well
-
Not sure if we also want to parse floats with
,as decimal separator, but maybe thats overkill. Any opinions on this @colings86 -
As I understand this change now makes
"coerce" : falsenot reject any Strings any more forepoch_millisandepoch_seconds. I just want to double check that this is okay, maybe we can document this somewhere?
| public int parseInto(DateTimeParserBucket bucket, String text, int position) { | ||
| boolean isPositive = text.startsWith("-") == false; | ||
| boolean isTooLong = text.length() > estimateParsedLength(); | ||
| int firstDotIndex = text.indexOf((int)'.'); |
There was a problem hiding this comment.
nit: I think we don't need the int cast here, it is done implicitely. At least my IDE removes it on "save"
| int factor = hasMilliSecondPrecision ? 1 : 1000; | ||
| try { | ||
| long millis = Long.valueOf(text) * factor; | ||
| long millis = new BigDecimal(text).longValue() * factor; |
There was a problem hiding this comment.
Nice, so this can handle all kinds of formats it seems.
|
|
||
| // test floats get truncated | ||
| String epochFloatValue = String.format(Locale.US, "%d.%d", dateTime.getMillis() / (parseMilliSeconds ? 1L : 1000L), randomNonNegativeLong()); | ||
| assertThat(formatter.parser().parseDateTime(epochFloatValue).getMillis(), is(dateTime.getMillis())); |
There was a problem hiding this comment.
I'm not sure if we also should support european decimal separators like ,? Maybe not, but just wanted to throw it in. I'm not sure if this complicates things too much.
There was a problem hiding this comment.
I think we should accept numbers as defined by javascript/JSON schema, and not formatted string representation of numbers. The issue is that we didn't accept the numbers from javascript which does not have an integer datatype. Accepting string representation of valid javascript numbers is the bonus part since all dates are parsed as strings anyway.
| } | ||
|
|
||
| // test floats get truncated | ||
| String epochFloatValue = String.format(Locale.US, "%d.%d", dateTime.getMillis() / (parseMilliSeconds ? 1L : 1000L), randomNonNegativeLong()); |
There was a problem hiding this comment.
Should this be a negative long in this test?
There was a problem hiding this comment.
It is negative since dateTime.getMillis() is negative, the non negative long is for the fractional part.
| assertEquals(mapping, mapper.mappingSource().toString()); | ||
|
|
||
| long millisFromEpoch = randomNonNegativeLong(); | ||
| String epochFloatValue = String.format(Locale.US, "%d.%d", millisFromEpoch, randomNonNegativeLong()); |
There was a problem hiding this comment.
Maybe also randomly append a negative prefix to also test parsing negative values here?
|
|
||
| // date_range ignores the coerce parameter and epoch_millis date format truncates floats (see issue: #14641) | ||
| if (type.equals("date_range")) { | ||
| return; |
There was a problem hiding this comment.
nit: maybe just personal preference, but early returns in test look strange to me. Can you change this to execute the rest of the test only for type.equals("date_range") == false)
| assertThat(searchResponse.getHits().getTotalHits(), equalTo(3L)); | ||
| buckets = checkBuckets(searchResponse.getAggregations().get("date_range"), "date_range", 2); | ||
| assertBucket(buckets.get(0), 2L, "1000-3000", 1000000L, 3000000L); | ||
| assertBucket(buckets.get(1), 1L, "3000-4000", 3000000L, 4000000L); |
|
@albertzaharovits thanks, those recent changes look good to me. That leaves the question about whether we should document the changed behaviour around |
|
@cbuescher I am not sure what you mean by:
|
Thanks, thats what I was missing. So the existing behaviour is that |
|
That's correct, |
There was a problem hiding this comment.
Thanks, LGTM. I think CI might be still failing because of unrelated problems, had the same yesterday. Maybe rebasing or merging in master helps to get a clean build. Also I don't know if you want to wait for @colings86 to have another look, I'm good.
Thanks a lot for this change.
* master: (30 commits) Rewrite range queries with open bounds to exists query (elastic#26160) Fix eclipse compilation problem (elastic#26170) Epoch millis and second formats parse float implicitly (Closes elastic#14641) (elastic#26119) fix SplitProcessor targetField test (elastic#26178) Fixed typo in README.textile (elastic#26168) Fix incorrect class name in deleteByQuery docs (elastic#26151) Move more token filters to analysis-common module reindex: automatically choose the number of slices (elastic#26030) Fix serialization of the `_all` field. (elastic#26143) percolator: Hint what clauses are important in a conjunction query based on fields Remove unused Netty-related settings (elastic#26161) Remove SimpleQueryStringIT#testPhraseQueryOnFieldWithNoPositions. Tests: reenable ShardReduceIT#testIpRange. Allow `ClusterState.Custom` to be created on initial cluster states (elastic#26144) Teach the build about betas and rcs (elastic#26066) Fix wrong header level inner hits: Unfiltered nested source should keep its full path Document how to import Lucene Snapshot libs when elasticsearch clients (elastic#26113) Use `global_ordinals_hash` execution mode when sorting by sub aggregations. (elastic#26014) Make the README use a single type in examples. (elastic#26098) ...
All floats parsed by epoch_millis/second date formatter get truncated, either as strings or as numbers - coerce behavior. This builds on the existing behavior of parsing all dates to strings.
In this way there is no 'coerce parameter' for the DateFieldMapper. 'Coerce parameter' remains valid only for numeric data types.
The coerce behavior is implicitly enabled for a specific Formatter only, i.e. epoch_*.
A coerce parameter at the DateFieldMapper level cannot be defined irrespective of the date format because of conflicts, e.g. basic_time and epoch_second as float.
Closes: #14641