Epoch millis and second formats accept float implicitly by albertzaharovits · Pull Request #26119 · elastic/elasticsearch

albertzaharovits · 2017-08-09T15:17:51Z

All floats parsed by epoch_millis/second date formatter get truncated, either as strings or as numbers - coerce behavior. This builds on the existing behavior of parsing all dates to strings.
In this way there is no 'coerce parameter' for the DateFieldMapper. 'Coerce parameter' remains valid only for numeric data types.
The coerce behavior is implicitly enabled for a specific Formatter only, i.e. epoch_*.
A coerce parameter at the DateFieldMapper level cannot be defined irrespective of the date format because of conflicts, e.g. basic_time and epoch_second as float.

Closes: #14641

The coerce parameter is implicity true for the epoch millis DateFormater. It is not defined for other date formaters. This extends the current "coerce" from numbers to strings for all dates. See: elastic#14641

albertzaharovits · 2017-08-09T15:22:29Z

@cbuescher I think from/to as double from org.elasticsearch.search.aggregations.bucket.range.RangeAggregator can be removed, since all dates can now be parsed as strings, plus this avoids the conversion from double to long. What do you think?

colings86

@albertzaharovits I left a small comment

colings86 · 2017-08-10T06:27:57Z

core/src/test/java/org/elasticsearch/deps/joda/SimpleJodaTests.java


+    public void testThatFloatEpochsCanBeParsed() {
+
+        long millisFromEpoch = randomNonNegativeLong();


Since dates previous to epoch 0 can still be expressed as a long and may well be encountered if the user is indexing historical data should we test negative epoch values here too?

cbuescher

@albertzaharovits this looks great, I left a couple of smaller comments but nothing big.

Some more things:

since the original issue revolves around documents not being indexed when they have float values, should we add an integration test for this? DateFieldMapperTests already more or less checks that, but I think it would be good to have one round-trip test here as well
Not sure if we also want to parse floats with , as decimal separator, but maybe thats overkill. Any opinions on this @colings86
As I understand this change now makes "coerce" : false not reject any Strings any more for epoch_millis and epoch_seconds. I just want to double check that this is okay, maybe we can document this somewhere?

cbuescher · 2017-08-10T09:24:59Z

core/src/main/java/org/elasticsearch/common/joda/Joda.java

        public int parseInto(DateTimeParserBucket bucket, String text, int position) {
            boolean isPositive = text.startsWith("-") == false;
-            boolean isTooLong = text.length() > estimateParsedLength();
+            int firstDotIndex = text.indexOf((int)'.');


nit: I think we don't need the int cast here, it is done implicitely. At least my IDE removes it on "save"

cbuescher · 2017-08-10T09:43:39Z

core/src/main/java/org/elasticsearch/common/joda/Joda.java

            int factor = hasMilliSecondPrecision ? 1 : 1000;
            try {
-                long millis = Long.valueOf(text) * factor;
+                long millis = new BigDecimal(text).longValue() * factor;


Nice, so this can handle all kinds of formats it seems.

cbuescher · 2017-08-10T09:54:17Z

core/src/test/java/org/elasticsearch/deps/joda/SimpleJodaTests.java

+
+        // test floats get truncated
+        String epochFloatValue = String.format(Locale.US, "%d.%d", dateTime.getMillis() / (parseMilliSeconds ? 1L : 1000L), randomNonNegativeLong());
+        assertThat(formatter.parser().parseDateTime(epochFloatValue).getMillis(), is(dateTime.getMillis()));


I'm not sure if we also should support european decimal separators like ,? Maybe not, but just wanted to throw it in. I'm not sure if this complicates things too much.

I think we should accept numbers as defined by javascript/JSON schema, and not formatted string representation of numbers. The issue is that we didn't accept the numbers from javascript which does not have an integer datatype. Accepting string representation of valid javascript numbers is the bonus part since all dates are parsed as strings anyway.

cbuescher · 2017-08-10T10:01:52Z

core/src/test/java/org/elasticsearch/deps/joda/SimpleJodaTests.java

        }

+        // test floats get truncated
+        String epochFloatValue = String.format(Locale.US, "%d.%d", dateTime.getMillis() / (parseMilliSeconds ? 1L : 1000L), randomNonNegativeLong());


Should this be a negative long in this test?

It is negative since dateTime.getMillis() is negative, the non negative long is for the fractional part.

I get it now, thanks.

cbuescher · 2017-08-10T10:07:11Z

core/src/test/java/org/elasticsearch/index/mapper/DateFieldMapperTests.java

+        assertEquals(mapping, mapper.mappingSource().toString());
+
+        long millisFromEpoch = randomNonNegativeLong();
+        String epochFloatValue = String.format(Locale.US, "%d.%d", millisFromEpoch, randomNonNegativeLong());


Maybe also randomly append a negative prefix to also test parsing negative values here?

cbuescher · 2017-08-10T10:13:06Z

core/src/test/java/org/elasticsearch/index/mapper/RangeFieldMapperTests.java


+        // date_range ignores the coerce parameter and epoch_millis date format truncates floats (see issue: #14641)
+        if (type.equals("date_range")) {
+            return;


nit: maybe just personal preference, but early returns in test look strange to me. Can you change this to execute the rest of the test only for type.equals("date_range") == false)

cbuescher · 2017-08-10T10:13:19Z

core/src/test/java/org/elasticsearch/search/aggregations/bucket/DateRangeIT.java

+        assertThat(searchResponse.getHits().getTotalHits(), equalTo(3L));
+        buckets = checkBuckets(searchResponse.getAggregations().get("date_range"), "date_range", 2);
+        assertBucket(buckets.get(0), 2L, "1000-3000", 1000000L, 3000000L);
+        assertBucket(buckets.get(1), 1L, "3000-4000", 3000000L, 4000000L);


Great this works

cbuescher · 2017-08-10T13:56:49Z

@albertzaharovits thanks, those recent changes look good to me. That leaves the question about whether we should document the changed behaviour around "coerce" : false. Maybe @colings86 also wants to take another look at this?

albertzaharovits · 2017-08-10T14:18:17Z

@cbuescher I am not sure what you mean by:

"coerce" : false not reject any Strings any more for epoch_millis and epoch_seconds

coerce parameter is invalid for date field type and is ignored in aggregations.
Any String that is a valid number is acceptable, other strings are considered malformed.

cbuescher · 2017-08-10T14:30:20Z

coerce parameter is invalid for date field type and is ignored in aggregations

Thanks, thats what I was missing. So the existing behaviour is that coerce doesn't work with the date datatype? I wasn't really sure from reading the coerce docs.

albertzaharovits · 2017-08-10T15:08:40Z

That's correct, coerce does not work with date and this is not changing.

cbuescher

Thanks, LGTM. I think CI might be still failing because of unrelated problems, had the same yesterday. Maybe rebasing or merging in master helps to get a clean build. Also I don't know if you want to wait for @colings86 to have another look, I'm good.
Thanks a lot for this change.

colings86

LGTM

#26119) `epoch_millis` and `epoch_second` date formats truncate float values, as numbers or as strings. The `coerce` parameter is not defined for `date` field type and this is not changing. See PR #26119 Closes #14641

* master: (30 commits) Rewrite range queries with open bounds to exists query (elastic#26160) Fix eclipse compilation problem (elastic#26170) Epoch millis and second formats parse float implicitly (Closes elastic#14641) (elastic#26119) fix SplitProcessor targetField test (elastic#26178) Fixed typo in README.textile (elastic#26168) Fix incorrect class name in deleteByQuery docs (elastic#26151) Move more token filters to analysis-common module reindex: automatically choose the number of slices (elastic#26030) Fix serialization of the `_all` field. (elastic#26143) percolator: Hint what clauses are important in a conjunction query based on fields Remove unused Netty-related settings (elastic#26161) Remove SimpleQueryStringIT#testPhraseQueryOnFieldWithNoPositions. Tests: reenable ShardReduceIT#testIpRange. Allow `ClusterState.Custom` to be created on initial cluster states (elastic#26144) Teach the build about betas and rcs (elastic#26066) Fix wrong header level inner hits: Unfiltered nested source should keep its full path Document how to import Lucene Snapshot libs when elasticsearch clients (elastic#26113) Use `global_ordinals_hash` execution mode when sorting by sub aggregations. (elastic#26014) Make the README use a single type in examples. (elastic#26098) ...

Epoch millis and second formats accept float implicitly

5d3f8e5

The coerce parameter is implicity true for the epoch millis DateFormater. It is not defined for other date formaters. This extends the current "coerce" from numbers to strings for all dates. See: elastic#14641

albertzaharovits added :Core/Infra/Core Core issues without another label >enhancement review v6.0.0 v6.1.0 v7.0.0 labels Aug 9, 2017

albertzaharovits self-assigned this Aug 9, 2017

albertzaharovits requested review from cbuescher and colings86 August 9, 2017 15:17

colings86 reviewed Aug 10, 2017

View reviewed changes

Better float timestamp truncation tests

daac500

cbuescher reviewed Aug 10, 2017

View reviewed changes

Date Field Mapper and Integ tests improvments

5d8229f

albertzaharovits changed the title ~~Epoch millis and second formats accept float implicitly~~ Epoch millis and second formats accept float implicitly (Closes #14641) Aug 10, 2017

cbuescher approved these changes Aug 10, 2017

View reviewed changes

colings86 approved these changes Aug 11, 2017

View reviewed changes

Merge branch 'master' into fix/14641

8c17da4

albertzaharovits merged commit 3e3132f into elastic:master Aug 13, 2017

clintongormley changed the title ~~Epoch millis and second formats accept float implicitly (Closes #14641)~~ Epoch millis and second formats accept float implicitly Aug 17, 2017

colings86 added 6.0.0-beta2 and removed v6.0.0 labels Aug 24, 2017

colings86 added v6.0.0-beta2 and removed 6.0.0-beta2 labels Aug 24, 2017

lcawl removed the v6.1.0 label Dec 12, 2017

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019


		public void testThatFloatEpochsCanBeParsed() {

		long millisFromEpoch = randomNonNegativeLong();

Conversation

albertzaharovits commented Aug 9, 2017

Uh oh!

albertzaharovits commented Aug 9, 2017

Uh oh!

colings86 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbuescher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbuescher commented Aug 10, 2017

Uh oh!

albertzaharovits commented Aug 10, 2017

Uh oh!

cbuescher commented Aug 10, 2017

Uh oh!

albertzaharovits commented Aug 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbuescher left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

colings86 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

albertzaharovits commented Aug 10, 2017 •

edited

Loading

cbuescher left a comment •

edited

Loading