Describe the feature: Bug in Dynamic field mapping of floating point numbers with numeric_detection enabled.
Elasticsearch version (bin/elasticsearch --version): 6.2.2
Plugins installed: []
JVM version (java -version): 1.8.0_66
OS version (uname -a if on a Unix-like system): RHEL6
Description of the problem including expected versus actual behavior:
According to the documentation on Dynamic field mapping, when numeric_detection is enabled passing a floating point number as a string will map the field to a double.
PUT my_index/_doc/1
{
"my_float": "1.0",
"my_integer": "1"
}
The my_float field is added as a double field.
However, this actually results in the field being mapped as a float.
In addition, the documentation on Dynamic templates indicates that only the following datatypes can be dynamically mapped using match_mapping_type:
Only the following datatypes can be automatically detected: boolean, date, double, long, object, string.
Therefore, based on the absence of the float datatype in that list and the fact that the floating point numbers are being dynamically mapped with a float datatype, it would seem that dynamically mapped floating point numbers cannot be mapped to a double datatype using match_mapping_type.
There is a workaround using match_mapping_type. I verified dynamic templates will not allow a value of float for the match_mapping_type, but I found that using a dynamic template with a match_mapping_type of double will map the fields that would have been dynamically mapped as float to double. See Steps to reproduce for an example of the workaround.
*DISCLAIMER The following is a shameless plug for 64-bit Unsigned Integer support in Elasticsearch
The dynamic mapping to double is needed as there is no support for 64-bit unsigned integers in Elasticsearch, whereas the system publishing the data (a custom Elastic Beat written in Go) does support 64-bit unsigned integers. The Beat was coded to workaround the limitations of Elasticsearch by publishing the 64-bit unsigned integer values as strings with a trailing '.0', so they are dynamically mapped as doubles. Otherwise, calculations, including those done in aggregations, in Elasticsearch were generating inaccurate results (even with the understanding that certain precision loss would happen) due to overflows (precision loss is one thing, getting a negative result when it is not mathematically possible is another). I've only been working with the stack for a short time, but I appreciate the complexity, capabilities and power of it. However, it feels "hackish" to have to treat 64-bit unsigned integers (not an uncommon thing) as floating point numbers masquerading as strings to be able to dynamically map and store them in a way that can efficiently be used in calculations and aggregations.
Steps to reproduce:
- Turn on numeric detection
PUT /_template/my_index_template
{
"index_patterns": ["my_index"],
"mappings": {
"doc": {
"numeric_detection": true
}
}
}
- Add document
POST my_index/doc/
{
"start": "1527613753042816000.0",
"end": "1527613753110000128.0",
"iterations": "100"
}
- Get mapping
start and end fields are dynamically mapped as float
{
"my_index": {
"mappings": {
"doc": {
"numeric_detection": true,
"properties": {
"end": {
"type": "float"
},
"iterations": {
"type": "long"
},
"start": {
"type": "float"
}
}
}
}
}
}
WORKAROUND
- Delete index
- Replace template [with one that maps
double to double]
PUT /_template/my_index_template
{
"index_patterns": ["my_index"],
"mappings": {
"doc": {
"numeric_detection": true,
"dynamic_templates": [
{
"not_so_double_to_double": {
"match_mapping_type": "double",
"mapping": {
"type": "double"
}
}
}
]
}
}
}
- Add [same] document
POST my_index/doc/
{
"start": "1527613753042816000.0",
"end": "1527613753110000128.0",
"iterations": "100"
}
- Get the mapping
start and end fields are dynamically mapped as double
{
"my_index": {
"mappings": {
"doc": {
"dynamic_templates": [
{
"not_so_double_to_double": {
"match_mapping_type": "double",
"mapping": {
"type": "double"
}
}
}
],
"numeric_detection": true,
"properties": {
"end": {
"type": "double"
},
"iterations": {
"type": "long"
},
"start": {
"type": "double"
}
}
}
}
}
}
EPILOGUE
Just to illustrate the impact of the mapping using the following query:
GET my_index/_search?size=0
{
"aggs": {
"avgDurationPerIteration": {
"avg": {
"script": "(doc['end'].value - doc['start'].value) / doc['iterations'].value"
}
}
}
}
- When fields are mapped as
float
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"avgDurationPerIteration": {
"value": 0
}
}
}
- When fields are mapped as
double
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"avgDurationPerIteration": {
"value": 671841.28
}
}
}
NOTE: Results are the same even when using BigDecimal to do the calculations:
GET my_index/_search?size=0
{
"aggs": {
"avgDurationPerIteration": {
"avg": {
"script": "BigDecimal.valueOf(doc['end'].value).subtract(BigDecimal.valueOf(doc['start'].value)).divide(BigDecimal.valueOf(doc['iterations'].value*1.0)).doubleValue()"
}
}
}
}
Provide logs (if relevant):
Describe the feature: Bug in Dynamic field mapping of floating point numbers with
numeric_detectionenabled.Elasticsearch version (
bin/elasticsearch --version): 6.2.2Plugins installed: []
JVM version (
java -version): 1.8.0_66OS version (
uname -aif on a Unix-like system): RHEL6Description of the problem including expected versus actual behavior:
According to the documentation on Dynamic field mapping, when
numeric_detectionis enabled passing a floating point number as astringwill map the field to adouble.However, this actually results in the field being mapped as a
float.In addition, the documentation on Dynamic templates indicates that only the following datatypes can be dynamically mapped using
match_mapping_type:Therefore, based on the absence of the float datatype in that list and the fact that the floating point numbers are being dynamically mapped with a float datatype, it would seem that dynamically mapped floating point numbers cannot be mapped to a double datatype using
match_mapping_type.There is a workaround using
match_mapping_type. I verified dynamic templates will not allow a value offloatfor thematch_mapping_type, but I found that using a dynamic template with amatch_mapping_typeofdoublewill map the fields that would have been dynamically mapped asfloattodouble. See Steps to reproduce for an example of the workaround.*DISCLAIMER The following is a shameless plug for 64-bit Unsigned Integer support in Elasticsearch
The dynamic mapping to double is needed as there is no support for 64-bit unsigned integers in Elasticsearch, whereas the system publishing the data (a custom Elastic Beat written in Go) does support 64-bit unsigned integers. The Beat was coded to workaround the limitations of Elasticsearch by publishing the 64-bit unsigned integer values as strings with a trailing '.0', so they are dynamically mapped as doubles. Otherwise, calculations, including those done in aggregations, in Elasticsearch were generating inaccurate results (even with the understanding that certain precision loss would happen) due to overflows (precision loss is one thing, getting a negative result when it is not mathematically possible is another). I've only been working with the stack for a short time, but I appreciate the complexity, capabilities and power of it. However, it feels "hackish" to have to treat 64-bit unsigned integers (not an uncommon thing) as floating point numbers masquerading as strings to be able to dynamically map and store them in a way that can efficiently be used in calculations and aggregations.
Steps to reproduce:
startandendfields are dynamically mapped asfloatWORKAROUND
doubletodouble]startandendfields are dynamically mapped asdoubleEPILOGUE
Just to illustrate the impact of the mapping using the following query:
floatdoubleNOTE: Results are the same even when using
BigDecimalto do the calculations:Provide logs (if relevant):