Skip to content

Dynamic Field Mapping and Templates using numeric_detection does not match documentation #30939

@StevenToth

Description

@StevenToth

Describe the feature: Bug in Dynamic field mapping of floating point numbers with numeric_detection enabled.

Elasticsearch version (bin/elasticsearch --version): 6.2.2

Plugins installed: []

JVM version (java -version): 1.8.0_66

OS version (uname -a if on a Unix-like system): RHEL6

Description of the problem including expected versus actual behavior:
According to the documentation on Dynamic field mapping, when numeric_detection is enabled passing a floating point number as a string will map the field to a double.

PUT my_index/_doc/1
{
  "my_float":   "1.0",
  "my_integer": "1" 
}

The my_float field is added as a double field.

However, this actually results in the field being mapped as a float.

In addition, the documentation on Dynamic templates indicates that only the following datatypes can be dynamically mapped using match_mapping_type:

Only the following datatypes can be automatically detected: boolean, date, double, long, object, string.

Therefore, based on the absence of the float datatype in that list and the fact that the floating point numbers are being dynamically mapped with a float datatype, it would seem that dynamically mapped floating point numbers cannot be mapped to a double datatype using match_mapping_type.

There is a workaround using match_mapping_type. I verified dynamic templates will not allow a value of float for the match_mapping_type, but I found that using a dynamic template with a match_mapping_type of double will map the fields that would have been dynamically mapped as float to double. See Steps to reproduce for an example of the workaround.

*DISCLAIMER The following is a shameless plug for 64-bit Unsigned Integer support in Elasticsearch
The dynamic mapping to double is needed as there is no support for 64-bit unsigned integers in Elasticsearch, whereas the system publishing the data (a custom Elastic Beat written in Go) does support 64-bit unsigned integers. The Beat was coded to workaround the limitations of Elasticsearch by publishing the 64-bit unsigned integer values as strings with a trailing '.0', so they are dynamically mapped as doubles. Otherwise, calculations, including those done in aggregations, in Elasticsearch were generating inaccurate results (even with the understanding that certain precision loss would happen) due to overflows (precision loss is one thing, getting a negative result when it is not mathematically possible is another). I've only been working with the stack for a short time, but I appreciate the complexity, capabilities and power of it. However, it feels "hackish" to have to treat 64-bit unsigned integers (not an uncommon thing) as floating point numbers masquerading as strings to be able to dynamically map and store them in a way that can efficiently be used in calculations and aggregations.

Steps to reproduce:

  1. Turn on numeric detection
PUT /_template/my_index_template
{
  "index_patterns": ["my_index"],
  "mappings": {
    "doc": {
      "numeric_detection": true
    }
  }
}
  1. Add document
POST my_index/doc/
{
  "start":        "1527613753042816000.0",
  "end":          "1527613753110000128.0",
  "iterations":   "100"
}
  1. Get mapping
GET my_index/_mapping/

start and end fields are dynamically mapped as float

{
  "my_index": {
    "mappings": {
      "doc": {
        "numeric_detection": true,
        "properties": {
          "end": {
            "type": "float"
          },
          "iterations": {
            "type": "long"
          },
          "start": {
            "type": "float"
          }
        }
      }
    }
  }
}

WORKAROUND

  1. Delete index
DELETE /my_index
  1. Replace template [with one that maps double to double]
PUT /_template/my_index_template
{
  "index_patterns": ["my_index"],
  "mappings": {
    "doc": {
      "numeric_detection": true,
      "dynamic_templates": [
        {
          "not_so_double_to_double": {
            "match_mapping_type": "double",
            "mapping": {
              "type": "double"
            }
          }
        }
       ]
    }
  }
}
  1. Add [same] document
POST my_index/doc/
{
  "start":        "1527613753042816000.0",
  "end":          "1527613753110000128.0",
  "iterations":   "100"
}
  1. Get the mapping
GET my_index/_mapping/

start and end fields are dynamically mapped as double

{
  "my_index": {
    "mappings": {
      "doc": {
        "dynamic_templates": [
          {
            "not_so_double_to_double": {
              "match_mapping_type": "double",
              "mapping": {
                "type": "double"
              }
            }
          }
        ],
        "numeric_detection": true,
        "properties": {
          "end": {
            "type": "double"
          },
          "iterations": {
            "type": "long"
          },
          "start": {
            "type": "double"
          }
        }
      }
    }
  }
}

EPILOGUE
Just to illustrate the impact of the mapping using the following query:

GET my_index/_search?size=0
{
  "aggs": {
    "avgDurationPerIteration": {
      "avg": {
        "script": "(doc['end'].value - doc['start'].value) / doc['iterations'].value"
      }
    }
  }
}
  1. When fields are mapped as float
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "avgDurationPerIteration": {
      "value": 0
    }
  }
}
  1. When fields are mapped as double
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "avgDurationPerIteration": {
      "value": 671841.28
    }
  }
}

NOTE: Results are the same even when using BigDecimal to do the calculations:

GET my_index/_search?size=0
{
  "aggs": {
    "avgDurationPerIteration": {
      "avg": {
        "script": "BigDecimal.valueOf(doc['end'].value).subtract(BigDecimal.valueOf(doc['start'].value)).divide(BigDecimal.valueOf(doc['iterations'].value*1.0)).doubleValue()"
      }
    }
  }
}

Provide logs (if relevant):

Metadata

Metadata

Assignees

Labels

:Search Foundations/MappingIndex mappings, including merging and defining field types>bug>docsGeneral docs changesTeam:Search FoundationsMeta label for the Search Foundations team in Elasticsearch

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions