Rethink string versus not_analyzed string mappings and support

### Problem

In Elasticsearch, you can currently support indexing your string data as either `analyzed` strings, which is great for unstructured, full text search, or as `not_analyzed` strings, which is great for structured search (e.g., _exact_ matches). However, there is frequently an in-between where you want exact matches, but you want them to ignore case or accented characters (`AA == aa == Ââ`). This forces the use of analyzers for normalization.
### Partial Workaround

For those scenarios, you are currently forced to use the analyzed string variant with a specific analyzer. This generally leads to users forgetting to disable a lot of things like norms, positions, and frequencies. Even if you do happen to do all of that right, you simply cannot take advantage of doc values.
### Potential Solution

It would be interesting to potentially rethink strings and how they're mapped. In the case of analyzed strings, there really isn't much need for improvement except potentially the naming of it. For not_analyzed strings, there's a lot of room for improvement.
#### Mockup

``` json
PUT /my-index
{
  "mappings" : {
    "my-type" : {
      "properties" : {
        "full_text" : {
          "type" : "string",
          "analyzer" : "standard"
        },
        "constant_string" : {
          "type" : "constant_string",
          "filter" : [ "lowercase", "trim" ],
          "char_filter" : [ "..." ]
        }
      }
    }
  }
}
```

Note: the difference is that analyzed strings stay "string" and not_analyzed strings become "constant_string". It's unlikely that we could easily change from "string" for analyzed text, but if we could, then perhaps we could switch analyzed strings to be `text` and not_analyzed strings to be just `string`.

This avoids a lot of questions and regular problems. It also provides the exact same functionality as we have today, if you choose to not supply a filter or char_filter for constant_strings, but it also provides more flexibility in that users can finally use doc values with filtered text that still can be reasonably sorted and aggregated in a normalized format, without the ability to confusingly tokenize the string and unnecessarily use norms, position, or frequency data.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink string versus not_analyzed string mappings and support #11901

Problem

Partial Workaround

Potential Solution

Mockup

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Rethink string versus not_analyzed string mappings and support #11901

Description

Problem

Partial Workaround

Potential Solution

Mockup

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions