Scripts to expose whole values for fields of the `text` family

One issue I keep hearing about is that it's too hard to define a runtime field that extracts some information from a `message` field with Painless. Something like extracting the HTTP status code from a log line of an Apache access log.

I think that this issue has been put into the general meta issue of "doing simple things with Painless should be simpler" but in my opinion this particular issue has more to do with mappings than with Painless. Historically, fielddata on analyzed `string` fields would uninvert the inverted index in memory and Elasticsearch would consider that the value of a field is the set of analyzed terms that it contains. This would require lots of memory, and over time we've increasingly discouraged users from doing it.

These semantics don't work well with runtime extraction of data. If you try to extract data using a regular expression that applies to `doc['message']`, you'll get an exception that fielddata is disabled by default on `text` fields. And even if Elasticsearch returned values, you'd get individual terms, which you cannot leverage to properly extract data from the message.

I suggest that we change the semantics of fielddata on fields of the `text` family (including `text` and `match_only_text`) so that it returns whole values instead. This will enable us to give a more intuitive experience with scripts, where `doc` could read data from `_source` on `text` fields (#80504).

Note that this brings a downside: in order to make it easy to slice and dice the data, Elasticsearch allows users to use terms produce by `terms` aggregations in `term` filters, in order to dig further data that falls within a given bucket. This would not work on `text` fields. I don't think it's the end of the world, since `terms` aggregations do not work on `text` fields today anyway given that we disallow fielddata, but I wanted to highlight it since it would create an exception to a rule that is otherwise honored by `keyword`, `ip` or numeric fields.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scripts to expose whole values for fields of the `text` family #81246

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Scripts to expose whole values for fields of the text family #81246

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Scripts to expose whole values for fields of the `text` family #81246