In PR #126492, we implemented an optimization to skip some redundant UTF8 to UTF16 conversions. There were several follow-ups to that PR, which are tracked here. - [x] Support for escaped and/or non-ascii characters (#129169) - [x] Support for match_only_text fields (#129371) - [ ] Support for text fields. Change field type to use the `UTF8DecodingReader` for indexed field. - [ ] Support for wildcard fields. Adopt `UTF8DecodingReader` for indexed field. There is also an unneeded utf16 to utf8 conversion for binary doc values. Optional followups: - [ ] Support for other xcontent types (~cbor~ #132542, smile, yaml) - [ ] Remove `XContentParser#optimizedText()` and instead have `XContentParser#text()` return `XContentString` instead of `String` Maybe not even possible: - [ ] Support for running normalizers on utf-8 encoded data instead of needing to convert to utf-16 strings
In PR #126492, we implemented an optimization to skip some redundant UTF8 to UTF16 conversions.
There were several follow-ups to that PR, which are tracked here.
UTF8DecodingReaderfor indexed field.UTF8DecodingReaderfor indexed field. There is also an unneeded utf16 to utf8 conversion for binary doc values.Optional followups:
cborEnable optimizedText for CBOR #132542, smile, yaml)XContentParser#optimizedText()and instead haveXContentParser#text()returnXContentStringinstead ofStringMaybe not even possible: