[8.19] Skip UTF8 to UTF16 conversion during document indexing (#126492) by jordan-powers · Pull Request #129023 · elastic/elasticsearch

jordan-powers · 2025-06-06T02:51:25Z

Backports the following commits to 8.19:

Skip UTF8 to UTF16 conversion during document indexing (Skip UTF8 to UTF16 conversion during document indexing #126492)

When parsing documents, we receive the document as UTF-8 encoded data which we then parse and convert the fields to java-native UTF-16 encoded Strings. We then convert these strings back to UTF-8 for storage in lucene. This patch skips the redundant conversion, instead passing lucene a direct reference to the received UTF-8 bytes when possible.

elasticsearchmachine mentioned this pull request Jun 6, 2025

Skip UTF8 to UTF16 conversion during document indexing #126492

Merged

elasticsearchmachine added the v8.19.0 label Jun 6, 2025

elasticsearchmachine merged commit cf0b1ef into elastic:8.19 Jun 6, 2025
15 checks passed

jordan-powers deleted the backport/8.19/pr-126492 branch June 6, 2025 04:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.19] Skip UTF8 to UTF16 conversion during document indexing (#126492)#129023

[8.19] Skip UTF8 to UTF16 conversion during document indexing (#126492)#129023
elasticsearchmachine merged 1 commit intoelastic:8.19from
jordan-powers:backport/8.19/pr-126492

jordan-powers commented Jun 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jordan-powers commented Jun 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants