Skip to content

_uid should be indexed in Lucene in binary form, not base64 #18154

@mikemccand

Description

@mikemccand

@rmuir had this idea:

Today, when ES auto-generates an ID (TimeBasedUUIDGenerator.getBase64UUID), it uses 15 bytes, but then we immediately Base64 encode that to 20 bytes, a 33% "waste".

This is really a holdover from the past when Lucene could not index fully binary terms.

I think we should explore passing the raw binary form to Lucene instead? We could implement back-compat based on the version as of when the index was created.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions