Add best_compression option for indices#8863
Closed
rmuir wants to merge 3 commits intoelastic:masterfrom
Closed
Conversation
Contributor
|
+1 I left some comments |
Contributor
|
LGTM |
Contributor
Author
|
I want @jpountz opinion too, when he has some time. |
13 tasks
Contributor
|
+1 to the named codec approach. And I see that |
Member
|
this is great!, now part of the time base data story can also be an optional codec change and optimizing to reduce storage for "old" indices, potentially significantly. |
|
This is wonderful, thank you guys! |
best_compression option for Lucene 5.0
best_compression option for Lucene 5.0best_compression option for indices
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
upgrades lucene to latest, and supports the BEST_COMPRESSION parameter now supported (with backwards compatibility, etc) in Lucene. This option uses deflate, tuned for highly compressible data.
index.codec::The
defaultvalue compresses stored data with LZ4 compression, butthis can be set to
best_compressionfor a higher compression ratio,at the expense of slower stored fields performance.
IMO its safest to implement as a named codec here, because ES already has logic to handle this correctly, and because its unrealistic to have a plethora of options to Lucene's default codec... we are practically limited in Lucene to what we can support with back compat, so I don't think we should overengineer this and add additional unnecessary plumbing.
See also:
https://issues.apache.org/jira/browse/LUCENE-5914
https://issues.apache.org/jira/browse/LUCENE-6089
https://issues.apache.org/jira/browse/LUCENE-6090
https://issues.apache.org/jira/browse/LUCENE-6100