Sort binary doc values by default#140244
Conversation
3679bd0 to
966199d
Compare
966199d to
35e72a4
Compare
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
I don't think I'm onboard with this change. I agree that sometimes |
|
We can't do that because our tests will fail. We expect different things across different fields and across different use cases. Some expect a deduped sorted list, other a sorted list thats not deduped, and others an unsorted list. We're inconsistent. We either need to agree to be consistent and introduce a breaking change or we deal with the mess. |
|
We synced offline to agree to sort at index time and adjust any tests that may fail. |
35e72a4 to
4338523
Compare
| protected void writeLenAndValues(BytesStreamOutput out) throws IOException { | ||
| // sort the ArrayList variant of the collection prior to serializing it into a binary array | ||
| if (values instanceof ArrayList<BytesRef>) { | ||
| ((ArrayList<BytesRef>) values).sort(Comparator.naturalOrder()); |
There was a problem hiding this comment.
list.sort() is more efficient than Collections.sort() since it sorts in place
42a1347 to
023790d
Compare
023790d to
2207387
Compare
| private final boolean store; | ||
| private final boolean index; | ||
| private final Integer ignoreAbove; | ||
| private final boolean fallbackUsesBinaryDocValues; |
There was a problem hiding this comment.
this is needed for annotated text. If we plan on using binary doc values for fallback annotated text fields, then this can later be removed
| } | ||
|
|
||
| tasks.named("yamlRestCompatTestTransform").configure { task -> | ||
| task.skipTest("match_only_text/10_basic/synthetic_source match_only_text as multi-field with ignored keyword as parent", "ignore_above now uses sorted binary doc values") |
There was a problem hiding this comment.
I wonder whether we can use replaceValueInMatch here instead of skipTest? Then we can still run the test with previous assertion.
There was a problem hiding this comment.
If this doesn't work, then please ignore my comment.
There was a problem hiding this comment.
I tried this but couldn't get it working
server/src/main/java/org/elasticsearch/index/mapper/MultiValuedBinaryDocValuesField.java
Outdated
Show resolved
Hide resolved
* Store fallback annotated text fields in binary doc values * Address feedback
MultiValuedSortedBinaryDocValues's name is misleading as it doesn't actually sort anything. It also violates the contract imposed bySortedBinaryDocValues(parent class).Now, sorting values may not actually be necessary as sometimes the underlying binary array is created in sorted order during indexing (example code). However, more often than not, its not (example code).
I found that leaving things as is makes introducing binary doc values into match only text difficult due to match only text fields having a dedicated wrapper that sorts stored fields. Leaving things as is means we need to introduce a similar abstraction to StoredFieldSortedBinaryIndexFieldData, which is a lot of extra code simply to make up for the fact that
MultiValuedSortedBinaryDocValuesdoesn't sort things despite "Sorted" literally being in its name.This PR address this inconsistency. All values in
MultiValuedBinaryDocValuesFieldare now sorted by default.