Rewrite CollectionUtils dedup to work with any type#85352
Rewrite CollectionUtils dedup to work with any type#85352rjernst merged 14 commits intoelastic:masterfrom
Conversation
CollectionUtils contains a method for sorting and deduplicating an hppc list of byte arrays. That is a very specific type. Yet the algorithm for deduplicating a sorted list is very simple and does not need to be specially typed. This commit removes the sort portion of the method, as that is already easily available (and timsort in Java should be just fine for this purpose, we don't need introsort), renames to unique, and makes the method take a generic List along with a Comparator. relates elastic#84735
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
jpountz
left a comment
There was a problem hiding this comment.
This looks much cleaner.
| T prevValue = list.get(0); | ||
| for (int i = 1; i < list.size(); ++i) { | ||
| T nextValue = list.get(i); | ||
| if (cmp.compare(nextValue, prevValue) != 0 && prevNdx++ != i) { |
There was a problem hiding this comment.
Do we need to pre-increment rather than post-increment? Otherwise it looks to me like a list where all elements are unique would still overwrite all the time?
There was a problem hiding this comment.
You are right, it should be pre-increment.
| return array == null || array.length == 0; | ||
| } | ||
|
|
||
| public static <T> void unique(List<T> list, Comparator<T> cmp) { |
There was a problem hiding this comment.
Add javadocs that the list must be sorted according to the given comparator?
Also a bit of a nit-pick, but since this modifies the list in-place, I feel like naming the method after a verb would be more appropriate?
| if (list.size() <= 1) { | ||
| return; | ||
| } | ||
|
|
There was a problem hiding this comment.
Should we verify that the list implements random access?
There was a problem hiding this comment.
The algorithm only requires a forward iterator. I've rewritten to use ListIterator instead of indices. The only caveat is that for LinkedList Java does not provide an efficient means to remove the rest of a list from a given point.
|
@jpountz I think I addressed all your comments, can you take another look? |
jpountz
left a comment
There was a problem hiding this comment.
Thanks this looks good to me. One thing that makes me a bit nervous is that I'm unsure of the performance impact that this change would have for users of binary fields, but it would only possibly affect binary fields that are effectively multi-valued, which is a rare combination I suspect, so probably nothing to worry about?
server/src/main/java/org/elasticsearch/common/util/CollectionUtils.java
Outdated
Show resolved
Hide resolved
Yes that is my thinking. |
…tils.java Co-authored-by: Adrien Grand <jpountz@gmail.com>
When a plugin specifies an elasticsearch version with an empty string in their plugin-descriptor.properties file, we don't enforce the version check. This PR ensures that we check the version string for empty as well as null value.
Fix small grammatical mistake. Closes elastic#85355
Since Java 9, the JDK has provided a means of parsing Java versions and getting the current Java version. That class obviates the need for the JavaVersion class of Elasticsearch. This commit removes the JavaVersion class in favor of Runtime.Version. Note that most of the changes here simply removed logic around versioning because this change is intended only for the master branch, where Java 17 is required.
This commit removes leftover sort methods in CollectionUtils which were only used in tests.
CollectionUtils contains a method for sorting and deduplicating an hppc
list of byte arrays. That is a very specific type. Yet the algorithm for
deduplicating a sorted list is very simple and does not need to be
specially typed.
This commit removes the sort portion of the method, as that is already
easily available (and timsort in Java should be just fine for this
purpose, we don't need introsort), renames to unique, and makes the
method take a generic List along with a Comparator.
relates #84735