Simplify and Speed up some Compression Usage#60953
Simplify and Speed up some Compression Usage#60953original-brownbear merged 3 commits intoelastic:masterfrom original-brownbear:faster-compression
Conversation
Use thread-local buffers and deflater and inflater instances to speed up compressing and decompressing from in-memory bytes. Not manually invoking `end()` on these should be safe since their off-heap memory will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals that are not instantiated at a high frequency. This significantly reduces the amount of byte copying and object creation relative to the previous approach which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc. Relates #57284 which should be helped by this change to some degree. Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these code paths.
|
Pinging @elastic/es-core-infra (:Core/Infra/Core) |
|
Jenkins run elasticsearch-ci/packaging-sample-windows (known Jenkins+Windows issue) |
|
Jenkins run elasticsearch-ci/1 (unrelated test failure #60954) |
| } | ||
|
|
||
| return Arrays.equals(uncompressed(), that.uncompressed()); | ||
| return uncompressed().equals(uncompressed()); |
There was a problem hiding this comment.
| return uncompressed().equals(uncompressed()); | |
| return uncompressed().equals(that.uncompressed()); |
also looking at the code above, I wonder why we don't compare crc32 first before comparing the compressed byte arrays
There was a problem hiding this comment.
🤦 thanks for spotting.
I wonder why we don't compare crc32 first before comparing the compressed byte arrays
I guess you could argue that it's really unlikely that the compressed bytes in the equal (as in equal uncompressed bytes) case aren't actually equal so you'd just add an extra int comparison in the equal case. So if we assume it's mostly the equal case here then the current version is better, I have no clue if that's true though. Probably doesn't matter much in practice since byte array comparison in JDK9+ is blazing fast anyway via jdk.internal.util.ArraysSupport#vectorizedMismatch? :)
There was a problem hiding this comment.
Probably doesn't matter much in practice
I agree that it doesn't matter much. Just a random thought that popped into my head :)
| }; | ||
| } | ||
|
|
||
| private static final ThreadLocal<Inflater> inflaterRef = ThreadLocal.withInitial(() -> new Inflater(true)); |
There was a problem hiding this comment.
IMO it would be a good idea to add a comment about why these threadlocals are not used in the other methods of the class
|
Thanks Jay! |
Use thread-local buffers and deflater and inflater instances to speed up compressing and decompressing from in-memory bytes. Not manually invoking `end()` on these should be safe since their off-heap memory will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals that are not instantiated at a high frequency. This significantly reduces the amount of byte copying and object creation relative to the previous approach which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc. Relates #57284 which should be helped by this change to some degree. Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these code paths.
Use thread-local buffers and deflater and inflater instances to speed up
compressing and decompressing from in-memory bytes.
Not manually invoking
end()on these should be safe since their off-heap memorywill eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals
that are not instantiated at a high frequency.
This significantly reduces the amount of byte copying and object creation relative to the previous approach
which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied
bytes out of that buffer to a freshly allocated
byte[], used 4k stream buffers needlessly when working withbytes that are already in arrays (
writeTohandles efficient writing to the compression logic now) etc.Relates #57284 which should be helped by this change to some degree.
Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these
code paths.