Conversation
This is a new stored fields format that leverages ZSTD via JNA.
|
What is the improvement? Especially interested in performance versus a "good" zlib: e.g. your comment here: https://issues.apache.org/jira/browse/LUCENE-8739?focusedCommentId=17322015&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17322015 I'm just curious because it definitely adds complexity by adding a third-party dependency/native code to lucene-codecs. JNA is complicated (e.g. .so hell with various "system versions" of it, doesn't work out of box with non-executable /tmp, doesn't support all architectures, etc) If the improvement can be done in an easier way (e.g. instructions to luser on how to swap in better zlib), that seems like a better way to go? |
|
Right, zstd definitely outperforms zlib on large files, but when I tested with Lucene stored fields, the difference wasn't very significant, which makes me wonder that most of the benefit of zstd comes from its very large window. One could increase the block size for stored fields to try to leverage this, but it would slow down document retrieval. I wasn't aware of these limitations with JNA. I guess I'll close the PR, it should already make it easier for folks who are curious about zstd to test it with Lucene. |
|
@jpountz another alternative would be to build against the new JDK foreign api? If we need to call out to some native code, maybe we should look at the best available mechanism. From what I looked at before, I feel like it would have less hassles than JNA or JNI and give safer native code, but never tried it out. |
|
With foreign API you can directly wrap arrays to compress and wrap as MemorySegment and pass to foreign function (native MethodHandle). Jextract parses header file and creates MethodHandlers out of it. So yes, that's the way to go in future... See also MmapDirectory v2 #177 for JDK 17. |
Description
First one
Second one
Thanks. |
This is a new stored fields format that leverages ZSTD via JNA.
JNA doesn't have a NOTICE file on their repository, so I copied the bottom of the README that seems to cover what the NOTICE file usually does.