Skip to content

Initial rewrite of MMapDirectory for JDK-16 preview (incubating) Panama APIs (>= JDK-16-ea-b32)#173

Closed
uschindler wants to merge 38 commits intoapache:mainfrom
uschindler:draft/jdk-foreign-mmap
Closed

Initial rewrite of MMapDirectory for JDK-16 preview (incubating) Panama APIs (>= JDK-16-ea-b32)#173
uschindler wants to merge 38 commits intoapache:mainfrom
uschindler:draft/jdk-foreign-mmap

Conversation

@uschindler
Copy link
Contributor

@uschindler uschindler commented Jun 7, 2021

INFO: This is a clone/update of apache/lucene-solr#2176 (for more detailed discussion see this old PR from the Lucene/Solr combined repository)

This is just a draft PR for a first insight on memory mapping improvements in JDK 16+.

Some background information: Starting with JDK-14, there is a new incubating module "jdk.incubator.foreign" that has a new, not yet stable API for accessing off-heap memory (and later it will also support calling functions using classical MethodHandles that are located in libraries like .so or .dll files). This incubator module has several versions:

  • first version: https://openjdk.java.net/jeps/370 (slow, very buggy and thread confinement, so making it unuseable with Lucene)
  • second version: https://openjdk.java.net/jeps/383 (still thread confinement, but now allows transfer of "ownership" to other threads; this is still impossible to use with Lucene.
  • third version in JDK 16: https://openjdk.java.net/jeps/393 (this version has included "Support for shared segments"). This now allows us to safely use the same external mmaped memory from different threads and also unmap it!

This module more or less overcomes several problems:

  • ByteBuffer API is limited to 32bit (in fact MMapDirectory has to chunk in 1 GiB portions)
  • There is no official way to unmap ByteBuffers when the file is no longer used. There is a way to use sun.misc.Unsafe and forcefully unmap segments, but any IndexInput accessing the file from another thread will crush the JVM with SIGSEGV or SIGBUS. We learned to live with that and we happily apply the unsafe unmapping, but that's the main issue.

@uschindler had many discussions with the team at OpenJDK and finally with the third incubator, we have an API that works with Lucene. It was very fruitful discussions (thanks to @mcimadamore !)

With the third incubator we are now finally able to do some tests (especially performance). As this is an incubating module, this PR first changes a bit the build system:

  • disable -Werror for :lucene:core
  • add the incubating module to compiler of :lucene:core and enable it for all test builds. This is important, as you have to pass --add-modules jdk.incubator.foreign also at runtime!

The code basically just modifies MMapDirectory to use LONG instead of INT for the chunk size parameter. In addition it adds MemorySegmentIndexInput that is a copy of our ByteBufferIndexInput (still there, but unused), but using MemorySegment instead of ByteBuffer behind the scenes. It works in exactly the same way, just the try/catch blocks for supporting EOFException or moving to another segment were rewritten.

The openInput code uses MemorySegment.mapFile() to get a memory mapping. This method is unfortunately a bit buggy in JDK-16-ea-b30, so I added some workarounds. See JDK issues: https://bugs.openjdk.java.net/browse/JDK-8259027, https://bugs.openjdk.java.net/browse/JDK-8259028, https://bugs.openjdk.java.net/browse/JDK-8259032, https://bugs.openjdk.java.net/browse/JDK-8259034. The bugs with alignment and zero byte mmaps are fixed in b32, this PR was adapted (hacks removed).

It passes all tests and it looks like you can use it to read indexes. The default chunk size is now 16 GiB (but you can raise or lower it as you like; tests are doing this). Of course you can set it to Long.MAX_VALUE, in that case every index file is always mapped to one big memory mapping. My testing with Windows 10 have shown, that this is not a good idea!!!. Huge mappings fragment address space over time and as we can only use like 43 or 46 bits (depending on OS), the fragmentation will at some point kill you. So 16 GiB looks like a good compromise: Most files will be smaller than 6 GiB anyways (unless you optimize your index to one huge segment). So for most Lucene installations, the number of segments will equal the number of open files, so Elasticsearch huge user consumers will be very happy. The sysctl max_map_count may not need to be touched anymore.

In addition, this implements readLongs in a better way than @jpountz did (no caching or arbitrary objects). Nevertheless, as the new MemorySegment API relies on final, unmodifiable classes and coping memory from a MemorySegment to a on-heap Java array, it requires us to wrap all those arrays using a MemorySegment each time (e.g. in readBytes() or readLongs), there may be some overhead du to short living object allocations (those are NOT reuseable!!!). In short: In future we should throw away on coping/loading our stuff to heap and maybe throw away IndexInput completely and base our code fully on random access. The new foreign-vector APIs will in future also be written with MemorySegment in its focus. So you can allocate a vector view on a MemorySegment and let the vectorizer fully work outside java heap inside our mmapped files! :-)

It would be good if you could checkout this branch and try it in production.

But be aware:

  • You need JDK 11 to run Gradle (set JAVA_HOME to it)
  • You need JDK 16-ea-b32 (set RUNTIME_JAVA_HOME to it)
  • The lucene-core.jar will be JDK16 class files and requires JDK-16 to execute.
  • Also you need to add --add-modules jdk.incubator.foreign to the command line of your Java program/Solr server/Elasticsearch server

It would be good to get some benchmarks, especially by @rmuir or @mikemccand. Take your time and enjoy the complexity of setting this up! ;-)

My plan is the following:

  • report any bugs or slowness, especially with Hotspot optimizations. The last time I talked to Maurizio, he taked about Hotspot not being able to fully optimize for-loops with long instead of int, so it may take some time until the full performance is there.
  • wait until the final version of project PANAMA-foreign goes into Java's Core Library (no module needed anymore)
  • add a MR-JAR for lucene-core.jar and compile the MemorySegmentIndexInput and maybe some helper classes with JDK 17/18/19 (hopefully?).

In addition there are some comments in the code talking about safety (e.g., we need IOUtils.close() taking AutoCloseable instead of just Closeable, so we can also enfoce that all memory segments are closed after usage. In addition, by default all VarHandles are aligned. By default it refuses to read a LONG from an address which is not a multiple of 8. I had to disable this feature, as all our index files are heavily unaliged. We should in meantime not only convert our files to little endian, but also make all non-compressed types (like long[] arrays or non-encoded integers be aligned to the correct boundaries in files). The most horrible thing I have seen is that our CFS file format starts the "inner" files totally unaligned. We should fix the CFSWriter to start new files always at multiples of 8 bytes. I will open an issue about this.

…ning "buffer" to "segment"; also make the segments array final (curSegment == null when closed)
…eException: Cannot close while another thread is accessing the segment"
…ng objects to extend their functionality (like asserting in tests)
… can correctly throw AlreadyClosedEx; TODO: add a test
@uschindler uschindler self-assigned this Jun 7, 2021
@uschindler uschindler marked this pull request as draft June 7, 2021 12:06
@uschindler
Copy link
Contributor Author

I moved this old pull request from apache/lucene-solr#2176 to the Lucene repository:

All tests still pass, the new policeman Jenkins job is: https://jenkins.thetaphi.de/view/Lucene/job/Lucene-jdk16panama-Linux/ (Linux), https://jenkins.thetaphi.de/view/Lucene/job/Lucene-jdk16panama-Windows/ (Windows)

@uschindler uschindler requested review from dweiss, jpountz and msokolov June 7, 2021 12:12
@uschindler
Copy link
Contributor Author

The JDK 17 version is now here: #177

@MarcusSorealheis
Copy link
Contributor

I know this is early days but want to clarify:

Is the plan to work on this and #177 in parallel until we know which is the more sustainable option, or abandon this one altogether with expectations that JDK 17 will be better? I'm going to go through the pain of setting one up but probably cannot do both.

@uschindler
Copy link
Contributor Author

Is the plan to work on this and #177 in parallel until we know which is the more sustainable option, or abandon this one altogether with expectations that JDK 17 will be better? I'm going to go through the pain of setting one up but probably cannot do both.

I won't expect any of both to be in a stable release of Lucene yet. The version for 16 (this one) is here for reference only. It has performance problems, because JDK 16 was not able to optimize loops with 64 bit. With JDK 17, this should be better, but I wasn't able to test this with #177. So I'd recommend to do performance tests with #177.

Hopefully at some point this will land in JDK so we can officially use it. The problem is currently, every major release changes API in significant ways, so there is no way to include it in official builds. It also won't work without command line settings, so if you'd like to use with software like Solr or Elasticsearch, you would need to change startup scripts, too. MR-JARs don't help, because MR-JARs resolve class files based on minimal version and you can't make it "only use this class for JDK 16".

A plan might be (as this is quite isolated) to create a separate github project, with just the directory implementation, so it can be downloaded as separate JAR file and included into projects. Possibly with a DirectoryFactory for Solr or similar plugin for Elasticsearch. My time is a bit limited at moment, but that's obviously the best way to go. The setup as draft pull request with hacked code inside Lucene was mainly done to run all Lucene tests easily against it and compare performance with old MMapDirectory.

# Conflicts:
#	lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java
@uschindler uschindler closed this Aug 6, 2021
@uschindler uschindler deleted the draft/jdk-foreign-mmap branch August 6, 2021 16:43
@uschindler uschindler restored the draft/jdk-foreign-mmap branch August 6, 2021 16:45
@uschindler uschindler reopened this Aug 6, 2021
@xiaoshi2013
Copy link
Contributor

so cool!

uschindler and others added 3 commits October 11, 2021 00:33
…o draft/jdk-foreign-mmap

# Conflicts:
#	gradle/testing/defaults-tests.gradle
#	lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java
#	lucene/core/src/java/org/apache/lucene/util/Unwrappable.java
uschindler and others added 7 commits December 20, 2021 13:19
# Conflicts:
#	gradle/java/javac.gradle
…o draft/jdk-foreign-mmap

# Conflicts:
#	lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java
#	lucene/core/src/test/org/apache/lucene/store/TestMmapDirectory.java
@uschindler uschindler force-pushed the draft/jdk-foreign-mmap branch from a795bc1 to b63001e Compare December 22, 2021 08:49
# Conflicts:
#	lucene/core/src/java/module-info.java
#	lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java
@uschindler
Copy link
Contributor Author

This PR is no longer maintained!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants