Only call madvise when necessary.#13907
Conversation
This commit tries to save calls to `madvise` which are not necessary, either because they map to the OS' default, or because the advice would be overridden later on anyway. I have not noticed specific problems with this, but it seems desirable to keep calls to `madvise` to a minimum. As a consequence: - Files that are open with `ReadAdvice.NORMAL` do not call `madvise` since this is the OS' default. - Compound files are always open with `ReadAdvice.NORMAL`, and the actual is only set when opening sub files of these compound files. To make the latter less trappy, the `IOContext` parameter has been removed from `CompoundFormat#getCompoundReader`.
|
In the original code when I added madvise for the first time, I had an exclusion on the mapping function between ReadAdvise and the platform constant to return a NULL integer for NORMAL. The later code then excluded the call when the mapping function returned NULL. The latter check is still there, so maybe we can clean this up, too. The updated code for compound files looks fine, although why should the madvise only called when read advice is NORMAL. The linux kernel has no problem when you open the CFS file with SEQUENTIAL and then you change the read advice to RANDOM. I was also thinking of generally supporting this in the API for merging. On the mailing list there was a complaint about merging being slower for docvalues and vectors, because those use RANDOM. So we could think about changing the readAdvice before merging for all files to SEQUENTIAL and restore it to the original value later. This could be done by adding an IndexInput method (similar to the new slice() method with advise). |
OK I'll do this.
If we want to not call
Indeed it's not a problem, just inefficient, as setting a SEQUENTIAL advice would be useless if we then immediately set RANDOM? (Merging is a bit different, as we may be reading the segment a few times for search operations before we use it for merging, so it would make sense to change the advice over time.) |
Of course, we should not do this. The compound outer IndexInput should be opened with "NORMAL" read advice (which is a NOOP) and for the sub-files we use an individual one. I was just stumbling on the javadocs that its not "legal": https://github.com/apache/lucene/pull/13907/files#diff-0ac049bc7bd12b9665b6b3e7605bc8029e6623a98aa300ab52df7a8e9fe302d9R130-R131
People already experimented with something like this, although their hacky patch doe snot fit the current API. My plan would be to allow to change the read advice to "SEQUENTIAL" or "NORMAL" before the merging starts and revert back at end. As this affects all clones, it must be restored at end. We can make a separate PR of this. Uwe |
You basically restored the original state. Thanks. Possibly we could still implement is like it was done recently. My conmment was more about removing the null check and making the mapper function return The new / restored code has the problem that we can't restore the read advise when we change it back to NORMAL after merging. |
This reverts commit f38ceba.
This comment refers to the compound file, not the inner file. I just tried to make it clearer.
Thanks, I had misunderstood your comment. I just did that. I agree it would be useful to be able to restore a |
| String sliceDescription, long offset, long length, ReadAdvice advice) throws IOException { | ||
| MemorySegmentIndexInput slice = slice(sliceDescription, offset, length); | ||
| if (NATIVE_ACCESS.isPresent()) { | ||
| if (NATIVE_ACCESS.isPresent() && advice != ReadAdvice.NORMAL) { |
There was a problem hiding this comment.
At moment the slice method would not allow to reset the read advice for a slice back to NORMAL. This is perfectly fine here, because this methods is only used for CFS files (which are always opened with NORMAL).
We should just keep this in mind. The Javadocs ob IndexInput already mentions why this method is available and should be used only for compound files. I just wanted to mention this here for the record, no need to fix anything.
Another idea for discussion possibly in another PR: How about to save the advice/IOContext that was used for opening the input as field in MemorySegmentIndexInput and only set the advise here, when it differs from the current?
There was a problem hiding this comment.
If we want to do some tricks for merging, I think we'll have to do something like that indeed. For CFS files, opening the compound file handle with a NORMAL advice felt like the right thing to do.
| if (preload) { | ||
| segment.load(); | ||
| } else if (nativeAccess.filter(na -> segment.address() % na.getPageSize() == 0).isPresent()) { | ||
| } else if (readAdvice != ReadAdvice.NORMAL |
There was a problem hiding this comment.
Idea for another PR: When we extend the whole code to be able to change read advice later, we could move this code to the MemorySegmentIndexInput class. Wecould just pass the advise or IOContext to the instance created (see above for more info).
This commit tries to save calls to `madvise` which are not necessary, either because they map to the OS' default, or because the advice would be overridden later on anyway. I have not noticed specific problems with this, but it seems desirable to keep calls to `madvise` to a minimum. As a consequence: - Files that are open with `ReadAdvice.NORMAL` do not call `madvise` since this is the OS' default. - Compound files are always open with `ReadAdvice.NORMAL`, and the actual is only set when opening sub files of these compound files. To make the latter less trappy, the `IOContext` parameter has been removed from `CompoundFormat#getCompoundReader`.
| /** Returns a Directory view (read-only) for the compound files in this segment */ | ||
| public abstract CompoundDirectory getCompoundReader( | ||
| Directory dir, SegmentInfo si, IOContext context) throws IOException; | ||
| public abstract CompoundDirectory getCompoundReader(Directory dir, SegmentInfo si) |
There was a problem hiding this comment.
Heya, just raising that this specific change requires adjusting consumers code when upgrading to Lucene 10.1. This class is marked experimental so I think we are good, but I thought it's worth double checking just to make sure.
Relates to apache/lucene#13907 getCompoundReader no longer takes IOContext as its last argument.
This reverts commit b8bfffa.
This commit tries to save calls to
madvisewhich are not necessary, either because they map to the OS' default, or because the advice would be overridden later on anyway. I have not noticed specific problems with this, but it seems desirable to keep calls tomadviseto a minimum.As a consequence:
ReadAdvice.NORMALdo not callmadvisesince this is the OS' default.ReadAdvice.NORMAL, and the actual is only set when opening sub files of these compound files.To make the latter less trappy, the
IOContextparameter has been removed fromCompoundFormat#getCompoundReader.