Skip to content

Add a preload hint for preloading mmap data on specific open calls#14604

Merged
ChrisHegarty merged 3 commits intoapache:mainfrom
thecoop:preload-hint
May 9, 2025
Merged

Add a preload hint for preloading mmap data on specific open calls#14604
ChrisHegarty merged 3 commits intoapache:mainfrom
thecoop:preload-hint

Conversation

@thecoop
Copy link
Contributor

@thecoop thecoop commented May 2, 2025

Add a PreloadHint to tell MMapDirectory to preload data, and use it to preload the HNSW index into memory

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

FileTypeHint.DATA,
FileDataHint.KNN_VECTORS,
DataAccessHint.RANDOM,
PreloadHint.INSTANCE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've not preloaded this before. But I think it is ok, since it's just a hint to the IndexInput, which by default will be be ignored - since one needs to provide a preLoadOverride to MMapDirectory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not an override, but there is a new constant for the already preexisting mechanism: mmapdir.setPreload(PRELOAD_HINT)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uschindler correct (my comments are out of date). Your words are better than my clumsy ones above. Thank you.

@ChrisHegarty ChrisHegarty merged commit 8d45ad6 into apache:main May 9, 2025
7 checks passed
@thecoop thecoop deleted the preload-hint branch May 9, 2025 11:30
ChrisHegarty pushed a commit that referenced this pull request May 9, 2025
…14604)

Add a PreloadHint to tell MMapDirectory to preload data, and use it to preload the HNSW index into memory.
@jpountz
Copy link
Contributor

jpountz commented May 9, 2025

Can you add a CHANGES entry for this? Similar changes have not always been well received in the past, because it increases the cost of opening a reader, even though the reader may not need the field that is pre-loaded. I personally think that we should do this, because Lucene is a search library and we want to move costs to open-time rather than run-time whenever possible, but this is a notable change in runtime behavior, so I'd like to make sure it's visible in the release notes.

@thecoop
Copy link
Contributor Author

thecoop commented May 9, 2025

After comments from @ChrisHegarty, this PR does not change the default behaviour. You need to use the hint by calling MMapDirectory.setPreload(PRELOAD_HINT)

jpountz added a commit to jpountz/lucene that referenced this pull request May 9, 2025
This enables the `PreloadHint` introduced in apache#14604 on completion fields and
memory terms dictionaries, which are both expected to fit in the page cache in
practice.

I don't have specific interest in these two file formats, I was more interested
in having more than one file format that uses `PreloadHint` to make sure it's
generally useful and not only to KNN vectors.
@jpountz
Copy link
Contributor

jpountz commented May 9, 2025

Ahh, I saw the constant on MMapDirectory and failed to see that it was unused.

@ChrisHegarty
Copy link
Contributor

Ahh, I saw the constant on MMapDirectory and failed to see that it was unused.

We can of course choose to change this - enable preLoad Hints by default, but I'd rather separate that out since the default behaviour is far more controversial than the hint itself.

jpountz added a commit that referenced this pull request May 11, 2025
…es. (#14634)

This enables the `PreloadHint` introduced in #14604 on completion fields and
memory terms dictionaries, which are both expected to fit in the page cache in
practice.

I don't have specific interest in these two file formats, I was more interested
in having more than one file format that uses `PreloadHint` to make sure it's
generally useful and not only to KNN vectors.
jpountz added a commit that referenced this pull request May 11, 2025
…es. (#14634)

This enables the `PreloadHint` introduced in #14604 on completion fields and
memory terms dictionaries, which are both expected to fit in the page cache in
practice.

I don't have specific interest in these two file formats, I was more interested
in having more than one file format that uses `PreloadHint` to make sure it's
generally useful and not only to KNN vectors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants