-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
With the relatively recent capability to call madvise in Lucene, we've started to use MADV_RANDOM in several places where it makes conceptual sense, e.g. for accessing vector data when navigating the graph. The memory access is truly random, but we've seen several reports of performance regressions that appear as a result of this.
Of particular concern is the interaction of MADV_RANDOM with Multi-Gen LRU [1]. From my reading of the code, and someone please correct me, the semantics of MADV_RANDOM has changed in the kernel with MGLRU, and results in pages being proactively reclaimed more eagerly, even when there is no memory pressure. Specifically after torvalds/linux@8788f67.
This Elasticsearch issue has the more of the lower-level details, elastic/elasticsearch#124499.
This issue may also have some connection, #14281.
I opened this issue to help facilitate a discussion and hopefully converge on a potential direction to mitigate the possibility of performance regressions. For example, one possible mitigation would be to expose the ReadAdivce that will be used as part of the API, so that callers can have more fine-grained control over whether or not to use MADV_RANDOM.
[1] https://docs.kernel.org/admin-guide/mm/multigen_lru.html