-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Motivation
The current incremental-index implementations (on-heap and off-heap) suffer from poor memory utilization and sub-optimal performance. In some ingestion scenarios, we observed 200% memory overhead and 70% runtime overhead that both are attributed to the GC mechanism. This is mainly due to the large number of metadata objects created by Java’s ConcurrentSkipList (CSL).
Proposed changes
We implemented an alternative incremental-index (OakIncrementalIndex) that has two main attributes that are different from the current implementations:
- It stores both keys and values off-heap (as opposed to the off-heap implementation that stores only the values off-heap).
- It is based on
OakMap[1] instead of Java’sConcurrentSkipList(CSL).
These two changes significantly reduce the number of heap-objects and thus decrease dramatically the GC’s memory and performance overhead.
This implementation was proposed before (#5698 and #7676). This issue expands on these with system-level experiments results, as well as more comprehensive component-level benchmarks results (as requested by the community). In addition to improved performance compared to older versions.
[1] Oak: a Scalable Off-Heap Allocated Key-Value Map. ACM Conference on Principles and Practices of Parallel Programming (PPoPP) ‘2020.
Rationale
Our implementation (OakIncrementalIndex) instantiates a sub-linear number of objects with respect to the number of rows in the incremental-index, as opposed to a linear number of metadata objects that are instantiated by CSL. For typical Incremental-Index sizes (e.g., the current flush threshold is 1M rows), this overhead is millions of Java metadata objects just for internal CSL use. In addition, an on-heap multi-dimensional key might include many small objects that increase the memory overhead even further, as opposed to OakIncrementalIndex that needs only one buffer object for many multi-dimensional keys.
Our experiments show that when using OnHeapIncrementalIndex and OffHeapIncrementalIndex, Java GC requires roughly 200% memory compared to the raw data size to achieve reasonable ingestion speed. Furthermore, this large number of objects also incur longer GC pauses (about 40% of the runtime in our experiments) as there are many long-living objects to traverse. OakIncrementalIndex has only 2% memory overhead and negligible GC runtime overhead.
We evaluated OakIncrementalIndex with comparison to OnHeapIncrementalIndex and OffHeapIncrementalIndex via system-level experiments and component-level benchmarks. The experimental setup and the results are depicted here.
The system-level experiments show improved ingestion memory and CPU efficiency. It uses 60% less memory and 50% less CPU-time to achieve the same performance. This translates to nearly double the system's ingestion-throughput with the same memory budget, and a 75% increase in throughput with the same CPU-time budget. The component-level benchmarks show almost 33% of the memory usage and 60% of the runtime (1.7x ingestion throughput) compared to the on-heap and off-heap implementations.
Test plan
We modified all the unit-test and benchmarks to test all the available incremental-index implementations (on-heap, off-heap, and Oak). All the unit tests passed successfully.
Operational impact
This change will not affect any existing clusters. It will work seamlessly and interchangeably with existing incremental index implementations. See our wiki’s usage section for more details.