-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Current state
Currently the data in several partial (or just one - for transformations) indexes is transformed during merged in the following way:
- Iterator < TimeAndDims + Object[] metrics (entry in
IncrementalIndex) >
--> sorting dimension value indexed, aka unsortedToSorted - Iterator < Rowboat (Object[] dims, Object[] metrics) >
--> optionally, reordering dims - Iterator < Rowboat (Object[] dims, Object[] metrics) >
// here array elements are the same objects as at the previous step, butObject[]arrays are new, if reordering or dims and/or metrics is actually required
--> another one reindexing, based on merged dictionary - Iterator < Rowboat (Object[] dims, Object[] metrics) >
--> final merge.
Here, Object[] elements are either int[] (DimensionSelector), Long, Double or Float (numeric ColumnValueSelectors, correspondingly).
So in the process of merge, each entry generates 2-3 extra Rowboat objects, 4-7 new Object[] arrays, and N (the number of string dimensions) * 2 new int[] arrays, and new boxed primitive objects, if merging is done with QueryableIndex as a source.
Garbage-free approach
Rowboat contains an array of ColumnValueSelector objects, representing the stream of dimensions, and another array of ColumnValueSelector objects, representing a stream of metrics, both "under cursor". When QueryableIndexis used as source for merging, the existing Cursor and ColumnValueSelectorFactory infrastructure is reused with minimal modifications.
0->1 and 2-3 conversions, as described above, implemented as ColumnValueSelector transformations, without creating new arrays, boxed primitives, etc. 1->2 transformation is essentially a no-op: create a Rowboat object with array of ColumnValueSelectors, ordered differently.