Every DataCell is a slice into a larger chunk of arrow data living somewhere on the heap.
That slice is represented by the erased array type: Box<dyn Array>.
We've had plenty of performance issues caused by that erased type in the past, perhaps most infamously: its erased refcount clone() implementation is very CPU unfriendly and orders of magnitude slower than just sticking it into an Arc... which is why we've introduced DataCell in the first place, so we could add our own refcounting layer on top.
The problems don't end there, unfortunately. Box<dyn Array> has huge space overhead.
Every Box<dyn Array> carry with it a DataType plus some type specific metadata: this can easily add up to a 100 bytes or more.
We've recently removed the heap overhead of DataType, but that's not nearly enough: it still takes 48 bytes of stack space (std::mem::size_of::<DataType>() = 48)!
Take e.g. a slice of uint32: std::mem::size_of::<arrow2::array::PrimitiveArray<u32>>() = 104.
That means if you're slicing a single uint32, you're introducing 2 orders of magnitude of overhead, and that's before we even take into account the cost of bucketing, timepoint metadata, etc.
Either we need to change our approach for small slices (e.g. TimeSeriesScalar), or we need a more efficient slicing mechanism.
Every
DataCellis a slice into a larger chunk of arrow data living somewhere on the heap.That slice is represented by the erased array type:
Box<dyn Array>.We've had plenty of performance issues caused by that erased type in the past, perhaps most infamously: its erased refcount clone() implementation is very CPU unfriendly and orders of magnitude slower than just sticking it into an
Arc... which is why we've introducedDataCellin the first place, so we could add our own refcounting layer on top.The problems don't end there, unfortunately.
Box<dyn Array>has huge space overhead.Every
Box<dyn Array>carry with it aDataTypeplus some type specific metadata: this can easily add up to a 100 bytes or more.We've recently removed the heap overhead of
DataType, but that's not nearly enough: it still takes 48 bytes of stack space (std::mem::size_of::<DataType>() = 48)!Take e.g. a slice of uint32:
std::mem::size_of::<arrow2::array::PrimitiveArray<u32>>() = 104.That means if you're slicing a single uint32, you're introducing 2 orders of magnitude of overhead, and that's before we even take into account the cost of bucketing, timepoint metadata, etc.
Either we need to change our approach for small slices (e.g.
TimeSeriesScalar), or we need a more efficient slicing mechanism.