`arrow2` does _not_ refcount schema metadata

All `arrow2` arrays are defined roughly as the following:
```rust
pub struct Array {
    data_type: DataType,
    values: Buffer<T>,
    validity: Option<Bitmap>,
}
```

When you clone/slice/index an `Array`, you get another `Array` in roughly `O(1)` thanks to both the `values` and `validity` bitmaps being refcounted behind the scenes:
```rust
pub struct Buffer<T> {
    data: Arc<Bytes<T>>,
    offset: usize,
    length: usize,
}

pub struct Bitmap {
    bytes: Arc<Bytes<u8>>,
    offset: usize,
    length: usize,
    unset_bits: usize,
}
```

Well... not really, turns out the `DataType` is _not_ refcounted, and it can get huge: it's a [massive heap-recursive enum](https://github.com/jorgecarleitao/arrow2/blob/33b82abe26edbb9079ff7bb2432c0149da89ea92/src/datatypes/mod.rs#L33) potentially filled with strings and such.

Say you have a `ListArray` that contains a bunch of `StructArray`s (i.e. a column of component data) and you want to extract references to the individual `StructArray`s in that list (i.e. the individual `DataCell`s): each of these arrays is now going to carry a full copy of the `StructArray`'s schema.

For tiny `DataCell`s (which are very common in Rerun), the overhead is enormous.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`arrow2` does _not_ refcount schema metadata #1805

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

arrow2 does _not_ refcount schema metadata #1805

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`arrow2` does _not_ refcount schema metadata #1805