Arrow Row Format

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**


I think this crate has pretty good stories for operating on individual columns, either by downcasting to a concrete type, or invoking a `dyn` kernel.

The stories for multi-column operations are substantially weaker, with patchy support for common multi-column operations such as sorts, groupings, aggregations, reassembly, etc... We have some pieces such as `MutableArrayData`, `DynComparator`, but they're not especially performant, making extensive use of dynamic dispatch at the row-level, nor easy to use.

**Describe the solution you'd like**


Having a first-class row representation will not only allow us to implement more performant versions of existing kernels such as lexsort, but also provide a pretty compelling primitive to downstreams with which to implement more advanced operations such as streaming merges, joins, aggregates, etc... There is also precedent, with the C++ arrow library providing its own row format.

**Goals**

* Each row should be encoded as a single sequence of bytes
* Comparison of the byte arrays should be sufficient to establish ordering of the rows
* It should be possible to convert a selection of rows back to arrays

**Non-Goals**

* Support introspection or mutation of the row values
* Provide a stable encoding for FFI, IO, etc...
* Provide "optimal" encoding, rather a reasonable out-of-the-box baseline for common use-cases

**Describe alternatives you've considered**


We could extend the row format in DataFusion, however, this would limit its benefits to DataFusion. I think a row-oriented representation is such a fundamental primitive that it makes sense for inclusion in arrow-rs, so that it can be both used in its kernels and by downstreams that don't make use of DataFusion.

**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arrow Row Format #2677

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Arrow Row Format #2677

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions