Skip to content

String-based path column projection #182

@alamb

Description

@alamb

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-11618

There is currently no way to select a column by its path, e.g. 'a.b.c'. We have to select the column by its index, which is not trivial for nested structures.

For example, if a record has the following schema, the column indices are shown in parentheses:

{code}
schema:
a [struct] ("a")
b [struct] ("a.b")
c [int32] ("a.b.c") [0]
d [struct] ("a.b.d")
e [int32] ("a.b.d.e") [1]
f [bool] ("a.b.d.f") [2]
g [int64] ("a.b.g") [3]
{code}

if one wants to select 'a.b', they need to know that 'a.b.d' spans 2 (1 to 2) columns. This is inconvenient, and potentially forces readers to read whole records to avoid this inconvenience.

A string-based projection could allow one to select columns 1 and 2 via "a.b.d" or column 2 via "a.b.g"

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions