Skip to content

Add NameMapping to Spec Document #3542

@RussellSpitzer

Description

@RussellSpitzer

Currently in the column projection section we have this text

Column Projection
Columns in Iceberg data files are selected by field id. The table schema’s column names and order may change after a data file is written, and projection must be done using field ids. If a field id is missing from a data file, its value for each row should be null.

For example, a file may be written with schema 1: a int, 2: b string, 3: c double and read using projection schema 3: measurement, 2: name, 4: a. This must select file columns c (renamed to measurement), b (now called name), and a column of null values called a; in that order.

Which while technically true seems to miss the well supported concept of NameMappings which exist in several of the engines already and is explicitly used in Snapshot, Migrate and AddFiles actions. I suggest we fully document NameMapping in the spec so we have less surprises when moving from one engine to another. Something like

...

Files without field ids, like those imported from another system, will have id's assigned to them based on the table's NameMapping. The name mapping should be a JSON map of column name to field ids stored in the table properties under the key schema.name-mapping.default. This mapping is only used on files where field ids have not been set.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions