Add NameMapping to Spec Document

Currently in the column projection section we have this text

> Column Projection
> Columns in Iceberg data files are selected by field id. The table schema’s column names and order may change after a data file is written, and projection must be done using field ids. If a field id is missing from a data file, its value for each row should be null.
> 
> For example, a file may be written with schema 1: a int, 2: b string, 3: c double and read using projection schema 3: measurement, 2: name, 4: a. This must select file columns c (renamed to measurement), b (now called name), and a column of null values called a; in that order.

Which while technically true seems to miss the well supported concept of NameMappings which exist in several of the engines already and is explicitly used in Snapshot, Migrate and AddFiles actions. I suggest we fully document NameMapping in the spec so we have less surprises when moving from one engine to another. Something like

> ...
> 
> Files without field ids, like those imported from another system, will have id's assigned to them based on the table's NameMapping. The name mapping should be a JSON map of column name to field ids stored in the table properties under the key `schema.name-mapping.default`. This mapping is only used on files where field ids have not been set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add NameMapping to Spec Document #3542

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add NameMapping to Spec Document #3542

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions