-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Currently in the column projection section we have this text
Column Projection
Columns in Iceberg data files are selected by field id. The table schema’s column names and order may change after a data file is written, and projection must be done using field ids. If a field id is missing from a data file, its value for each row should be null.For example, a file may be written with schema 1: a int, 2: b string, 3: c double and read using projection schema 3: measurement, 2: name, 4: a. This must select file columns c (renamed to measurement), b (now called name), and a column of null values called a; in that order.
Which while technically true seems to miss the well supported concept of NameMappings which exist in several of the engines already and is explicitly used in Snapshot, Migrate and AddFiles actions. I suggest we fully document NameMapping in the spec so we have less surprises when moving from one engine to another. Something like
...
Files without field ids, like those imported from another system, will have id's assigned to them based on the table's NameMapping. The name mapping should be a JSON map of column name to field ids stored in the table properties under the key
schema.name-mapping.default. This mapping is only used on files where field ids have not been set.