Skip to content

GH-452: Clarify use of RowGroup.ordinal field#453

Merged
ggershinsky merged 2 commits intoapache:masterfrom
ggershinsky:gh452
Sep 25, 2024
Merged

GH-452: Clarify use of RowGroup.ordinal field#453
ggershinsky merged 2 commits intoapache:masterfrom
ggershinsky:gh452

Conversation

@ggershinsky
Copy link
Contributor

Encrypted files use three types of ordinals: row group, column, page. All three are simple local counters in both writers and readers. In addition, the row group ordinal is stored in the parquet footer (RowGroup.ordinal field). Parquet implementors can benefit from a clarification on the reason for and intended use of this field.

@ggershinsky
Copy link
Contributor Author

cc @mapleFU @pitrou

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious:

  1. If multiple files being merged or something, would this being merged with same id, or should this being rewritten?
  2. Is this only required when aad suffix?

Also cc @alamb

@ggershinsky
Copy link
Contributor Author

Just curious:

  1. If multiple files being merged or something, would this being merged with same id, or should this being rewritten?

Each encrypted parquet file has a unique file id , used for signing every module of the file (to ensure they are not swapped, etc). Also, each file typically has a unique encryption key. Therefore, a merged file needs a new id, new row group ordinals, a new key; and re-encryption of each module with the new key / AAD.

  1. Is this only required when aad suffix?

Row group ordinal is a part of the AAD suffix in most modules

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants