Skip to content

[Proposal]: Allow to chose encoding for parquet columns #1761

@norberttech

Description

@norberttech

Describe the Proposal

Currently Parquet library support following encodings:

  • Plain
  • RLEDictionary
  • Delta (work in progress)

But more will come.

Right now Plain is used for everything, RLEDictionary is unused, and Delta is going to be used for Int32 and Int64.

However it would be good to let end user to decide what encoding for what type of column to use.

API Adjustments

We need to add new option, ColumnsEncodings that would accept array<string, string> where key is a column and value is an Encoding enum.

Whenever this option is provided, it should override default encodings.

Are you intenting to also work on proposed change?

Yes

Are you interested in sponsoring this change?

None

Integration & Dependencies

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions