Skip to content

Commit 4130e1f

Browse files
authored
feat: Add Parquet format max_row_group_length option (#19583)
#### Summary The support for setting the group length was added in #19578 implicitly via cloudquery/filetypes#589. This PR adds documentation for it.
1 parent da0440a commit 4130e1f

File tree

10 files changed

+24
-0
lines changed

10 files changed

+24
-0
lines changed

plugins/destination/azblob/docs/_configuration.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ spec:
1818
# Parquet specific parameters:
1919
# version: "v2Latest"
2020
# root_repetition: "repeated"
21+
# max_row_group_length: 134217728 # 128 * 1024 * 1024
2122

2223
# Optional parameters
2324
# compression: "" # options: gzip

plugins/destination/azblob/docs/overview.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,3 +99,7 @@ Reserved for future use.
9999
[Repetition option to use for the root node](https://github.com/apache/arrow/issues/20243). Supported values are `undefined`, `required`, `optional` and `repeated`.
100100

101101
Some Parquet readers require a specific root repetition option to be able to read the file. For example, importing Parquet files into [Snowflake](https://www.snowflake.com/en/) requires the root repetition to be `undefined`.
102+
103+
- `max_row_group_length` (`integer`) (optional) (default: `134217728` (= 128 * 1024 * 1024))
104+
105+
The maximum number of rows in a single row group. Use a lower number to reduce memory usage when reading the Parquet files, and a higher number to increase the efficiency of reading the Parquet files.

plugins/destination/file/docs/_configuration.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ spec:
2020
# Parquet specific parameters:
2121
# version: "v2Latest"
2222
# root_repetition: "repeated"
23+
# max_row_group_length: 134217728 # 128 * 1024 * 1024
2324
# compression: "" # options: gzip
2425
# no_rotate: false
2526
# batch_size: 10000

plugins/destination/file/docs/overview.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,3 +96,7 @@ Reserved for future use.
9696
[Repetition option to use for the root node](https://github.com/apache/arrow/issues/20243). Supported values are `undefined`, `required`, `optional` and `repeated`.
9797

9898
Some Parquet readers require a specific root repetition option to be able to read the file. For example, importing Parquet files into [Snowflake](https://www.snowflake.com/en/) requires the root repetition to be `undefined`.
99+
100+
- `max_row_group_length` (`integer`) (optional) (default: `134217728` (= 128 * 1024 * 1024))
101+
102+
The maximum number of rows in a single row group. Use a lower number to reduce memory usage when reading the Parquet files, and a higher number to increase the efficiency of reading the Parquet files.

plugins/destination/gcs/docs/_configuration.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ spec:
1919
# Parquet specific parameters:
2020
# version: "v2Latest"
2121
# root_repetition: "repeated"
22+
# max_row_group_length: 134217728 # 128 * 1024 * 1024
2223

2324
# Optional parameters
2425
# compression: "" # options: gzip

plugins/destination/gcs/docs/overview.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,10 @@ Reserved for future use.
104104

105105
Some Parquet readers require a specific root repetition option to be able to read the file. For example, importing Parquet files into [Snowflake](https://www.snowflake.com/en/) requires the root repetition to be `undefined`.
106106

107+
- `max_row_group_length` (`integer`) (optional) (default: `134217728` (= 128 * 1024 * 1024))
108+
109+
The maximum number of rows in a single row group. Use a lower number to reduce memory usage when reading the Parquet files, and a higher number to increase the efficiency of reading the Parquet files.
110+
107111
## Authentication
108112

109113
:authentication

plugins/destination/kafka/docs/_configuration.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ spec:
2424
# Parquet specific parameters:
2525
# version: "v2Latest"
2626
# root_repetition: "repeated"
27+
# max_row_group_length: 134217728 # 128 * 1024 * 1024
2728

2829
# Optional parameters
2930
# compression: "" # options: gzip

plugins/destination/kafka/docs/overview.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,9 @@ Reserved for future use.
8989

9090
Some Parquet readers require a specific root repetition option to be able to read the file. For example, importing Parquet files into [Snowflake](https://www.snowflake.com/en/) requires the root repetition to be `undefined`.
9191

92+
- `max_row_group_length` (`integer`) (optional) (default: `134217728` (= 128 * 1024 * 1024))
93+
94+
The maximum number of rows in a single row group. Use a lower number to reduce memory usage when reading the Parquet files, and a higher number to increase the efficiency of reading the Parquet files.
9295

9396
### topic_details
9497

plugins/destination/s3/docs/_configuration.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ spec:
2323
# Parquet specific parameters:
2424
# version: "v2Latest"
2525
# root_repetition: "repeated"
26+
# max_row_group_length: 134217728 # 128 * 1024 * 1024
2627

2728
# Optional parameters
2829
# compression: "" # options: gzip

plugins/destination/s3/docs/overview.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,10 @@ Reserved for future use.
147147

148148
Some Parquet readers require a specific root repetition option to be able to read the file. For example, importing Parquet files into [Snowflake](https://www.snowflake.com/en/) requires the root repetition to be `undefined`.
149149

150+
- `max_row_group_length` (`integer`) (optional) (default: `134217728` (= 128 * 1024 * 1024))
151+
152+
The maximum number of rows in a single row group. Use a lower number to reduce memory usage when reading the Parquet files, and a higher number to increase the efficiency of reading the Parquet files.
153+
150154
### server_side_encryption_configuration
151155

152156
- `sse_kms_key_id` (`string`) (required)

0 commit comments

Comments
 (0)