Fix Parquet writer to produce evenly-sized row groups by marin-ma · Pull Request #7734 · facebookincubator/velox

marin-ma · 2023-11-27T07:30:59Z

When setting parquet::DefaultFlushPolicy::rowsInRowGroup_, Velox parquet writer might not produce consistent row groups in the output file. This patch ensures the exact number of rows for each row group is flushed if possible.

This PR makes the following changes:

When a new call to write(data) is made, it first flushes the cached batches if the staging rows or bytes meet the flush condition. Whether flushed or not, it performs the following operations on the input data:

If stagingRows + data->size() <= parquet::DefaultFlushPolicy::rowsInRowGroup_, cache the data and return.
If stagingRows + data->size() > parquet::DefaultFlushPolicy::rowsInRowGroup_, slice the data in a loop so that each block has exactly rowsInRowGroup_ rows, until the remainingData <= rowsInRowGrop_.
After slicing, the data will be like: | stagingRows + sliced data1 | sliced data2 | ... | (remaining data)
Except for the remaining data, all data will be flushed before the next slice. Cache the remaining data and return.

Issue: #7733

netlify · 2023-11-27T07:31:04Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`130d291`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/656445b56b49fa000846e4b3

marin-ma · 2023-11-27T07:33:35Z

With this patch, the row groups generated in the output file are as follows:

row group 1:     RC:1048576 TS:72804882 OFFSET:4
row group 2:     RC:1048576 TS:72791467 OFFSET:32775245
row group 3:     RC:1048576 TS:72787140 OFFSET:65551495
row group 4:     RC:1048576 TS:72822468 OFFSET:98310737
row group 5:     RC:1048576 TS:72822943 OFFSET:131105160
row group 6:     RC:1048576 TS:72809444 OFFSET:163867758
row group 7:     RC:1048576 TS:72812367 OFFSET:196626208
row group 8:     RC:1048576 TS:72818130 OFFSET:229397754
row group 9:     RC:1048576 TS:72818842 OFFSET:262175626
row group 10:    RC:1048576 TS:72810621 OFFSET:294959668
row group 11:    RC:1048576 TS:72787619 OFFSET:327777654
row group 12:    RC:1048576 TS:72809069 OFFSET:360544626
row group 13:    RC:1048576 TS:72823618 OFFSET:393311032
row group 14:    RC:1048576 TS:72819293 OFFSET:426117932
row group 15:    RC:1048576 TS:72823097 OFFSET:458912409
row group 16:    RC:1048576 TS:72824384 OFFSET:491689520
row group 17:    RC:1048576 TS:72828216 OFFSET:524490944
row group 18:    RC:1048576 TS:72819588 OFFSET:557267306
row group 19:    RC:1048576 TS:72815294 OFFSET:590077215
row group 20:    RC:1048576 TS:72822598 OFFSET:622866176
row group 21:    RC:1048576 TS:72831177 OFFSET:655643910
row group 22:    RC:1048576 TS:72809161 OFFSET:688410312
row group 23:    RC:1048576 TS:72814410 OFFSET:721209873
row group 24:    RC:1048576 TS:72812891 OFFSET:753994847
row group 25:    RC:1048576 TS:72812006 OFFSET:786783787
row group 26:    RC:1048576 TS:72815289 OFFSET:819560617
row group 27:    RC:1048576 TS:72819100 OFFSET:852365581
row group 28:    RC:1048576 TS:72818776 OFFSET:885155164
row group 29:    RC:1048576 TS:72813158 OFFSET:917951654
row group 30:    RC:841376 TS:58670551 OFFSET:950747217

marin-ma · 2023-11-27T10:01:44Z

@mbasmanova Could you help to review? Thanks!

mbasmanova

@marin-ma Rong, thank you for the improvement, but would you explain a bit more the new logic? It will make it easier to review.

@majetideepak Deepak, would you help review this PR?

marin-ma · 2023-11-28T01:24:34Z

@marin-ma Rong, thank you for the improvement, but would you explain a bit more the new logic? It will make it easier to review.

@majetideepak Deepak, would you help review this PR?

Thanks you! I've included additional explanations in the PR description for clarity.

majetideepak · 2023-11-28T09:32:36Z

| This patch ensures the exact number of rows for each row group is flushed if possible.

@marin-ma Can you also explain the motivation for the above requirement? It should be fine if each row group in a Parquet file has a slightly different number of rows. I assume a batch to write is usually much smaller than the Row Group row count. Is this assumption invalid in your case?
The downside of this patch is the additional computation of chunking/slicing.
What does the exported Parquet file look like without this change?

marin-ma · 2023-11-28T09:43:00Z

| This patch ensures the exact number of rows for each row group is flushed if possible.

@marin-ma Can you also explain the motivation for the above requirement? It should be fine if each row group in a Parquet file has a slightly different number of rows. I assume a batch to write is usually much smaller than the Row Group row count. Is this assumption invalid in your case? The downside of this patch is the additional computation of chunking/slicing. What does the exported Parquet file look like without this change?

@majetideepak Thanks. The row group configuration in the parquet file without this modification was posted in the issue; apologies for any confusion.

The key point here is that the Velox Parquet writer delays writing until the input reaches or exceeds the set row group size. However, when it flushes the cached RecordBatches through the Arrow Parquet writer, and the total rows is larger than the configured row number, the data is divided into two row groups. The first matches the configured row number, while the second comprises the remaining data. This often results in the second row group being significantly smaller than usual.

I assume a batch to write is usually much smaller than the Row Group row count.

I think it's not our case. We set the Row Group row count to an expected value, and Row Group size to a very large value to make sure the "Row Group row count" threshold being reached first. Therefore, the total rows of the cached recordbatches before flush is always >= "Row Group row count". But each recordbatch is small, in our case we set the core::QueryConfig::kPreferredOutputBatchRows to 4096, and "Row Group row count" to 1M.

majetideepak · 2023-11-28T11:27:08Z

@marin-ma Thank you for the clarification. I missed the description of the issue linked.
The main problem is that the last chunk can be very small leading to a small rowgroup.
This issue is also due to the usage of the WriteTable API that completely writes the input table.
I feel we should simplify the writer code path now that the Arrow Parquet Writer code has been moved to Velox.
With the current PR, there will be two places where the chunking will happen. The new chunking that is added here and the chunking inside WriteTable.
If we instead use the NewRowGroup, WriteColumnChunk APIs, we can simply the code paths significantly.
We can copy the chunking inside TableWrite to flush() and skip flushing the last chunk based on the flush property.

stale · 2024-02-27T05:24:53Z

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

write fixed rows row groups

130d291

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 27, 2023

mbasmanova requested a review from majetideepak November 27, 2023 14:40

mbasmanova changed the title ~~Fix for inconsistent row group sizes in velox parquet writer~~ Fix inconsistent row group sizes in Parquet writer Nov 27, 2023

mbasmanova changed the title ~~Fix inconsistent row group sizes in Parquet writer~~ Fix Parquet writer to produce evenly-sized row groups Nov 27, 2023

mbasmanova reviewed Nov 27, 2023

View reviewed changes

pedroerp requested a review from yingsu00 November 27, 2023 16:54

stale Bot added the stale label Feb 27, 2024

stale Bot closed this Mar 12, 2024

FelixYBW mentioned this pull request Feb 7, 2026

[VL] useful Velox PRs not merged into upstream apache/gluten#11585

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Parquet writer to produce evenly-sized row groups#7734

Fix Parquet writer to produce evenly-sized row groups#7734
marin-ma wants to merge 1 commit intofacebookincubator:mainfrom
marin-ma:write-fixed-rows-rowgroup

marin-ma commented Nov 27, 2023 •

edited

Loading

Uh oh!

netlify Bot commented Nov 27, 2023 •

edited

Loading

Uh oh!

marin-ma commented Nov 27, 2023

Uh oh!

marin-ma commented Nov 27, 2023

Uh oh!

mbasmanova left a comment

Uh oh!

marin-ma commented Nov 28, 2023

Uh oh!

majetideepak commented Nov 28, 2023 •

edited

Loading

Uh oh!

marin-ma commented Nov 28, 2023 •

edited

Loading

Uh oh!

majetideepak commented Nov 28, 2023

Uh oh!

stale Bot commented Feb 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

marin-ma commented Nov 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify Bot commented Nov 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for meta-velox canceled.

Uh oh!

marin-ma commented Nov 27, 2023

Uh oh!

marin-ma commented Nov 27, 2023

Uh oh!

mbasmanova left a comment

Choose a reason for hiding this comment

Uh oh!

marin-ma commented Nov 28, 2023

Uh oh!

majetideepak commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marin-ma commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

majetideepak commented Nov 28, 2023

Uh oh!

stale Bot commented Feb 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

marin-ma commented Nov 27, 2023 •

edited

Loading

netlify Bot commented Nov 27, 2023 •

edited

Loading

majetideepak commented Nov 28, 2023 •

edited

Loading

marin-ma commented Nov 28, 2023 •

edited

Loading