GH-39789: [Go][Parquet] Close current row group when finished writing unbuffered batch by joellubi · Pull Request #43326 · apache/arrow

joellubi · 2024-07-18T21:18:46Z

Rationale for this change

The number of bytes reported by FileWriter.RowGroupTotalBytesWritten() was consistently lower than the actual bytes in the output buffer, if it was read before closing the writer. The issue is that the last column's data page was not flushed until the entire writer was closed, causing it's bytes not to be included in the total. By closing the row group writer before returning from Write(), we can ensure all pages are flushed and the totalBytesWritten will be accurate.

What changes are included in this PR?

Close row group writer before returning from FileWriter.Write()
Test to ensure stats are up to date before closing the writer

Are these changes tested?

Yes

Are there any user-facing changes?

FileWriter.RowGroupTotalBytesWritten() will be accurate when read while still writing to the file.

GitHub Issue: [Go][Parquet] Potential inconsistency between TotalBytesWritten tracked by RowGroupWriter and actual bytes written to io.Writer #39789

github-actions · 2024-07-18T21:19:12Z

⚠️ GitHub issue #39789 has been automatically assigned in GitHub to PR creator.

conbench-apache-arrow · 2024-07-19T15:05:54Z

After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit ed67a42.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

…rgetSize on ingestion (#2026) Fixes: #1997 **Core Changes** - Change ingestion `writeParquet` function to use unbuffered writer, skipping 0-row records to avoid recurrence of #1847 - Use parquet writer's internal `RowGroupTotalBytesWritten()` method to track output file size in favor of `limitWriter` - Unit test to validate that file cutoff occurs precisely when expected **Secondary Changes** - Bump arrow dependency to `v18` to pull in the changes from [ARROW-43326](apache/arrow#43326) - Fix flightsql test that depends on hardcoded arrow version

…(#2026) Fixes: #1997 **Core Changes** - Change ingestion `writeParquet` function to use unbuffered writer, skipping 0-row records to avoid recurrence of #1847 - Use parquet writer's internal `RowGroupTotalBytesWritten()` method to track output file size in favor of `limitWriter` - Unit test to validate that file cutoff occurs precisely when expected **Secondary Changes** - Bump arrow dependency to `v18` to pull in the changes from [ARROW-43326](apache/arrow#43326) - Fix flightsql test that depends on hardcoded arrow version

close row group when finished writing

2ba04ec

github-actions bot added Component: Go awaiting committer review Awaiting committer review labels Jul 18, 2024

joellubi mentioned this pull request Jul 18, 2024

go/adbc/driver/flightsql: Default Value (10 MB) For adbc.snowflake.rpc.ingest_target_file_size Not Used In 1.1.0 apache/arrow-adbc#1997

Closed

joellubi marked this pull request as ready for review July 18, 2024 21:38

joellubi requested a review from zeroshade as a code owner July 18, 2024 21:38

zeroshade approved these changes Jul 19, 2024

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Jul 19, 2024

zeroshade merged commit ed67a42 into apache:main Jul 19, 2024

zeroshade removed the awaiting merge Awaiting merge label Jul 19, 2024

zeroshade mentioned this pull request Jul 19, 2024

[Go][Parquet] Potential inconsistency between TotalBytesWritten tracked by RowGroupWriter and actual bytes written to io.Writer #39789

Closed

joellubi mentioned this pull request Jul 19, 2024

fix(go/adbc/driver/snowflake): split files properly after reaching targetSize on ingestion apache/arrow-adbc#2026

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-39789: [Go][Parquet] Close current row group when finished writing unbuffered batch#43326

GH-39789: [Go][Parquet] Close current row group when finished writing unbuffered batch#43326
zeroshade merged 1 commit intoapache:mainfrom
joellubi:gh-1997-writer

joellubi commented Jul 18, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Jul 18, 2024

Uh oh!

conbench-apache-arrow bot commented Jul 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joellubi commented Jul 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Jul 18, 2024

Uh oh!

conbench-apache-arrow bot commented Jul 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joellubi commented Jul 18, 2024 •

edited

Loading