Skip to content

GH-39870: [Go] Include buffered pages in TotalBytesWritten#40105

Merged
zeroshade merged 1 commit intoapache:mainfrom
matthewmcnew:buffered-pages
Feb 21, 2024
Merged

GH-39870: [Go] Include buffered pages in TotalBytesWritten#40105
zeroshade merged 1 commit intoapache:mainfrom
matthewmcnew:buffered-pages

Conversation

@matthewmcnew
Copy link
Copy Markdown
Contributor

@matthewmcnew matthewmcnew commented Feb 16, 2024

Rationale for this change

Currently, buffered data pages are not included in TotalBytesWritten this means that their is not an accurate estimate of the size of the current size.

Are there any user-facing changes?

RowGroupTotalBytesWritten will include the TotalBytes in buffered DataPages minus the buffered data pages headers.

@github-actions
Copy link
Copy Markdown

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@matthewmcnew matthewmcnew marked this pull request as ready for review February 20, 2024 20:54
@matthewmcnew matthewmcnew changed the title Include buffered pages in TotalBytesWritten GH-39870: [Go] Include buffered pages in TotalBytesWritten Feb 20, 2024
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #39870 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Feb 20, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 20, 2024
Copy link
Copy Markdown
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zeroshade zeroshade merged commit 29a0581 into apache:main Feb 21, 2024
@zeroshade zeroshade removed the awaiting change review Awaiting change review label Feb 21, 2024
@github-actions github-actions bot added the awaiting merge Awaiting merge label Feb 21, 2024
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 29a0581.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Go][Parquet] Inaccurate RowGroupTotalCompressedBytes/RowGroupTotalBytesWritten with go parquet file writer

2 participants