-
Notifications
You must be signed in to change notification settings - Fork 22
Blog post for DataFusion 51.0.0 #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Yongting You <2010youy01@gmail.com>
|
Thank you @2010YOUY01 |
|
|
||
| ### Decimal32/Decimal64 support | ||
|
|
||
| The new Arrow types `Decimal32` and `Decimal64` are now supported in DataFusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nice!
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alamb and @2010YOUY01 IMO it is LGTM
Btw I found that release notes are now concise, easy to read and follow!
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great blog post, thank you!
| **Fewer object store round-trips for Parquet by default** | ||
|
|
||
| DataFusion now sets a default `metadata_size_hint` for [Apache Parquet] scans | ||
| ([#18118]), avoiding the extra | ||
| “last 8‑byte” request many clouds require to read file footers. Remote scans | ||
| typically drop from five requests to four per file, cutting latency and transfer | ||
| costs without any application changes. Thanks to [zhuqi-lucas] for leading this | ||
| effort. | ||
|
|
||
| [apache parquet]: https://parquet.apache.org/ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| **Fewer object store round-trips for Parquet by default** | |
| DataFusion now sets a default `metadata_size_hint` for [Apache Parquet] scans | |
| ([#18118]), avoiding the extra | |
| “last 8‑byte” request many clouds require to read file footers. Remote scans | |
| typically drop from five requests to four per file, cutting latency and transfer | |
| costs without any application changes. Thanks to [zhuqi-lucas] for leading this | |
| effort. | |
| [apache parquet]: https://parquet.apache.org/ |
I think this is a duplicate to the below 'Better Defaults for Remote Parquet Reads' section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a great catch -- I consolidated them in 33e4375
|
|
||
| We are proud to announce the release of [DataFusion 51.0.0]. This post highlights | ||
| some of the major improvements since [DataFusion 50.0.0]. The complete list of | ||
| changes is available in the [changelog]. Thanks to the [128 contributors] for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
128, wow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed -- I think this is the part of this blog post I am most proud of
|
I plan to publish this tomorrow, 2025-11-25. Please let me know if anyone wants more time to review or has any additional commetns |
|
The blog post is now live: https://datafusion.apache.org/blog/2025/11/25/datafusion-51.0.0/ |
51.0.0release datafusion#18548See rendered preview: https://datafusion.staged.apache.org/blog/2025/11/25/datafusion-51.0.0/
For anyone curious, I asked
codexto draft this PR with the following prompt. It did a pretty good job for the rough draftDetails
We are going to write a blog post for the DataFusion 51.0.0 release
We need to cover the major features in this release. If you are unsure of any content, please leave a "TODO" note in the text and we can fill it in later.
I have copied the old release post here as a starting point:
content/blog/2025-11-25-datafusion-51.0.0.mdHere are the PRs this release (approx based on dates) - https://github.com/apache/datafusion/pulls?q=is%3Apr+merged%3A2025-09-16..2025-11-08
The changelog is here: https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md
The list of major features can be found here apache/datafusion#17558 under the section "Features to mention in the blog (if they make it)"
(please only include the ones that made it into the release, with a checkmark)