ARROW-3762: [C++] manage ChunkedArrayBuilder capacity explicitly#4695
ARROW-3762: [C++] manage ChunkedArrayBuilder capacity explicitly#4695bkietz wants to merge 5 commits intoapache:masterfrom
Conversation
095862e to
fd35c7f
Compare
wesm
left a comment
There was a problem hiding this comment.
This looks okay, but I think we need to add some unit tests (that aren't disabled) to exercise the various code paths in ChunkedBinaryBuilder::Reserve to assert that they do what we expect. Otherwise if we refactor later we're going to be playing whackamole
pitrou
left a comment
There was a problem hiding this comment.
Looks like a reasonable improvement to me. Unfortunately it'll be difficult to come up with a test that doesn't blow 16+GB of memory.
|
I'll add some tests to src/arrow/array-binary-test.cc |
fd35c7f to
9ae140f
Compare
|
Hm. Can we make the 2GB limit something that's configurable rather than hard-coded? In theory that's the purpose of the max chunk size. We can't have such a long-running test in CI |
|
We could certainly add another to apply to the # of strings in the string_array, or alternatively use |
|
Oh okay, sorry for my confusion. Can we make the number of elements per chunk configurable, then? I can see a use case for having no more than 64K elements per chunk, no matter what their size. Then this is much easier to test |
|
Will do |
16baa10 to
95a4e30
Compare
|
Merging, the failure is the cloud.r-project.org DNS issue |
ChunkedArrayBuilder should never request BinaryBuilder reserve space for more strings than it can hold (`kListMaximumElements`); with this change it will instead of begin a new chunk Author: Benjamin Kietzman <bengilgit@gmail.com> Closes apache#4695 from bkietz/3762-Parquet-arrowTable-reads-error-when-over and squashes the following commits: 95a4e30 <Benjamin Kietzman> add configurable maximum element count to ChunkedBinaryBuilder 5331a2c <Benjamin Kietzman> add test for ChunkedBinaryBuilder's new reserve behavior 3eb5719 <Benjamin Kietzman> manage ChunkedBinaryBuilder's capacity d626c07 <Benjamin Kietzman> attempted fix, killed: OOM cb000ab <Benjamin Kietzman> add (failing) test which repros issue in c++
ChunkedArrayBuilder should never request BinaryBuilder reserve space for more strings than it can hold (
kListMaximumElements); with this change it will instead of begin a new chunk