Skip to content

Upgrade Arrow C++ to 24.0.0#647

Merged
adamreeve merged 9 commits into
G-Research:masterfrom
adamreeve:arrow-24-update
May 4, 2026
Merged

Upgrade Arrow C++ to 24.0.0#647
adamreeve merged 9 commits into
G-Research:masterfrom
adamreeve:arrow-24-update

Conversation

@adamreeve

Copy link
Copy Markdown
Contributor

No description provided.

@adamreeve

adamreeve commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

The benchmark project is reliably segfaulting. I have a minimal repro here: https://github.com/adamreeve/ParquetSharp/blob/segfault-repro/SegfaultRepro/Program.cs

Looks like it's related to mimalloc and only happens when writing Parquet from multiple threads (but doesn't need any concurrency). Backtrace is:

Details
Thread 15 "SegfaultRepro" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fdfb6ffe6c0 (LWP 238809)]
0x00007fdfa9b2b353 in _mi_theap_default_set (theap=0x7fdfabd3e700 <theap_main>) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/init.c:938
938       mi_assert_internal(theap->tld->thread_id==0 || theap->tld->thread_id==_mi_thread_id());
(gdb) bt
#0  0x00007fdfa9b2b353 in _mi_theap_default_set (theap=0x7fdfabd3e700 <theap_main>) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/init.c:938
#1  0x00007fdfa9b2ae65 in _mi_thread_init_theap_default () at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/init.c:610
#2  0x00007fdfa9b2b16d in mi_thread_init () at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/init.c:709
#3  0x00007fdfa9b36dab in _mi_malloc_generic (theap=0x7fdfabbd7140 <_mi_theap_empty>, size=16455, zero_huge_alignment=0, usable=0x0) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/page.c:984
#4  0x00007fdfa9b10189 in mi_theap_malloc_generic (theap=0x7fdfabbd7140 <_mi_theap_empty>, size=16447, zero=false, huge_alignment=0, usable=0x0) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/alloc.c:176
#5  _mi_theap_malloc_zero_ex (theap=0x7fdfabbd7140 <_mi_theap_empty>, size=16447, zero=false, huge_alignment=0, usable=0x0) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/alloc.c:239
#6  _mi_theap_malloc_zero (theap=0x7fdfabbd7140 <_mi_theap_empty>, size=16447, zero=false, usable=0x0) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/alloc.c:244
#7  0x00007fdfa9b16d4e in mi_theap_malloc_zero_no_guarded (theap=0x7fdfabbd7140 <_mi_theap_empty>, size=16447, zero=false, usable=0x0) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/alloc-aligned.c:57
#8  0x00007fdfa9b16f67 in mi_theap_malloc_zero_aligned_at_overalloc (theap=0x7fdfabbd7140 <_mi_theap_empty>, size=16384, alignment=64, offset=0, zero=false, usable=0x0)
    at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/alloc-aligned.c:94
#9  0x00007fdfa9b17364 in mi_theap_malloc_zero_aligned_at_generic (theap=0x7fdfabbd7140 <_mi_theap_empty>, size=16384, alignment=64, offset=0, zero=false, usable=0x0)
    at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/alloc-aligned.c:179
#10 0x00007fdfa9b1760c in mi_theap_malloc_zero_aligned_at (theap=0x7fdfabbd7140 <_mi_theap_empty>, size=16384, alignment=64, offset=0, zero=false, usable=0x0) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/alloc-aligned.c:230
#11 0x00007fdfa9b1764b in mi_theap_malloc_aligned_at (theap=0x7fdfabbd7140 <_mi_theap_empty>, size=16384, alignment=64, offset=0) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/alloc-aligned.c:239
#12 0x00007fdfa9b1767a in mi_theap_malloc_aligned (theap=0x7fdfabbd7140 <_mi_theap_empty>, size=16384, alignment=64) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/alloc-aligned.c:243
#13 0x00007fdfa9b177d7 in mi_malloc_aligned (size=16384, alignment=64) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/mimalloc/src/v3.3.1-c528aac713.clean/src/alloc-aligned.c:274
#14 0x00007fdfa8efa5e4 in arrow::(anonymous namespace)::MimallocAllocator::AllocateAligned (size=16384, alignment=64, out=0x7fdfb6ffc950) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/arrow/memory_pool.cc:404
#15 0x00007fdfa8efc9a9 in arrow::BaseMemoryPoolImpl<arrow::(anonymous namespace)::MimallocAllocator>::Allocate (this=0x7fdfabd3a5c0 <arrow::global_state+576>, size=16384, alignment=64, out=0x7fdfb6ffc950)
    at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/arrow/memory_pool.cc:473
#16 0x00007fdfa8f011d8 in arrow::PoolBuffer::Reserve (this=0x7fdfac0f9d70, capacity=16384) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/arrow/memory_pool.cc:932
#17 0x00007fdfa8f014d4 in arrow::PoolBuffer::Resize (this=0x7fdfac0f9d70, new_size=16384, shrink_to_fit=true) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/arrow/memory_pool.cc:956
#18 0x00007fdfa8efbf9d in arrow::(anonymous namespace)::ResizePoolBuffer<std::unique_ptr<arrow::ResizableBuffer>, std::unique_ptr<arrow::PoolBuffer> > (buffer=..., size=16384)
    at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/arrow/memory_pool.cc:1003
#19 0x00007fdfa8efbabb in arrow::AllocateResizableBuffer (size=16384, alignment=64, pool=0x7fdfabd3a5c0 <arrow::global_state+576>) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/arrow/memory_pool.cc:1030
#20 0x00007fdfa82eae6d in arrow::BufferBuilder::Resize (this=0x7fdfac0f6d20, new_capacity=16384, shrink_to_fit=true) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/arrow/buffer_builder.h:78
#21 0x00007fdfa8440df5 in arrow::TypedBufferBuilder<arrow::internal::HashTable<arrow::internal::ScalarMemoTable<int, arrow::internal::HashTable>::Payload>::Entry, void>::Resize (this=0x7fdfac0f6d20, new_capacity=1024, shrink_to_fit=true)
    at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/arrow/buffer_builder.h:287
#22 0x00007fdfa8440375 in arrow::internal::HashTable<arrow::internal::ScalarMemoTable<int, arrow::internal::HashTable>::Payload>::UpsizeBuffer (this=0x7fdfac0f6d00, capacity=1024)
    at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/arrow/util/hashing.h:351
#23 0x00007fdfa843e914 in arrow::internal::HashTable<arrow::internal::ScalarMemoTable<int, arrow::internal::HashTable>::Payload>::HashTable (this=0x7fdfac0f6d00, pool=0x7fdfabd3a5c0 <arrow::global_state+576>, capacity=1024)
    at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/arrow/util/hashing.h:252
#24 0x00007fdfa843cc4e in arrow::internal::ScalarMemoTable<int, arrow::internal::HashTable>::ScalarMemoTable (this=0x7fdfac0f6cf8, pool=0x7fdfabd3a5c0 <arrow::global_state+576>, entries=1024)
    at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/arrow/util/hashing.h:423
#25 0x00007fdfa84137af in parquet::(anonymous namespace)::DictEncoderImpl<parquet::PhysicalType<(parquet::Type::type)1> >::DictEncoderImpl (this=0x7fdfac0f6ca0, desc=0x7fdfac0f6c40, pool=0x7fdfabd3a5c0 <arrow::global_state+576>, __in_chrg=<optimized out>,
    __vtt_parm=<optimized out>) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/parquet/encoder.cc:472
#26 0x00007fdfa840f820 in std::make_unique<parquet::(anonymous namespace)::DictEncoderImpl<parquet::PhysicalType<(parquet::Type::type)1> >, parquet::ColumnDescriptor const*&, arrow::MemoryPool*&> () at /usr/include/c++/15/bits/unique_ptr.h:1084
#27 0x00007fdfa8408a26 in parquet::MakeEncoder (type_num=parquet::Type::INT32, encoding=parquet::Encoding::RLE_DICTIONARY, use_dictionary=true, descr=0x7fdfac0f6c40, pool=0x7fdfabd3a5c0 <arrow::global_state+576>)
    at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/parquet/encoder.cc:1773
#28 0x00007fdfa8363c87 in parquet::TypedColumnWriterImpl<parquet::PhysicalType<(parquet::Type::type)1> >::TypedColumnWriterImpl (this=0x7fdfac0148b0, metadata=0x7fdfac0dde90, pager=std::unique_ptr<parquet::PageWriter> = {...}, use_dictionary=true,
    encoding=parquet::Encoding::RLE_DICTIONARY, properties=0x7fdfac013ec0, bloom_filter=0x0) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/parquet/column_writer.cc:1291
#29 0x00007fdfa8361859 in std::_Construct<parquet::TypedColumnWriterImpl<parquet::PhysicalType<(parquet::Type::type)1> >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, bool const&, parquet::Encoding::type&, parquet::WriterProperties const*&, parquet::BloomFilter*&> (__p=0x7fdfac0148b0) at /usr/include/c++/15/bits/stl_construct.h:133
#30 0x00007fdfa835e68c in std::allocator_traits<std::allocator<void> >::construct<parquet::TypedColumnWriterImpl<parquet::PhysicalType<(parquet::Type::type)1> >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, bool const&, parquet::Encoding::type&, parquet::WriterProperties const*&, parquet::BloomFilter*&> (__p=0x7fdfac0148b0) at /usr/include/c++/15/bits/alloc_traits.h:805
#31 std::_Sp_counted_ptr_inplace<parquet::TypedColumnWriterImpl<parquet::PhysicalType<(parquet::Type::type)1> >, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, bool const&, parquet::Encoding::type&, parquet::WriterProperties const*&, parquet::BloomFilter*&> (this=0x7fdfac0148a0, __a=...) at /usr/include/c++/15/bits/shared_ptr_base.h:606
#32 0x00007fdfa835ac39 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<parquet::TypedColumnWriterImpl<parquet::PhysicalType<(parquet::Type::type)1> >, std::allocator<void>, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, bool const&, parquet::Encoding::type&, parquet::WriterProperties const*&, parquet::BloomFilter*&> (this=0x7fdfb6ffd5a8, __p=@0x7fdfb6ffd5a0: 0x0, __a=...) at /usr/include/c++/15/bits/shared_ptr_base.h:969
#33 0x00007fdfa83513be in std::__shared_ptr<parquet::TypedColumnWriterImpl<parquet::PhysicalType<(parquet::Type::type)1> >, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<void>, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, bool const&, parquet::Encoding::type&, parquet::WriterProperties const*&, parquet::BloomFilter*&> (this=0x7fdfb6ffd5a0, __tag=...) at /usr/include/c++/15/bits/shared_ptr_base.h:1719
#34 0x00007fdfa8346b24 in std::shared_ptr<parquet::TypedColumnWriterImpl<parquet::PhysicalType<(parquet::Type::type)1> > >::shared_ptr<std::allocator<void>, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, bool const&, parquet::Encoding::type&, parquet::WriterProperties const*&, parquet::BloomFilter*&> (this=0x7fdfb6ffd5a0, __tag=...) at /usr/include/c++/15/bits/shared_ptr.h:463
#35 0x00007fdfa833eff6 in std::make_shared<parquet::TypedColumnWriterImpl<parquet::PhysicalType<(parquet::Type::type)1> >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, bool const&, parquet::Encoding::type&, parquet::WriterProperties const*&, parquet::BloomFilter*&> () at /usr/include/c++/15/bits/shared_ptr.h:1008
#36 0x00007fdfa8326873 in parquet::ColumnWriter::Make (metadata=0x7fdfac0dde90, pager=std::unique_ptr<parquet::PageWriter> = {...}, properties=0x7fdfac013ec0, bloom_filter=0x0)
    at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/parquet/column_writer.cc:2722
#37 0x00007fdfa8469db1 in parquet::RowGroupSerializer::CreateColumnWriterForColumn (this=0x7fdfac0104d0, col_meta=0x7fdfac0dde90, column_ordinal=0) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/parquet/file_writer.cc:324
#38 0x00007fdfa8468de7 in parquet::RowGroupSerializer::NextColumn (this=0x7fdfac0104d0) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/parquet/file_writer.cc:150
#39 0x00007fdfa8467237 in parquet::RowGroupWriter::NextColumn (this=0x7fdfac010580) at /home/adam/dev/gross/ParquetSharp/build/vcpkg/buildtrees/arrow/src/e-arrow-24-991446d5c5.clean/cpp/src/parquet/file_writer.cc:56
#40 0x00007fdfa818b67a in RowGroupWriter_NextColumn (row_group_writer=0x7fdfac010580, column_writer=0x7fdfb6ffda38) at /home/adam/dev/gross/ParquetSharp/cpp/RowGroupWriter.cpp:28
#41 0x00007fff7918f8c3 in ?? ()
#42 0x00007fdfb6ffda38 in ?? ()
#43 0x00007fdfb6ffda38 in ?? ()
#44 0x0000000000000001 in ?? ()

The repro works fine if setting ARROW_DEFAULT_MEMORY_POOL to system or jemalloc instead of using the default mimalloc.

The current mimalloc version after updating the vcpkg baseline is 3.3.1. In the previous 23.0.0 release of ParquetSharp, mimalloc 3.2.7 was used. I tested downgrading mimalloc to 3.2.7 and that works, so I think we should just pin mimalloc to the older version for now and report this upstream (ideally with a repro that just uses mimalloc directly from C++).

Edit: mimalloc 3.2.8 also works so I've pinned to that. The first broken version is 3.3.0, and 3.3.2 is also broken.

Comment thread vcpkg.json Outdated
Comment thread .github/workflows/ci.yml
Comment on lines +124 to +126
curl -L -o /tmp/bison-${BISON_VERSION}.tar.gz https://ftp.gnu.org/gnu/bison/bison-${BISON_VERSION}.tar.gz &&
tar -xf /tmp/bison-${BISON_VERSION}.tar.gz -C /tmp &&
cd /tmp/bison-${BISON_VERSION} && ./configure && make && make install"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit slow, and we could instead use the bison-bin PyPI package (https://github.com/trim21/bison-py), but I don't think this is widely used enough that we should trust it, and it's hard to verify the wheel contents.

Maybe we should be building a docker image that we can cache instead, but that feels like it should be a separate PR.

@adamreeve adamreeve merged commit 4f99f4f into G-Research:master May 4, 2026
58 of 60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants