-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Segmentation fault when reading from a Parquet file with parquet native reader v3 #87509
Copy link
Copy link
Closed
Labels
bugConfirmed user-visible misbehaviour in official releaseConfirmed user-visible misbehaviour in official releasecomp-formatsInput/output formats (CSV/JSON/Parquet/ORC/Arrow/Protobuf/etc.).Input/output formats (CSV/JSON/Parquet/ORC/Arrow/Protobuf/etc.).crashCrash / segfault / abortCrash / segfault / abort
Description
Company or project name
No response
Describe what's wrong
I hit a ClickHouse crash when selecting from a file that has a Bool column with a bloom filter applied to it:
[665bc865432b] 2025.09.23 13:23:03.769097 [ 82 ] <Fatal> BaseDaemon: ########################################
[665bc865432b] 2025.09.23 13:23:03.769219 [ 82 ] <Fatal> BaseDaemon: (version 25.8.4.13 (official build), build id: AA4DCCDCCDCECCBA683D762F836A961FE245B65E, git hash: 42d08f157cb5e869d62715738177452c45dc007e) (from thread 940) (query_id: 968c5383-f61f-4490-a967-5cc280361fdc) (query: SELECT * FROM file('boolean_bloom.gz.parquet', 'Parquet') SETTINGS input_format_parquet_use_native_reader_v3 = 1) Received signal Segmentation fault (11)
[665bc865432b] 2025.09.23 13:23:03.769269 [ 82 ] <Fatal> BaseDaemon: Address: 0x101010109. Access: write. Address not mapped to object.
[665bc865432b] 2025.09.23 13:23:03.769330 [ 82 ] <Fatal> BaseDaemon: Stack trace: 0x000000001a1d8a85 0x000000001a1ee72b 0x000000001a1cfcc8 0x0000000014e3fb56 0x0000000013508aab 0x000000001350fe26 0x0000000013505a92 0x000000001350d55a 0x00007be600b16ac3 0x00007be600ba7a04
[665bc865432b] 2025.09.23 13:23:03.769485 [ 82 ] <Fatal> BaseDaemon: 2. DB::Parquet::PlainBooleanDecoder::~PlainBooleanDecoder() @ 0x000000001a1d8a85
[665bc865432b] 2025.09.23 13:23:03.769574 [ 82 ] <Fatal> BaseDaemon: 3. DB::Parquet::Reader::decodePrimitiveColumn(DB::Parquet::Reader::ColumnChunk&, DB::Parquet::Reader::PrimitiveColumnInfo const&, DB::Parquet::Reader::ColumnSubchunk&, DB::Parquet::Reader::RowGroup const&, DB::Parquet::Reader::RowSubgroup const&) @ 0x000000001a1ee72b
[665bc865432b] 2025.09.23 13:23:03.769639 [ 82 ] <Fatal> BaseDaemon: 4. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<DB::Parquet::ReadManager::scheduleTasksIfNeeded(DB::Parquet::ReadStage)::$_1, void ()>>(std::__function::__policy_storage const*) @ 0x000000001a1cfcc8
[665bc865432b] 2025.09.23 13:23:03.769709 [ 82 ] <Fatal> BaseDaemon: 5. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<DB::ThreadPoolCallbackRunnerFast::initThreadPool(ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>&, unsigned long, String, std::shared_ptr<DB::ThreadGroup>)::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x0000000014e3fb56
[665bc865432b] 2025.09.23 13:23:03.769787 [ 82 ] <Fatal> BaseDaemon: 6. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker() @ 0x0000000013508aab
[665bc865432b] 2025.09.23 13:23:03.769884 [ 82 ] <Fatal> BaseDaemon: 7. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000001350fe26
[665bc865432b] 2025.09.23 13:23:03.769962 [ 82 ] <Fatal> BaseDaemon: 8. ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x0000000013505a92
[665bc865432b] 2025.09.23 13:23:03.770012 [ 82 ] <Fatal> BaseDaemon: 9. void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000001350d55a
[665bc865432b] 2025.09.23 13:23:03.770100 [ 82 ] <Fatal> BaseDaemon: 10. ? @ 0x0000000000094ac3
[665bc865432b] 2025.09.23 13:23:03.770156 [ 82 ] <Fatal> BaseDaemon: 11. ? @ 0x0000000000125a04
[665bc865432b] 2025.09.23 13:23:03.904590 [ 82 ] <Fatal> BaseDaemon: Integrity check of the executable successfully passed (checksum: DC9D3CA0A2E89B903B7F99DC0AC7E60E)
[665bc865432b] 2025.09.23 13:23:04.502618 [ 82 ] <Fatal> BaseDaemon: Report this error to https://github.com/ClickHouse/ClickHouse/issues
[665bc865432b] 2025.09.23 13:23:04.502899 [ 82 ] <Fatal> BaseDaemon: Changed settings: input_format_parquet_use_native_reader_v3 = true
Does it reproduce on the most recent release?
Yes
How to reproduce
ClickHouse version: 25.8.4.13
Settings: input_format_parquet_use_native_reader_v3=1
The file that caused the issue: https://github.com/Selfeer/parquet-files/blob/main/parquet/boolean_bloom.gz.parquet
Steps:
I just ran the SELECT query a couple of times:
SELECT * FROM file('boolean_bloom.gz.parquet', 'Parquet') SETTINGS input_format_parquet_use_native_reader_v3 = 1Expected behavior
No response
Error message and/or stacktrace
No response
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugConfirmed user-visible misbehaviour in official releaseConfirmed user-visible misbehaviour in official releasecomp-formatsInput/output formats (CSV/JSON/Parquet/ORC/Arrow/Protobuf/etc.).Input/output formats (CSV/JSON/Parquet/ORC/Arrow/Protobuf/etc.).crashCrash / segfault / abortCrash / segfault / abort