-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Can not read hive partitioned parquet files with s3 and RawBlob, One formats #87515
Copy link
Copy link
Closed
Altinity/ClickHouse
#1149Labels
potential bugTo be reviewed by developers and confirmed/rejected.To be reviewed by developers and confirmed/rejected.
Description
Company or project name
No response
Describe what's wrong
I can not read hive partitioned parquet files with s3 and RawBlob, One formats starting from 25.8.
SELECT *
FROM s3('http://minio:9000/warehouse/data/**/**.parquet', 'admin', 'password', 'RawBlob')
SETTINGS use_hive_partitioning = 1Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: This input format is only suitable for tables with a single column of type String but the number of columns is 2: While executing S3(_table_function.s3)ReadStep. (BAD_ARGUMENTS)
Does it reproduce on the most recent release?
Yes
How to reproduce
- Put hive partitioned file to object storage
- Run following query:
SELECT *
FROM s3('http://minio:9000/warehouse/data/**/**.parquet', 'admin', 'password', 'One')
SETTINGS use_hive_partitioning = 1
Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: One input format is only suitable for tables with a single column of type UInt8 but the number of columns is 2: While executing S3(_table_function.s3)ReadStep. (BAD_ARGUMENTS)
With use_hive_partitioning=0, query works correctly:
SELECT *
FROM s3('http://minio:9000/warehouse/data/**/**.parquet', 'admin', 'password', 'One')
SETTINGS use_hive_partitioning = 0 ┌─dummy─┐
1. │ 0 │
2. │ 0 │
3. │ 0 │
4. │ 0 │
5. │ 0 │
6. │ 0 │
7. │ 0 │
8. │ 0 │
9. │ 0 │
10. │ 0 │
11. │ 0 │
12. │ 0 │
13. │ 0 │
14. │ 0 │
└───────┘
14 rows in set. Elapsed: 0.019 sec.
Expected behavior
No exception, correct result
Error message and/or stacktrace
[clickhouse1] 2025.09.23 16:51:14.119734 [ 41 ] {2932d212-ada5-45ed-9525-4927be0a64f0} <Error> executeQuery: Code: 36. DB::Exception: This input format is only suitable for tables with a single column of type String but the number of columns is 2: While executing ReadFromObjectStorage. (BAD_ARGUMENTS) (version 25.8.4.13 (official build)) (from 127.0.0.1:45920) (query 1, line 1) (in query: SELECT * FROM s3('http://minio:9000/warehouse/data/**/**.parquet', 'admin', '[HIDDEN]', 'RawBlob') SETTINGS use_hive_partitioning = 1), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000133aa85f
1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000c8559ce
2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000c855480
3. DB::Exception::Exception<unsigned long>(int, FormatStringHelperImpl<std::type_identity<unsigned long>::type>, unsigned long&&) @ 0x000000000e293cab
4. std::shared_ptr<DB::IInputFormat> std::__function::__policy_invoker<std::shared_ptr<DB::IInputFormat> (DB::ReadBuffer&, DB::Block const&, DB::RowInputFormatParams const&, DB::FormatSettings const&)>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<DB::registerInputFormatRawBLOB(DB::FormatFactory&)::$_0, std::shared_ptr<DB::IInputFormat> (DB::ReadBuffer&, DB::Block const&, DB::RowInputFormatParams const&, DB::FormatSettings const&)>>(std::__function::__policy_storage const*, DB::ReadBuffer&, DB::Block const&, DB::RowInputFormatParams const&, DB::FormatSettings const&) (.llvm.7030994725452934287) @ 0x0000000019d46bab
5. DB::FormatFactory::getInput(String const&, DB::ReadBuffer&, DB::Block const&, std::shared_ptr<DB::Context const> const&, unsigned long, std::optional<DB::FormatSettings> const&, std::shared_ptr<DB::FormatParserSharedResources>, std::shared_ptr<DB::FormatFilterInfo>, bool, DB::CompressionMethod, bool) const @ 0x0000000019a69271
6. DB::StorageObjectStorageSource::createReader(unsigned long, std::shared_ptr<DB::IObjectIterator> const&, std::shared_ptr<DB::StorageObjectStorageConfiguration> const&, std::shared_ptr<DB::IObjectStorage> const&, DB::ReadFromFormatInfo&, std::optional<DB::FormatSettings> const&, std::shared_ptr<DB::Context const> const&, DB::SchemaCache*, std::shared_ptr<Poco::Logger> const&, unsigned long, std::shared_ptr<DB::FormatParserSharedResources>, std::shared_ptr<DB::FormatFilterInfo>, bool) @ 0x0000000016333fd3
7. DB::StorageObjectStorageSource::generate() @ 0x0000000016331334
8. DB::ISource::tryGenerate() @ 0x0000000019abf0de
9. DB::ISource::work() @ 0x0000000019abec76
10. DB::ExecutionThreadContext::executeTask() @ 0x0000000019add642
11. DB::PipelineExecutor::executeStepImpl(unsigned long, DB::IAcquiredSlot*, std::atomic<bool>*) @ 0x0000000019acf7d0
12. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads(std::shared_ptr<DB::IAcquiredSlot>)::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x0000000019ad3583
13. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker() @ 0x0000000013508aab
14. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000001350fe26
15. ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x0000000013505a92
16. void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000001350d55a
17. ? @ 0x0000000000094ac3
18. ? @ 0x0000000000126850
Additional context
Just in case, table looks like this:
SELECT *
FROM s3('http://minio:9000/warehouse/data/**/**.parquet', 'admin', 'password') ┌─name────┬─double─┬─integer─┐
1. │ Grace │ 56.78 │ 80 │
2. │ Charlie │ 67.89 │ 40 │
3. │ Frank │ 12.34 │ 70 │
4. │ Ivan │ 34.56 │ 100 │
5. │ Karl │ 23.45 │ 120 │
6. │ Heidi │ 90.12 │ 90 │
7. │ Mallory │ 11.12 │ 140 │
8. │ Eve │ 89.01 │ 60 │
9. │ Alice │ 195.23 │ 20 │
10. │ Leo │ 67.89 │ 130 │
11. │ Judy │ 78.9 │ 110 │
12. │ Nina │ 34.56 │ 150 │
13. │ David │ 45.67 │ 50 │
14. │ Bob │ 123.45 │ 30 │
└─────────┴────────┴─────────┘
14 rows in set. Elapsed: 0.020 sec.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
potential bugTo be reviewed by developers and confirmed/rejected.To be reviewed by developers and confirmed/rejected.