-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Join between delta lake table and distributed table only works in one direction #85173
Copy link
Copy link
Closed
Labels
potential bugTo be reviewed by developers and confirmed/rejected.To be reviewed by developers and confirmed/rejected.
Description
Company or project name
No response
Describe what's wrong
Joining a delta lake table with a distributed table only works when the delta table is on the left. If it is on the right, the query fails with the following error:
2025.08.07 06:27:05.208265 [ 821 ] {3c15ba69-b370-4400-8ee7-7142ec00bca1} <Error> ForcedCriticalErrorsLogger: Code: 49. DB::Exception: Distributed task iterator is not initialized. (LOGICAL_ERROR), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000012b99d9b
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000c0c1bac
2. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x000000000c0cedab
3. DB::RemoteQueryExecutor::processPacket(DB::Packet) @ 0x00000000164f43e6
4. DB::RemoteQueryExecutor::readAsync() @ 0x00000000164f5b92
5. DB::RemoteSource::tryGenerate() @ 0x00000000193d53f0
6. DB::ISource::work() @ 0x0000000018ff70f6
7. DB::ExecutionThreadContext::executeTask() @ 0x0000000019016582
8. DB::PipelineExecutor::executeStepImpl(unsigned long, DB::IAcquiredSlot*, std::atomic<bool>*) @ 0x0000000019007b10
9. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads(std::shared_ptr<DB::IAcquiredSlot>)::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x000000001900bb83
10. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker() @ 0x0000000012ce636b
11. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x0000000012ced3e6
12. ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x0000000012ce3412
13. void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x0000000012ceac9a
14. ? @ 0x0000000000094ac3
15. ? @ 0x0000000000125a04
Related issues for other table functions:
- S3 Table with local table joins don't work in one direction on distributed tables #52022
- Logical error: 'Distributed task iterator is not initialized' #84658
Does it reproduce on the most recent release?
Yes
How to reproduce
create table a_base on cluster default (
id UInt8, val char
) engine = ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/{uuid}', '{replica}') order by id;
create table a on cluster ingestion AS a_base
engine = Distributed(default, default, a_base, rand());
insert into a values (1, 'A'),(2, 'B'),(3, 'C'),(4, 'D'),(5, 'E'),(6, 'F'),(7, 'G'),(8, 'H'),(9, 'I');df = spark.createDataFrame([(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e'), (6, 'f',), (7, 'g'), (8, 'h'), (9, 'i')], ['id', 'val'])
df.write.format("delta").save("s3a://delta-lake/b")The join fails when the delta table is on the right:
:) with b as (select * from deltaLake(minio_creds, url='http://minio:9023/delta-lake/b'))
select a.val, b.val from a join b on a.id = b.id;
Received exception from server (version 25.8.1):
Code: 49. DB::Exception: Received from localhost:9004. DB::Exception: Distributed task iterator is not initialized: While executing Remote. (LOGICAL_ERROR)The join works when the delta table is on the left:
:) with b as (select * from deltaLake(minio_creds, url='http://minio:9023/delta-lake/b'))
select a.val, b.val from b join a on a.id = b.id;
+-----+---+
|a.val|val|
+-----+---+
|E |e |
|D |d |
|G |g |
|I |i |
|H |h |
|C |c |
|A |a |
|B |b |
|F |f |
+-----+---+Expected behavior
No response
Error message and/or stacktrace
No response
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
potential bugTo be reviewed by developers and confirmed/rejected.To be reviewed by developers and confirmed/rejected.