Skip to content

Commit 552bbb7

Browse files
Fix race condition in sharded hashed dictionary parallel loading
There was a TOCTOU race in the worker thread loop: after `tryPop` timed out (confirming the queue was empty), a context switch could allow the main thread to push the final block and call `finish()` before the worker checked `isFinished()`. The worker would then see the queue as finished and exit, leaving the last block unprocessed. This caused occasional loss of ~2000-3000 rows (one shard's portion of the last source block) in sharded dictionaries loaded in parallel. The fix replaces `isFinished()` with `isFinishedAndEmpty()`, which atomically verifies both that the queue is marked finished AND that no items remain. If items were pushed between the `tryPop` timeout and this check, the worker correctly retries and processes them. Closes #84468 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent f3e2aaa commit 552bbb7

1 file changed

Lines changed: 9 additions & 1 deletion

File tree

src/Dictionaries/HashedDictionaryParallelLoader.h

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,8 +186,16 @@ class HashedDictionaryParallelLoader : public boost::noncopyable
186186
if (!shard_queue.tryPop(block, /* milliseconds= */ 100))
187187
{
188188
/// Check if we need to stop
189-
if (stop_all_workers || shard_queue.isFinished())
189+
if (stop_all_workers)
190190
break;
191+
192+
/// Note: we use isFinishedAndEmpty() instead of isFinished() to avoid
193+
/// a race condition where items could be pushed to the queue between
194+
/// tryPop timing out and this check. With isFinished(), the worker
195+
/// could exit while the queue still had unprocessed blocks.
196+
if (shard_queue.isFinishedAndEmpty())
197+
break;
198+
191199
/// Timeout expired, but the queue is not finished yet, try again
192200
continue;
193201
}

0 commit comments

Comments
 (0)