-
-
Notifications
You must be signed in to change notification settings - Fork 48
Description
@domis86 noticed during his precise and detailed research, that implementation of External Sort mechanism that is based on Cache is currently not as efficient as it supposed to be.
It's because of those lines:
foreach ($this->generators as $cacheId => $generator) {
if ($generator->valid()) {
/** @var Rows $rows */
$rows = $generator->current();
foreach ($rows->chunks(1) as $chunk) {
$heap->insert(CachedRow::fromRows($chunk, $cacheId));
}
$generator->next();
}
}
Each Generator represents all Rows from cached chunks.
This means that at once it might load 1k rows when in reality we should just load the first row from the generator and move to the next generator.
However to achieve that I think we need to update the following logic:
$minHeap = $cachedParts->createHeap(...$refs);
$bufferCache = new BufferCache($this->cache, $maxRowsSize);
while ($cachedParts->notEmpty() || !$minHeap->isEmpty()) {
$cachedParts->takeNext($minHeap, $this->id, $bufferCache);
}
$bufferCache->close();
Currently we are first creating the heap from all rows and then just reading from it, but that's wrong.
Let say we have 3 chunks each with 1k rows.
Each iteration should put to the MinHeap only 3 rows at once (1 from each chunk), so we can later add them to the $bufferCache.
Accordingly to: https://web.archive.org/web/20150202022830/http://faculty.simpson.edu/lydia.sinapova/www/cmsc250/LN250_Weiss/L17-ExternalSortEX2.htm
Metadata
Metadata
Assignees
Labels
Type
Projects
Status