-
-
Notifications
You must be signed in to change notification settings - Fork 48
Closed
Description
It looks that when whole dataset to sort is small enough to fit into allowed memory for external sort implementation then sorting works fine (sorry for overcomplicated example, I was testing multiple theories):
// putenv('FLOW_EXTERNAL_SORT_MAX_MEMORY=10M');
$arr = [];
for ($j = 100; $j > 0; $j--) {
for ($i = 200; $i > 0; $i--) {
$arr[] = ['id' => str_pad((string) $j, 5, '0', STR_PAD_LEFT).'-'.str_pad((string) $i, 3, '0', STR_PAD_LEFT)];
}
}
shuffle($arr);
data_frame()
->read(from_array($arr))
->sortBy(ref('id'))
->limit(10)
->write(to_output(truncate: true))
->run();Output:
+-----------+
| id |
+-----------+
| 00001-001 |
| 00001-002 |
| 00001-003 |
| 00001-004 |
| 00001-005 |
| 00001-006 |
| 00001-007 |
| 00001-008 |
| 00001-009 |
| 00001-010 |
+-----------+
But when we set memory limit to anything below amount required to run sorting (example above reaches ~60MB for my tests), by e.g. uncommenting putenv('FLOW_EXTERNAL_SORT_MAX_MEMORY=10M'); the output for the same code as above is:
+-----------+
| id |
+-----------+
| 00100-200 |
| 00100-199 |
| 00100-198 |
| 00100-197 |
| 00100-196 |
| 00100-195 |
| 00100-194 |
| 00100-193 |
| 00100-192 |
| 00100-191 |
+-----------+
I believe it may be a bug and should work the same, no matter if we use (filesystem) cache to support sorting or not, right? 🙂
Thanks in advance for taking a look into this!
Metadata
Metadata
Assignees
Labels
No labels