Heap implementation is very slow
Please refer to this stackoverflow question: http://stackoverflow.com/questions/13978484/kanwei-minheap-slow-ruby
Insertion is supposed to be constant time, according to this @igrigorik blog post. It looks like it is not quite even linear, though I haven't delved too deeply. See this gist
From the gist, notably:
- It takes about 3 seconds to insert 7500 items (in a vagrant VM)
- Insertion time clearly increases with the size of the heap
- It looks roughly linear, so
O(n), but a heap insert should beO(log n)naively andO(1)for a Fibonacci heap
I think this section makes insertion depend on N, the size of the heap:
https://github.com/kanwei/algorithms/blame/master/lib/containers/heap.rb#L80
Check out this implementation as a point of comparison: https://github.com/rickhull/compsci/blob/master/lib/compsci/heap.rb
OUTPUT
...
790000th insert: 0.00000097 s
800000th insert: 0.00000245 s
810000th insert: 0.00000148 s
820000th insert: 0.00000100 s
830000th insert: 0.00000103 s
inserted 837191 items in 3.0 s
Run options: --seed 55035
# Running:
bench_insert 0.000085 0.000032 0.000562 0.001365 0.013642
.
Finished in 0.040194s, 24.8793 runs/s, 24.8793 assertions/s.
1 runs, 1 assertions, 0 failures, 0 errors, 0 skips
Note that this is asserting constant insertion performance, not linear, with 0.9999 error threshold. I suspect it will ultimately prove to be O(log n) at higher n. We're getting over 800,000 insertions in 3 seconds as compared to 7600 or so.
now getting over 500k push / sec:
1510000th push: 0.00000184 s
1520000th push: 0.00000614 s
1530000th push: 0.00000141 s
1540000th push: 0.00000294 s
1550000th push: 0.00000199 s
1560000th push: 0.00000876 s
1570000th push: 0.00000134 s
pushed 1577943 items in 3.0 s
still a heap with 1577958 items? YES - 1.633 sec
While the heap implementation doesn't have the best throughput on insertions (regrettable, for sure), it does have a bunch of features that rb_heap and others don't, such as re-scoring elements and removing elements, and merging heaps.
I've just pushed a PR that fixes push performance. It used to be O(n) but once this is merged it will be O(1).