algorithms icon indicating copy to clipboard operation
algorithms copied to clipboard

Heap implementation is very slow

Open generall opened this issue 9 years ago • 6 comments

Please refer to this stackoverflow question: http://stackoverflow.com/questions/13978484/kanwei-minheap-slow-ruby

generall avatar Jan 22 '17 10:01 generall

Insertion is supposed to be constant time, according to this @igrigorik blog post. It looks like it is not quite even linear, though I haven't delved too deeply. See this gist

From the gist, notably:

  • It takes about 3 seconds to insert 7500 items (in a vagrant VM)
  • Insertion time clearly increases with the size of the heap
  • It looks roughly linear, so O(n), but a heap insert should be O(log n) naively and O(1) for a Fibonacci heap

rickhull avatar Oct 19 '17 03:10 rickhull

I think this section makes insertion depend on N, the size of the heap:

https://github.com/kanwei/algorithms/blame/master/lib/containers/heap.rb#L80

rickhull avatar Oct 19 '17 04:10 rickhull

Check out this implementation as a point of comparison: https://github.com/rickhull/compsci/blob/master/lib/compsci/heap.rb

OUTPUT

...
790000th insert: 0.00000097 s
800000th insert: 0.00000245 s
810000th insert: 0.00000148 s
820000th insert: 0.00000100 s
830000th insert: 0.00000103 s
inserted 837191 items in 3.0 s
Run options: --seed 55035

# Running:

bench_insert     0.000085        0.000032        0.000562        0.001365       0.013642
.

Finished in 0.040194s, 24.8793 runs/s, 24.8793 assertions/s.

1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

Note that this is asserting constant insertion performance, not linear, with 0.9999 error threshold. I suspect it will ultimately prove to be O(log n) at higher n. We're getting over 800,000 insertions in 3 seconds as compared to 7600 or so.

rickhull avatar Oct 21 '17 10:10 rickhull

now getting over 500k push / sec:

1510000th push: 0.00000184 s
1520000th push: 0.00000614 s
1530000th push: 0.00000141 s
1540000th push: 0.00000294 s
1550000th push: 0.00000199 s
1560000th push: 0.00000876 s
1570000th push: 0.00000134 s
pushed 1577943 items in 3.0 s

still a heap with 1577958 items? YES - 1.633 sec

rickhull avatar Oct 29 '17 03:10 rickhull

While the heap implementation doesn't have the best throughput on insertions (regrettable, for sure), it does have a bunch of features that rb_heap and others don't, such as re-scoring elements and removing elements, and merging heaps.

alexkalderimis avatar May 26 '21 19:05 alexkalderimis

I've just pushed a PR that fixes push performance. It used to be O(n) but once this is merged it will be O(1).

BMorearty avatar Dec 09 '21 22:12 BMorearty