Proposal Details
Background
A heap is a kind of collection where each element has a value, and the values are ordered: each is either greater than, less than or equal to another. A binary min-heap is a complete binary tree of elements with the heap property: the value of each node is less than or equal to the values of its children. That implies that the smallest element is at the root, and therefore quickly available. Insertions, deletions and changes can be done in logarithmic time. Common uses for heaps are:
- Priority queues, which appear in schedulers, discrete event simulations, and graph algorithms like Dijkstra's algorithm.
- Heapsort: while slower than quicksort on average, this sorting algorithm guarantees optimal O(n*log(n)) performance. Some sorting implementations, like the one in the standard library (based on pdqsort), fall back to heapsort when quicksort behaves poorly. Since heapsort delivers the sorted sequence in order, it can also be used to sort incrementally, in cases when the entire sorted sequence might not be needed.
- K-way merge: a heap holds the minimum element from each of k sorted lists.
- Top K: a heap can be used to efficiently find the largest (or smallest) K items in a sequence in space proportional to K.
The standard library has a binary min-heap implementation in container/heap. While it has served well for the past decade, it has several issues:
- It is not generic, so users must write type assertions when retrieving elements.
- The insertion function has an argument of type
any, so inserting some values requires an allocation.
- Users must supply their own storage and implement an interface with five methods. The storage is nearly always a slice, but the heap interface must be reimplemented each time, with nearly identical code.
- Users must pass a value of that interface to top-level functions that implement the heap. The names
Push and Pop appear in both the interface and as heap functions, with different meanings. The API is confusing enough so that even experienced Go developers have trouble understanding it.
Proposal
We propose a new standard package, container/heap/v2, with a type representing a binary min-heap. The data structure is the same as that of container/heap, but its API solves all of the above problems.
An overview follows. The github.com/jba/heap module provides a reference implementation with the full API.
Our heap type, Heap[T], requires a comparison function, which is passed to the constructor, New . There are methods for retrieving the smallest element (Min), removing and returning the smallest (TakeMin), finding the size of the heap (Len), iterating over the heap items (All), and clearing the heap’s contents (Clear).
With the Init method, a heap can be initialized with a slice. The heap owns the slice, so this code:
var items []T
// append to items
h := heap.New[T](compareFunc)
h.Init(items)
has the same costs as the similar code for container/heap:
type tHeap []T // tHeap implements heap.Interface
var items tHeap
// append to items
heap.Init(items)
The Insert method inserts a single element. It incurs the cost of maintaining the heap property, so inserting k items individually into a heap of size n takes O(k*log(n+k)) time. There is also an InsertAll method which takes an iter.Seq[T] and re-creates a heap from scratch in O(n+k) time, where k is the number of elements in the sequence.
The Drain method returns an iterator over the elements in sorted order, emptying the heap in the process; to implement heapsort, build a heap and then call Drain.
The ChangeMin method modifies the minimum element and moves it to its proper place in the heap. It provides a simple, efficient way to implement Top K, as shown in this example.
Referring to elements
Deleting or re-ordering a specific element in a heap is a common requirement—for example, when a task’s priority changes and its position must be adjusted to restore the heap property. The trouble is that, unlike a map, a heap doesn't have a natural key to find an element. Using element equality could work (it’s what Java does), but then the user would have to provide a separate equality function on elements themselves, in addition to the comparison function on the element values used for the heap property, which is confusing. And since heaps don’t support efficient lookup by equality, the delete function would need to visit each element, turning what should be a logarithmic operation into a linear one.
Looking at uses of container/heap in the ecosystem, we noticed that when heap elements need to be modified, the element type is invariably a pointer to a struct with an int field that maintains the index into the heap slice, as in this code from Cockroach DB. Users of container/heap must maintain this index themselves, in the Swap and Push methods they are required to implement. Our heap performs this task for them: they need only provide a function that sets the index of an element. For that purpose, a heap can be constructed with NewIndexed, which takes both a comparison function and a func(e T, v int) whose job is to set the index field of e to v. A Heap[T] so constructed invokes the index function to maintain the indexes. This enables two other heap methods, called Delete and Changed, both taking an index. Delete deletes the element at the index. Changed is called after changing the element’s value, to adjust its position. Both panic if the heap was created with New rather than NewIndexed, unless the index is 0, which can always be used to refer to the minimum element.
Here is an example of using an index function along with the Changed method to change a task’s priority:
type Task struct { priority, index int }
func CompareTasks(t1, t2 *Task) int { return cmp.Compare(t2.priority, t1.priority) }
func (t *Task) SetIndex(i int) { t.index = i }
// ...
// Create a max-heap on priority (higher is better).
h := heap.NewIndexed(CompareTasks, (*Task).SetIndex)
// ...
t.priority++
h.Change(t.index)
Comparison with container/heap
Compared to container/heap, our proposed heap behaves more like traditional abstract data types: it manages its own storage and exports methods to manipulate it. However, having direct access to the heap’s underlying storage can be more efficient, by avoiding a copy. For example, this function in the Go Ethereum implementation benefits from sorting the heap in place. Also, as a side effect of running heapsort—that is, repeatedly calling heap.Pop until the heap is empty—the underlying storage will be sorted in the reverse order. (This behavior is not documented, but probably couldn’t be changed at this point, and is a natural consequence of the implementation.) The heap types we propose would require a copy in both cases. If we feel these applications are worth supporting, we could provide a method to expose the underlying slice.
We also took the opportunity to change some names. The names with “min” make it clear without having to consult the documentation that this is a min-heap. Insert seems clearer than Push (especially with Pop gone), and Changed seems clearer than Fix.
Alternatives considered
Heaps for ordered types
A generic heap whose element type is constrained to ordered types (those implementing cmp.Ordered, like int and float64) is almost twice as fast on heapsort than the general heap described above, because the comparison function cmp.Compare is inlined. After a survey of open-source Go code, we decided not to propose this optimized heap.
We first looked for uses of container/heap that involved ordered types. We found several production implementations in prominent modules like Ethereum and LetsEncrypt.
We then scanned the open-source Go ecosystem for optimized heap implementations that did not use container/heap, by looking for types based on slices of ordered types with method names like “heapify”, “siftUp”, and so on. That heuristic search turned up only three hits which might be part of production systems; all the rest were teaching examples, exercise solutions, personal projects, or copies of the standard library’s sort package (which does indeed use an optimized heapsort).
- m3db ostensibly uses a heap in its quantile estimator. But in fact, the data structure’s heap property is unused: about five years ago, the implementation was changed to sort the heap slice. Apparently this did not affect performance enough for anyone to notice, perhaps because the heap is small.
- Consensys/gnark uses a heap to implement an incremental sort, in order to remove the values in one slice from another slice. Putting the values to be removed into a map is usually faster on average. We did not explore this module enough to know whether the heap is the best choice here.
- Grafana Pyroscope uses an optimized heap for Top K in several places. When we compared their optimized heap implementation with the one proposed here on their existing benchmark for one of those uses, we found no difference. A likely reason is that when running Top K on a random sequence of length n, the heap operation that involves comparison is invoked only O(log(n)) times. Any significant time spent in the O(n) part of the algorithm dominates the time spent in the heap. (Other uses of Top K in that Grafana module would likely benefit from the optimization; our benchmarks show a speed advantage of around 7%.)
To summarize, heaps optimized for ordered types are rare, and are not always an improvement when they do occur. We conclude that it's not worth complicating the package by providing one.
We can always add the optimized version later, of course. Admittedly, however, some of our proposed choices would result in a suboptimal overall design if we did:
- If there were a heap type for ordered types, the more general type should be called
HeapFunc and the constructor NewFunc, to better align with names in the standard library, like slices.SortFunc.
- A heap for ordered types should support both min and max versions, because there is no way to negate the comparison function. That means that either we would need a separate max-heap type with different names for some methods—
Max, TakeMax and ChangeMax—or we should use order-agnostic names now.
Just to be clear, we don't think we should make any of these changes to the proposal, because it seems so unlikely that we would add the optimized heap.
A Comparer type parameter
We considered adding a second type parameter to the heap for the comparison. If we did that, the compiler would have more information about the comparison function, so it could in theory produce better code. That design would look like this:
type Comparer[T any] interface { Compare(a, b T) int }
type Heap[T any, C Comparer[T]] struct { compare C; ... }
func New[T any, C Comparer[T]]() *Heap[T, C] { return &Heap[T, C]{} }
For ordered types, we would provide this type:
type OrderedComparer[T cmp.Ordered] struct{}
func (OrderedComparer[T]) Compare(a, b T) int { return cmp.Compare(a, b) }
That would make it possible to create a heap on, say, ints with this line:
h := heap.New[int, heap.OrderedComparer[int]]()
This pattern works for any Comparer implementation whose zero value is ready to use:
type Task struct { priority, index int }
type TaskComparer struct{}
func (TaskComparer) Compare(t1, t2 *Task) int { return cmp.Compare(t2.priority, t1.priority) }
// ...
h := heap.New[*Task, TaskComparer]()
For comparers that must be initialized before use, we would provide a method:
func (h *Heap[T, C]) SetComparer(c C) { h.compare = c }
As we said, the attraction here is that the compiler is guaranteed to have the information it needs to generate better code. But we decided against this design, because the Go compiler does not currently generate better code for it, and there are no plans for it to do so. To get a meaningful speedup, the compiler would have to generate different code for each instantiation of Heap. When generics were first implemented, such a monomorphizing design was rejected for good reasons: it would lead to bloated binaries and slow compile times. That doesn’t mean it will never happen, though more likely some instantiations that satisfy a set of heuristics will be monomorphized, much like inlining happens now. But there currently no concrete plans for that, just brainstorming.
Rog Peppe has pointed out that if we adopted this design, we could use a type switch that would allow us to specialize the implementation to ordered types, without waiting for compiler improvements. But as we argued above, it’s not worth optimizing for ordered types.
It would still make sense to adopt this design speculatively, if it weren’t any worse than our current proposal. But it is worse. As you can see from the code above, someone who wanted to use a heap for their type would have to define a second type with a method (like TaskComparer above) to create their heap, where in the current proposal they can just pass a function literal:
h := heap.New(func(t1, t2 *Task) int { return cmp.Compare(t2.priority, t1.priority) })
We could provide a type to help them with this:
type CompareFunc[T any] func(a, b T) int
func (f CompareFunc[T]) Compare(a, b T) int { return f(a, b) }
Now the user no longer has to write a second type, but the result is still clumsy:
h := heap.New[*Task, heap.CompareFunc[*Task]]()
h.SetComparer(heap.CompareFunc[*Task](func(t1, t2 *Task) int { return cmp.Compare(t2.priority, t1.priority) })
The proposal for a general hash map came to a different conclusion: its map type includes a type parameter for the hasher, which is an interface with hash and equality functions. There are two reasons for the difference. First, the performance advantage with hash maps comes from improved escape analysis, not inlining. Better generic escape analysis can be done without monomorphizing, and will be implemented in the near future. Second, the hasher is an interface with two methods, so implementations already require a separate type, while the heap comparison function can be written as a function literal without a type.
Proposal Details
Background
A heap is a kind of collection where each element has a value, and the values are ordered: each is either greater than, less than or equal to another. A binary min-heap is a complete binary tree of elements with the heap property: the value of each node is less than or equal to the values of its children. That implies that the smallest element is at the root, and therefore quickly available. Insertions, deletions and changes can be done in logarithmic time. Common uses for heaps are:
The standard library has a binary min-heap implementation in
container/heap. While it has served well for the past decade, it has several issues:any, so inserting some values requires an allocation.PushandPopappear in both the interface and as heap functions, with different meanings. The API is confusing enough so that even experienced Go developers have trouble understanding it.Proposal
We propose a new standard package,
container/heap/v2, with a type representing a binary min-heap. The data structure is the same as that ofcontainer/heap, but its API solves all of the above problems.An overview follows. The github.com/jba/heap module provides a reference implementation with the full API.
Our heap type,
Heap[T], requires a comparison function, which is passed to the constructor,New. There are methods for retrieving the smallest element (Min), removing and returning the smallest (TakeMin), finding the size of the heap (Len), iterating over the heap items (All), and clearing the heap’s contents (Clear).With the
Initmethod, a heap can be initialized with a slice. The heap owns the slice, so this code:has the same costs as the similar code for
container/heap:The
Insertmethod inserts a single element. It incurs the cost of maintaining the heap property, so inserting k items individually into a heap of size n takes O(k*log(n+k)) time. There is also anInsertAllmethod which takes aniter.Seq[T]and re-creates a heap from scratch in O(n+k) time, where k is the number of elements in the sequence.The
Drainmethod returns an iterator over the elements in sorted order, emptying the heap in the process; to implement heapsort, build a heap and then callDrain.The
ChangeMinmethod modifies the minimum element and moves it to its proper place in the heap. It provides a simple, efficient way to implement Top K, as shown in this example.Referring to elements
Deleting or re-ordering a specific element in a heap is a common requirement—for example, when a task’s priority changes and its position must be adjusted to restore the heap property. The trouble is that, unlike a map, a heap doesn't have a natural key to find an element. Using element equality could work (it’s what Java does), but then the user would have to provide a separate equality function on elements themselves, in addition to the comparison function on the element values used for the heap property, which is confusing. And since heaps don’t support efficient lookup by equality, the delete function would need to visit each element, turning what should be a logarithmic operation into a linear one.
Looking at uses of
container/heapin the ecosystem, we noticed that when heap elements need to be modified, the element type is invariably a pointer to a struct with anintfield that maintains the index into the heap slice, as in this code from Cockroach DB. Users ofcontainer/heapmust maintain this index themselves, in theSwapandPushmethods they are required to implement. Our heap performs this task for them: they need only provide a function that sets the index of an element. For that purpose, a heap can be constructed withNewIndexed, which takes both a comparison function and afunc(e T, v int)whose job is to set the index field ofetov. AHeap[T]so constructed invokes the index function to maintain the indexes. This enables two other heap methods, calledDeleteandChanged, both taking an index.Deletedeletes the element at the index.Changedis called after changing the element’s value, to adjust its position. Both panic if the heap was created withNewrather thanNewIndexed, unless the index is 0, which can always be used to refer to the minimum element.Here is an example of using an index function along with the
Changedmethod to change a task’s priority:Comparison with container/heap
Compared to
container/heap, our proposed heap behaves more like traditional abstract data types: it manages its own storage and exports methods to manipulate it. However, having direct access to the heap’s underlying storage can be more efficient, by avoiding a copy. For example, this function in the Go Ethereum implementation benefits from sorting the heap in place. Also, as a side effect of running heapsort—that is, repeatedly callingheap.Popuntil the heap is empty—the underlying storage will be sorted in the reverse order. (This behavior is not documented, but probably couldn’t be changed at this point, and is a natural consequence of the implementation.) The heap types we propose would require a copy in both cases. If we feel these applications are worth supporting, we could provide a method to expose the underlying slice.We also took the opportunity to change some names. The names with “min” make it clear without having to consult the documentation that this is a min-heap.
Insertseems clearer thanPush(especially withPopgone), andChangedseems clearer thanFix.Alternatives considered
Heaps for ordered types
A generic heap whose element type is constrained to ordered types (those implementing
cmp.Ordered, likeintandfloat64) is almost twice as fast on heapsort than the general heap described above, because the comparison functioncmp.Compareis inlined. After a survey of open-source Go code, we decided not to propose this optimized heap.We first looked for uses of
container/heapthat involved ordered types. We found several production implementations in prominent modules like Ethereum and LetsEncrypt.We then scanned the open-source Go ecosystem for optimized heap implementations that did not use
container/heap, by looking for types based on slices of ordered types with method names like “heapify”, “siftUp”, and so on. That heuristic search turned up only three hits which might be part of production systems; all the rest were teaching examples, exercise solutions, personal projects, or copies of the standard library’s sort package (which does indeed use an optimized heapsort).To summarize, heaps optimized for ordered types are rare, and are not always an improvement when they do occur. We conclude that it's not worth complicating the package by providing one.
We can always add the optimized version later, of course. Admittedly, however, some of our proposed choices would result in a suboptimal overall design if we did:
HeapFuncand the constructorNewFunc, to better align with names in the standard library, likeslices.SortFunc.Max,TakeMaxandChangeMax—or we should use order-agnostic names now.Just to be clear, we don't think we should make any of these changes to the proposal, because it seems so unlikely that we would add the optimized heap.
A Comparer type parameter
We considered adding a second type parameter to the heap for the comparison. If we did that, the compiler would have more information about the comparison function, so it could in theory produce better code. That design would look like this:
For ordered types, we would provide this type:
That would make it possible to create a heap on, say,
ints with this line:This pattern works for any
Comparerimplementation whose zero value is ready to use:For comparers that must be initialized before use, we would provide a method:
As we said, the attraction here is that the compiler is guaranteed to have the information it needs to generate better code. But we decided against this design, because the Go compiler does not currently generate better code for it, and there are no plans for it to do so. To get a meaningful speedup, the compiler would have to generate different code for each instantiation of
Heap. When generics were first implemented, such a monomorphizing design was rejected for good reasons: it would lead to bloated binaries and slow compile times. That doesn’t mean it will never happen, though more likely some instantiations that satisfy a set of heuristics will be monomorphized, much like inlining happens now. But there currently no concrete plans for that, just brainstorming.Rog Peppe has pointed out that if we adopted this design, we could use a type switch that would allow us to specialize the implementation to ordered types, without waiting for compiler improvements. But as we argued above, it’s not worth optimizing for ordered types.
It would still make sense to adopt this design speculatively, if it weren’t any worse than our current proposal. But it is worse. As you can see from the code above, someone who wanted to use a heap for their type would have to define a second type with a method (like
TaskComparerabove) to create their heap, where in the current proposal they can just pass a function literal:We could provide a type to help them with this:
Now the user no longer has to write a second type, but the result is still clumsy:
The proposal for a general hash map came to a different conclusion: its map type includes a type parameter for the hasher, which is an interface with hash and equality functions. There are two reasons for the difference. First, the performance advantage with hash maps comes from improved escape analysis, not inlining. Better generic escape analysis can be done without monomorphizing, and will be implemented in the near future. Second, the hasher is an interface with two methods, so implementations already require a separate type, while the heap comparison function can be written as a function literal without a type.