proposal: container/heap/v2: generic heap

### Proposal Details

# Background

A heap is a kind of collection where each element has a _value_, and the values are ordered: each is either greater than, less than or equal to another. A binary min-heap is a complete binary tree of elements with the _heap property_: the value of each node is less than or equal to the values of its children. That implies that the smallest element is at the root, and therefore quickly available. Insertions, deletions and changes can be done in logarithmic time. Common uses for heaps are:

- Priority queues, which appear in schedulers, discrete event simulations, and graph algorithms like Dijkstra's algorithm.
- Heapsort: while slower than quicksort on average, this sorting algorithm guarantees optimal _O(n\*log(n))_ performance. Some sorting implementations, like the one in the standard library (based on [pdqsort](https://github.com/orlp/pdqsort)), fall back to heapsort when quicksort behaves poorly. Since heapsort delivers the sorted sequence in order, it can also be used to sort incrementally, in cases when the entire sorted sequence might not be needed.
- K-way merge: a heap holds the minimum element from each of k sorted lists.
- Top K: a heap can be used to efficiently find the largest (or smallest) K items in a sequence in space proportional to K.

The standard library has a binary min-heap implementation in `container/heap`. While it has served well for the past decade, it has several issues:

- It is not generic, so users must write type assertions when retrieving elements.
- The insertion function has an argument of type `any`, so inserting some values requires an allocation.
- Users must supply their own storage and implement an interface with five methods. The storage is nearly always a slice, but the heap interface must be reimplemented each time, with nearly identical code.
- Users must pass a value of that interface to top-level functions that implement the heap. The names `Push` and `Pop` appear in both the interface and as heap functions, with different meanings. The API is confusing enough so that even experienced Go developers have trouble understanding it. 

# Proposal

We propose a new standard package, `container/heap/v2`, with a type representing a binary min-heap. The data structure is the same as that of `container/heap`, but its API solves all of the above problems.

An overview follows. The [github.com/jba/heap](https://pkg.go.dev/github.com/jba/heap@v0.9.0) module provides a reference implementation with the full API.

Our heap type, `Heap[T]`, requires a comparison function, which is passed to the constructor, `New` . There are methods for retrieving the smallest element (`Min`), removing and returning the smallest (`TakeMin`), finding the size of the heap (`Len`), iterating over the heap items (`All`), and clearing the heap’s contents (`Clear`).

With the `Init` method, a heap can be initialized with a slice. The heap owns the slice, so this code:
```
var items []T
// append to items
h := heap.New[T](compareFunc)
h.Init(items)
```

has the same costs as the similar code for `container/heap`:

```
type tHeap []T  // tHeap implements heap.Interface
var items tHeap
// append to items
heap.Init(items)
```

The `Insert` method inserts a single element. It incurs the cost of maintaining the heap property, so inserting k items individually into a heap of size n takes _O(k*log(n+k))_ time. There is also an `InsertAll` method which takes an `iter.Seq[T]` and re-creates a heap from scratch in _O(n+k)_ time, where k is the number of elements in the sequence.

The `Drain` method returns an iterator over the elements in sorted order, emptying the heap in the process; to implement heapsort, build a heap and then call `Drain`.

The `ChangeMin` method modifies the minimum element and moves it to its proper place in the heap. It provides a simple, efficient way to implement Top K, as shown in [this example](https://pkg.go.dev/github.com/jba/heap#example-package-TopK).

## Referring to elements

Deleting or re-ordering a specific element in a heap is a common requirement&mdash;for example, when a task’s priority changes and its position must be adjusted to restore the heap property. The trouble is that, unlike a map, a heap doesn't have a natural key to find an element. Using element equality could work (it’s what [Java does](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/PriorityQueue.java#L360)), but then the user would have to provide a separate equality function on elements themselves, in addition to the comparison function on the element values used for the heap property, which is confusing. And since heaps don’t support efficient lookup by equality, the delete function would need to visit each element, turning what should be a logarithmic operation into a linear one.

Looking at uses of `container/heap` in the ecosystem, we noticed that when heap elements need to be modified, the element type is invariably a pointer to a struct with an `int` field that maintains the index into the heap slice, as in [this code from Cockroach DB](https://github.com/cockroachdb/cockroach/blob/v25.4.2/pkg/sql/rowcontainer/numbered_row_container.go#L341-L348). Users of `container/heap` must maintain this index themselves, in the `Swap` and `Push` methods they are required to implement. Our heap performs this task for them: they need only provide a function that sets the index of an element. For that purpose, a heap can be constructed with `NewIndexed`, which takes both a comparison function and a `func(e T, v int)` whose job is to set the index field of `e` to `v`. A `Heap[T]` so constructed invokes the index function to maintain the indexes. This enables two other heap methods, called `Delete` and `Changed`, both taking an index. `Delete` deletes the element at the index. `Changed` is called after changing the element’s value, to adjust its position. Both panic if the heap was created with `New` rather than `NewIndexed`, unless the index is 0, which can always be used to refer to the minimum element.

Here is an example of using an index function along with the `Changed` method to change a task’s priority:

```
type Task struct { priority, index int }

func CompareTasks(t1, t2 *Task) int { return cmp.Compare(t2.priority, t1.priority) }

func (t *Task) SetIndex(i int) { t.index = i }

// ... 

// Create a max-heap on priority (higher is better).
h := heap.NewIndexed(CompareTasks, (*Task).SetIndex)

// ...

t.priority++
h.Change(t.index)
```


# Comparison with container/heap

Compared to `container/heap`, our proposed heap behaves more like traditional abstract data types: it manages its own storage and exports methods to manipulate it. However, having direct access to the heap’s underlying storage can be more efficient, by avoiding a copy. For example, [this function in the Go Ethereum implementation](https://github.com/ethereum/go-ethereum/blob/v1.16.7/core/txpool/legacypool/list.go#L157-L180) benefits from sorting the heap in place. Also, as a side effect of running heapsort&mdash;that is, repeatedly calling `heap.Pop` until the heap is empty&mdash;the underlying storage will be sorted in the reverse order. (This behavior is not documented, but probably couldn’t be changed at this point, and is a natural consequence of the implementation.) The heap types we propose would require a copy in both cases. If we feel these applications are worth supporting, we could provide a method to expose the underlying slice.

We also took the opportunity to change some names. The names with “min” make it clear without having to consult the documentation that this is a min-heap. `Insert` seems clearer than `Push` (especially with `Pop` gone), and `Changed` seems clearer than `Fix`.

# Alternatives considered

## Heaps for ordered types

A generic heap whose element type is constrained to ordered types (those implementing `cmp.Ordered`, like `int` and `float64`) is almost twice as fast on heapsort than the general heap described above, because the comparison function `cmp.Compare` is inlined. After a survey of open-source Go code, we decided not to propose this optimized heap.

We first looked for uses of `container/heap` that involved ordered types. We found several production implementations in prominent modules like [Ethereum](https://github.com/ethereum/go-ethereum/blob/v1.16.8/core/txpool/legacypool/list.go) and  [LetsEncrypt](https://github.com/letsencrypt/boulder/blob/main/nonce/nonce.go).

We then scanned the open-source Go ecosystem for optimized heap implementations that did not use `container/heap`, by looking for types based on slices of ordered types with method names like “heapify”, “siftUp”, and so on. That heuristic search turned up only three hits which might be part of production systems; all the rest were teaching examples, exercise solutions, personal projects, or copies of the standard library’s sort package (which does indeed use an optimized heapsort). 

- [m3db](https://github.com/m3db/m3/blob/master/src/aggregator/aggregation/quantile/cm/heap.go) ostensibly uses a heap in its quantile estimator. But in fact, the data structure’s heap property is unused: about five years ago, the implementation was changed to [sort the heap slice](https://github.com/m3db/m3/blob/master/src/aggregator/aggregation/quantile/cm/stream.go#L294). Apparently this did not affect performance enough for anyone to notice, perhaps because the heap is small.
- [Consensys/gnark](https://github.com/Consensys/gnark/blob/master/internal/utils/heap.go) uses a heap to implement an incremental sort, in order to remove the values in one slice from another slice. Putting the values to be removed into a map is usually faster on average. We did not explore this module enough to know whether the heap is the best choice here.
- [Grafana Pyroscope](https://go-mod-viewer.appspot.com/github.com/grafana/pyroscope@v1.18.0/pkg/util/minheap/minheap.go) uses an optimized heap for Top K in several places. When we compared their optimized heap implementation with the one proposed here on their [existing benchmark](https://github.com/grafana/pyroscope/blob/main/pkg/model/flamegraph_diff_bench_test.go#L10) for one of those uses, we found no difference. A likely reason is that when running Top K on a random sequence of length n, the heap operation that involves comparison is [invoked only _O(log(n))_ times.](https://math.stackexchange.com/questions/1034155/find-average-number-of-updates-needed-to-find-maximum-number-in-an-array) Any significant time spent in the _O(n)_ part of the algorithm dominates the time spent in the heap. (Other uses of Top K in that Grafana module would likely benefit from the optimization; our benchmarks show a speed advantage of around 7%.)

To summarize, heaps optimized for ordered types are rare, and are not always an improvement when they do occur. We conclude that it's not worth complicating the package by providing one.

We can always add the optimized version later, of course. Admittedly, however, some of our proposed choices would result in a suboptimal overall design if we did:

- If there were a heap type for ordered types, the more general type should be called `HeapFunc` and the constructor `NewFunc`, to better align with names in the standard library, like `slices.SortFunc`.
- A heap for ordered types should support both min and max versions, because there is no way to negate the comparison function. That means that either we would need a separate max-heap type with different names for some methods&mdash;`Max`, `TakeMax` and `ChangeMax`&mdash;or we should use order-agnostic names now.

Just to be clear, we don't think we should make any of these changes to the proposal, because it seems so unlikely that we would add the optimized heap.

## A Comparer type parameter

We considered adding a second type parameter to the heap for the comparison. If we did that, the compiler would have more information about the comparison function, so it could in theory produce better code. That design would look like this:

```
type Comparer[T any] interface { Compare(a, b T) int }

type Heap[T any, C Comparer[T]] struct { compare C; ... }

func New[T any, C Comparer[T]]() *Heap[T, C] { return &Heap[T, C]{} }
```

For ordered types, we would provide this type:

```
type OrderedComparer[T cmp.Ordered] struct{}

func (OrderedComparer[T]) Compare(a, b T) int { return cmp.Compare(a, b) }
```

That would make it possible to create a heap on, say, `int`s with this line:

```
h := heap.New[int, heap.OrderedComparer[int]]()
```

This pattern works for any `Comparer` implementation whose zero value is ready to use:

```
type Task struct { priority, index int }

type TaskComparer struct{}

func (TaskComparer) Compare(t1, t2 *Task) int { return cmp.Compare(t2.priority, t1.priority) }

// ...

h := heap.New[*Task, TaskComparer]()
```

For comparers that must be initialized before use, we would provide a method:
```
func (h *Heap[T, C]) SetComparer(c C) { h.compare = c }
```

As we said, the attraction here is that the compiler is guaranteed to have the information it needs to generate better code. But we decided against this design, because the Go compiler does not currently generate better code for it, and there are no plans for it to do so. To get a meaningful speedup, the compiler would have to generate different code for each instantiation of `Heap`. When generics were first implemented, such a _monomorphizing_ design was rejected for good reasons: it would lead to bloated binaries and slow compile times. That doesn’t mean it will never happen, though more likely some instantiations that satisfy a set of heuristics will be monomorphized, much like inlining happens now. But there currently no concrete plans for that, just brainstorming.

Rog Peppe has pointed out that if we adopted this design, we could use a type switch that would allow us to specialize the implementation to ordered types, without waiting for compiler improvements. But as we argued above, it’s not worth optimizing for ordered types.

It would still make sense to adopt this design speculatively, if it weren’t any worse than our current proposal. But it is worse. As you can see from the code above, someone who wanted to use a heap for their type would have to define a second type with a method (like `TaskComparer` above) to create their heap, where in the current proposal they can just pass a function literal:

```
h := heap.New(func(t1, t2 *Task) int { return cmp.Compare(t2.priority, t1.priority) })
```
We could provide a type to help them with this:
```
type CompareFunc[T any] func(a, b T) int 

func (f CompareFunc[T]) Compare(a, b T) int { return f(a, b) }
```
Now the user no longer has to write a second type, but the result is still clumsy:
```
h := heap.New[*Task, heap.CompareFunc[*Task]]()
h.SetComparer(heap.CompareFunc[*Task](func(t1, t2 *Task) int { return cmp.Compare(t2.priority, t1.priority) })
```

The [proposal for a general hash map](https://github.com/golang/go/issues/69559) came to a different conclusion: its map type includes a type parameter for the hasher, which is an interface with hash and equality functions. There are two reasons for the difference. First, the performance advantage with hash maps comes from improved escape analysis, not inlining. Better generic escape analysis can be done without monomorphizing, and will be implemented in the near future. Second, the hasher is an interface with two methods, so implementations already require a separate type, while the heap comparison function can be written as a function literal without a type.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: container/heap/v2: generic heap #77397

Proposal Details

Background

Proposal

Referring to elements

Comparison with container/heap

Alternatives considered

Heaps for ordered types

A Comparer type parameter

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: container/heap/v2: generic heap #77397

Description

Proposal Details

Background

Proposal

Referring to elements

Comparison with container/heap

Alternatives considered

Heaps for ordered types

A Comparer type parameter

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions