Optimized SFL cluster linearization by sipa · Pull Request #34023 · bitcoin/bitcoin

sipa · 2025-12-06T20:21:44Z

Follow-up to #32545, part of #30289.

This contains many of the optimizations that were originally part of #32545 itself.

Here is a list of commits and the geometric average of the benchmark timings. Note that these aren't worst cases, but because of the nature of the optimizations here, I do expect them to apply roughly equally to all kinds of clusters. In other words, the relative improvement shown by these numbers should be representative:

commit title	ns per optimal linearization
clusterlin: split tx/chunk dep counting (preparation)	24,760.30
clusterlin: count chunk deps without loop (optimization)	24,677.64
scripted-diff: rename _rep -> _idx in SFL	24,640.08
clusterlin: get rid of DepData, reuse sets (optimization)	24,389.01
clusterlin: improve TxData::dep_top_idx type (optimization)	22,578.58
clusterlin: abstract out functions from MergeStep (refactor)	22,577.15
clusterlin: split up OptimizeStep (refactor)	22,981.11
clusterlin: simplify PickMergeCandidate (optimization)	22,018.63
clusterlin: precompute reachable sets (optimization)	21,194.91
clusterlin: make MergeSequence take SetIdx (simplification)	21,135.60
clusterlin: special-case self-merges (optimization)	20,588.13
clusterlin: keep track of active children (optimization)	13,911.22
clusterlin: track suboptimal chunks (optimization)	13,629.42
clusterlin: unidirectional MakeTopological initially (optimization)	12,796.57
clusterlin: inline UpdateChunk into (De)Activate (optimization)	12,706.35
clusterlin: inline GetReachable into Deactivate (optimization)	12,525.66

And to show that they apply to all clusters roughly similarly:

DrahtBot · 2025-12-06T20:21:51Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34023.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	instagibbs, ajtowns

If your review is incorrectly listed, please copy-paste  into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#34138 (Cluster mempool: more accurate cost model for SFL by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot · 2025-12-06T20:52:18Z

🚧 At least one of the CI tasks failed.
_{Task 32 bit ARM: https://github.com/bitcoin/bitcoin/actions/runs/19993781856/job/57338323718}
_{LLM reason (✨ experimental): Cluster linearize tests fail due to a size mismatch (linearization size != depgraph TxCount) triggering assertion failures and aborted tests.}

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot · 2025-12-12T23:07:03Z

🚧 At least one of the CI tasks failed.
_{Task Windows-cross to x86_64, ucrt: https://github.com/bitcoin/bitcoin/actions/runs/20182154259/job/57944831923}
_{LLM reason (✨ experimental): Compiler error: -Werror treats a range-for over a tuple as an error in cluster_linearize.h, causing the build to fail.}

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

instagibbs

ACK 87d74c1

Changes make sense, did not run fuzzer myself. Non-blocking nits only

in the chart's x-asis, what's the denominator?

src/cluster_linearize.h

sipa · 2026-02-16T20:45:17Z

Addressed @instagibbs' comments, added a commit to drop the DepGraph& argument to SpanningForestState::SanityCheck(), and added extra comments and assertions suggested by Claude (reviewed/applied/edited by me).

in the chart's x-asis, what's the denominator?

"benchmark count", I guess. Each benchmark takes up equals x-axis space, sorted by master's runtime.

EDIT: oh, in the label? Those are "ntx/ndeps" for the test.

ajtowns

Pushed an update that splits the "get rid of DepData" commit in two:

Yeah, I'm still not smart enough to understand things unless they're way more bite-size than that. I broke the first of those commits down into 12 commits before it really made sense. OTOH, I think that found a couple of bugs/typos, so yay?

I didn't really see why you swapped the parent/child ordering in Activate (return child_chunk_idx), but I don't see any harm in it either.

ACK 7a0309b though probably should do the fix ups.

src/cluster_linearize.h

Since the deterministic ordering change, SpanningForestState holds a reference to the DepGraph it is linearizing. So this means we do not need to pass it to SanityCheck() as an argument anymore.

This splits the chunk_deps variable in LoadLinearization in two, one for tracking tx dependencies and one for chunk dependencies. This is a preparation for a later commit, where chunks won't be identified anymore by a representative transaction in them, but by a separate index. With that, it seems weird to keep them both in the same structure if they will be indexed in an unrelated way. Note that the changes in src/test/util/cluster_linearize.h to the table of worst observed iteration counts are due to switching to a different data set, and are unrelated to the changes in this commit.

This small optimization avoids the need to loop over the parents of each transaction when initializing the dependency-counting structures inside GetLinearization().

This is a preparation for the next commit, where chunks will no longer be identified using a representative transaction, but using a set index. Reduce the load of line changes by doing this rename ahead of time. -BEGIN VERIFY SCRIPT- sed --in-place 's/_rep/_idx/g' src/cluster_linearize.h -END VERIFY SCRIPT-

This significantly changes the data structures used in SFL, based on the observation that the DepData::top_setinfo fields are quite wasteful: there is one per dependency (up to n^2/4), but we only ever need one per active dependency (of which there at most n-1). In total, the number of chunks plus the number of active dependencies is always exactly equal to the number of transactions, so it makes sense to have a shared pool of SetInfos, which are used for both chunks and top sets. To that effect, introduce a separate m_set_info variable, which stores a SetInfo per transaction. Some of these are used for chunk sets, and some for active dependencies' top sets. Every activation transforms the parent's chunk into the top set for the new dependency. Every deactivation transforms the top set into the new parent chunk. With indexes into m_set_data (SetIdx) becoming bounded by the number of transactions, we can use a SetType to represent sets of SetIdxs. Specifically, an m_chunk_idxs is added which contains all SetIdx referring to chunks. This leads to a much more natural way of iterating over chunks. Also use this opportunity to normalize many variable names.

With the earlier change to pool SetInfo objects, there is little need for DepData anymore. Use parent/child TxIdxs to refer to dependencies, and find their top set by having a child TxIdx-indexed vector in each TxData, rather than a list of dependencies. This makes code for iterating over dependencies more natural and simpler.

The combined size of TxData::dep_top_idx can be 16 KiB with 64 transactions and SetIdx = uint32_t. Use a smaller type where possible to reduce memory footprint and improve cache locality of m_tx_data. Also switch from an std::vector to an std::array, reducing allocation overhead and indirections.

This is a simple refactor to make the code more readable.

The current process consists of iterating over the transactions of the chunk one by one, and then for each figuring out which of its parents/children are in unprocessed chunks. Simplify this (and speed it up slightly) by splitting this process into two phases: first determine the union of all parents/children, and then find which chunks those belong to.

Instead of computing the set of reachable transactions inside PickMergeCandidate, make the information precomputed, and updated in Activate (by merging the two chunks' reachable sets) and Deactivate (by recomputing). This is a small performance gain on itself, but also a preparation for future optimizations that rely on quickly testing whether dependencies between chunks exist.

Future changes will rely on knowing the chunk indexes of the two created chunks after a split. It is natural to return this information from Deactivate, which also simplifies MergeSequence.

After a split, if the top part has a dependency on the bottom part, the first MergeSequence will always perform this merge and then stop. This is referred to as a self-merge. We can special case these by detecting self-merges early, and avoiding the overhead of a full MergeSequence which involves two PickMergeCandidate calls (a succesful and an unsuccesful one).

This means we can iterate over all active dependencies in a cluster/chunk in O(ntx) time rather than O(ndeps) (*), as the number of active dependencies in a set of transactions of size is at most ntx-1. (*) Asymptotically, this is not actually true, as for large transaction counts, iterating over a BitSet still scales with ntx. In practice however, where BitSets are represented by a constant number of integers, it holds.

This avoids adding them a second time to m_suboptimal_chunks when they happen to already be there.

It suffices to initially only attempt one direction of merges in MakeTopological(), and only try both directions on chunks that are the result of other merges.

The two calls to UpdateChunk, in Activate and Deactive each, are subtly different: the top one needs to update the chunk_idx of iterated transactions, while the bottom one leaves it unchanged. To exploit this difference, inline the four function calls, getting rid of UpdateChunks. This is also a preparation for a future improvement that inlines the recomputation of reachable sets in the same loop in Deactivate.

Avoid two full iterations over all of a chunks' transactions to recompute the reachable sets, by inlining them into the dependency-updating loops. Note that there is no need to do the same for Activate, because the reachable sets after merging can be computed directly from the input chunks' reachable sets. Deactivate needs to recompute them, however.

instagibbs · 2026-02-17T14:14:08Z

reACK c2fcf25

sipa · 2026-02-17T14:14:28Z

OTOH, I think that found a couple of bugs/typos, so yay?

Yay!

Do you want me to try to perform some of those splits here? I think the introduction/usage of m_chunk_idxs may be reasonable.

I didn't really see why you swapped the parent/child ordering in Activate (return child_chunk_idx), but I don't see any harm in it either.

The reason is that on Activation, the top chunk becomes the new dependency's top set, and thus the other one - the bottom chunk - needs to become the merged chunk's set (and thus returned by Activate). By reusing the top chunk as top set, we avoid the need for updating things. On Deactivation, the reverse is done (the initial chunk becomes the bottom chunk, the dependency top set becomes the top chunk).

ajtowns · 2026-02-17T23:43:47Z

OTOH, I think that found a couple of bugs/typos, so yay?

Yay!

Do you want me to try to perform some of those splits here? I think the introduction/usage of m_chunk_idxs may be reasonable.

I think it's useful for understanding the changes incrementally, but I'm not sure anyone's likely to want to do that (vs just understanding the latest implementation from scratch) outside of reviewing this PR, in which case they can look at my branch if they want?

I didn't really see why you swapped the parent/child ordering in Activate (return child_chunk_idx), but I don't see any harm in it either.
By reusing the top chunk as top set, we avoid the need for updating things.

Ah, essentially saving swapping two (24 byte?) SetInfo's I guess.

reACK c2fcf25

sipa mentioned this pull request Dec 6, 2025

Cluster mempool tracking issue #30289

Closed

33 tasks

sipa force-pushed the 202512_sfl_opt branch from 8335595 to af0481b Compare December 6, 2025 20:50

DrahtBot added the CI failed label Dec 6, 2025

sipa force-pushed the 202512_sfl_opt branch 4 times, most recently from c3b0b64 to 73c3ea3 Compare December 7, 2025 02:09

DrahtBot removed the CI failed label Dec 7, 2025

sipa force-pushed the 202512_sfl_opt branch from 73c3ea3 to 36efa5f Compare December 7, 2025 03:48

sipa mentioned this pull request Dec 8, 2025

Replace cluster linearization algorithm with SFL #32545

Merged

sipa force-pushed the 202512_sfl_opt branch 3 times, most recently from 447c370 to fd6d05b Compare December 11, 2025 22:11

DrahtBot added the CI failed label Dec 12, 2025

sipa force-pushed the 202512_sfl_opt branch from fd6d05b to 162b347 Compare December 12, 2025 16:40

DrahtBot removed the CI failed label Dec 12, 2025

sipa force-pushed the 202512_sfl_opt branch 2 times, most recently from 5ba9b9b to e136646 Compare December 12, 2025 23:05

DrahtBot added the CI failed label Dec 12, 2025

DrahtBot removed the CI failed label Dec 13, 2025

sipa force-pushed the 202512_sfl_opt branch 4 times, most recently from ad3cc10 to 6a04276 Compare December 14, 2025 22:46

DrahtBot added CI failed and removed CI failed labels Dec 14, 2025

sipa force-pushed the 202512_sfl_opt branch from 6a04276 to e1dcf26 Compare December 15, 2025 03:47

instagibbs approved these changes Feb 16, 2026

View reviewed changes

src/cluster_linearize.h Show resolved Hide resolved

src/cluster_linearize.h Outdated Show resolved Hide resolved

src/cluster_linearize.h Outdated Show resolved Hide resolved

src/cluster_linearize.h Outdated Show resolved Hide resolved

src/cluster_linearize.h Show resolved Hide resolved

ajtowns reviewed Feb 17, 2026

View reviewed changes

src/cluster_linearize.h Outdated Show resolved Hide resolved

src/cluster_linearize.h Outdated Show resolved Hide resolved

sipa and others added 20 commits February 17, 2026 09:04

clusterlin: fix type to count dependencies

666b379

clusterlin: avoid depgraph argument in SanityCheck (cleanup)

900e459

Since the deterministic ordering change, SpanningForestState holds a reference to the DepGraph it is linearizing. So this means we do not need to pass it to SanityCheck() as an argument anymore.

clusterlin: count chunk deps without loop (optimization)

d69c9f5

This small optimization avoids the need to loop over the parents of each transaction when initializing the dependency-counting structures inside GetLinearization().

clusterlin: add more Assumes and sanity checks (tests)

268fcb6

clusterlin: abstract out functions from MergeStep (refactor)

cbd684a

This is a simple refactor to make the code more readable.

clusterlin: split up OptimizeStep (refactor)

dcf458f

clusterlin: make MergeSequence take SetIdx (simplification)

3221f1a

Future changes will rely on knowing the chunk indexes of the two created chunks after a split. It is natural to return this information from Deactivate, which also simplifies MergeSequence.

clusterlin: track suboptimal chunks (optimization)

1daa600

This avoids adding them a second time to m_suboptimal_chunks when they happen to already be there.

clusterlin: unidirectional MakeTopological initially (optimization)

b684f95

It suffices to initially only attempt one direction of merges in MakeTopological(), and only try both directions on chunks that are the result of other merges.

instagibbs mentioned this pull request Feb 18, 2026

mempool: expose optimality of mempool to log / rpc #34615

Merged

This was referenced Feb 19, 2026

Add new clusterlin_{sfl,linearize} fuzz seeds bitcoin-core/qa-assets#257

Merged

Cluster mempool: SFL cost model (take 2) #34616

Merged

NoSandMan mentioned this pull request Feb 23, 2026

March 2026 Perth BitDevs PerthBitDevs/PerthBitDevs#32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized SFL cluster linearization#34023

Optimized SFL cluster linearization#34023
fanquake merged 20 commits intobitcoin:masterfrom
sipa:202512_sfl_opt

sipa commented Dec 6, 2025 •

edited

Loading

Uh oh!

DrahtBot commented Dec 6, 2025 •

edited

Loading

Uh oh!

DrahtBot commented Dec 6, 2025

Uh oh!

DrahtBot commented Dec 12, 2025

Uh oh!

instagibbs left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sipa commented Feb 16, 2026 •

edited

Loading

Uh oh!

ajtowns left a comment

Uh oh!

Uh oh!

Uh oh!

instagibbs commented Feb 17, 2026

Uh oh!

sipa commented Feb 17, 2026 •

edited

Loading

Uh oh!

ajtowns commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

sipa commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

DrahtBot commented Dec 6, 2025

Uh oh!

DrahtBot commented Dec 12, 2025

Uh oh!

instagibbs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sipa commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajtowns left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

instagibbs commented Feb 17, 2026

Uh oh!

sipa commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajtowns commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sipa commented Dec 6, 2025 •

edited

Loading

DrahtBot commented Dec 6, 2025 •

edited

Loading

sipa commented Feb 16, 2026 •

edited

Loading

sipa commented Feb 17, 2026 •

edited

Loading