txgraph: randomize order of same-feerate distinct-cluster transactions #33335

sipa · 2025-09-07T23:43:53Z

Part of #30289.

Before, the implicit ordering of transactions in a TxGraph's main graph is sorted by:

feerate, high to low
Cluster, low to high m_sequence
linearization order within clusters

However, since m_sequence values are assigned deterministically and predictably, the ordering of same-feerate distinct-Cluster transactions can reveal something about the order they were inserted into TxGraph.

In #28676, this ordering (through TxGraph::CompareMainOrder) is used to decide the order of to-be-announced transactions. These should not expose information about insertion order.

To deal with this, do not compare the m_sequence values directly, but hash them with a fixed but random key first. In case this yields a collision (which needs ~192000 clusters to have a probability of 10^-9 of existing), fall back to comparing the m_sequence values directly.

Alternatives: we could instead use the pre-cluster-mempool approach of sorting by wtxid in case of ties, which is equally non-revealing, and deterministic. I find it cleaner to have a randomized internal total ordering rather than needing a partial ordering within TxGraph that needs to be complemented by tie-breaking logic inside the mempool, however, and I don't think the determinism is that valuable.

For context: #28676 (comment)

DrahtBot · 2025-09-07T23:43:59Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33335.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	instagibbs

If your review is incorrectly listed, please copy-paste  into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#34257 (txgraph: deterministic optimal transaction order by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

src/txgraph.cpp

instagibbs · 2025-09-11T13:17:31Z

concept ACK, I also think it removes the kinda weird behavior where same CFR clusters get mined/evicted based on the underlying cluster's original sequence value

instagibbs

ACK 593d418137e4802bbe229707dcda5796522e2e5e

double-checked Trim benchmark just in case, looks unchanged

sipa · 2025-09-17T15:04:34Z

Rebased on top of #33157, as there are some conflicts, and I expect the other one will go in first.

sipa · 2025-10-12T13:02:23Z

Rebased.

instagibbs

Thinking a bit more on the fact that this randomizes eviction ordering in addition to INV gossiping, but rebase was straight forward

git range-diff master 593d418137e4802bbe229707dcda5796522e2e5e 949353ac63249bf51fce34f50ac27f6ec51d0172

src/txgraph.cpp

Before, the implicit ordering of transactions in a TxGraph's main graph is sorted by: - feerate, high to low - Cluster, low to high m_sequence - linearization order within clusters However, since m_sequence values are assigned deterministically and predictably, the ordering of same-feerate distinct-Cluster transactions can reveal something about the order they were inserted into TxGraph. In the full cluster mempool PR, this ordering (through TxGraph::CompareMainOrder) is used to decide the order of to-be-announced transactions. These should not expose information about insertion order. To deal with this, do not compare the m_sequence values directly, but hash them with a fixed but random key first. In case this yields a collision (which needs ~192000 clusters to have a probability of 10^-9 of existing), fall back to comparing the m_sequence values directly.

instagibbs · 2025-12-15T20:13:50Z

f0699bd5718eb099e242900d2b4494c171d70825 does something similar, with making chunk ordering within the same cluster at same chunk feerate randomized

sipa · 2025-12-15T20:52:09Z

@instagibbs Indeed, and with the same rationale.

instagibbs · 2025-12-18T17:08:19Z

Should TxGraph::m_rng be instrumented to allow it being deterministic to increase fuzz stability?

sipa · 2025-12-18T18:13:37Z

The txgraph tests sets the RNG to deterministic mode before every run, see the SeedRandomStateForTest(SeedRand::ZEROS); call at the start.

Without that, there is even logic to show an error if the global RNG is used. See CheckGlobalsImpl::~CheckGlobalsImpl in src/test/fuzz/util/check_globals.cpp.

instagibbs · 2025-12-18T18:29:55Z

ACK e325c3a

sipa · 2025-12-27T18:37:41Z

After discussion with @gmaxwell, there is a possibility to do something more deterministic here, though I'm not sure it's worth it.

It would be possible to sort by this metric:

First by chunk feerate
Then by cumulative equal-feerate chunk size: the sum of the sizes within all chunks of the same feerate in the same cluster, up to and including the one that holds the transaction under consideration.
Then by salted hash of cluster id.
Then by position within the chunk (or cluster).

Instead of this "sum of sizes of transactions in equal-feerate chunks up to current chunk", it can be any function of the cluster that is monotonically increasing within equal-feerate transaction groups of the same cluster. It can even be multiple functions, e.g. first sum of sizes, and then sum of hashes of topologies of chunks or so.

This would make things more deterministic, but in an arbitrary way - making certain transactions preferred over others, without them being fundamentally more desirable for miners. Size at least has some bearing on the block building logic, but it's not clear to me that smallest-first is necessarily best: if the block boundary falls near the end of a group of equal-feerate chunks, you'd want the smallest ones at the end. Using a hash of topology or so is even more arbitrary.

instagibbs · 2025-12-29T15:00:36Z

Tend to agree I don't think the additional complexity is worth it since:

you're not going to get rid of lots of non-determinism in practice (see brc20 patterns of same-size-same-feerate-same-topo chains)
may encourage weird games by transaction makers to "grind" it in general case to come in "on top", if you consider the more general/arbitrary functions like topology hash.

gmaxwell · 2025-12-31T16:49:04Z

hm? you can absolutely eliminate all non-determinism-- e.g. with the old use the wtxid as a tiebreaker, if nothing else. I think you'll probably have to eliminate all non-determinism one way or another eventually if this sort order is eventually used for efficient block relay. It may be desirable to eliminate it now so it doesn't have to be done later and so e.g. metrics that determine if miners are censoring transactions get less noise from randomness here, or even compact block reconstruction rates are less harmed by differences in eviction choices if the network starts evicting. They're fringe benefits (at least now) but I'm not aware of wtxid grinding to get a slight tiebreak advantage ever having been a concern. In any case, for the reasons that determinism is better (or worse) any marginal increase in determinism is good/bad.

My discussion was sipa was basically just on 'can't you prefer smaller first' under the idea that it would be better for block packing, but he convinced me that the block boundary could fall anywhere in the cluster and so that wasn't an obvious win. But being able to sort on cumulative size is more or less orthogonal to making the sort more deterministic.

@0xB10C any thoughts about block content predictability here?

instagibbs · 2025-12-31T16:52:16Z

e.g. with the old use the wtxid as a tiebreaker, if nothing else

Right, I was speaking of the exact proposed methods (not wtxid!), we previously used time as a tiebreaker iirc (pre cluster mempool)

edit: guess I didn't publish my other time related comment

I'm not aware of wtxid grinding to get a slight tiebreak advantage ever having been a concern

Well, I don't think we used wtxid as tiebreak previously, so it wouldn't have been a concern? In practice I don't think it would effect network resources, so I'm not particularly concerned, it's just something new people might do.

gmaxwell · 2025-12-31T17:12:32Z

Ah I guess it was the relay sort tie broke on wtxid then. In any case I think it's good to tiebreak first on things we want people to do but after all that the choice is something like wtxid or random (I agree arrival order is right out). A final arbitrary criteria risks that people game it to get a slight preference, but the alternative of random has the downside of making block inclusion/order less consistent between nodes.

sipa · 2025-12-31T19:22:01Z

There are several places where tie-break decisions need to be made:

The order of equal-feerate chunks in distinct clusters: currently lowest cluster_id first, while this PR makes it uniformly random.
The order of equal-feerate chunks within the same cluster (without dependency constraints between them): since SFL, smallest chunk first and then uniformly random.
The order of transactions within the same chunk: since SFL, random but followed by a PostLinearize() which in practice often undoes the randomization.

So even with this PR, the current situation is arguably inconsistent (we prefer smaller chunks, but don't tie-break based on size elsewhere). I feel like we should make a decision for what properties are desirable, and then implement it consistently. I think we definitely want highest-feerate chunk first (that's what cluster mempool is), and probably want PostLinearize() within chunks, as they are objective (even though PostLinearize()'s quality improvement isn't actually observable without sub-chunk block template building). But besides those, we could decide:

Preference for smaller size or not? While smallest doesn't directly mean better, it's at least objectively relevant to block building, and if grinding is somehow an outcome, grinding for smaller size is probably not terrible.
- (s) Preference for size.
  - For (1), prefer smaller clusters first (h) or smaller equal-feerate-chunks-before (c)
  - For (2), prefer smaller chunks like now.
  - For (3), prefer smaller transactions within chunks first. Optionally (f) or not (m) it could involve preferring higher-feerate transactions within a chunk first too, before preferring smaller size.
- (u) Do not discriminate based on size at all (including no preference for smaller chunks).
How to tie-break the rest:
- (r) Randomly.
- (t) By txid.
  - For (1), pick lower sum of txids (as bigint) of the whole cluster (h) or sum of txids of equal-feerate-chunks-before (c).
  - For (2), pick chunk with smallest sum of txids first.
  - For (3), pick lowest txid transaction first.
- (w) Like (t), but use wtxids rather than txids. I think this is (very slightly) worse as it might encourage witness grinding which is slightly easier than full txid grinding, and doesn't have any benefits as far as I can see.
- (d) Like (t), but use a salted hash of txids rather than directly. This avoids a global bias, and thus avoids grinding, but unlike (r) it is per-host deterministic (and thus won't randomly change any time a cluster is modified).

I think all of these are doable for implementation. For any of the versions with dependency on wtxid (d/t/w) I think it would involve handing TxGraph a callback function which computes a score based on mempool data (as TxGraph doesn't have access to wtxids directly).

The easiest solution is (ur). Everything random, including no preference for smallest chunk first.
If size preference is a good discriminator, I think (scfr) makes sense, and probably gets rid of most non-determinism.
If we want fully deterministic, (scft), but it comes at the cost of some implementation complexity.

Marking as draft until decided what approach we want.

gmaxwell · 2025-12-31T20:35:29Z

Other preferences like "creates fewest outputs" could also be used before breaking on size, but perhaps it would be rare that this would be the tiebreaker? Like size it has an advantage that if someone is "gaming" it that's a good-ish thing. In any case the more criteria you stick in front of a grindable tiebreaker the less the grinding matters. (Or, if you do go with a local pseudorandom choice, -- which I think you shouldn't-- the more criteria in front of it the less unpredictable the behavior will be.)

You could prefer the lowest maximum confirmed input height as a tie breaker before TXID: It's a deterministic, naturally scarce resource-- and favoring the coins that have moved less recently in the event of a tie seems healthy.

Does your comparison need to obey the triangle inequality-- I assume not if you could decide 'randomly'? H(sorted(txid_a,txid_b))^txid_a < H(sorted(txid_a,txid_b))^txid_b as the tiebraker? this can't usefully be ground.

sipa · 2025-12-31T20:56:22Z

Other preferences like "creates fewest outputs" could also be used before breaking on size, but perhaps it would be rare that this would be the tiebreaker? Like size it has an advantage that if someone is "gaming" it that's a good-ish thing.

Indeed, but I don't think it's particularly useful. It should be pretty rare to have a different number of outputs if the size is already the same. And similarly, after grinding for smaller size (which probably already means fewer outputs) you can't really make the number of outputs smaller while keeping the size the same. There is also some CPU/memory cost to having more criteria, especially if it somehow turns out that they need to be cached on a per-chunk basis (which may be the case for (c)).

You could prefer the lowest maximum confirmed input height as a tie breaker before TXID: It's a deterministic, naturally scarce resource-- and favoring the coins that have moved less recently in the event of a tie seems healthy.

Ha, tie-break by anti-priority.

Does your comparison need to obey the triangle inequality-- I assume not if you could decide 'randomly'? H(sorted(txid_a,txid_b))^txid_a < H(sorted(txid_a,txid_b))^txid_b as the tiebraker? this can't usefully be ground.

Yes, it needs to be consistent. In the current implementation, that's achieved by mapping everything to a score function and comparing the scores. For random, the score isn't actually a function, just a randomly-generated value assigned to each chunk/transaction.

For (2) and (3), any scoring function works (even if the result is some complicated tuple of uint256s and feerates), really, because it's internal to the linearization algorithm. Compute once for each chunk/txn throw into the decision-making heaps.

For (1), it's more complicated, because it either needs to be cached per chunk/cluster, or recomputed any time two equal-feerate chunks are sorted. Furthermore, it must respect chunk ordering within clusters. So we need something that can be implemented as a function of (cluster, chunk within that cluster), in such a way that f(cluster, chunk_pos1) <= f(cluster, chunk_pos2) if chunk_pos1 <= chunk_pos2 and the two chunks have equal feerate. That's why I suggest sums of values (sizes, or hashes) of the whole cluster, or of equal-feerate chunk prefixes; those are naturally monotonically increasing with chunk position. Anything that's just a function of cluster satisfies this, however.

So it's possible for both (1) and (2) to just use SHA256(concat(sorted(txids))) or so (for all txids within cluster, or all txids within chunk). But I'm not sure that is desirable, because it means any new transaction could completely change the existing orderings.

sipa · 2026-01-04T21:14:47Z

What about the following fully-deterministic ordering:

First, prefer smaller sizes.

For sorting equal-feerate chunks in distinct clusters: sort by the sum of tx weights across all chunks in the same cluster of the same feerate, up to and including the chunk itself.
For sorting equal-feerate chunks within the same cluster: sort by chunk weight.
For sorting transactions within a chunk: sort by tx weight (and PostLinearize afterwards).

Then, tie-break by "maximum txid":

For sorting equal-feerate chunks (both within and across clusters), sort by the maximum txid within the chunk.
For sorting transactions within a chunk, use transaction txid (and PostLinearize afterwards).

I think this can be done with a very small impact on CPU and memory (8 bytes per transaction: 4 to store the equal-feerate-chunk-prefix-weight, and 4 to store the max txid GraphIndex).

sipa · 2026-01-15T05:25:20Z

Closing in favor of #34257.

ajtowns · 2026-01-15T22:42:44Z

First, prefer smaller sizes.

Wouldn't it be better to prefer larger sizes? If you have two equal feerate chunks, A and B, and you're filling a block then either:

both A and B fit, in which case ordering doesn't matter
neither of A and B fit, in which case ordering doesn't matter
either but not both of A and B fit, in which case you'll add one of them, and then fill the rest of the block with lower feerate txs, so you would prefer to include the larger of A and B to maximise total fee for the block

sipa · 2026-01-15T22:47:59Z

@ajtowns There is also the possibility that A is smaller and fits, and B is bigger and doesn't fit (regardless of whether A has been included). In this case, you want A first. I believe this will exactly compensate for the probability that one but not both fits, where you prefer the bigger one first. (EDIT: this only matters within a cluster - across clusters they're independent)

Relevant IRC discussion today:

11:42:57 < darosior> sipa: the discussion in #33335 is convincing that a deterministic tie breaker is preferable. But i find it quite unintuitive that between two equal-feerate transactions you would choose to pick the one that pays a miner less. Miners optimize for maximum fees, which often (though not always) means maximizing feerate first, but also just absolute
11:42:57 < darosior> fees? Now i understand miners optimize for what actually ends up paying them, which is what they can actually pack in a template, and that sometimes smaller transactions can lead to higher-paying templates, but is that the common case?
...
11:54:28 < sipa> darosior: it's important to note that this is just choosing the order of chunks within a group of equal-feerate ones - and that group has fixed size, and fixed total fee
11:56:17 < sipa> so it's just a question of where to put the boundaries between chunks in it, and it's really the distance between where the block boundary falls and the last chunk boundary before it that matters
12:03:27 < darosior> And more often than not you'd want to have the smaller transaction first because the larger (higher-fee) one would not fit and therefore if you didn't order this way you'd give up both their fees?
12:03:35  * darosior spells out loud to make sure he understands
...
13:14:51 < sipa> darosior: no, i don't think it matters either way - the only reason to prefer smaller is because people deliberately grinding for smaller size feels "not really bad"
13:15:02 < sipa> we just can't predict where the boundaries will fall
13:15:51 < sipa> maybe in a world where there is a permanent transaction backlog, and many transactions that just never get mined, smaller first is actually better because you can have clusters whose ends never make it
13:16:23 < sipa> but i don't think this is an important point... the real "quality" aspect is covered by picking higher feerate first, and splitting up chunks in minimal parts
13:16:32 < sipa> the rest is just convention to get determinism
13:41:53 < darosior> On the other hand choosing on the contrary the highest-fee-paying one may directly solve the "grinding" since it would mean just paying for more block space which someone can already do if they want to. But yeah it's probably never going to matter either way.
13:45:46 < sipa> yeah, fair point - i can see an argument for largest size first too

ajtowns · 2026-01-16T02:11:53Z

There is also the possibility that A is smaller and fits, and B is bigger and doesn't fit (regardless of whether A has been included).

Right -- when they're in different clusters, considering B first is fine; it'll be skipped and A taken; but when they're in the same cluster, skipping B would also prevent anything else from the cluster being included.

So within a cluster, consider smaller chunks first; across clusters, consider the bigger chunk first? Maybe that's a bit too much complexity for too little value though, especially since it's not stable between cluster merges/splits?

FWIW, I don't find the grinding argument very persuasive: if you can find a way to make your tx smaller, then you're already encouraged to do that by fee rate prioritisation. Making a larger tx and also paying more fees for it just to keep the fee rate constant isn't attractive, as far as I can see.

This was referenced Sep 7, 2025

Cluster mempool tracking issue #30289

Open

Cluster mempool implementation #28676

Closed

DrahtBot mentioned this pull request Sep 8, 2025

cluster mempool: control/optimize TxGraph memory usage #33157

Merged

sipa force-pushed the 202509_txgraph_rand branch from c91e128 to 593d418 Compare September 9, 2025 18:40

instagibbs reviewed Sep 11, 2025

View reviewed changes

src/txgraph.cpp Show resolved Hide resolved

instagibbs approved these changes Sep 12, 2025

View reviewed changes

sipa force-pushed the 202509_txgraph_rand branch from 593d418 to 45341ba Compare September 17, 2025 15:03

DrahtBot mentioned this pull request Oct 10, 2025

Cluster mempool followups #33591

Merged

sipa force-pushed the 202509_txgraph_rand branch from 45341ba to cc52a89 Compare October 12, 2025 13:01

DrahtBot mentioned this pull request Oct 13, 2025

TxGraph: change m_excluded_clusters #33469

Merged

sipa force-pushed the 202509_txgraph_rand branch from cc52a89 to 949353a Compare October 14, 2025 15:59

instagibbs reviewed Oct 14, 2025

View reviewed changes

src/txgraph.cpp Outdated Show resolved Hide resolved

sipa force-pushed the 202509_txgraph_rand branch from 949353a to e325c3a Compare October 14, 2025 18:08

glozow added the Mempool label Oct 27, 2025

sipa marked this pull request as draft December 31, 2025 20:04

sipa mentioned this pull request Jan 11, 2026

txgraph: deterministic optimal transaction order #34257

Open

sipa closed this Jan 15, 2026

txgraph: randomize order of same-feerate distinct-cluster transactions #33335

txgraph: randomize order of same-feerate distinct-cluster transactions #33335

Uh oh!

Conversation

sipa commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

Uh oh!

instagibbs commented Sep 11, 2025

Uh oh!

instagibbs left a comment

Choose a reason for hiding this comment

Uh oh!

sipa commented Sep 17, 2025

Uh oh!

sipa commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

instagibbs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

instagibbs commented Dec 15, 2025

Uh oh!

sipa commented Dec 15, 2025

Uh oh!

instagibbs commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

instagibbs commented Dec 18, 2025

Uh oh!

sipa commented Dec 27, 2025

Uh oh!

instagibbs commented Dec 29, 2025

Uh oh!

gmaxwell commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

instagibbs commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gmaxwell commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gmaxwell commented Dec 31, 2025

Uh oh!

sipa commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Jan 15, 2026

Uh oh!

ajtowns commented Jan 15, 2026

Uh oh!

sipa commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajtowns commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

sipa commented Sep 7, 2025 •

edited

Loading

DrahtBot commented Sep 7, 2025 •

edited

Loading

sipa commented Oct 12, 2025 •

edited

Loading

instagibbs commented Dec 18, 2025 •

edited

Loading

sipa commented Dec 18, 2025 •

edited

Loading

gmaxwell commented Dec 31, 2025 •

edited

Loading

instagibbs commented Dec 31, 2025 •

edited

Loading

gmaxwell commented Dec 31, 2025 •

edited

Loading

sipa commented Dec 31, 2025 •

edited

Loading

sipa commented Dec 31, 2025 •

edited

Loading

sipa commented Jan 4, 2026 •

edited

Loading

sipa commented Jan 15, 2026 •

edited

Loading