-
Notifications
You must be signed in to change notification settings - Fork 38.7k
txgraph: randomize order of same-feerate distinct-cluster transactions #33335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33335. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
c91e128 to
593d418
Compare
|
concept ACK, I also think it removes the kinda weird behavior where same CFR clusters get mined/evicted based on the underlying cluster's original sequence value |
instagibbs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 593d418137e4802bbe229707dcda5796522e2e5e
double-checked Trim benchmark just in case, looks unchanged
593d418 to
45341ba
Compare
|
Rebased on top of #33157, as there are some conflicts, and I expect the other one will go in first. |
45341ba to
cc52a89
Compare
|
Rebased. |
cc52a89 to
949353a
Compare
instagibbs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a bit more on the fact that this randomizes eviction ordering in addition to INV gossiping, but rebase was straight forward
git range-diff master 593d418137e4802bbe229707dcda5796522e2e5e 949353ac63249bf51fce34f50ac27f6ec51d0172
Before, the implicit ordering of transactions in a TxGraph's main graph is sorted by: - feerate, high to low - Cluster, low to high m_sequence - linearization order within clusters However, since m_sequence values are assigned deterministically and predictably, the ordering of same-feerate distinct-Cluster transactions can reveal something about the order they were inserted into TxGraph. In the full cluster mempool PR, this ordering (through TxGraph::CompareMainOrder) is used to decide the order of to-be-announced transactions. These should not expose information about insertion order. To deal with this, do not compare the m_sequence values directly, but hash them with a fixed but random key first. In case this yields a collision (which needs ~192000 clusters to have a probability of 10^-9 of existing), fall back to comparing the m_sequence values directly.
949353a to
e325c3a
Compare
|
f0699bd5718eb099e242900d2b4494c171d70825 does something similar, with making chunk ordering within the same cluster at same chunk feerate randomized |
|
@instagibbs Indeed, and with the same rationale. |
|
Should TxGraph::m_rng be instrumented to allow it being deterministic to increase fuzz stability? |
|
The Without that, there is even logic to show an error if the global RNG is used. See |
|
ACK e325c3a |
|
After discussion with @gmaxwell, there is a possibility to do something more deterministic here, though I'm not sure it's worth it. It would be possible to sort by this metric:
Instead of this "sum of sizes of transactions in equal-feerate chunks up to current chunk", it can be any function of the cluster that is monotonically increasing within equal-feerate transaction groups of the same cluster. It can even be multiple functions, e.g. first sum of sizes, and then sum of hashes of topologies of chunks or so. This would make things more deterministic, but in an arbitrary way - making certain transactions preferred over others, without them being fundamentally more desirable for miners. Size at least has some bearing on the block building logic, but it's not clear to me that smallest-first is necessarily best: if the block boundary falls near the end of a group of equal-feerate chunks, you'd want the smallest ones at the end. Using a hash of topology or so is even more arbitrary. |
|
Tend to agree I don't think the additional complexity is worth it since:
|
|
hm? you can absolutely eliminate all non-determinism-- e.g. with the old use the wtxid as a tiebreaker, if nothing else. I think you'll probably have to eliminate all non-determinism one way or another eventually if this sort order is eventually used for efficient block relay. It may be desirable to eliminate it now so it doesn't have to be done later and so e.g. metrics that determine if miners are censoring transactions get less noise from randomness here, or even compact block reconstruction rates are less harmed by differences in eviction choices if the network starts evicting. They're fringe benefits (at least now) but I'm not aware of wtxid grinding to get a slight tiebreak advantage ever having been a concern. In any case, for the reasons that determinism is better (or worse) any marginal increase in determinism is good/bad. My discussion was sipa was basically just on 'can't you prefer smaller first' under the idea that it would be better for block packing, but he convinced me that the block boundary could fall anywhere in the cluster and so that wasn't an obvious win. But being able to sort on cumulative size is more or less orthogonal to making the sort more deterministic. @0xB10C any thoughts about block content predictability here? |
Right, I was speaking of the exact proposed methods (not wtxid!), we previously used time as a tiebreaker iirc (pre cluster mempool) edit: guess I didn't publish my other time related comment
Well, I don't think we used wtxid as tiebreak previously, so it wouldn't have been a concern? In practice I don't think it would effect network resources, so I'm not particularly concerned, it's just something new people might do. |
|
Ah I guess it was the relay sort tie broke on wtxid then. In any case I think it's good to tiebreak first on things we want people to do but after all that the choice is something like wtxid or random (I agree arrival order is right out). A final arbitrary criteria risks that people game it to get a slight preference, but the alternative of random has the downside of making block inclusion/order less consistent between nodes. |
|
There are several places where tie-break decisions need to be made:
So even with this PR, the current situation is arguably inconsistent (we prefer smaller chunks, but don't tie-break based on size elsewhere). I feel like we should make a decision for what properties are desirable, and then implement it consistently. I think we definitely want highest-feerate chunk first (that's what cluster mempool is), and probably want
I think all of these are doable for implementation. For any of the versions with dependency on wtxid (d/t/w) I think it would involve handing
Marking as draft until decided what approach we want. |
|
Other preferences like "creates fewest outputs" could also be used before breaking on size, but perhaps it would be rare that this would be the tiebreaker? Like size it has an advantage that if someone is "gaming" it that's a good-ish thing. In any case the more criteria you stick in front of a grindable tiebreaker the less the grinding matters. (Or, if you do go with a local pseudorandom choice, -- which I think you shouldn't-- the more criteria in front of it the less unpredictable the behavior will be.) You could prefer the lowest maximum confirmed input height as a tie breaker before TXID: It's a deterministic, naturally scarce resource-- and favoring the coins that have moved less recently in the event of a tie seems healthy. Does your comparison need to obey the triangle inequality-- I assume not if you could decide 'randomly'? H(sorted(txid_a,txid_b))^txid_a < H(sorted(txid_a,txid_b))^txid_b as the tiebraker? this can't usefully be ground. |
Indeed, but I don't think it's particularly useful. It should be pretty rare to have a different number of outputs if the size is already the same. And similarly, after grinding for smaller size (which probably already means fewer outputs) you can't really make the number of outputs smaller while keeping the size the same. There is also some CPU/memory cost to having more criteria, especially if it somehow turns out that they need to be cached on a per-chunk basis (which may be the case for (c)).
Ha, tie-break by anti-priority.
Yes, it needs to be consistent. In the current implementation, that's achieved by mapping everything to a score function and comparing the scores. For random, the score isn't actually a function, just a randomly-generated value assigned to each chunk/transaction. For (2) and (3), any scoring function works (even if the result is some complicated tuple of uint256s and feerates), really, because it's internal to the linearization algorithm. Compute once for each chunk/txn throw into the decision-making heaps. For (1), it's more complicated, because it either needs to be cached per chunk/cluster, or recomputed any time two equal-feerate chunks are sorted. Furthermore, it must respect chunk ordering within clusters. So we need something that can be implemented as a function of (cluster, chunk within that cluster), in such a way that f(cluster, chunk_pos1) <= f(cluster, chunk_pos2) if chunk_pos1 <= chunk_pos2 and the two chunks have equal feerate. That's why I suggest sums of values (sizes, or hashes) of the whole cluster, or of equal-feerate chunk prefixes; those are naturally monotonically increasing with chunk position. Anything that's just a function of cluster satisfies this, however. So it's possible for both (1) and (2) to just use SHA256(concat(sorted(txids))) or so (for all txids within cluster, or all txids within chunk). But I'm not sure that is desirable, because it means any new transaction could completely change the existing orderings. |
|
What about the following fully-deterministic ordering: First, prefer smaller sizes.
Then, tie-break by "maximum txid":
I think this can be done with a very small impact on CPU and memory (8 bytes per transaction: 4 to store the equal-feerate-chunk-prefix-weight, and 4 to store the max txid |
|
Closing in favor of #34257. |
Wouldn't it be better to prefer larger sizes? If you have two equal feerate chunks, A and B, and you're filling a block then either:
|
|
@ajtowns There is also the possibility that A is smaller and fits, and B is bigger and doesn't fit (regardless of whether A has been included). In this case, you want A first. I believe this will exactly compensate for the probability that one but not both fits, where you prefer the bigger one first. (EDIT: this only matters within a cluster - across clusters they're independent) Relevant IRC discussion today: |
Right -- when they're in different clusters, considering B first is fine; it'll be skipped and A taken; but when they're in the same cluster, skipping B would also prevent anything else from the cluster being included. So within a cluster, consider smaller chunks first; across clusters, consider the bigger chunk first? Maybe that's a bit too much complexity for too little value though, especially since it's not stable between cluster merges/splits? FWIW, I don't find the grinding argument very persuasive: if you can find a way to make your tx smaller, then you're already encouraged to do that by fee rate prioritisation. Making a larger tx and also paying more fees for it just to keep the fee rate constant isn't attractive, as far as I can see. |
Part of #30289.
Before, the implicit ordering of transactions in a TxGraph's main graph is sorted by:
Cluster, low to highm_sequenceHowever, since
m_sequencevalues are assigned deterministically and predictably, the ordering of same-feerate distinct-Cluster transactions can reveal something about the order they were inserted intoTxGraph.In #28676, this ordering (through
TxGraph::CompareMainOrder) is used to decide the order of to-be-announced transactions. These should not expose information about insertion order.To deal with this, do not compare the
m_sequencevalues directly, but hash them with a fixed but random key first. In case this yields a collision (which needs ~192000 clusters to have a probability of 10^-9 of existing), fall back to comparing them_sequencevalues directly.Alternatives: we could instead use the pre-cluster-mempool approach of sorting by wtxid in case of ties, which is equally non-revealing, and deterministic. I find it cleaner to have a randomized internal total ordering rather than needing a partial ordering within
TxGraphthat needs to be complemented by tie-breaking logic inside the mempool, however, and I don't think the determinism is that valuable.For context: #28676 (comment)