Fix unbounded heap growth in the priority mempool.#8944
Fix unbounded heap growth in the priority mempool.#8944creachadair merged 17 commits intov0.35.xfrom
Conversation
5d9dc66 to
74c70b7
Compare
|
I think one of these test failures is real. I'll take a closer look in the morning. |
cmwaters
left a comment
There was a problem hiding this comment.
Nice work on this so far 🎉 . Left a few comments.
866c76c to
efb70ea
Compare
Ok, this should be fixed now. |
|
@cmwaters Thanks for all the comments! I believe all should be addressed now, please take another look to see if you agree with the adjustments. |
6b7d1ec to
1a266e5
Compare
The primary effect of this change is to simplify the implementation of the priority mempool to eliminate an unbounded heap growth observed by Vega team when it was enabled in their testnet. It updates and fixes #8775. The main body of this change is to remove the auxiliary indexing structures, and use only the concurrent list structure (the same as the legacy mempool) to maintain both gossip order and priority. This means that operations that require priority information, such as block updates and insert-time evictions, require a linear scan over the mempool. This tradeoff greatly simplifies the code and eliminates the long-term heap load, at the cost of some extra CPU and short-lived working memory during CheckTx and Update calls. Rough benchmark results: - This PR: BenchmarkTxMempool_CheckTx-10 486373 2271 ns/op - Original priority mempool implementation: BenchmarkTxMempool_CheckTx-10 500302 2113 ns/op - Legacy (v0) mempool: BenchmarkCheckTx-10 364591 3571 ns/op These benchmarks are not a good proxy for production load, but at least suggest that the overhead of the implementation changes are not cause for concern. In addition: - Rework synchronization so that access to shared data structures is safe. Previously shared locks were used to exclude block updates during calls that update mempool state. Now access is properly exclusive where necessary. - Fix a bug in the recheck flow, where priority updates from the application were not correctly reflected in the index structures. - Eliminate the need for separate recheck cursors during block update. This avoids the need to explicitly invalidate elements of the concurrent list, which averts the dependency cycle that led to objects being pinned. - Clean up, clarify, and fix inaccuracies in documentation comments throughout the package.
6d9f145 to
4631f69
Compare
williambanfield
left a comment
There was a problem hiding this comment.
Looks good. There are a few edge cases on adding a new transaction with priority / size / gas that may be worth ensuring there's a test for.
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
Yes, I plan to add tests for all those corners. Nothing so far is really exercising those conditions at all. |
This is a manual forward-port of #8944 and related fixes from v0.35.x. One difference of note is that the CheckTx response messages no longer have a field to record an error from the ABCI application. The code is set up so that these could be reported directly to the CheckTx caller, but it would be an API change, and right now a bunch of test plumbing depends on the existing semantics.
This is a manual forward-port of #8944 and related fixes from v0.35.x. One difference of note is that the CheckTx response messages no longer have a field to record an error from the ABCI application. The code is set up so that these could be reported directly to the CheckTx caller, but it would be an API change, and right now a bunch of test plumbing depends on the existing semantics.
This is a manual forward-port of #8944 and related fixes from v0.35.x. One difference of note is that the CheckTx response messages no longer have a field to record an error from the ABCI application. The code is set up so that these could be reported directly to the CheckTx caller, but it would be an API change, and right now a bunch of test plumbing depends on the existing semantics. Also fix up tests relying on implementation-specific mempool behavior. - Commit was setting the expected mempool size incorrectly. - Fix sequence test not to depend on the incorrect size.
This is a manual forward-port of #8944 and related fixes from v0.35.x. One difference of note is that the CheckTx response messages no longer have a field to record an error from the ABCI application. The code is set up so that these could be reported directly to the CheckTx caller, but it would be an API change, and right now a bunch of test plumbing depends on the existing semantics. Also fix up tests relying on implementation-specific mempool behavior. - Commit was setting the expected mempool size incorrectly. - Fix sequence test not to depend on the incorrect size.
This is a manual forward-port of #8944 and related fixes from v0.35.x. One difference of note is that the CheckTx response messages no longer have a field to record an error from the ABCI application. The code is set up so that these could be reported directly to the CheckTx caller, but it would be an API change, and right now a bunch of test plumbing depends on the existing semantics. Also fix up tests relying on implementation-specific mempool behavior. - Commit was setting the expected mempool size incorrectly. - Fix sequence test not to depend on the incorrect size.
Description
The primary effect of this change is to simplify the implementation of the
priority mempool to eliminate an unbounded heap growth observed by Vega team
when it was enabled in their testnet. It updates and fixes #8775.
The main body of this change is to remove the auxiliary indexing structures,
and use only the concurrent list structure (the same as the legacy mempool) to
maintain both gossip order and priority.
This means that operations that require priority information, such as block
updates and insert-time evictions, require a linear scan over the mempool.
This tradeoff greatly simplifies the code and eliminates the long-term heap
load, at the cost of some extra CPU and short-lived working memory during
CheckTx and Update calls.
Rough benchmark results:
BenchmarkTxMempool_CheckTx-10 486373 2271 ns/op
BenchmarkTxMempool_CheckTx-10 500302 2113 ns/op
BenchmarkCheckTx-10 364591 3571 ns/op
These benchmarks are not a good proxy for production load, but at least suggest
that the overhead of the implementation changes are not cause for concern.
In addition:
Rework synchronization so that access to shared data structures is safe.
Previously shared locks were used to exclude block updates during calls that
update mempool state. Now access is properly exclusive where necessary.
Fix a bug in the recheck flow, where priority updates from the application
were not correctly reflected in the index structures.
Eliminate the need for separate recheck cursors during block update. This
avoids the need to explicitly invalidate elements of the concurrent list,
which averts the dependency cycle that led to objects being pinned.
Clean up, clarify, and fix inaccuracies in documentation comments throughout
the package.
Note to Reviewers
This change passes all the existing tests in the package (other than those eliminated by removal of the data structures they exercised). However, the existing test coverage is relatively weak, and should be improved before this code is put into production.