Skip to content

Fix unbounded heap growth in the priority mempool.#8944

Merged
creachadair merged 17 commits intov0.35.xfrom
mjf/priority-mempool
Jul 7, 2022
Merged

Fix unbounded heap growth in the priority mempool.#8944
creachadair merged 17 commits intov0.35.xfrom
mjf/priority-mempool

Conversation

@creachadair
Copy link

@creachadair creachadair commented Jul 6, 2022

Description

The primary effect of this change is to simplify the implementation of the
priority mempool to eliminate an unbounded heap growth observed by Vega team
when it was enabled in their testnet. It updates and fixes #8775.

The main body of this change is to remove the auxiliary indexing structures,
and use only the concurrent list structure (the same as the legacy mempool) to
maintain both gossip order and priority.

This means that operations that require priority information, such as block
updates and insert-time evictions, require a linear scan over the mempool.
This tradeoff greatly simplifies the code and eliminates the long-term heap
load, at the cost of some extra CPU and short-lived working memory during
CheckTx and Update calls.

Rough benchmark results:

  • This PR:
    BenchmarkTxMempool_CheckTx-10 486373 2271 ns/op
  • Original priority mempool implementation:
    BenchmarkTxMempool_CheckTx-10 500302 2113 ns/op
  • Legacy (v0) mempool:
    BenchmarkCheckTx-10 364591 3571 ns/op

These benchmarks are not a good proxy for production load, but at least suggest
that the overhead of the implementation changes are not cause for concern.

In addition:

  • Rework synchronization so that access to shared data structures is safe.
    Previously shared locks were used to exclude block updates during calls that
    update mempool state. Now access is properly exclusive where necessary.

  • Fix a bug in the recheck flow, where priority updates from the application
    were not correctly reflected in the index structures.

  • Eliminate the need for separate recheck cursors during block update. This
    avoids the need to explicitly invalidate elements of the concurrent list,
    which averts the dependency cycle that led to objects being pinned.

  • Clean up, clarify, and fix inaccuracies in documentation comments throughout
    the package.

Note to Reviewers

This change passes all the existing tests in the package (other than those eliminated by removal of the data structures they exercised). However, the existing test coverage is relatively weak, and should be improved before this code is put into production.

@creachadair creachadair force-pushed the mjf/priority-mempool branch 3 times, most recently from 5d9dc66 to 74c70b7 Compare July 6, 2022 02:14
@creachadair
Copy link
Author

I think one of these test failures is real. I'll take a closer look in the morning.

Copy link
Contributor

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this so far 🎉 . Left a few comments.

@creachadair creachadair force-pushed the mjf/priority-mempool branch 2 times, most recently from 866c76c to efb70ea Compare July 6, 2022 14:42
@creachadair
Copy link
Author

I think one of these test failures is real. I'll take a closer look in the morning.

Ok, this should be fixed now.

@creachadair
Copy link
Author

@cmwaters Thanks for all the comments! I believe all should be addressed now, please take another look to see if you agree with the adjustments.

@creachadair creachadair force-pushed the mjf/priority-mempool branch from 6b7d1ec to 1a266e5 Compare July 6, 2022 20:36
Copy link
Contributor

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

M. J. Fromberger added 13 commits July 6, 2022 15:34
The primary effect of this change is to simplify the implementation of the
priority mempool to eliminate an unbounded heap growth observed by Vega team
when it was enabled in their testnet. It updates and fixes #8775.

The main body of this change is to remove the auxiliary indexing structures,
and use only the concurrent list structure (the same as the legacy mempool) to
maintain both gossip order and priority.

This means that operations that require priority information, such as block
updates and insert-time evictions, require a linear scan over the mempool.
This tradeoff greatly simplifies the code and eliminates the long-term heap
load, at the cost of some extra CPU and short-lived working memory during
CheckTx and Update calls.

Rough benchmark results:

 - This PR:
   BenchmarkTxMempool_CheckTx-10             486373              2271 ns/op
 - Original priority mempool implementation:
   BenchmarkTxMempool_CheckTx-10             500302              2113 ns/op
 - Legacy (v0) mempool:
   BenchmarkCheckTx-10                       364591              3571 ns/op

These benchmarks are not a good proxy for production load, but at least suggest
that the overhead of the implementation changes are not cause for concern.

In addition:

- Rework synchronization so that access to shared data structures is safe.
  Previously shared locks were used to exclude block updates during calls that
  update mempool state. Now access is properly exclusive where necessary.

- Fix a bug in the recheck flow, where priority updates from the application
  were not correctly reflected in the index structures.

- Eliminate the need for separate recheck cursors during block update. This
  avoids the need to explicitly invalidate elements of the concurrent list,
  which averts the dependency cycle that led to objects being pinned.

- Clean up, clarify, and fix inaccuracies in documentation comments throughout
  the package.
@creachadair creachadair force-pushed the mjf/priority-mempool branch from 6d9f145 to 4631f69 Compare July 6, 2022 22:36
Copy link
Contributor

@williambanfield williambanfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. There are a few edge cases on adding a new transaction with priority / size / gas that may be worth ensuring there's a test for.

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
@creachadair
Copy link
Author

Looks good. There are a few edge cases on adding a new transaction with priority / size / gas that may be worth ensuring there's a test for.

Yes, I plan to add tests for all those corners. Nothing so far is really exercising those conditions at all.

@creachadair creachadair merged commit 9b02094 into v0.35.x Jul 7, 2022
@creachadair creachadair deleted the mjf/priority-mempool branch July 7, 2022 14:15
creachadair pushed a commit that referenced this pull request Jul 8, 2022
This is a manual backport of the changes from these commits:

- bc49f66 Add more unit tests for the priority mempool. (#8961)
- 9b02094 Fix unbounded heap growth in the priority mempool. (#8944)

Imports and type signatures have been updated to match the v0.34 usage.
creachadair pushed a commit that referenced this pull request Jul 8, 2022
This is a manual backport of the changes from these commits:

- bc49f66 Add more unit tests for the priority mempool. (#8961)
- 9b02094 Fix unbounded heap growth in the priority mempool. (#8944)

Imports and type signatures have been updated to match the v0.34 usage.
creachadair pushed a commit that referenced this pull request Jul 20, 2022
This is a manual forward-port of #8944 and related fixes from v0.35.x.

One difference of note is that the CheckTx response messages no longer have a
field to record an error from the ABCI application. The code is set up so that
these could be reported directly to the CheckTx caller, but it would be an API
change, and right now a bunch of test plumbing depends on the existing semantics.
creachadair pushed a commit that referenced this pull request Jul 20, 2022
This is a manual forward-port of #8944 and related fixes from v0.35.x.

One difference of note is that the CheckTx response messages no longer have a
field to record an error from the ABCI application. The code is set up so that
these could be reported directly to the CheckTx caller, but it would be an API
change, and right now a bunch of test plumbing depends on the existing semantics.
creachadair pushed a commit that referenced this pull request Jul 20, 2022
This is a manual forward-port of #8944 and related fixes from v0.35.x.

One difference of note is that the CheckTx response messages no longer have a
field to record an error from the ABCI application. The code is set up so that
these could be reported directly to the CheckTx caller, but it would be an API
change, and right now a bunch of test plumbing depends on the existing semantics.

Also fix up tests relying on implementation-specific mempool behavior.

- Commit was setting the expected mempool size incorrectly.
- Fix sequence test not to depend on the incorrect size.
creachadair pushed a commit that referenced this pull request Jul 20, 2022
This is a manual forward-port of #8944 and related fixes from v0.35.x.

One difference of note is that the CheckTx response messages no longer have a
field to record an error from the ABCI application. The code is set up so that
these could be reported directly to the CheckTx caller, but it would be an API
change, and right now a bunch of test plumbing depends on the existing semantics.

Also fix up tests relying on implementation-specific mempool behavior.

- Commit was setting the expected mempool size incorrectly.
- Fix sequence test not to depend on the incorrect size.
creachadair pushed a commit that referenced this pull request Jul 20, 2022
This is a manual forward-port of #8944 and related fixes from v0.35.x.

One difference of note is that the CheckTx response messages no longer have a
field to record an error from the ABCI application. The code is set up so that
these could be reported directly to the CheckTx caller, but it would be an API
change, and right now a bunch of test plumbing depends on the existing semantics.

Also fix up tests relying on implementation-specific mempool behavior.

- Commit was setting the expected mempool size incorrectly.
- Fix sequence test not to depend on the incorrect size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants