Skip to content

feat(sequencer): make mempool balance aware#1408

Merged
lobstergrindset merged 6 commits intomainfrom
lilyjjo/mempool_balance_aware
Sep 11, 2024
Merged

feat(sequencer): make mempool balance aware#1408
lobstergrindset merged 6 commits intomainfrom
lilyjjo/mempool_balance_aware

Conversation

@lobstergrindset
Copy link
Copy Markdown
Contributor

@lobstergrindset lobstergrindset commented Aug 26, 2024

Summary

Added logic to the mempool to make its pending vs parked logic balance aware. Transactions will only be added to the pending queue if the signing account has enough balance to cover the cost of all of the transactions in the pending plus the new transaction. Transactions that exceed the account's available balance are put into parked.

This PR also contains the logic needed to recost the transactions in the mempool when a FeeAssetChange or FeeChange actions are ran.

image

Background

Previously the mempool did not have sufficient logic to deal with transaction costs. CheckTx would kick out any transaction that the account did not have enough balance to cover. This didn't account for the edge cases of:

  1. In-flight transactions from other accounts funding the deficient account.
  2. Sequential transactions that together exceed an account's balance but independently are fine. This causes unnecessary transaction failures in prepare_proposal()'s block construction.

Changes

  • Added a cost field to the TimemarkedTransaction struct.
  • Made the PendingTransactionsForAccount type's add() functionality reject transactions that would exceed an account's balance.
  • Added cost-aware demotion logic to the PendingTransactionsForAccount type and promotion logic to the ParkedTransactionsForAccount type which is invoked in the mempool's run_maintenance() function.
  • Factored out some functionality of internal functions to be used elsewhere.
  • Has the app flag when a FeeAssetChange or FeeChange actions is executed and proceeds to recost the mempool in the run_maintenance() function.

Testing

Wrote unit tests and ran locally.

Metrics

Added timing metrics to CheckTx to track how long the transaction costing logic takes:

  • check_tx_duration_seconds_convert_address
  • check_tx_duration_seconds_fetch_balances
  • check_tx_duration_seconds_fetch_tx_cost
    Added metrics to count how many times the mempool is recosted:
  • mempool_recosted

Related Issues

closes #1336

@github-actions github-actions bot added the sequencer pertaining to the astria-sequencer crate label Aug 26, 2024
@lobstergrindset lobstergrindset marked this pull request as ready for review August 27, 2024 18:47
@lobstergrindset lobstergrindset requested a review from a team as a code owner August 27, 2024 18:47
Copy link
Copy Markdown
Contributor

@Fraser999 Fraser999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review with some minor suggestions.

Comment on lines +320 to +306
.chain(parked.addresses())
.collect();
Copy link
Copy Markdown
Contributor

@noot noot Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there could be duplicates of addresses here if it's in both pending and parked? maybe sort and then dedup?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're collected into a HashSet which does the dedup by nature of the container. Happy to change into something else if you'd prefer

Comment on lines +353 to +347
if demotion_txs.is_empty() {
// nothing to demote, check for transactions to promote
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we only promote txs if there's nothing to demote? shouldn't both occur either way?

Copy link
Copy Markdown
Contributor Author

@lobstergrindset lobstergrindset Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we demoted something from an account's pending queue that means there shouldn't be anything to promote from the account's parked queue. The pending queue's enforcement of sequential nonces is the main reason for this.

For example, if we have nonces [0,1,2] in pending and nonces [3,4] in parked , and [2] gets demoted from pending because of lack of account funds, then parked would be [2,3,4]. During the promotion loop 2 would be ineligible for promotion due to lack of account funds, also making 3,4 ineligible because of the sequential nonce requirement.

I should write up docs on how the different queues work.

Comment on lines +225 to +247
/// Note: assumes that the balances in `current_account_balance` are large enough
/// to cover costs for contained transactions. Will log an error if this is not true
/// but will not fail.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not return an error? what cases would that error happen, developer error or other?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be a code logic error. It is possible to have too high costs in pending after transaction re-costing or a different node including a different transaction for the same account's nonce in a block, but we only call get_remaining_balances() after we run the demotion logic on pending which should make the transaction costs covered.

Fraser and I thought it would be better to let the mempool continue running if somewhere else in the code there was a bug that would make the above assumptions untrue instead of having the mempool crash. The idea was that someone would see the error logs and we'd know we need to go fix something 😬

/// Note: this function only operates on the front of the queue. If the target nonce is not at
/// the front, an error will be logged and nothing will be returned.
fn pop_front_contiguous(
/// the front, nothing will be returned.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pass in the target nonce if the function only works at the front of the queue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The target nonce is the next nonce that pending is expecting, I can add that to the documentation. It's not guaranteed that parked will contain a transaction with the target nonce and I was letting this function internalize that check instead of doing it explicitly. I can make it explicit if you'd prefer.

There shouldn't be the scenario where parked and pending have overlapping nonces as we only add things to parked if their nonce isn't already in pending.

) -> bool;

/// Returns `Ok` if adding `ttx` would not break the balance precondition, i.e. enough
/// balance to cover all transactions in `CostsCovered` mode.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have such a thing as CostsCovered mode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to do the naming similar to the SequentialNonces mode we have but for saying the container only contains transactions that are covered by the account's balances. On reflection CostsCovered isn't the same as the SequentialNonces case because we have to reclean to stay in CostsCovered when SequentialNonces is always true.

Happy to remove the comments if you think they are misleading.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you're ok to remove just the " in CostsCovered mode" part, that would make more sense to me.

@lobstergrindset
Copy link
Copy Markdown
Contributor Author

Addressed all comments I didn't comment responses to, marking this ready for review again

Copy link
Copy Markdown
Contributor

@SuperFluffy SuperFluffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments on the merged asset/asset-balance stream

Copy link
Copy Markdown
Contributor

@Fraser999 Fraser999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now - optimistic approval assuming @SuperFluffy's latest comments are addressed.

ttx: &TimemarkedTransaction,
current_account_balances: &HashMap<IbcPrefixed, u128>,
) -> bool {
let mut current_account_balances = current_account_balances.clone();
Copy link
Copy Markdown
Contributor

@SuperFluffy SuperFluffy Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty innocently named functions where I would not expect a deep copy to happen.

I think this could already be addressed by changing the type signature (to make the clone explicit at the call site) and adding a comment:

/// Tests if `account_balances` has enough funds to cover `self` and `ttx`.
fn has_balance_to_cover(
    &self,
    ttx: &TimeMarkedTransaction,
    mut account_balances: HashMap<IbcPrefixed, u128,
) -> bool

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. Will change!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this was done to prevent doing copies of the account_balances for the parked transactions container, which is a noop for the has_balance_to_cover fn. We can avoid making unnecessary copies if we do the clone on the inside for the pending container

@SuperFluffy
Copy link
Copy Markdown
Contributor

Given this being a breaking charge, please mark this patch as breaking using !: feat(sequencer)!: make mempool balance aware.

Also, please add a section # Breaking Changelist explaining:

  1. why changes to the mempool constitute a breaking change?
  2. why changes to the mempool affect the app hash?

@lobstergrindset lobstergrindset changed the title feat(sequencer): make mempool balance aware feat(sequencer)!: make mempool balance aware Sep 11, 2024
@lobstergrindset lobstergrindset changed the title feat(sequencer)!: make mempool balance aware feat(sequencer): make mempool balance aware Sep 11, 2024
github-merge-queue bot pushed a commit that referenced this pull request Sep 11, 2024
…e keys (#1487)

## Summary
Updated public addresses in `app/test_util.rs` to addresses with known
private keys.

## Background
Needed for PR #1408. We want to separate out the 'breaking change' into
a different PR since it is unrelated.

## Breaking Changes

While the sequencer app hash snapshots are changed, this patch is not
breaking as not business logic was updated.
lobstergrindset and others added 5 commits September 11, 2024 13:02
…1411)

This PR builds on top of the mempool balance aware logic to recost the
mempool when a `FeeAssetChange` or `FeeChange` action are seen in a
finalized block.

There needs to be a follow up PR to ensure that the `FeeChange` and
`FeeAssetChange` actions are only ran at the end of the block to ensure
that transactions are invalidated mid-block due to fee changes.

The mempool needs to recost transactions if the fees are changed, as the
original placement of the transaction in the `parked` or `pending`
structure may be out of date.

- Added logic to flag when `FeeAssetChange` or `FeeChange` actions are
included in a block.
- The mempool will now re-cost transactions when this happens.
- Changed the mempool to use the app's state directly instead of using
getters

Unit tests for the mempool and the app logic.

This doesn't have breaking changes, but the snapshot files were
re-generated because I changed the public addresses for `CAROL_ADDRESS`,
`BOB_ADDRESS`, and `JUDY_ADDRESS` in the `test_utils.rs` file. I needed
access to their private keys to sign transactions for testing promotion
flows on the app level.

Added `MEMPOOL_RECOSTED` counter to provide metrics on how often fees
are being changed.

---------

Co-authored-by: Jordan Oroshiba <jordan@astria.org>
@lobstergrindset lobstergrindset force-pushed the lilyjjo/mempool_balance_aware branch from 43b2186 to 420fc3a Compare September 11, 2024 17:06
@lobstergrindset lobstergrindset added this pull request to the merge queue Sep 11, 2024
Merged via the queue into main with commit b401e4f Sep 11, 2024
@lobstergrindset lobstergrindset deleted the lilyjjo/mempool_balance_aware branch September 11, 2024 18:24
steezeburger added a commit that referenced this pull request Sep 23, 2024
* main:
  feat(sequencer): make mempool balance aware (#1408)
  chore(sequencer): change test addresses to versions with known private keys (#1487)
  chore(chart): update geth tag (#1485)
  feat(sequencer): report deposit events (#1447)
  feat(proto, core, sequencer)!: add traceability to rollup deposits (#1410)
  fix(bridge-withdrawer, cli, sequencer-client): migrate from `broadcast_tx_commit` to `broadcast_tx_sync` (#1376)
  fix(sequencer): add `end_block` to `app_execute_transaction_with_every_action_snapshot` (#1455)
  release: end of iteration release cuts (#1456)
  chore(charts): rollupName templates (#1458)
  chore(sequencer-relayer): Add instrumentation (#1375)
  feat(proto, core, sequencer)!: permit bech32 compatible addresses (#1425)
  chore: memoize `address_bytes` of verification key (#1444)
  chore(ci): include `ibc-bridge-test` in `docker` CI target (#1438)
  chore(charts): bump celestia versions (#1431)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sequencer pertaining to the astria-sequencer crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sequencer, mempool: implement account based balance-aware transaction validation and storage

4 participants