perf: TxSearch pagination (backport #2855) by mergify[bot] · Pull Request #2910 · cometbft/cometbft

mergify · 2024-04-27T07:25:06Z

Since moving to faster blocks, Osmosis public RPC nodes have noticed massive RAM spikes, resulting in nodes constantly crashing:

After heap profiling, the issue was clearly coming from TxSearch, showing that it was unmarshaling a huge amount of data.

After looking into the method, the issue is that txSearch retrieves all hashes (filtered by the query condition), but we call Get (and therefore unmarshal) every filtered transaction from the transaction index store, regaurdless whether or not the transactions are within the pagination request. Therefore, if one were to call txSearch on an event that happens on almost every transaction, this causes the node to unmarshal essentially every transaction.

We have all the data we need in the key though to sort the transaction hashes without unmarshaling the transactions at all! This PR filters and sorts the hashes, paginates them, and then only retrieves the transactions that fall in the page being requested.

We have run this patch on two of our RPC nodes, and have seen zero spikes on the patched ones thus far!

PR checklist

Tests written/updated
Changelog entry added in .changelog (we use unclog to manage our changelog)
Updated relevant documentation (docs/ or spec/) and code comments
Title follows the Conventional Commits spec

This is an automatic backport of pull request #2855 done by [Mergify](https://mergify.com).

Since moving to faster blocks, Osmosis public RPC nodes have noticed massive RAM spikes, resulting in nodes constantly crashing: ![Screenshot 2024-04-20 at 11 25 36 AM](https://github.com/osmosis-labs/cometbft/assets/40078083/18d0513e-25fc-4510-b4bd-b48472a9df69) After heap profiling, the issue was clearly coming from TxSearch, showing that it was unmarshaling a huge amount of data. ![Screenshot 2024-04-20 at 11 28 29 AM](https://github.com/osmosis-labs/cometbft/assets/40078083/5d88a66a-c72d-4752-8770-a2c00e6d7669) After looking into the method, the issue is that txSearch retrieves all hashes (filtered by the query condition), but we call Get (and therefore unmarshal) every filtered transaction from the transaction index store, regaurdless whether or not the transactions are within the pagination request. Therefore, if one were to call txSearch on an event that happens on almost every transaction, this causes the node to unmarshal essentially every transaction. We have all the data we need in the key though to sort the transaction hashes without unmarshaling the transactions at all! This PR filters and sorts the hashes, paginates them, and then only retrieves the transactions that fall in the page being requested. We have run this patch on two of our RPC nodes, and have seen zero spikes on the patched ones thus far! ![Screenshot 2024-04-20 at 11 33 11 AM](https://github.com/osmosis-labs/cometbft/assets/40078083/fd815f81-5756-45bd-b1c0-818e6774ea53) #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec (cherry picked from commit b420f07) # Conflicts: # rpc/core/tx.go

mergify · 2024-04-27T07:25:08Z

Cherry-pick of b420f07 has failed:

On branch mergify/bp/v1.x/pr-2855
Your branch is up to date with 'origin/v1.x'.

You are currently cherry-picking commit b420f0765.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   .changelog/unreleased/improvements/2855-fix-txsearch-performance.md
	modified:   internal/inspect/inspect_test.go
	modified:   rpc/core/blocks.go
	modified:   state/indexer/sink/psql/backport.go
	modified:   state/pruner_test.go
	modified:   state/txindex/indexer.go
	modified:   state/txindex/kv/kv.go
	modified:   state/txindex/kv/kv_bench_test.go
	modified:   state/txindex/kv/kv_test.go
	modified:   state/txindex/mocks/tx_indexer.go
	modified:   state/txindex/null/null.go

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   rpc/core/tx.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

Since moving to faster blocks, Osmosis public RPC nodes have noticed massive RAM spikes, resulting in nodes constantly crashing: ![Screenshot 2024-04-20 at 11 25 36 AM](https://github.com/osmosis-labs/cometbft/assets/40078083/18d0513e-25fc-4510-b4bd-b48472a9df69) After heap profiling, the issue was clearly coming from TxSearch, showing that it was unmarshaling a huge amount of data. ![Screenshot 2024-04-20 at 11 28 29 AM](https://github.com/osmosis-labs/cometbft/assets/40078083/5d88a66a-c72d-4752-8770-a2c00e6d7669) After looking into the method, the issue is that txSearch retrieves all hashes (filtered by the query condition), but we call Get (and therefore unmarshal) every filtered transaction from the transaction index store, regaurdless whether or not the transactions are within the pagination request. Therefore, if one were to call txSearch on an event that happens on almost every transaction, this causes the node to unmarshal essentially every transaction. We have all the data we need in the key though to sort the transaction hashes without unmarshaling the transactions at all! This PR filters and sorts the hashes, paginates them, and then only retrieves the transactions that fall in the page being requested. We have run this patch on two of our RPC nodes, and have seen zero spikes on the patched ones thus far! ![Screenshot 2024-04-20 at 11 33 11 AM](https://github.com/osmosis-labs/cometbft/assets/40078083/fd815f81-5756-45bd-b1c0-818e6774ea53) - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#2855 done by [Mergify](https://mergify.com). --------- Co-authored-by: Adam Tucker <adam@osmosis.team> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>

Since moving to faster blocks, Osmosis public RPC nodes have noticed massive RAM spikes, resulting in nodes constantly crashing: ![Screenshot 2024-04-20 at 11 25 36 AM](https://github.com/osmosis-labs/cometbft/assets/40078083/18d0513e-25fc-4510-b4bd-b48472a9df69) After heap profiling, the issue was clearly coming from TxSearch, showing that it was unmarshaling a huge amount of data. ![Screenshot 2024-04-20 at 11 28 29 AM](https://github.com/osmosis-labs/cometbft/assets/40078083/5d88a66a-c72d-4752-8770-a2c00e6d7669) After looking into the method, the issue is that txSearch retrieves all hashes (filtered by the query condition), but we call Get (and therefore unmarshal) every filtered transaction from the transaction index store, regaurdless whether or not the transactions are within the pagination request. Therefore, if one were to call txSearch on an event that happens on almost every transaction, this causes the node to unmarshal essentially every transaction. We have all the data we need in the key though to sort the transaction hashes without unmarshaling the transactions at all! This PR filters and sorts the hashes, paginates them, and then only retrieves the transactions that fall in the page being requested. We have run this patch on two of our RPC nodes, and have seen zero spikes on the patched ones thus far! ![Screenshot 2024-04-20 at 11 33 11 AM](https://github.com/osmosis-labs/cometbft/assets/40078083/fd815f81-5756-45bd-b1c0-818e6774ea53) - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request #2855 done by [Mergify](https://mergify.com). --------- Co-authored-by: Adam Tucker <adam@osmosis.team> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>

See #2855 or #2910 for a detailed description --- #### PR checklist - ~[ ] Tests written/updated~ - ~[ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog)~ - ~[ ] Updated relevant documentation (`docs/` or `spec/`) and code comments~ - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Adam Tucker <adam@osmosis.team> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>

Since moving to faster blocks, Osmosis public RPC nodes have noticed massive RAM spikes, resulting in nodes constantly crashing: ![Screenshot 2024-04-20 at 11 25 36 AM](https://github.com/osmosis-labs/cometbft/assets/40078083/18d0513e-25fc-4510-b4bd-b48472a9df69) After heap profiling, the issue was clearly coming from TxSearch, showing that it was unmarshaling a huge amount of data. ![Screenshot 2024-04-20 at 11 28 29 AM](https://github.com/osmosis-labs/cometbft/assets/40078083/5d88a66a-c72d-4752-8770-a2c00e6d7669) After looking into the method, the issue is that txSearch retrieves all hashes (filtered by the query condition), but we call Get (and therefore unmarshal) every filtered transaction from the transaction index store, regaurdless whether or not the transactions are within the pagination request. Therefore, if one were to call txSearch on an event that happens on almost every transaction, this causes the node to unmarshal essentially every transaction. We have all the data we need in the key though to sort the transaction hashes without unmarshaling the transactions at all! This PR filters and sorts the hashes, paginates them, and then only retrieves the transactions that fall in the page being requested. We have run this patch on two of our RPC nodes, and have seen zero spikes on the patched ones thus far! ![Screenshot 2024-04-20 at 11 33 11 AM](https://github.com/osmosis-labs/cometbft/assets/40078083/fd815f81-5756-45bd-b1c0-818e6774ea53) - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#2855 done by [Mergify](https://mergify.com). --------- Co-authored-by: Adam Tucker <adam@osmosis.team> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>

mergify bot requested a review from a team as a code owner April 27, 2024 07:25

mergify bot requested a review from a team April 27, 2024 07:25

mergify bot added the conflicts label Apr 27, 2024

fix conflicts

074e6bd

melekes approved these changes Apr 27, 2024

View reviewed changes

melekes merged commit b2db3ed into v1.x Apr 27, 2024

melekes deleted the mergify/bp/v1.x/pr-2855 branch April 27, 2024 07:58

sergio-mena removed the conflicts label Apr 30, 2024

sergio-mena mentioned this pull request Jul 25, 2024

perf: TxSearch pagination (manual backport #2910) #3556

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: TxSearch pagination (backport #2855)#2910

perf: TxSearch pagination (backport #2855)#2910
melekes merged 2 commits intov1.xfrom
mergify/bp/v1.x/pr-2855

mergify bot commented Apr 27, 2024

Uh oh!

mergify bot commented Apr 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mergify bot commented Apr 27, 2024

PR checklist

Uh oh!

mergify bot commented Apr 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants