Skip to content

fix(state): TxIndexer no longer leaks its database's goroutines.#4642

Merged
alesforz merged 8 commits intomainfrom
alesforz/close-txindexer
Dec 11, 2024
Merged

fix(state): TxIndexer no longer leaks its database's goroutines.#4642
alesforz merged 8 commits intomainfrom
alesforz/close-txindexer

Conversation

@alesforz
Copy link
Collaborator

@alesforz alesforz commented Dec 10, 2024

Context

The TxIndexer interface defines how to index and search transactions. Its implementers need to interact with a database, which callers are typically expected to close when done. However, TxIndexer does not provide a Close method. This prevents closing the database used by the transaction indexer, causing goroutines associated with that database to leak.

Two tests, TestInspectConstructor and TestInspectRun, show this problem indirectly. These tests check whether an Inspector leaks goroutines. The Inspector sets up three databases: a block database, a state database, and a transaction indexer database. It closes only the block and state databases because it cannot close the transaction indexer’s database (because of the missing Close method). Therefore, the transaction indexer database's goroutines leak, and the two tests above detect that.

In main, both tests pass because they use a MemDB, an in-memory store that does not spawn goroutines. However, in the work to remove cometbft-db, the tests switch to a “real” database that does spawn goroutines. Without a proper Close method in TxIndexer, these goroutines remain active, causing TestInspectConstructor and TestInspectRun to fail.

This Change

This PR adds a Close method to the TxIndexer interface and updates the existing implementation to properly close the underlying database. It also modifies the code to call TxIndexer.Close where needed, ensuring that database resources are released and preventing goroutine leaks.


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments

Alessandro Sforzin added 3 commits December 10, 2024 13:18
This method is necessary to close the underlying database that a TxIndexer uses.
If not called, the TxIndexer will leak the goroutines associated with the database.
They now implement the `Close` method to close their database.
@alesforz alesforz changed the title fix(store): TxIndexer no longer leaks its database's goroutines. fix(state): TxIndexer no longer leaks its database's goroutines. Dec 10, 2024
@alesforz alesforz self-assigned this Dec 10, 2024
@alesforz alesforz added state breaking A breaking change P:tech-debt Priority: Technical debt that needs to be paid off to enable us to move faster, reliably labels Dec 10, 2024
@alesforz alesforz marked this pull request as ready for review December 10, 2024 15:37
@alesforz alesforz requested a review from a team as a code owner December 10, 2024 15:37
@alesforz alesforz requested a review from a team December 10, 2024 15:37
Copy link
Collaborator

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch @alesforz 👍

My only suggestion is to amend the documentation slightly to reflect the fact that we're assuming there's an underlying database used, and Close will close it + no calls are allowed after.

@alesforz alesforz added this pull request to the merge queue Dec 11, 2024
Merged via the queue into main with commit 27b1014 Dec 11, 2024
@alesforz alesforz deleted the alesforz/close-txindexer branch December 11, 2024 13:17
alesforz pushed a commit that referenced this pull request Dec 11, 2024
The test unfolds too quickly for the node's indexer to finish its operations.
Therefore, the goroutine below that calls n.Stop() will close the indexer's
database right when the indexer is trying to write to it at line
state/txindex/indexer_service.go:109
thus causing a panic.  This small sleep allows the indexer to finish its
write operation before the node is shut down.
Unfortunately, there is no graceful shutdown available for the database layer,
i.e., make the database wait for all the active connections before closing.

The test does not fail in `main` because the code in `main` does not close
the indexer's database when shutting down a node (which causes its goroutines
to leak. See #4642).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking A breaking change P:tech-debt Priority: Technical debt that needs to be paid off to enable us to move faster, reliably state

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants