Allowing ABCI Queries while computing blocks

## Summary



Allow (at the very least) ABCI QuerySync to run in parallel with AppHash computation.

## Problem Definition



The mutex in `local_client.go` is preventing Cosmos SDK from replying to ABCI QuerySync requests while the chain's state machine is computing the AppHash.  This is bad for scalability for chains with high block compute utilisation whose RPC fullnodes also need to provide low-latency RPC queries.

### Agoric's conundrum

The [Agoric](https://agoric.com) Cosmos-SDK-based chain is intended to provide hardened JS smart contract services that expects to have high utilisation.  We have mechanisms for postponing and scheduling work for future blocks, but we do much more computation per block (say 8-10 seconds to compute the AppHash for the next block) than seems typical of other Cosmos chains.  Additionally, our chain is part of a distributed object system that will have many automated clients making RPC requests (whether submitting transactions or querying state to catch up more quickly than relying exclusively on events).

The aggressive serialisation of ABCI causes our RPC nodes to be unresponsive while computing their AppHashes. This leads to a "thundering herd" problem where clients making queries are blocked while the AppHash is being computed (for 8 seconds). There's a brief moment where voting happens and several (I count typically 25) RPC queries are served, until the cycle begins again and more queries pile up.  This results in a block or two that has high utilisation, followed by several empty blocks while the clients compete to read their state and send new transactions. 

### Cosmos evolution

In #6048 there was some discussion around lock contention and query performance.  There was even a question by @alexanderbez in https://github.com/tendermint/tendermint/issues/6048#issuecomment-784251474 why info and query need to be serialised at all.

Recently, IAVL has been improved and Cosmos SDK is removing its own global ABCI lock around its GRPC Query handlers cosmos/cosmos-sdk#10045.

I believe the rationale for having a single serialised lock (which was allegedly to aid the Cosmos SDK) is obsolete.

With some careful implementation and testing, much more fine-grained locking (if any) would make queries responsive and serviceable by some threads, even while other threads are fully loaded computing the AppHash.

## Proposal


We need a careful analysis of what the actual locking requirements are for the `local_node.go`.

At the very least, I've tested a naive proof-of-concept change to add a separate `queryMtx` that is independent from the existing `mtx` (**not** a RWLock, since that would still block during our AppHash calculation in EndBlock), and to use only the `queryMtx` for `QuerySync()` method.  That's enough to unblock the RPC nodes and free them up to do their jobs at every phase of the ABCI cycle.    I can achieve a steady 100 qps between `curl` and a single Agoric RPC node running on my laptop, even while EndBlock is busy for 8 seconds at a time.

Some consequences of that PoC are separate queries are still serialized.  Folks with deeper Tendermint and Cosmos knowledge need to help decide the best path forward.  I'm reluctant to propose anything except for this strawman because I lack that knowledge.

Thanks for Tendermint!

Attn: @dtribble, @JimLarson, @warner, @zmanian

____

#### For Admin Use

- [ ] Not duplicate issue
- [ ] Appropriate labels applied
- [ ] Appropriate contributors tagged
- [ ] Contributor assigned/self-assigned

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing ABCI Queries while computing blocks #6899

Summary

Problem Definition

Agoric's conundrum

Cosmos evolution

Proposal

For Admin Use

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allowing ABCI Queries while computing blocks #6899

Description

Summary

Problem Definition

Agoric's conundrum

Cosmos evolution

Proposal

For Admin Use

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions