Skip to content

Lite Client Spec#3796

Closed
tac0turtle wants to merge 2 commits intomasterfrom
zm_light_client_spec
Closed

Lite Client Spec#3796
tac0turtle wants to merge 2 commits intomasterfrom
zm_light_client_spec

Conversation

@tac0turtle
Copy link
Contributor

@tac0turtle tac0turtle commented Jul 14, 2019

Opening this PR to spur discussions here

Refs #1413

  • Referenced an issue explaining the need for the change
  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGELOG_PENDING.md

blockchain header, and further the corresponding Merkle proofs.
## Lite client requirements from Tendermint and Proof of Stake modules

Before explaining lite client mechanisms and operations we need to define some requirements expected from the Tendermint blockchain. Tendermint provides a deterministic, Byzantine fault-tolerant, source of time (called
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't think that BFT has a huge amount to with lite clients.

Weak Subjectivity is entirely in terms of clients local clock.

It is slightly relevant in an IBC context where BFT time is the local clock because the lite client is another blockchain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to constrain drifts between local time and BFT time to be able to come up with some guarantees. Otherwise (at least in theory) clocks can drift arbitrarily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we're going to remove BFT time, no? #2840

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan isn't to remove BFT time but to make it proposer based so the attack is 2f+1 instead of f+1 to accelerate time.

Copy link
Contributor

@cwgoes cwgoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented! I think this PR, #3710, and #3795 should be combined.


Before explaining lite client mechanisms and operations we need to define some requirements expected from the Tendermint blockchain. Tendermint provides a deterministic, Byzantine fault-tolerant, source of time (called
[BFT Time](/Users/zarkomilosevic/go-workspace/src/github.com/tendermint/tendermint/docs/spec/consensus/bft-time.md)).
BFT time is monotonically increasing and in case of at most 1/3 of voting power equivalent of faulty validators guaranteed to be close to the wall time of correct validators. For correct functioning of lite client we need a guarantee that BFT time does not drift more than some known parameter BFT_TIME_DRIFT_BOUND (that should normally be measured in hours, maybe even days) from client wall time. Note that this requirement currently only holds in case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "close"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is made more concrete below.

BFT time is monotonically increasing and in case of at most 1/3 of voting power equivalent of faulty validators guaranteed to be close to the wall time of correct validators. For correct functioning of lite client we need a guarantee that BFT time does not drift more than some known parameter BFT_TIME_DRIFT_BOUND (that should normally be measured in hours, maybe even days) from client wall time. Note that this requirement currently only holds in case
at most 1/3 of voting power equivalent of validators report wrong time, but we might need to strengthen this requirement further to also be able to tolerate time-related misbehavior of more than 1/3 voting power equivalent of validators (https://github.com/tendermint/tendermint/issues/2653, https://github.com/tendermint/tendermint/issues/2840).

Furthermore, lite client security is tightly coupled with the notion of UNBONDING_PERIOD that is at the core of the security of proof of stake blockchain systems (for example Cosmos Hub). UNBONDING_PERIOD is period of time that needs to pass from the withdraw event until stake is liquid. During this period unbonded validator cannot participate in the consensus protocol (and is therefore not rewarded) but can be slashed for misbehavior (done either before withdraw event or during UNBONDING_PERIOD). This is used to protect against a validator attacking the network and then immediately withdrawing his stake. Cosmos Hub is currently enforcing a 21-day UNBONDING_PERIOD. Note that UNBONDING_PERIOD is measured with respect to BFT time and that this has significant effect on the security of lite client operation as validators are not slashable outside their UNBONDING_PERIOD. There is a hidden implicit assumptions regarding the UNBONDING_PERIOD: we assume that Tendermint will always generate blocks within duration of UNBONDING_PERIOD. If chain halts for the duration of UNBONDING_PERIOD security of lite clients are jeopardized. Probably more secure solution would be defining UNBONDING_PERIOD as a hybrid of wall time and logical time (number of block heights). In that case UNBONDING_PERIOD is over when the both conditions are true. In that case no assumption is being made on the chain progress (which is in theory hard to make as Tendermint operate in partially synchronous system model), and system is secure (including lite clients) even in case of long halts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The proof-of-stake system is defined in the Cosmos SDK, it's not (necessarily) particular to the Cosmos Hub.
  • Validators are unbonding during the unbonding period.
  • We also assume synchrony of evidence submission.

obtain `ResultValidators` that contains validators that has committed the block h. Then we check if MerkleRoot
of the validator set is equal to the trusted validator set hash. If verification failed, initialization exits with error, otherwise it proceeds.

Next step is determining if the block at hight h is correctly signed by the obtained validator set. This is achieved by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Next step is determining if the block at hight h is correctly signed by the obtained validator set. This is achieved by
Next step is determining if the block at height h is correctly signed by the obtained validator set. This is achieved by

Tendermint RPC:
`header.Time + UNBONDING_PERIOD <= Now() - BFT_TIME_DRIFT_BOUND`.

Note that outside this time window lite client cannot trust validator set as validators could potentially unbonded its stake so security of the lite client does not hold as they are not slashable for its actions. Therefore, they can eclipse client and cheat about the system state without risk of being punished for such misbehavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that outside this time window lite client cannot trust validator set as validators could potentially unbonded its stake so security of the lite client does not hold as they are not slashable for its actions. Therefore, they can eclipse client and cheat about the system state without risk of being punished for such misbehavior.
Note that outside this time window lite client cannot trust validator set as validators could potentially have unbonded their stake so security of the lite client does not hold as they are not slashable for its actions. Therefore, they can eclipse client and cheat about the system state without risk of being punished for such misbehavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what eclipsing and the unbonding period have to do with one another. If the light client is eclipsed for the full duration of the unbonding period it doesn't matter whether the validators are unbonding or not, since the light client can't submit evidence anyways. Outside the unbonding period, it doesn't matter if the client is eclipsed since the state machine will reject the evidence even if it is submitted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume here that being eclipsed is not permanent state of things as at any point in time you can decide to change full node you are connected to or decide to connect to new full node. In that sense there is a difference whether you are operating within unbonding period as you still have a chance of connected to correct node, detecting fork (for example) and submit evidence, compared to the case where you are outside unbonding period where there is no help if you are cheated.

Note that outside this time window lite client cannot trust validator set as validators could potentially unbonded its stake so security of the lite client does not hold as they are not slashable for its actions. Therefore, they can eclipse client and cheat about the system state without risk of being punished for such misbehavior.

Note that this formula shows a fundamental dependence of lite client security on the wall time. If UNBONDING_PERIOD
would be defined only in terms of logical time (block heights), lite client will not have a way to know if trusted validator set is still withing its UNBONDING_PERIOD as it does not have a way of reliably determining top of the chain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this! (define unbonding period as a minimum of time passed and of blocks passed)

It's not very hard at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the same. I think complexity is significantly added and we increase security of lite client significantly.


Note that this formula shows a fundamental dependence of lite client security on the wall time. If UNBONDING_PERIOD
would be defined only in terms of logical time (block heights), lite client will not have a way to know if trusted validator set is still withing its UNBONDING_PERIOD as it does not have a way of reliably determining top of the chain.
Therefore, it seems that having BFT time in sync with standard notions of time (for example NTP) is necessarily for correct operations of the system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Therefore, it seems that having BFT time in sync with standard notions of time (for example NTP) is necessarily for correct operations of the system.
Therefore, it seems that having BFT time in sync with standard notions of time (for example NTP) is necessary for correct operations of the system.

(and see above comment)

would be defined only in terms of logical time (block heights), lite client will not have a way to know if trusted validator set is still withing its UNBONDING_PERIOD as it does not have a way of reliably determining top of the chain.
Therefore, it seems that having BFT time in sync with standard notions of time (for example NTP) is necessarily for correct operations of the system.

Lite client security depends also on the guarantee that faulty validator behavior will be punished. Therefore if a client detect faulty behavior we need to guarantee that proof of misbehavior evidence transaction will be committed within UNBONDING_PERIOD of faulty validators so it can be slashed. This can be achieved by having client considering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Lite client security depends also on the guarantee that faulty validator behavior will be punished. Therefore if a client detect faulty behavior we need to guarantee that proof of misbehavior evidence transaction will be committed within UNBONDING_PERIOD of faulty validators so it can be slashed. This can be achieved by having client considering
Lite client security depends also on the guarantee that faulty validator behavior will be punished. Therefore if a client detects faulty behavior we need to guarantee that proof of misbehavior evidence transaction will be committed within UNBONDING_PERIOD of faulty validators so it can be slashed. This can be achieved by having client considering

validator set sequence number and the validator set init time.
The core of the light client logic is captured by the VerifyAndUpdate function that is used to 1) verify if the given header is valid,
and 2) update the validator set (when the given header is valid and it is more recent than the seen headers).
To be able to validate a Merkle proof, a light client needs to validate the blockchain header that contains the root app hash.Validating a blockchain header in Tendermint consists in verifying that the header is committed (signed) by >2/3 of the voting power of the corresponding validator set. As the validator set is a dynamic set (it is changing), one of the core functionality of the lite client is updating the current validator set, that is then used to verify the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To be able to validate a Merkle proof, a light client needs to validate the blockchain header that contains the root app hash.Validating a blockchain header in Tendermint consists in verifying that the header is committed (signed) by >2/3 of the voting power of the corresponding validator set. As the validator set is a dynamic set (it is changing), one of the core functionality of the lite client is updating the current validator set, that is then used to verify the
To be able to validate a Merkle proof, a light client needs to validate the blockchain header that contains the root app hash. Validating a blockchain header in Tendermint consists in verifying that the header is committed (signed) by >2/3 of the voting power of the corresponding validator set. As the validator set is a dynamic set (it is changing), one of the core functionalities of the lite client is updating the current validator set, which is then used to verify the

The core of the light client logic is captured by the VerifyAndUpdate function that is used to 1) verify if the given header is valid,
and 2) update the validator set (when the given header is valid and it is more recent than the seen headers).
To be able to validate a Merkle proof, a light client needs to validate the blockchain header that contains the root app hash.Validating a blockchain header in Tendermint consists in verifying that the header is committed (signed) by >2/3 of the voting power of the corresponding validator set. As the validator set is a dynamic set (it is changing), one of the core functionality of the lite client is updating the current validator set, that is then used to verify the
blockchain header, and further the corresponding Merkle proofs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
blockchain header, and further the corresponding Merkle proofs.
blockchain header, and subsequently the corresponding Merkle proofs.

i.e., that it will always be used to verify more recent headers. In case a light client needs to be used to verify older
headers (go backward) the same mechanisms and similar logic can be used. In case a call to the FullNode or subsequent
checks fail, a light client need to implement some recovery strategy, for example connecting to other FullNode.
In case a call to the FullNode or subsequent checks fail, a light client need to implement some recovery strategy, for example connecting to other FullNode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The light client should always try to use a randomized load balancing strategy, considering the possibility of malevolent eclipse attacks (or just innocuous but inconvenient stale full node data).

@tac0turtle
Copy link
Contributor Author

Closing this PR and #3710, as @milosevic & @josef-widder will open a new pr with these two documents combined.

@tac0turtle tac0turtle closed this Jul 22, 2019
@tac0turtle tac0turtle mentioned this pull request Jul 22, 2019
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants