Update specification of light client algorithm to align with the code#61
Update specification of light client algorithm to align with the code#61
Conversation
milosevic
commented
Nov 15, 2019
- Fix timing issues by introducing Delta parameter
bb6b481 to
3fbdc58
Compare
- Fix timing issues by introducing Delta parameter
3fbdc58 to
a4b68ec
Compare
ancazamfir
left a comment
There was a problem hiding this comment.
Did a first pass. Added some comments. In general I think we could simplify more.
ancazamfir
left a comment
There was a problem hiding this comment.
I added some comment mainly to help with readability. For the purpose of this document I think having (bool, err) as return in CheckSupport() makes the pseudocode harder to read.
Co-Authored-By: Anca Zamfir <ancazamfir@users.noreply.github.com>
e17582c to
2306108
Compare
4d60a43 to
069906a
Compare
…ent_algo # Conflicts: # spec/consensus/light-client.md
ancazamfir
left a comment
There was a problem hiding this comment.
Lots of good changes! Left a few comments.
60303ef to
74d6c04
Compare
ebuchman
left a comment
There was a problem hiding this comment.
A few concerns from reviewing the latest implementation update (cometbft/tendermint-rs#100 (review))
spec/consensus/light-client.md
Outdated
| Store.Add(h2) | ||
| return nil | ||
| } | ||
| if err != ErrTooMuchChange return err |
There was a problem hiding this comment.
If we do this here, doesn't it mean we never start bisecting?
There was a problem hiding this comment.
We are in line 359 with err != nil and, if err != ErrTooMuchChange, the only possible values for err (looking at CheckSupport()) are:
ErrHeaderNotWithinTrustedPeriodErrInvalidAdjacentHeaders
They are both indicators that current bisection has failed, further recursive calls should not be done and therefore the call stack will unwind propagating this error to the final caller (VerifyHeader()).
I mentioned this in an earlier comment, it is better to write 359 and others like:
if fatalCheckSupportError(err) return err
Also, potential future error types added to CheckSupport() would be easier to implement without changing all the callers.
There was a problem hiding this comment.
Alternatively, we could use sth else than an error to indicate that there was too much change in the validator set (going from h1 to h2). Returning a bool for instance. Or simply renaming the method name to indicate the only relevant case directly: ValidatorsChangedTooMuch (or EnoughIntersection? I'm sure there are better names).
FWIW, linking the related discussion regarding the rust implementation: https://github.com/interchainio/tendermint-rs/pull/100/files#r360502272
spec/consensus/light-client.md
Outdated
|
|
||
| _Verification Condition:_ We may need a Tendermint invariant stating that if _h2.Header.height = h1.Header.height + 1_ then _signers(h2.Commit) \subseteq h1.Header.NextV_. | ||
| h2 := Commit(height) | ||
| if !verify(h2) { return ErrInvalidHeader(h2) } |
There was a problem hiding this comment.
Shouldn't we have a flow where we CheckSupport before we verify ?
spec/consensus/light-client.md
Outdated
| if CheckSupport(h1,h2,trustlevel) { | ||
| return true | ||
| if isWithinTrustedPeriod(h2) { | ||
| Store.add(h2) |
There was a problem hiding this comment.
Isn't this redundant? CanTrust already has stored h2 here. At least in the bisection case because:
func CanTrustBisection(h1,h2,trustThreshold) error {
assume h1.Header.Height < h2.header.Height
err = CheckSupport(h1,h2,trustThreshold)
if err == nil {
Store.Add(h2)
return nil
}
if err != ErrTooMuchChange return err
pivot := (h1.Header.height + h2.Header.height) / 2
hp := Commit(pivot)
if !verify(hp) return ErrInvalidHeader(hp)
err = CanTrustBisection(h1,hp,trustThreshold)
if err == nil {
Store.Add(hp)
err2 = CanTrustBisection(hp,h2,trustThreshold)
if err2 == nil {
Store.Add(h2)
return nil
}
return err2
}
return err
}There was a problem hiding this comment.
Bisection needs to store headers in between (the pivot headers). But maybe we leave it to the caller to actually deal with the last header h2?
I think the name of this method (CanTrust / CanTrustBisection) should actually indicate that it is updating the store and not just answering the question "Can we trust this header based on h1".
spec/consensus/light-client.md
Outdated
| func Bisection(h1,h2,trustlevel) bool{ | ||
| if CheckSupport(h1,h2,trustlevel) { | ||
| return true | ||
| if isWithinTrustedPeriod(h2) { |
There was a problem hiding this comment.
Why do we do this twice? When we fetch the commit / signed header above and then again here? Shouldn't we assume that we are still within the trusted period here?
h2 := Commit(height)
if !verify(h2) { return ErrInvalidHeader(h2) }
if !isWithinTrustedPeriod(h2) { return ErrHeaderNotWithinTrustedPeriod(h2) }
// ...
if isWithinTrustedPeriod(h2) { /*...*/ }If isWithinTrustedPeriod was false, we would have already returned with an error.
There was a problem hiding this comment.
I think it's because of #57 but I agree it doesn't really make sense, especially if time is going to be passed in
spec/consensus/light-client.md
Outdated
| err = CanTrust(trusted_h, untrusted_h, trustThreshold) // or CanTrustBisection((trusted_h, untrusted_h, trustThreshold) | ||
| if err != nil { return err } | ||
|
|
||
| if isWithinTrustedPeriod(untrusted_h) { |
There was a problem hiding this comment.
If this re-checking is necessary, I think it needs an explanation (see #61 (comment)).
spec/consensus/light-client.md
Outdated
| if err != nil { return err } | ||
|
|
||
| if isWithinTrustedPeriod(untrusted_h) { | ||
| Store.add(untrusted_h) |
There was a problem hiding this comment.
This seems redundant, at least in the CanTrustBisection case untrusted_h was already stored.
|
I think I may have a simpler structure - please see cometbft/tendermint-rs#114 |
spec/consensus/light-client.md
Outdated
| while `Header.Time` corresponds to the [BFT time](bft-time.md). In this note, we assume that clocks of correct processes | ||
| are synchronized (for example using NTP), and therefore there is bounded clock drift (CLOCK_DRIFT) between local clocks and | ||
| BFT time. More precisely, for every correct process p and every header (correctly generated by the Tendermint consensus) | ||
| time (BFT time) the following inequality holds: `Header.Time < now + CLOCK_DRIFT`. |
There was a problem hiding this comment.
Do we want to include the lower bound here? Or at least mention why we don't care about it ? It might also be helpful to clarify that we mostly need this to hold for the light client's local clock, not just the validators ...
There was a problem hiding this comment.
I am not sure what would be the lower bound? Not sure if we can say anything more precise than genesis time, but not sure how this is useful. My understanding is that upper bound ensures that we don't consider headers that are outside the assumption that lite client clock is in sync with blockchain time; as time progresses, not sure what we can say about the past.
There was a problem hiding this comment.
Ah, but it's currently written from the perspective of validators. For the validators, we want a lower bound (I think?). For the light client, we only need the upper bound.
There was a problem hiding this comment.
It is not clear to me when this inequality should hold. I guess when the header is generated? Also, there are two moving parts in the definition, i.e., "Header.Time" and "now", so it is not clear who to blame when the inequality is violated. I guess our assumption is that "Header.Time" is always correct (by definition; it serves as a time reference for the system), and that violation of the inequality means that the Lite Client is faulty. IOW it is in the responsibility of the lite client to keep its clock synchronized.
There was a problem hiding this comment.
The inequality holds from the moment header is generated. I was assuming that header is always correct, i.e., that header is coming from the main chain that is not forked. We need to make this assumption more clear. As we also assume that lite client processes are synchronised with respect to BFT time, header that does not satisfy this inequality must come from a faulty full node. We can think about trying to weaken this assumption to eventually hold. I think that it would be useful trying to more precisely understand attack vectors in case local lite client clock drifts more than clockDrift. Ideally, safety should not be violated in case this assumption does not hold, but only termination should temporary be violated.
There was a problem hiding this comment.
I agree. We rely on bfttime for the failure model and the checks heavily. So if the lite clients clock is off too far, everything might be lost. The question is, whether within the Bisection we should/can check whether the lite client's clock is synchronized (and what timing assumptions this would impose), or whether safety has to rely on the synchronization of the lite client's clock.
spec/consensus/light-client.md
Outdated
| The function _Bisection_ checks whether to trust header _h2_ based on the trusted header _h1_. It does so by calling | ||
| the function _CheckSupport_ in the process of | ||
| bisection/recursion. _CheckSupport_ implements the trusted period method and, for two adjacent headers (in term of heights), it checks uninterrupted sequence of proof. | ||
| now = System.Time() |
There was a problem hiding this comment.
Do we really want the function to have access to the system clock ?
There was a problem hiding this comment.
The idea of this function is to illustrate how to correctly use VerifyBisection function. In case we don't check if we are still within trusted period of initial trusted state after VerifyBisection is executed, we can't give precise guarantees to the user. I don't know how to avoid this. As most of complexity is happening within VerifyBisection this shouldn't be a big problem from the testing perspective.
liamsi
left a comment
There was a problem hiding this comment.
I agree, we should merge this!
Move light specs to their own dir, add readme and diagram
22af52f to
026fdde
Compare