Skip to content

WIP: blockchain/reactor: respondWithNoResponseMessage for missing height#648

Closed
odeke-em wants to merge 1 commit intotendermint:developfrom
orijtech:react-to-no-block-found
Closed

WIP: blockchain/reactor: respondWithNoResponseMessage for missing height#648
odeke-em wants to merge 1 commit intotendermint:developfrom
orijtech:react-to-no-block-found

Conversation

@odeke-em
Copy link
Contributor

@odeke-em odeke-em commented Sep 7, 2017

Fixes #514
Replaces #540

If a peer requests a block with a height that we don't have
respond with a bcNoBlockResponseMessage.
However, according to the Tendermint spec, if all nodes are honest
this condition shouldn't occur, so this is a possible hint of an
dishonest node.

@odeke-em odeke-em requested a review from ebuchman September 7, 2017 10:17
@odeke-em odeke-em changed the base branch from master to develop September 7, 2017 10:17
@odeke-em
Copy link
Contributor Author

odeke-em commented Sep 7, 2017

@ebuchman PTAL at this work in progress, it's super tricky trying to test this out because setting up the reactor itself is quite involved, the only part that's left is to figure out how to add the blocks in despite the requester being unset at the *BlockchainReactor.pool(*blockPool).requests.

Am off to bed for now but I'd appreciate some feedback and will take a look at it. Thank you!

Copy link
Contributor

@ebuchman ebuchman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks fine (other than 2 comments). The testing is obviously quite difficult given the current structure of reactors. It can be simplified a bunch as is, but to do it right we probably need a peer interface.

// Otherwise peer is asking for things we don't have.
// According to the Tendermint spec, if the peer asks for
// something that we don't have have, some of the nodes aren't honest
bcR.Logger.Debug("Respond", "src", src, "flagging", "potentially malicious peer",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify this message. Should be Info level and msg can just be Peer asking for block we don't have. Tags can just be src and height

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger that, thanks. I'll update that.

return cmn.Fmt("[bcBlockRequestMessage %v]", m.Height)
}

type bcNoBlockResponseMessage struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be registered with go-wire. See call to RegisterInterface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger that, updating, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thank you!

"github.com/tendermint/tmlibs/log"
)

type bcBlockRequestMessage struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this ?

fmt.Printf("% x\n", buf.Bytes())
}

func newBlockchainReactor(logger log.Logger, blockNumbers ...int) (*blockchain.BlockchainReactor, *sm.State) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need such complex setup. See how we setup consensus tests for reference.

We make whole networks in process here: https://github.com/tendermint/tendermint/blob/master/consensus/common_test.go#L357

And just the state machine here: https://github.com/tendermint/tendermint/blob/master/consensus/common_test.go#L232

We can hard code an inproc dummy app. No need for handshaker. No need for tx indexing. No need to log if we're a validator. Fast sync should be on. etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you @ebuchman for the reference. Yeah, it's mostly out of lack of familiarity, and it yeah I was also telling myself that y'all def wrote simpler test setups, but TIL!

}

tests := [...]struct {
heightBytes []byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a byte array ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a byte array, preserialized by go-wire so that we can pass them directly in *BlockchainReactor.Receive(byte, *p2p.Peer, []byte)

for i, tt := range tests {
msgBytes := append([]byte{0x10, 0x02}, tt.heightBytes...)
bcr.Receive(chID, peer, msgBytes)
gotErr := bytes.Contains(logBuf.Bytes(), nonExistentSubstr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh boy. This is unfortunate. This is why I opened #631.

}
} else {
// TODO peer is asking for things we don't have.
if queued := bcR.respondToPeer(msg, src); !queued {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move this out into its own function ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed quite noisy in there, plus the extra logic for responding to non-existent blocks would have crowded it more.

@ebuchman
Copy link
Contributor

This should be easier to fix up the tests for now that #653 (peer interfaces) is merged.

// According to the Tendermint spec, if all nodes are honest, no node should
// be requesting for a block that's non-existent.
func (bcR *BlockchainReactor) respondToPeer(msg *bcBlockRequestMessage, src *p2p.Peer) (queued bool) {
// Got a request for a block. Respond with block if we have it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is unnecessary. the code looks simple enough

return src.TrySend(BlockchainChannel, struct{ BlockchainMessage }{msg})
}

// Otherwise peer is asking for things we don't have.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no need to duplicate the log message

Otherwise peer is asking for things we don't have.
Peer asking for a block we don't have
According to the Tendermint spec,

do we need to focus on this? maybe we can skip this part? i.e. just if the peer asks ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed it entirely, thanks.

} else {
// TODO peer is asking for things we don't have.
if queued := bcR.respondToPeer(msg, src); !queued {
// Unfortunately not queued since the queue is full.
Copy link
Contributor

@melekes melekes Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this emotional touch :) haha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heheh

Fixes #514
Replaces #540

If a peer requests a block with a height that we don't have
respond with a bcNoBlockResponseMessage.
However, according to the Tendermint spec, if all nodes are honest
this condition shouldn't occur, so this is a possible hint of an
dishonest node.
@odeke-em
Copy link
Contributor Author

Alright, I've updated the code, thanks for the initial reviews @ebuchman and @melekes, PTAL.

@codecov-io
Copy link

codecov-io commented Sep 19, 2017

Codecov Report

Merging #648 into develop will increase coverage by 1.49%.
The diff coverage is 75%.

@@             Coverage Diff             @@
##           develop     #648      +/-   ##
===========================================
+ Coverage    55.66%   57.15%   +1.49%     
===========================================
  Files           80       79       -1     
  Lines         8356     8306      -50     
===========================================
+ Hits          4651     4747      +96     
+ Misses        3317     3156     -161     
- Partials       388      403      +15

@ebuchman
Copy link
Contributor

push to next non-breaking release

type bcrTestPeer struct {
key string
ch chan *keyValue
kvm map[byte]interface{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not seeing where it is being used...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks for this catch. It is a vestige from the previous round of review.

// Currently the repsonses take asynchronous paths so
// the results are skewed. However, we can ensure that
// the (expectedHeight, expectedExistence) all tally up.
heightTally := map[int]bool{}
Copy link
Contributor

@melekes melekes Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you help me understand why we need this tally stuff and the code below?

can't we just test that if we call Receive(bcBlockRequestMessage{-100}) we will get bsNoBlockResponseMessage?

(p testPeer) TrySend() {
  map[channel] = message
}

bcr.Receive(bcBlockRequestMessage{-100})
assert.Equal(p.map[blockchainChannel], bcNoBlockResponse)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, so in the tests we use different heights https://github.com/tendermint/tendermint/pull/648/files#diff-4b45e760192d8a11e9b4e278a8c8ea32R72.
Responses are sent with different heights echoed back, the channels might be the same for the same type of response, but the heights vary thus for different heights we strictly expect a response back. Does that make sense?

@ebuchman
Copy link
Contributor

ebuchman commented Oct 4, 2017

Ok - I have dramatically simplified the test code. @odeke-em please tell me if I'm failing to test something there that you were testing here. #714

The test in this PR was hard to read. It may have been correct, but too difficult to follow. We need to try to pare down our tests to be as simple as possible so it's easy to understand what's going on. If that means we have to refactor an entire package (as we did for this), so be it. And even then, we need to continue to ensure the tests are simple :)

@ebuchman ebuchman closed this Oct 4, 2017
@odeke-em
Copy link
Contributor Author

odeke-em commented Oct 4, 2017

Thanks for the simplification @ebuchman. I agree that we should aim for tests that are simple and easy to follow.
The problem though is that if we want to be thorough enough, IMHO we have to bite the bullet.
The latest simplification just checks that a block was received back, but it doesn't check that the height that was sent back was the one for the test. However, the responses are sent out of order which is what I commented here https://github.com/tendermint/tendermint/pull/648/files#diff-4b45e760192d8a11e9b4e278a8c8ea32R94
and that's why there was the complexity of tallying up heights with the respective responses and at the end ensuring that no other response came out https://github.com/tendermint/tendermint/pull/648/files#diff-4b45e760192d8a11e9b4e278a8c8ea32R143. The current test runs just len(test) times but doesn't check that we have no rogue responses out and also the heights most definitely don't match with those at i, tt, you can confirm this by printing out the responding height along with the wanted height.

@ebuchman
Copy link
Contributor

ebuchman commented Oct 5, 2017

Hmm, so since your comment I force pushed a change to check the height.

you can confirm this by printing out the responding height along with the wanted height.

This seems to contradict what's now implemented, where we explicitly check the height in the response message, and it matches the tt.height. Can you check again please? Am I still missing something?

the responses are sent out of order which is what I commented here

I don't understand this part. How so ? We are calling Receive synchronously in a loop, which calls TrySend, which just loads the msg onto a buffered channel, and then we read it off immediately after. There are no go routines. How could the messages come out of order?

The current test runs just len(test) times but doesn't check that we have no rogue responses

Valid point. We can add a final select statement to pull on the peers internal channel and ensure theres no rogue messages.

@odeke-em
Copy link
Contributor Author

odeke-em commented Oct 5, 2017

@ebuchman

  This seems to contradict what's now implemented, where we explicitly check the height in the 
  response message, and it matches the tt.height. Can you check again please? Am I still missing 
  something?

I don't understand this part. How so ? We are calling Receive synchronously in a loop, which calls TrySend, which just loads the msg onto a buffered channel, and then we read it off immediately after. There are no go routines. How could the messages come out of order?

Interesting, I think I caused the confusion myself, my apologies. So when adding blocks, I'd get blockStatusResponse messages which skewed the results. However, I compensated for that with https://github.com/tendermint/tendermint/pull/648/files#diff-4b45e760192d8a11e9b4e278a8c8ea32R198 and never simplified the tally code, of which this you did for me, thank you!

In short, thank you for the PR, I hadn't thought through it to propagate the simplifications I previously encountered; my last comments were outdated and the PR update that you made fixed it. LGTM!

@odeke-em odeke-em deleted the react-to-no-block-found branch May 11, 2021 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants