Skip to content

[blockchain] v2-riri Scheduler implementation#3821

Closed
brapse wants to merge 70 commits intotendermint:masterfrom
brapse:brapse/blockchain-v2-riri-implementation
Closed

[blockchain] v2-riri Scheduler implementation#3821
brapse wants to merge 70 commits intotendermint:masterfrom
brapse:brapse/blockchain-v2-riri-implementation

Conversation

@brapse
Copy link
Contributor

@brapse brapse commented Jul 19, 2019

The blockchain reactor is responsible for fetching and applying blocks from peers in order to catch up a node. #3777 specifies a new architecture architecture which includes a scheduler component which:

  • Decouples scheduling from IO
  • strictly deterministic business logic
  • Uses explicit time instead of timers

This PR is one such scheduler modeled as a block and peer wise finite state machine. Blocks transition from:
unknown -> new -> pending -> received -> processed
and peers transition from:
new -> ready -> removed
based on a synchronous methods.

  • Referenced an issue explaining the need for the change
  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGELOG_PENDING.md

Anca Zamfir added 30 commits March 7, 2019 18:39
added block requests under peer

moved the request trigger in the reactor poolRoutine, triggered now by a ticker

in general moved everything required for making block requests smarter in the poolRoutine

added a simple map of heights to keep track of what will need to be requested next

added a few more tests
send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests
If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.

While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.
@brapse brapse requested a review from ancazamfir July 19, 2019 15:18
return Skip{}
}

type scPrunePeerEv struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U1000: type scPrunePeerEv is unused (from unused)

reason error
}

func (sdr *Scheduler) handleBlockProcessError(peerID p2p.ID, height int64) Event {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U1000: func (*Scheduler).handleBlockProcessError is unused (from unused)

return Skip{}
}

type scSchedulerFailure struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U1000: type scSchedulerFailure is unused (from unused)

reason error
}

func (sdr *Scheduler) handleTimeCheck(now time.Time) Events {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U1000: func (*Scheduler).handleTimeCheck is unused (from unused)

Copy link
Contributor

@ancazamfir ancazamfir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on scheduler.go. Will do the tests later.

return &sc
}

func (sc *schedule) addPeer(peerID p2p.ID) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should addPeer have a height parameter? What triggers a peer addition the way it is right now?
In the older version it is the first StatusResponseMessage that creates a new peer with a height.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would most likely be pass passed an event form reactor.AddPeer


func (sc *schedule) addPeer(peerID p2p.ID) error {
if _, ok := sc.peers[peerID]; ok {
return fmt.Errorf("Cannot add duplicate peer %s", peerID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should a check for removed state be added here? What happens if a peer was removed and later re-added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question! what does @milosevic think?


peer.height = height
peer.state = peerStateReady
for i := sc.minHeight(); i <= height; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asssume we start at height 1 and we have a peer that declares a very high height. This loop may take a long time and also add a lot of entries to blockStates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this on the surface seems pretty inefficient. The goal is to ensure correctness first and then look at performance it once the spec is somehow well defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not only a performance factor but also a safety/ security concern. If a "bad" node sends a huge height, setPeerHeight() will hog the CPU potentially forever. Try for example TestPeerHeight() with a height of 2^40 and observe blockchain.test process CPU and memory usage.

}

func (sc *schedule) allBlocksProcessed() bool {
for _, state := range sc.blockStates {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is called every time a block is processed. The range loop is expensive given the amount of entries in blockStates map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be called at some interval. The same comment as above, let's make sure it's correct and then profile to figure out where best to optimize .

return schedulerErrorEv{peerID, err}
}

return scBlockReceivedEv{peerID}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who receives this event and wouldn't the height be needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Yes this will be sent to the processor and require the height, blockData, etc.

@brapse brapse marked this pull request as ready for review July 30, 2019 09:55
@brapse brapse requested review from ebuchman, melekes and xla as code owners July 30, 2019 09:55
@brapse brapse changed the base branch from ancaz/blockchain_reactor_reorg to master July 30, 2019 15:30
@brapse
Copy link
Contributor Author

brapse commented Jul 30, 2019

This PR was replaced by #3848

@brapse brapse closed this Jul 30, 2019
cboh4 pushed a commit to scrtlabs/tendermint that referenced this pull request Apr 7, 2025
fork contains updated dependecies

<!--

Please add a reference to the issue that this PR addresses and indicate
which
files are most critical to review. If it fully addresses a particular
issue,
please include "Closes #XXX" (where "XXX" is the issue number).

If this PR is non-trivial/large/complex, please ensure that you have
either
created an issue that the team's had a chance to respond to, or had some
discussion with the team prior to submitting substantial pull requests.
The team
can be reached via GitHub Discussions or the Cosmos Network Discord
server in
the #cometbft channel. GitHub Discussions is preferred over Discord as
it
allows us to keep track of conversations topically.
https://github.com/cometbft/cometbft/discussions

If the work in this PR is not aligned with the team's current
priorities, please
be advised that it may take some time before it is merged - especially
if it has
not yet been discussed with the team.

See the project board for the team's current priorities:
https://github.com/orgs/cometbft/projects/1

-->

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants