Skip to content

Conversation

@TheBlueMatt
Copy link
Contributor

@TheBlueMatt TheBlueMatt commented Jul 2, 2019

Built on #16323, this takes one next step on the first future-work bullet point listed there - we now do not immediately wait on cs_main after returning from ProcessMessage and can (maybe) get to another peer before blocking ProcessMessage waiting on ActivateBestChain to finish in the background. Sadly this isn't expected to result in a material performance improvement on IBD until the next ProcessNewBlock calls' CheckBlock can begin running without waiting on cs_main, which is not yet the case.

Further, this now moves CBlock::fChecked out of cs_main, making CheckBlock parallel.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 2, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #16878 (Fix non-deterministic coverage of test DoS_mapOrphans by davereikher)
  • #16839 (Replace Connman and BanMan globals with Node local by ryanofsky)
  • #16834 (Fetch Headers over DNS by TheBlueMatt)
  • #16762 (Rust-based Backup over-REST block downloader by TheBlueMatt)
  • #16757 (doc: CChainState return values by MarcoFalke)
  • #16748 ([WIP] Add support for addrv2 (BIP155) by dongcarl)
  • #16688 (log: Add validation interface logging by jkczyz)
  • #16442 (Serve BIP 157 compact filters by jimpo)
  • #16411 (Signet support by kallewoof)
  • #16333 (test: Set BIP34Height = 2 for regtest by MarcoFalke)
  • #16279 (Return the AcceptBlock CValidationState directly in ProcessNewBlock by TheBlueMatt)
  • #16202 (Refactor network message deserialization by jonasschnelli)
  • #15545 ([doc] explain why CheckBlock() is called before AcceptBlock by Sjors)
  • #15206 (Immediately disconnect on invalid net message checksum by jonasschnelli)
  • #14053 (Add address-based index (attempt 4?) by marcinja)
  • #8994 (Testchains: Introduce custom chain whose constructor... by jtimon)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@TheBlueMatt TheBlueMatt force-pushed the 2019-07-peerstate-initial-moves branch 2 times, most recently from 9101ff4 to 072de59 Compare July 3, 2019 04:09
@TheBlueMatt TheBlueMatt force-pushed the 2019-07-peerstate-initial-moves branch 2 times, most recently from 1d011c0 to c2814df Compare July 3, 2019 22:25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This peerstate shadows the peerstate in the outer context. Consider renaming :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not captured so does is it really shadows?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be const ref since only used as such?

@TheBlueMatt
Copy link
Contributor Author

Working on a big structural cleanup, will reopen something new once I have a new PR series.

@TheBlueMatt TheBlueMatt reopened this Jul 30, 2019
@TheBlueMatt TheBlueMatt force-pushed the 2019-07-peerstate-initial-moves branch from c2814df to 47e762a Compare July 30, 2019 18:05
@TheBlueMatt
Copy link
Contributor Author

Actually I was somewhat confused by some existing bugs triggering on travis. I think this is good as-is, original description applies, modulo moving CheckBlock out of cs_main.

9fdf05d resolved some lock
inversion warnings in denialofservice_tests, but left in a number
of cs_main locks that are unnecessary (introducing lock inversion
warnings in future changes).
In practice this means that CheckBlock+ContextualCheckBlock are
called with a passed-in CValidationState before we move onto
connecting the best chain. This makes conceptual sense as these
calls represent the DoS checks on a block (ie PoW and malleability)
which the caller almost certainly wants to know about right away
and shouldn't have to wait on a callback for (and other
validationinterface clients shouldn't care about someone submitting
bogus malleated blocks to PNB).

This also makes it much, much easier to move the best chain
activation logic to a background thread as it implies that if PNB
returns with a IsValid() CValidationState we don't need to care
about trying to process (non-malleated) copies of the block from
other peers.
This prepares for making best-chain-activation and disk writes
happen in a separate thread from the caller, even though all
callsites currently block on the return value immediately.
CNodeState was added for validation-state-tracking, and thus,
logically, was protected by cs_main. However, as it has grown to
include non-validation state (taking state from CNode), and as
we've reduced cs_main usage for other unrelated things, CNodeState
is left with lots of cs_main locking in net_processing.

In order to ease transition to something new, this adds only a
dummy CPeerState which is held as a reference for the duration of
message processing.

Note that moving things is somewhat tricky pre validation-thread as
a consistent lockorder must be kept - we can't take a lock on the
new cs_peerstate in anything that's called directly from
validation.
Essentially, our goal is to not process anything for the given peer
until the block finishes processing (emulating the previous behavior)
without actually blocking the ProcessMessages loops. Obviously, in
most cases, we'll just go on to the next peer and immediately hit a
cs_main lock, blocking us anyway, but this we can slowly improve
that state over time by moving things from CNodeState to CPeerState.
@TheBlueMatt TheBlueMatt force-pushed the 2019-07-peerstate-initial-moves branch from e57cf36 to 9a97bb7 Compare August 20, 2019 00:07
Spawn a background thread at startup which validates each block as
it comes in from ProcessNewBlock, taking advantage of the new
std::future return value to keep tests simple (and the new
net_processing handling of such values async already).

This makes introducing subtle validationinterface deadlocks much
harder as any locks held going into ProcessNewBlock do not interact
with (in the form of lockorder restrictions) locks taken in
validationinterface callbacks.

Note that after this commit, feature_block and feature_assumevalid
tests time out due to increased latency between block processing
when those blocks do not represent a new best block. This will be
resolved in the next commit.
This resolves the performance regression introduced in the previous
commit by always waking the message processing thread after each
block future resolves.

Sadly, this is somewhat awkward - all other validationinterface
callbacks represent an actual change to the global validation state,
whereas this callback indicates only that a call which one
validation "client" made has completed. After going back and forth
for some time I didn't see a materially better way to resolve this
issue, and luckily its a rather simple change, but its far from
ideal. Note that because we absolutely do not want to ever block on
a ProcessNewBlock-returned-future, the callback approach is
critical.
To keep the API the same (and for simplicity of clients, ie
net_processing), this splits AcceptBlock into the do-I-want-this
stage, the checking stage, and the writing stage.

ProcessNewBlock calls the do-I-want-this and checking (ie
malleability checking) stuff, and then dumps blocks that pass
into the background thread. In the background, we re-test the
do-I-want-this logic but skip the checking stuff, before writing
the block to disk and activating the best chain.
As reject messages are required to go out in-order (ie before any
further messages are processed), this sadly requires that we
further delay re-enabling a peer after a block has been processed
by waiting for current validationinterface callbacks to drain.

This commit enables further reduction of cs_main in net_processing
by allowing us to lock cs_peerstate before cs_main in BlockChecked
(ie allows us to move things which are accessed in BlockChecked,
including DoS state and rejects into CPeerState and out of
CNodeState).
This technically resolves a race where entries are added to
mapBlockSource before we know that they're non-malleated and then
removed only after PNB returns, though in practice this wasn't an
issue since all access to mapBlockSource already held cs_peerstate.
This removes the cs_main lock which is taken on every
ProcessMessages call.
This moves one more group of variables out of cs_main. Importantly,
these moves allow us to do block processing almost entirely without
cs_main once mapBlockIndex moves to its own (read) lock.
This has the effect of allowing CheckBlock to run in parallel with
other block connection things in some cases during IBD.
@TheBlueMatt TheBlueMatt force-pushed the 2019-07-peerstate-initial-moves branch from 9a97bb7 to 99108e2 Compare August 20, 2019 03:54
@DrahtBot
Copy link
Contributor

Needs rebase

@ryanofsky
Copy link
Contributor

Marking up for grabs

http://www.erisian.com.au/bitcoin-core-dev/log-2019-11-12.html#l-394

<BlueMatt> #16323 and #16324 are up for grabs if anyone wants to work on them, but there seems to be ~zero interest in reviewing them, cause they have wonderfully scary titles (despite the code actually being pretty simple) :)

@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants