Skip to content

performance improvement in many aspects#257

Merged
unclezoro merged 10 commits intomasterfrom
improve_8
Jul 29, 2021
Merged

performance improvement in many aspects#257
unclezoro merged 10 commits intomasterfrom
improve_8

Conversation

@unclezoro
Copy link
Copy Markdown
Contributor

@unclezoro unclezoro commented May 26, 2021

Description

This PR tries to improve the performance of BSC.

Rationale

Part of Block Processing can be parallel, and the layer of the snapshot is too deeper for the current implementation. This PR focus on:

  1. Do BlockBody verification concurrently;
  2. Do the calculation of intermediate root concurrently;
  3. Commit the MPTs concurrently;
  4. Preload accounts before processing blocks;
  5. Make the snapshot layers configurable.
  6. Reuse some objects to reduce GC.
  7. Cherry-pick the RLP improvement from go-ethereum.
  8. Other minor improvements.

Example

The testing result on an 8 core 2.5G Hz Linux VM:

  1. The full sync speed on the main-net is 40% faster.
  2. The execution of an BEP20 transfer is 50% faster.

Changes

New command flags: --triesInMemory

Preflight checks

  • build passed (make build)
  • tests passed (make test)
  • manual transaction test passed

Already reviewed by

...

Related issues

... reference related issue #'s here ...

@unclezoro unclezoro force-pushed the improve_8 branch 13 times, most recently from 62904a0 to 5bdb7b4 Compare May 31, 2021 07:35
@unclezoro unclezoro force-pushed the improve_8 branch 17 times, most recently from 0a1cc10 to e750b5d Compare June 7, 2021 13:19
fjl added 4 commits July 12, 2021 09:41
All encoding/decoding operations read the type cache to find the
writer/decoder function responsible for a type. When analyzing CPU
profiles of geth during sync, I found that the use of sync.RWMutex in
cache lookups appears in the profiles. It seems we are running into
CPU cache contention problems when package rlp is heavily used
on all CPU cores during sync.

This change makes it use atomic.Value + a writer lock instead of
sync.RWMutex. In the common case where the typeinfo entry is present in
the cache, we simply fetch the map and lookup the type.
This change improves the performance of encoding/decoding [N]byte.

    name                     old time/op    new time/op    delta
    DecodeByteArrayStruct-8     336ns ± 0%     246ns ± 0%  -26.98%  (p=0.000 n=9+10)
    EncodeByteArrayStruct-8     225ns ± 1%     148ns ± 1%  -34.12%  (p=0.000 n=10+10)

    name                     old alloc/op   new alloc/op   delta
    DecodeByteArrayStruct-8      120B ± 0%       48B ± 0%  -60.00%  (p=0.000 n=10+10)
    EncodeByteArrayStruct-8     0.00B          0.00B          ~     (all equal)
This change grows the static integer buffer in Stream to 32 bytes,
making it possible to decode 256bit integers without allocating a
temporary buffer.

In the recent commit 088da24, Stream struct size decreased from 120
bytes down to 88 bytes. This commit grows the struct to 112 bytes again,
but the size change will not degrade performance because Stream
instances are internally cached in sync.Pool.

    name             old time/op    new time/op    delta
    DecodeBigInts-8    12.2µs ± 0%     8.6µs ± 4%  -29.58%  (p=0.000 n=9+10)

    name             old speed      new speed      delta
    DecodeBigInts-8   230MB/s ± 0%   326MB/s ± 4%  +42.04%  (p=0.000 n=9+10)
Getting the raw value is not necessary to decode this type, and
decoding it directly from the stream is faster.
}
}
}
wg.Wait()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why wait here instead of line 1487

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to state.Commit and blockBatch.Write() run concurrently

if data, ok := storage[storageHash]; ok {
snapshotDirtyStorageHitMeter.Mark(1)
snapshotDirtyStorageHitDepthHist.Update(int64(depth))
//snapshotDirtyStorageHitDepthHist.Update(int64(depth))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we delete the comments here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is commented out because of the performance issue, if we need the metrics in some scenery, we get it back without deleting it.

chainRules: chainConfig.Rules(blockCtx.BlockNumber),
interpreters: make([]Interpreter, 0, 1),
}
evm := EvmPool.Get().(*EVM)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we put the evm back to pool

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

"sync"

"github.com/ethereum/go-ethereum/common/gopool"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// Issue the header retrieval request (absolute upwards without gaps)
go p.peer.RequestHeadersByNumber(from, count, 0, false)

go p.peer.RequestHeadersByNumber(from, count, 0, false)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use gopool here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used when clients doing fast sync and snap sync. Not related to performance

p2p/rlpx/rlpx.go Outdated
"github.com/ethereum/go-ethereum/rlp"
"github.com/golang/snappy"
"golang.org/x/crypto/sha3"
"github.com/oxtoacart/bpool"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}
if len(accountsSlice) >= preLoadLimit {
objsChan := make(chan []*state.StateObject, runtime.NumCPU())
for i := 0; i < runtime.NumCPU(); i++ {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make sure that len(accountsSlice) > runtime.NumCPU()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one more condition: len(accountsSlice) > runtime.NumCPU()

@adithyaxx
Copy link
Copy Markdown

Can we get this PR merged?

@unclezoro unclezoro mentioned this pull request Jul 29, 2021
3 tasks
Copy link
Copy Markdown

@yvettep321 yvettep321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_thank you _

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BSC synchronization issues

8 participants