Ensure clean wasm instances by cmichi · Pull Request #2931 · paritytech/substrate

cmichi · 2019-06-24T08:30:12Z

Addresses #2051.

The runtime cache which substrate currently uses does reuse one runtime instance for every call.
In order to enable this the instance is cleaned up for every call. This leads to the problems detailed in
#2051.

Following up to private discussions with @pepyakin and @folsen, this PR tries out an alternative
approach: it replaces the runtime cache with a pool of pre-created runtime instances. For every call a "fresh" instance is taken out of the pool, a background thread monitors the pool and adds new instances.

In order to make this possible I had to implement threadsafe types for wasmi (wasmi-labs/wasmi#189, not yet merged).

I did a first (very very simple) benchmark by timing invocations of call_in_wasm_module and running cargo run --release -- --dev for 40 blocks:

AuraApi_authorities:
master:    169  µs (100%)
pool:      196  µs (115%)

AuraApi_slot_duration:
master:    2192 µs (100%)
pool:      96   µs (4%)

BlockBuilder_apply_extrinsic:
master:    1805 µs (100%)
pool:      2551 µs (141%)

BlockBuilder_finalize_block:
master:    6035 µs (100%)
pool:      4774 µs (79%)

BlockBuilder_inherent_extrinsics:
master:    996  µs (100%)
pool:      1416 µs (142%)

Core_execute_block:
master:    1335 µs (100%)
pool:      965  µs (72%)

Core_initialize_block:
master:    1731 µs (100%)
pool:      2143 µs (123%)

GrandpaApi_grandpa_authorities:
master:    2357 µs (100%)
pool:      150  µs (6%)

GrandpaApi_grandpa_forced_change:
master:    136  µs (100%)
pool:      155  µs (113%)

GrandpaApi_grandpa_pending_change:
master:    122  µs (100%)
pool:      121  µs (99%)

OffchainWorkerApi_offchain_worker:
master:    3    µs (100%)
pool:      11   µs (366%)

There is some room for optimization though (Jef had some comments on the wasmi PR).

At first we had intended for the runtime pool to do more complex actions when adding instances to the pool. Right now it just clones a runtime instance from a "template instance". I measured the clone operation in release mode on dev over 40 blocks and it takes on average about 1 µs on my machine. So it seems rather cheap and I'm wondering if we could maybe even do this synchronously without having the overhead of a separate thread?

(The CI fails because I use wasmi = { path = "../../../wasmi", features = [ "threadsafe" ] } in the code, since that PR is not yet merged. That package is not (yet) available in the CI. All tests run locally though.)

@pepyakin

Original is from @pepyakin in 3d7b27f. I adapted it to work with the latest master.

...to have the threadsafe version available.

A background thread monitors the pool and adds new instances.

bkchr · 2019-06-24T09:38:44Z

Wouldn't it be much more clean to just restore the original data section?
This all looks like a big hack.

pepyakin · 2019-06-24T10:55:49Z

Wouldn't it be much more clean to just restore the original data section?

I assume you meant restoration of the original data snapshot as well as other components of an instance (basically, cloning an wasm instance), otherwise it would have the same problems as #2051 to some degree.

I think we might have tried to restore/to clean the whole memory instance, but it was taking considerably more time. I think that is the reason why we got the current optimization that attempts at zeroing the (heuristically detected) dirty areas of memory (I think @arkpar might have some info on this).

The proposed approach, at least conceptially (I haven't had a look at the code yet), is very simple, so I think I can repeat it here:

Create a pool of instances.
On one side of the pool is producers which replenish the pool in background threads.
On another side of the pool is consumers - wasm_executor which asks for an instance.

The general assumption behind this is the typical workload is more or less synchronous and thus the latency of wasm runtime execution matters more than actual time spent on preparing these instances. While threading and interthread communications definitely add some overhead, we are lowering the latency on the critical path to almost zero on average. Besides that, it doesn't use any specific feature of a wasm execution engine (wasmi at the moment), so this mechanism can be easily reused for another execution engine (i.e. #2634).

All of this and simplicity of the approach indicated that it is a good path to explore.

Although, it uncovered a problem with wasmi: wasmi in the current state is unusable in multithreaded environment, and I am not happy about it as well. That said, I think the general direction is still worth exploring.

Do you have anything in particular that you don't like in this solution (conceptually)?

core/executor/Cargo.toml

core/executor/src/wasm_runtimes_pool.rs

Co-Authored-By: Sergei Pepyakin <s.pepyakin@gmail.com>

pepyakin · 2019-06-24T11:56:56Z

Updates:

This approach uses a raw thread which is step backwards with respect to
Add a light client code that compiles in the browser #2416
If the inital assumption that copying of whole memory takes too long is not true, then plain cloning of the runtime instance template is actually very tempting because of its simplicity.

cmichi · 2019-06-24T12:08:54Z

At first we had intended for the runtime pool to do more complex actions when adding instances to the pool. Right now it just clones a runtime instance from a "template instance". I measured the clone operation in release mode on dev over 40 blocks and it takes on average about 1 µs on my machine. So it seems rather cheap and I'm wondering if we could maybe even do this synchronously without having the overhead of a separate thread?

If the inital assumption that copying of whole memory takes too long is not true, then plain cloning of the runtime instance template is actually very tempting because of its simplicity.

I'll try it out in a separate branch/PR. This would also save us from having to switch to thread-safe types for wasmi.

bkchr · 2019-06-24T12:12:59Z

My complain was mostly about copying everything ^^ But if there is no better way, okay..

Conceptually I did not look that much into the code. In general you are right, we should not spawn raw threads in here. Then we need to take a look, how we evict old runtimes from the cache. Maybe it makes sense to remove them after some blocks and not directly as we see a new runtime.
And if we already touch this, I would really like to see the global cache variable gone and that the cache is passed in from the outside. Having global variables, was not a good solution from the beginning.

arkpar · 2019-06-24T13:55:15Z

The problem is with the way rust-to-wasm toolchain organizes wasm memory. IIRC it allocates 1MB for the stack and and the data segment follows it. So each new runtime instance requires a 1MB write to memory, which impacts the performance when you have a lot of runtimes created for simple calls into WASM.

I'd really prefer this to be fundamentally fixed by avoiding that huge unnecessary work, rather than working around it. Pooling would waste CPU and will degrade heavily under load. E.g. when you have a lot of RPC requests coming in parallel, which is quite typical in various blockchain applications.

pepyakin · 2019-06-24T15:35:28Z

Pooling would waste CPU and will degrade heavily under load. E.g. when you have a lot of RPC requests coming in parallel, which is quite typical in various blockchain applications.

This is really good input. As I mentioned, the assumption was that the node has spare CPU time. If you think that is not the case then this would be an issue.

I have some ideas in mind how to fix itproperly. However, they would require non-trivial amount of modification to the current execution execution engine - wasmi - and for the execution engines that would be introduced in the future (#2634). Opting into this more sophisticated approach would inhibit experimentation as well.

But most importantly, this issue is pressing due to an upcoming audit. This solution seemed to be pretty simple and shouldn't have regressed the performance very much.

That said, I really agree that we need to use a proper solution eventually and I see the proposed solution as a stop-gap.

arkpar · 2019-06-26T06:41:20Z

Would still love to see how it affects real block import performance. I.e. import 10000 testnet blocks from a file. Also please compare CPU load.

cmichi · 2019-06-28T15:04:21Z

I'm closing this PR as we are currently favoring the approach from #2938 and have stopped working on this one. If the other one turns out to be a dead end we can always reopen this PR.

cmichi added 4 commits June 24, 2019 10:22

Add test from original bug report

e036de8

Original is from @pepyakin in 3d7b27f. I adapted it to work with the latest master.

No longer cleanup module instance

a2dd7b0

Upgrade wasmi

c420fd8

...to have the threadsafe version available.

Replace runtime cache with runtime pool

d856b6c

A background thread monitors the pool and adds new instances.

cmichi added the A3-in_progress Pull request is in progress. No review needed at this stage. label Jun 24, 2019

cmichi requested a review from pepyakin June 24, 2019 08:30

devops-parity added the A4-gotissues label Jun 24, 2019

pepyakin reviewed Jun 24, 2019

View reviewed changes

core/executor/Cargo.toml Outdated Show resolved Hide resolved

pepyakin reviewed Jun 24, 2019

View reviewed changes

core/executor/src/wasm_runtimes_pool.rs Outdated Show resolved Hide resolved

pepyakin reviewed Jun 24, 2019

View reviewed changes

core/executor/src/wasm_runtimes_pool.rs Show resolved Hide resolved

cmichi and others added 2 commits June 24, 2019 13:36

Improve comment

eb815d8

Co-Authored-By: Sergei Pepyakin <s.pepyakin@gmail.com>

Use git ref for wasmi dependency

2ab6198

Add comments for parameters and return value

fac96ff

Update Cargo.lock's after wasmi git ref change

4e3254b

cmichi mentioned this pull request Jun 25, 2019

Ensure clean wasm instances via synchronous clone. #2938

Closed

cmichi closed this Jun 28, 2019

pepyakin mentioned this pull request Jul 3, 2019

Fair reusing of wasm runtime instances #3011

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure clean wasm instances#2931

Ensure clean wasm instances#2931
cmichi wants to merge 8 commits intomasterfrom
cmichi-ensure-clean-wasm-instances

cmichi commented Jun 24, 2019 •

edited

Loading

Uh oh!

bkchr commented Jun 24, 2019

Uh oh!

pepyakin commented Jun 24, 2019 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pepyakin commented Jun 24, 2019

Uh oh!

cmichi commented Jun 24, 2019

Uh oh!

bkchr commented Jun 24, 2019

Uh oh!

arkpar commented Jun 24, 2019

Uh oh!

pepyakin commented Jun 24, 2019

Uh oh!

arkpar commented Jun 26, 2019

Uh oh!

cmichi commented Jun 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

cmichi commented Jun 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bkchr commented Jun 24, 2019

Uh oh!

pepyakin commented Jun 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pepyakin commented Jun 24, 2019

Uh oh!

cmichi commented Jun 24, 2019

Uh oh!

bkchr commented Jun 24, 2019

Uh oh!

arkpar commented Jun 24, 2019

Uh oh!

pepyakin commented Jun 24, 2019

Uh oh!

arkpar commented Jun 26, 2019

Uh oh!

cmichi commented Jun 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cmichi commented Jun 24, 2019 •

edited

Loading

pepyakin commented Jun 24, 2019 •

edited

Loading