Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Applying WASM snapshot fails occasionally #3780

@raykyri

Description

@raykyri

Our nodes occasionally panic when the executor's WASM snapshot gets out of sync. This happens more frequently on different nodes than others (e.g. nodes that have been running for longer / were started earlier).

We're using this version of Substrate: e0d486d

Oct 07 00:35:25 edg09-1 run_node.sh[20824]: ====================
Oct 07 00:35:25 edg09-1 run_node.sh[20824]: Version: 1.0.0-5a52390-x86_64-linux-gnu
Oct 07 00:35:25 edg09-1 run_node.sh[20824]: stack backtrace:
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:    0: substrate_panic_handler::set::{{closure}}
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:    1: std::panicking::rust_panic_with_hook
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:              at src/libstd/panicking.rs:481
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:    2: std::panicking::continue_panic_fmt
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:              at src/libstd/panicking.rs:384
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:    3: rust_begin_unwind
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:              at src/libstd/panicking.rs:311
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:    4: core::panicking::panic_fmt
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:              at src/libcore/panicking.rs:85
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:    5: core::result::unwrap_failed
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:              at src/libcore/result.rs:1084
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:    6: std::thread::local::LocalKey<T>::with
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:    7: substrate_state_machine::StateMachine<B,H,N,T,O,Exec>::execute_aux
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:    8: substrate_state_machine::StateMachine<B,H,N,T,O,Exec>::execute_using_consensus_failure_handler
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:    9: <substrate_client::call_executor::LocalCallExecutor<B,E> as substrate_client::call_executor::CallExecutor<Block,substrate_primitives
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   10: <substrate_client::client::Client<B,E,Block,RA> as substrate_client::runtime_api::CallRuntimeAt<Block>>::call_api_at
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   11: substrate_client::runtime_api::runtime_decl_for_Core::initialize_block_call_api_at
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   12: substrate_client::runtime_api::Core::initialize_block
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   13: <substrate_client::call_executor::LocalCallExecutor<B,E> as substrate_client::call_executor::CallExecutor<Block,substrate_primitives
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   14: <substrate_client::client::Client<B,E,Block,RA> as substrate_client::runtime_api::CallRuntimeAt<Block>>::call_api_at
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   15: substrate_client::runtime_api::runtime_decl_for_Metadata::metadata_call_api_at
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   16: substrate_client::runtime_api::Metadata::metadata
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   17: <substrate_rpc::state::state_full::FullState<B,E,Block,RA> as substrate_rpc::state::StateBackend<B,E,Block,RA>>::metadata
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   18: <jsonrpc_core::delegates::DelegateAsyncMethod<T,F> as jsonrpc_core::calls::RpcMethod<M>>::call
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   19: <futures::future::lazy::Lazy<F,R> as futures::future::Future>::poll
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   20: futures::future::chain::Chain<A,B,C>::poll
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   21: <futures::future::then::Then<A,B,F> as futures::future::Future>::poll
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   22: <futures::future::map::Map<A,F> as futures::future::Future>::poll
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   23: <futures::future::either::Either<A,B> as futures::future::Future>::poll
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   24: <futures::future::map::Map<A,F> as futures::future::Future>::poll
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   25: <futures::future::select::Select<A,B> as futures::future::Future>::poll
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   26: <futures::future::map::Map<A,F> as futures::future::Future>::poll
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   27: <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   28: futures::task_impl::std::set
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   29: std::panicking::try::do_call
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   30: __rust_maybe_catch_panic
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:              at src/libpanic_unwind/lib.rs:80
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   31: tokio_threadpool::task::Task::run
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   32: tokio_threadpool::worker::Worker::run_task
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   33: tokio_threadpool::worker::Worker::run
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   34: std::thread::local::LocalKey<T>::with
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   35: std::thread::local::LocalKey<T>::with
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   36: std::thread::local::LocalKey<T>::with
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   37: tokio::runtime::threadpool::builder::Builder::build::{{closure}}
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   38: std::thread::local::LocalKey<T>::with
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   39: std::thread::local::LocalKey<T>::with
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   40: std::sys_common::backtrace::__rust_begin_short_backtrace
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   41: std::panicking::try::do_call
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   42: __rust_maybe_catch_panic
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:              at src/libpanic_unwind/lib.rs:80
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   43: core::ops::function::FnOnce::call_once{{vtable.shim}}
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   44: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:              at /rustc/625451e376bb2e5283fc4741caa0a3e8a2ca4d54/src/liballoc/boxed.rs:787
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   45: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:              at /rustc/625451e376bb2e5283fc4741caa0a3e8a2ca4d54/src/liballoc/boxed.rs:787
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:       std::sys_common::thread::start_thread
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:              at src/libstd/sys_common/thread.rs:13
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:       std::sys::unix::thread::Thread::new::thread_start
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:              at src/libstd/sys/unix/thread.rs:79
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   46: start_thread
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:   47: __clone
Oct 07 00:35:25 edg09-1 run_node.sh[20824]: Thread 'jsonrpc-eventloop-0' panicked at 'applying the snapshot can only fail if the passed instance is different
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:                         from the one that was used for creation of the snapshot;
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:                         we use the snapshot that is directly associated with the instance;
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:                         thus the snapshot was created using the instance;
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:                         qed: ApplySnapshotFailed', src/libcore/result.rs:1084
Oct 07 00:35:25 edg09-1 run_node.sh[20824]: This is a bug. Please report it at:
Oct 07 00:35:25 edg09-1 run_node.sh[20824]:         https://github.com/hicommonwealth/edgeware-node/issues/new
Oct 07 00:35:25 edg09-1 systemd[1]: edgeware.service: Main process exited, code=exited, status=1/FAILURE
Oct 07 00:35:25 edg09-1 systemd[1]: edgeware.service: Failed with result 'exit-code'.

Sometimes this happens with the offchain worker API:

Oct 07 03:10:43 edg09-1 run_node.sh[9920]: Version: 1.0.0-5a52390-x86_64-linux-gnu
Oct 07 03:10:43 edg09-1 run_node.sh[9920]: stack backtrace:
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:    0: substrate_panic_handler::set::{{closure}}
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:    1: std::panicking::rust_panic_with_hook
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:              at src/libstd/panicking.rs:481
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:    2: std::panicking::continue_panic_fmt
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:              at src/libstd/panicking.rs:384
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:    3: rust_begin_unwind
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:              at src/libstd/panicking.rs:311
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:    4: core::panicking::panic_fmt
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:              at src/libcore/panicking.rs:85
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:    5: core::result::unwrap_failed
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:              at src/libcore/result.rs:1084
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:    6: std::thread::local::LocalKey<T>::with
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:    7: substrate_state_machine::StateMachine<B,H,N,T,O,Exec>::execute_aux
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:    8: substrate_state_machine::StateMachine<B,H,N,T,O,Exec>::execute_using_consensus_failure_handler
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:    9: <substrate_client::call_executor::LocalCallExecutor<B,E> as substrate_client::call_executor::CallExecutor<Block,substrate_primitives:
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:   10: <substrate_client::client::Client<B,E,Block,RA> as substrate_client::runtime_api::CallRuntimeAt<Block>>::call_api_at
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:   11: substrate_offchain_primitives::runtime_decl_for_OffchainWorkerApi::offchain_worker_call_api_at
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:   12: substrate_offchain_primitives::OffchainWorkerApi::offchain_worker_with_context
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:   13: std::sys_common::backtrace::__rust_begin_short_backtrace
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:   14: std::panicking::try::do_call
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:   15: __rust_maybe_catch_panic
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:              at src/libpanic_unwind/lib.rs:80
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:   16: core::ops::function::FnOnce::call_once{{vtable.shim}}
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:   17: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:              at /rustc/625451e376bb2e5283fc4741caa0a3e8a2ca4d54/src/liballoc/boxed.rs:787
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:   18: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:              at /rustc/625451e376bb2e5283fc4741caa0a3e8a2ca4d54/src/liballoc/boxed.rs:787
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:       std::sys_common::thread::start_thread
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:              at src/libstd/sys_common/thread.rs:13
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:       std::sys::unix::thread::Thread::new::thread_start
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:              at src/libstd/sys/unix/thread.rs:79
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:   19: start_thread
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:   20: __clone
Oct 07 03:10:43 edg09-1 run_node.sh[9920]: Thread '<unnamed>' panicked at 'applying the snapshot can only fail if the passed instance is different
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:                         from the one that was used for creation of the snapshot;
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:                         we use the snapshot that is directly associated with the instance;
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:                         thus the snapshot was created using the instance;
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:                         qed: ApplySnapshotFailed', src/libcore/result.rs:1084
Oct 07 03:10:43 edg09-1 run_node.sh[9920]: This is a bug. Please report it at:
Oct 07 03:10:43 edg09-1 run_node.sh[9920]:         https://github.com/hicommonwealth/edgeware-node/issues/new
Oct 07 03:10:43 edg09-1 systemd[1]: edgeware.service: Main process exited, code=exited, status=1/FAILURE
Oct 07 03:10:43 edg09-1 systemd[1]: edgeware.service: Failed with result 'exit-code'.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions