fix race condition between bootstrap pull and push#3721
Merged
dsiganos merged 1 commit intonanocurrency:developfrom Feb 8, 2022
Merged
fix race condition between bootstrap pull and push#3721dsiganos merged 1 commit intonanocurrency:developfrom
dsiganos merged 1 commit intonanocurrency:developfrom
Conversation
The unit test bootstrap_processor.push_one fails intermittently and it exposes a race condition in the bootstrap code of the node. nanocurrency#3533 The race condition is that after finishing with bootstrap frontiers pull, the code is supposed to put the socket back into the idle bootstrap connections queue. However it does that in a racy way. The problem is in the function nano::frontier_req_client::received_frontier. Near the bottom of that function, we should call: connection->connections.pool_connection (connection); before calling promise.set_value (false); Once the promise is given a value then nano::bootstrap_attempt_legacy::run() continues to call push_request in parallel with pool_connection and it is a race condition. Although this is a bug, this is likely not a major nor catastrophic bug because the bootstrap push functionality is only an efficiency. resolves nanocurrency#3533 resolves nanocurrency#3720
clemahieu
approved these changes
Feb 8, 2022
theohax
approved these changes
Feb 8, 2022
Contributor
There was a problem hiding this comment.
2022-02-08T04:32:05.3477625Z + [[ Linux == \L\i\n\u\x ]]
2022-02-08T04:32:05.3477883Z + clang --version
2022-02-08T04:32:05.3478439Z ./ci/build-travis.sh: line 33: clang: command not found
2022-02-08T04:32:05.3478947Z + BACKTRACE=-DNANO_STACKTRACE_BACKTRACE=ON
2022-02-08T04:32:05.3480243Z + cmake '-GUnix Makefiles' -DACTIVE_NETWORK=nano_dev_network -DNANO_TEST=ON -DNANO_GUI=ON -DPORTABLE=1 -DNANO_WARN_TO_ERR=ON -DCMAKE_BUILD_TYPE=Debug -DCMAKE_VERBOSE_MAKEFILE=ON -DBOOST_ROOT=/tmp/boost -DNANO_SHARED_BOOST=ON -DQt5_DIR=/usr/lib/x86_64-linux-gnu/cmake/Qt5 -DCI_TEST=1 -DNANO_STACKTRACE_BACKTRACE=ON ..
2022-02-08T04:32:05.3481095Z fatal: No names found, cannot describe anything.
2022-02-08T04:32:05.3482385Z CMake Warning at flatbuffers/CMake/Version.cmake:22 (message):
2022-02-08T04:32:05.3599182Z git describe failed with exit code: 128
2022-02-08T04:32:05.3599477Z Call Stack (most recent call first):
2022-02-08T04:32:05.3599737Z flatbuffers/CMakeLists.txt:559 (include)
2022-02-08T04:32:05.3600146Z
Probably this makes the CI fail, I'll have a look into it. Fix itself looks good to me (y)
Oh, actually I think this is the culprit:
[ RUN ] block_store.empty_bootstrap
+ core_test_res=124
Looks like a timeout?
Contributor
Author
|
Return value 124 is the timeout code of the command timeout. |
Contributor
Yes, looks like a timeout... probably a buggy test therefore |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The unit test bootstrap_processor.push_one fails intermittently and it
exposes a race condition in the bootstrap code of the node. #3533
The race condition is that after finishing with bootstrap frontiers pull,
the code is supposed to put the socket back into the idle bootstrap
connections queue. However it does that in a racy way. The problem is in
the function nano::frontier_req_client::received_frontier.
Near the bottom of that function, we should call:
connection->connections.pool_connection (connection);
before calling
promise.set_value (false);
Once the promise is given a value then
nano::bootstrap_attempt_legacy::run()
continues to call push_request in parallel with pool_connection and it is
a race condition.
Although this is a bug, this is likely not a major nor catastrophic bug
because the bootstrap push functionality is only an efficiency.
resolves #3533
resolves #3720