-
Notifications
You must be signed in to change notification settings - Fork 38.7k
ci: use Debian Bookworm and Valgrind 3.19 in Valgrind jobs #27444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: use Debian Bookworm and Valgrind 3.19 in Valgrind jobs #27444
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process. |
|
Not sure. Requiring clang-16, and valgrind to be compiled from source, along with removing support for gcc seems a bit aggressive. |
|
What do you mean by "removing support for GCC"? |
|
You removed the suppression for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65434 without explanation? (Or with the explanation that it is an unused GUI supppression?) |
I removed that suppression as I can't seem to recreate the issue. I assume it's been "fixed" due to using the newer libstdc++ versions shipped with the newer Ubuntu. Note that this was the case with clang-16 & the ubuntu shipped valgrind 3.19 as well. The GUI suppression was removed because it's unused, and any versions would be out of date when changing Ubuntu versions. |
|
In any case, it would be good to provide a reason for the bump. Using non-LTS devel OS releases in CI should be an exception reserved for bugfixes or important features, not the norm. No objection if you want to use bookworm, but for lunar there should be a rationale. |
|
I just tried bookworm and it was sufficient to remove the gcc suppression. https://packages.debian.org/bookworm/gcc |
077cd40 to
75a3bf1
Compare
Ok. I've just moved to I'll split out the compile-from-source change, as I'd still like to improve support for aarch64, and don't think the overhead of compiling Valgrind is much. |
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use gcc and drop the dwarf flags, if you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this for now, and follow up in the build-from-source PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry. You'll need clang/llvm libfuzzer anyway. Discussion can be closed/resolved.
What is the error message on aarch64? Does it happen with gcc and clang? |
I'll open a separate issue for this. I've opened some PRs in other repos in readiness for this change:
So going to go-ahead and merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to use clang-15 over clang? Absent a reason, this makes it harder to bump or quickly modify the image name tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather be using newer tools/compilers when available, generally less bugs/problems, but also too catch potential issues introduced by newer versions of the tools, i.e bitcoin-core/secp256k1#1257, sooner, rather than later. However if this is going to be too annoying for hotswapping images, I can drop it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using newer tools/compilers
No strong opinion, but using clang makes that easier, because you can just bump the tag to ubuntu:devel and get clang 16 that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll leave this for now, so we can hotswap easier
Remove no-longer-required libstdc++ suppression. Remove unused (and versioned) GUI suppression.
75a3bf1 to
e047ae8
Compare
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-lgtm. Didn't test, except that the gcc suppression can be dropped on Bookworm and later.
|
Did you test this? CI is red, looking at https://cirrus-ci.com/task/5858910658101248 |
|
Yea, these work for me locally. Isn't that CI expected that to fail, before this was merged, because it'd be using the wrong image to run the job? |
|
Locally it also fails for me: |
|
Can you provide the full config.log? |
No, you didn't change anything in this pull, other than the image tag. So if the CI passed before and after, there is no "wrong image" tag.
Should be the same I linked to above. |
The image tags in the qa-assets repo are now wrong right? They are pointing to the wrong OS, than what we are expecting to run the job on? |
|
I wouldn't call it "wrong", just "mismatch". I think absent any other reason the CI should support the largest range possible of image tags. See #27444 (comment) This means a user/dev can easily force the tag to be a different one to quickly check different package versions. |
Yea I think this is fine, but obviously constrained to where the exact same apt invocations are going to work. i.e you can't necessarily drop in an |
This fixes some cases, i.e under --no-install-recommends, where libclang-rt-dev wouldn't be installed, and configuring would then fail. Followup to bitcoin#27444.
2c60826 ci: explicitly install libclang-rt-dev in valgrind jobs (fanquake) Pull request description: This fixes some cases, i.e under --no-install-recommends, where libclang-rt-dev wouldn't be installed, and configuring would then fail. Followup to #27444. Top commit has no ACKs. Tree-SHA512: d1ab0050731df47c21f6ac4f575a728b045b4617beaa1fa8b878050e07e5ddda18fb7d066c7b32bee5ed0ac0e878958a812d4c6b5bd704612755ccb3c172d7e2
| fun:_dl_init | ||
| obj:*/ld-*.so | ||
| } | ||
| # * aarch64 (Debian Bookworm system libs, clang, without gui) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, this fails with:
test/sighash_tests.cpp(163): Leaving test case "sighash_from_data"; testing time: 3749686us
test/sighash_tests.cpp(120): Entering test case "sighash_test"
==21825== Source and destination overlap in memcpy(0x8704270, 0x8704270, 36)
==21825== at 0x488CFA0: __GI_memcpy (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==21825== by 0x895673: CTxIn::operator=(CTxIn const&) (transaction.h:74)
==21825== by 0x8DD22F: SignatureHashOld(CScript, CTransaction const&, unsigned int, int) (sighash_tests.cpp:76)
==21825== by 0x8DC7E3: sighash_tests::sighash_test::test_method() (sighash_tests.cpp:138)
==21825== by 0x8DC437: sighash_tests::sighash_test_invoker() (sighash_tests.cpp:120)
==21825== by 0x358BBB: boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:117)
==21825== by 0x24EBE7: boost::function0<void>::operator()() const (function_template.hpp:763)
==21825== by 0x2C9EC7: boost::detail::forward::operator()() (execution_monitor.ipp:1388)
==21825== by 0x2C9AFF: boost::detail::function::function_obj_invoker0<boost::detail::forward, int>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:137)
==21825== by 0x2C3C13: boost::function0<int>::operator()() const (function_template.hpp:763)
==21825== by 0x2282EB: int boost::detail::do_invoke<boost::shared_ptr<boost::detail::translator_holder_base>, boost::function<int ()> >(boost::shared_ptr<boost::detail::translator_holder_base> const&, boost::function<int ()> const&) (e
xecution_monitor.ipp:301)
==21825== by 0x1EAAF7: boost::execution_monitor::catch_signals(boost::function<int ()> const&) (execution_monitor.ipp:903)
==21825==
{
<insert_a_suppression_name_here>
Memcheck:Overlap
fun:__GI_memcpy
fun:_ZN5CTxInaSERKS_
fun:_ZL16SignatureHashOld7CScriptRK12CTransactionji
fun:_ZN13sighash_tests12sighash_test11test_methodEv
fun:_ZN13sighash_testsL20sighash_test_invokerEv
fun:_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE
fun:_ZNK5boost9function0IvEclEv
fun:_ZN5boost6detail7forwardclEv
fun:_ZN5boost6detail8function21function_obj_invoker0INS0_7forwardEiE6invokeERNS1_15function_bufferE
fun:_ZNK5boost9function0IiEclEv
fun:_ZN5boost6detail9do_invokeINS_10shared_ptrINS0_22translator_holder_baseEEENS_8functionIFivEEEEEiRKT_RKT0_
fun:_ZN5boost17execution_monitor13catch_signalsERKNS_8functionIFivEEE
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea this, and the other failure in the benches you'll see, if you suppress this one (see below), were already known, and the whole reason I wanted to move to 3.20, rather than continuing to fight with broken tools. I'll open the build from source PR.
Running bench/bench_bitcoin (one iteration sanity check, only high priority)...
bench/bench_bitcoin -sanity-check -priority-level=high
Running with -sanity-check option, output is being suppressed as benchmark results will be useless.
==23097== Source and destination overlap in memcpy(0x78796c0, 0x78796c0, 36)
==23097== at 0x488D070: __GI_memcpy (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==23097== by 0x32D48B: std::enable_if<__and_<std::__not_<std::__is_tuple_like<COutPoint> >, std::is_move_constructible<COutPoint>, std::is_move_assignable<COutPoint> >::value, void>::type std::swap<COutPoint>(COutPoint&, COutPoint&) (move.h:205)
==23097== by 0x32D3FB: std::pair<COutPoint, long>::swap(std::pair<COutPoint, long>&) (stl_pair.h:209)
==23097== by 0x30E3B3: std::enable_if<__and_<std::__is_swappable<COutPoint>, std::__is_swappable<long> >::value, void>::type std::swap<COutPoint, long>(std::pair<COutPoint, long>&, std::pair<COutPoint, long>&) (stl_pair.h:709)
==23097== by 0x3078E7: TestChain100Setup::PopulateMempool(FastRandomContext&, unsigned long, bool) (setup_common.cpp:420)
==23097== by 0x24AFF3: MempoolCheck(ankerl::nanobench::Bench&) (mempool_stress.cpp:109)
==23097== by 0x193A13: void std::__invoke_impl<void, void (*&)(ankerl::nanobench::Bench&), ankerl::nanobench::Bench&>(std::__invoke_other, void (*&)(ankerl::nanobench::Bench&), ankerl::nanobench::Bench&) (invoke.h:61)
==23097== by 0x193943: std::enable_if<is_invocable_r_v<void, void (*&)(ankerl::nanobench::Bench&), ankerl::nanobench::Bench&>, void>::type std::__invoke_r<void, void (*&)(ankerl::nanobench::Bench&), ankerl::nanobench::Bench&>(void (*&)(ankerl::nanobench::Bench&), ankerl::nanobench::Bench&) (invoke.h:111)
==23097== by 0x19373F: std::_Function_handler<void (ankerl::nanobench::Bench&), void (*)(ankerl::nanobench::Bench&)>::_M_invoke(std::_Any_data const&, ankerl::nanobench::Bench&) (std_function.h:290)
==23097== by 0x19A5B3: std::function<void (ankerl::nanobench::Bench&)>::operator()(ankerl::nanobench::Bench&) const (std_function.h:591)
==23097== by 0x197727: benchmark::BenchRunner::RunAll(benchmark::Args const&) (bench.cpp:119)
==23097== by 0x1E9C57: main (bench_bitcoin.cpp:132)
==23097==
{
Suppresion which I will produce the valgrind info for.
Memcheck:Overlap
fun:__GI_memcpy
fun:_ZN5CTxInaSERKS_
fun:_ZL16SignatureHashOld7CScriptRK12CTransactionji
fun:_ZN13sighash_tests12sighash_test11test_methodEv
fun:_ZN13sighash_testsL20sighash_test_invokerEv
fun:_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE
fun:_ZNK5boost9function0IvEclEv
fun:_ZN5boost6detail7forwardclEv
fun:_ZN5boost6detail8function21function_obj_invoker0INS0_7forwardEiE6invokeERNS1_15function_bufferE
fun:_ZNK5boost9function0IiEclEv
fun:_ZN5boost6detail9do_invokeINS_10shared_ptrINS0_22translator_holder_baseEEENS_8functionIFivEEEEEiRKT_RKT0_
fun:_ZN5boost17execution_monitor13catch_signalsERKNS_8functionIFivEEE
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like gcc works fine, see also #27444 (comment). Though, this won't help with the fuzz task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fuzz task seems to be passing, see #27364 (comment), so no need to bump valgrind. You can simply use gcc:
diff --git a/ci/test/00_setup_env_native_valgrind.sh b/ci/test/00_setup_env_native_valgrind.sh
index 97b85755e..794a2dcca 100755
--- a/ci/test/00_setup_env_native_valgrind.sh
+++ b/ci/test/00_setup_env_native_valgrind.sh
@@ -8,10 +8,10 @@ export LC_ALL=C.UTF-8
export CI_IMAGE_NAME_TAG="debian:bookworm"
export CONTAINER_NAME=ci_native_valgrind
-export PACKAGES="valgrind clang llvm libclang-rt-dev python3-zmq libevent-dev bsdmainutils libboost-dev libdb5.3++-dev libminiupnpc-dev libnatpmp-dev libzmq3-dev libsqlite3-dev"
+export PACKAGES="valgrind python3-zmq libevent-dev bsdmainutils libboost-dev libdb5.3++-dev libminiupnpc-dev libnatpmp-dev libzmq3-dev libsqlite3-dev"
export USE_VALGRIND=1
export NO_DEPENDS=1
export TEST_RUNNER_EXTRA="--nosandbox --exclude feature_init,rpc_bind,feature_bind_extra" # Excluded for now, see https://github.com/bitcoin/bitcoin/issues/17765#issuecomment-602068547
export GOAL="install"
# Temporarily pin dwarf 4, until using Valgrind 3.20 or later
-export BITCOIN_CONFIG="--enable-zmq --with-incompatible-bdb --with-gui=no CC=clang CXX=clang++ CFLAGS='-gdwarf-4' CXXFLAGS='-gdwarf-4'" # TODO enable GUI
+export BITCOIN_CONFIG="--enable-zmq --with-incompatible-bdb --with-gui=no " # TODO enable GUIThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll still open a PR for discussion, as I don't think our reponse to a tool being broken/having known issues, should be, change compiler, as opposed to just using a non-broken version of the tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building from source.
:(
So the only alternative to building from source would be to use gcc, see #27444 (comment), or is that going to run into #27741 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If building from source works fine, but the debian package fails, then maybe a bug should be reported to debian?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what I don't understand is, why this only happens in the unit tests and bench tests, but none of the other binaries (fuzz tests, or bitcoind).
To reproduce without CI on debian:experimental:
export DEBIAN_FRONTEND=noninteractive && apt update && apt install curl wget htop git vim ccache -y && git clone https://github.com/bitcoin/bitcoin.git --depth=1 ./bitcoin-core && cd bitcoin-core && apt install -t experimental build-essential libtool autotools-dev automake pkg-config bsdmainutils python3-zmq libevent-dev libboost-dev libsqlite3-dev libdb++-dev clang llvm valgrind -y && ./autogen.sh && ./configure --disable-wallet --disable-fuzz-binary CC=clang CXX=clang++ CFLAGS='-gdwarf-4' CXXFLAGS='-gdwarf-4' && make -j $(nproc) && valgrind ./src/test/test_bitcoin -t sighash_tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tracked in https://reviews.llvm.org/D86993
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also be tested with something like this (on clang++ -O0 arm64):
#include <array>
int main() {
using A = std::array<unsigned char, 33>;
A obj{};
A copy{std::move(obj)};
obj = std::move(obj); // valid, but unspecified state
obj = std::move(copy);
}…nd jobs 2c60826 ci: explicitly install libclang-rt-dev in valgrind jobs (fanquake) Pull request description: This fixes some cases, i.e under --no-install-recommends, where libclang-rt-dev wouldn't be installed, and configuring would then fail. Followup to bitcoin#27444. Top commit has no ACKs. Tree-SHA512: d1ab0050731df47c21f6ac4f575a728b045b4617beaa1fa8b878050e07e5ddda18fb7d066c7b32bee5ed0ac0e878958a812d4c6b5bd704612755ccb3c172d7e2
This fixes some cases, i.e under --no-install-recommends, where libclang-rt-dev wouldn't be installed, and configuring would then fail. Followup to bitcoin#27444.
I've been using this branch for some time, for working Valgrind CI jobs on aarch64. Benefits include: * Valgrind CI jobs work across x86_64 & aarch64. * Can use latest (hopefully less buggy) Valgrind, rather than whatever the distro happens to package. * No need to "bless" a specific compiler for use with Valgrind, (current discussion includes switching from Clang to GCC). * Valgrind from source seems to run significantly faster compared to running the system package. i.e, when fuzzing under Valgrind: Master: ```bash asmap_direct with args Done 646 runs in 155 second(s) .... addrman_deserialize with args Done 2944 runs in 2875 second(s) ``` vs running this branch: ```bash asmap_direct with args Done 646 runs in 23 second(s) ... addrman_deserialize with args Done 2944 runs in 413 second(s) ``` This is also being seen in the qa-assets repo: bitcoin-core/qa-assets#136 (comment). For example, the `descriptor_parse` target under Valgrind currently takes: `Done 6304 runs in 12971 second(s)` however [with this branch, it takes](https://cirrus-ci.com/task/4623075795271680?): `Done 6304 runs in 4609 second(s)`. Running the native_valgrind CI (master, aarch64): ```bash test/sighash_tests.cpp(120): Entering test case "sighash_test" ==21957== Source and destination overlap in memcpy(0x871e4b0, 0x871e4b0, 36) ==21957== at 0x488CFA0: __GI_memcpy (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so) ==21957== by 0x8F7F63: CTxIn::operator=(CTxIn const&) (transaction.h:74) ==21957== by 0x93F96B: SignatureHashOld(CScript, CTransaction const&, unsigned int, int) (sighash_tests.cpp:76) ==21957== by 0x93EF1F: sighash_tests::sighash_test::test_method() (sighash_tests.cpp:138) ==21957== by 0x93EB73: sighash_tests::sighash_test_invoker() (sighash_tests.cpp:120) ==21957== by 0x36CF47: boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:117) ==21957== by 0x25B367: boost::function0<void>::operator()() const (function_template.hpp:763) ==21957== by 0x2D6647: boost::detail::forward::operator()() (execution_monitor.ipp:1388) ==21957== by 0x2D627F: boost::detail::function::function_obj_invoker0<boost::detail::forward, int>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:137) ==21957== by 0x2D0393: boost::function0<int>::operator()() const (function_template.hpp:763) ==21957== by 0x234A6B: int boost::detail::do_invoke<boost::shared_ptr<boost::detail::translator_holder_base>, boost::function<int ()> >(boost::shared_ptr<boost::detail::translator_holder_base> const&, boost::function<int ()> const&) (execution_monitor.ipp:301) ==21957== by 0x1F7277: boost::execution_monitor::catch_signals(boost::function<int ()> const&) (execution_monitor.ipp:903) ==21957== { <insert_a_suppression_name_here> Memcheck:Overlap fun:__GI_memcpy fun:_ZN5CTxInaSERKS_ fun:_ZL16SignatureHashOld7CScriptRK12CTransactionji fun:_ZN13sighash_tests12sighash_test11test_methodEv fun:_ZN13sighash_testsL20sighash_test_invokerEv fun:_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE fun:_ZNK5boost9function0IvEclEv fun:_ZN5boost6detail7forwardclEv fun:_ZN5boost6detail8function21function_obj_invoker0INS0_7forwardEiE6invokeERNS1_15function_bufferE fun:_ZNK5boost9function0IiEclEv fun:_ZN5boost6detail9do_invokeINS_10shared_ptrINS0_22translator_holder_baseEEENS_8functionIFivEEEEEiRKT_RKT0_ fun:_ZN5boost17execution_monitor13catch_signalsERKNS_8functionIFivEEE } ``` vs running this branch: ```bash real 118m55.057s ``` Disadvantages includes: * Becoming slightly more of a package manager in the CI. Related to the discussion in bitcoin#27444.
Switch to using Debian Bookworm and valgrind 3.19 in the Valgrind jobs. Also update the suppressions file.
This originally contained a changed to build valgrind 3.20 from source (for improved aarch64 support), but I'll split that into it's own change.