Conversation
|
@vszakats I am not sure how to interpret the thread-sanitizer errors here: Any advice? |
a9cfff3 to
091c5f0
Compare
Hm, interesting and somewhat puzzling. It may not be a new problem, but possibly the first time the sanitizier Added in a2bcec0 #14751 Maybe building the whole stack in static mode could make difference, |
|
Thanks for having a look. Running into a wall trying to find the problem. Will try tomorrow to reproduce locally on my debian box. |
|
Maybe just rebuilding openssl-tsan with a fresh toolchain may help? Another guess is an Ubuntu package necessary for this, which isn't |
9e527be to
8ea57ce
Compare
|
@vszakats I reproduced the errors with The runs work when nothing fails, but in case something is detected the analysis runs into this problem. |
I wonder if the Another option may be cmake, which doesn't alter Do you have a diff to make it detect something, to verify? |
Or not build the shared version, which then avoids the libtool wrapper |
the |
|
Agree, not fighting the build tool is better. cmake so far works (and 30s quicker). trying to trip tsan with: It looks fine now, what do you think?: |
|
nice work, @vszakats ! |
Replace autotools with cmake to avoid libtool wrappers that are changing `LD_LIBRARY_PATH` in a way incompatible with the thread sanitizer. To fix the output when the sanitizier is finding something: ``` ==51718==WARNING: Can't write to symbolizer at fd 7 /usr/bin/llvm-symbolizer-18: /home/runner/work/curl/curl/bld/lib/.libs/libcurl.so.4: no version information available (required by /usr/bin/llvm-symbolizer-18) /usr/bin/llvm-symbolizer-18: symbol lookup error: /home/runner/openssl/lib/libcrypto.so.3: undefined symbol: __tsan_func_entry ``` Ref: https://github.com/curl/curl/actions/runs/16911402500/job/47913783729#step:39:4466 After: ``` 13:50:04.117885 == Info:ThreadSanitizer: thread T1 finished with ignores enabled, created at: closing connection #0 #0 pthread_create <null> (libtests+0x6bc0f) (BuildId: 4fe889446291259934205ac03931c397aa0210d3) #1 Curl_thread_create /home/runner/work/curl/curl/lib/curl_threads.c:73:6 (libcurl.so.4+0x55a76) (BuildId: cb0f14ba2ad68c9cab0c980d9a5d7a53cc0782da) #2 async_thrdd_init /home/runner/work/curl/curl/lib/asyn-thrdd.c:500:26 (libcurl.so.4+0x1c153) (BuildId: cb0f14ba2ad68c9cab0c980d9a5d7a53cc0782da) [...] ``` Ref: https://github.com/curl/curl/actions/runs/16939193922/job/48003405272?pr=18274#step:39:4018 Also: - disable memory tracker which turned out to be incompatible with the thread sanitizer and detaching threads. Ref: #18263 and #curl IRC. - the job is ~30 seconds faster after this patch. Reported-by: Stefan Eissing Bug: #18263 (comment) Follow-up to a2bcec0 #14751 Closes #18274
fe14c3f to
5e1beae
Compare
|
b24fbba to
e70aad0
Compare
4334033 to
9decb0e
Compare
Changed strategy to start up and terminate resolver thread.
When starting up:
Start the thread with mutex acquired, wait for signal from thread
that it started and has incremented the ref counter. Thread set
pthread_cancel() to disabled before that and only enables
cancelling during resolving itself. This assure that the ref
counter is correct and the unlinking of the resolve context
always happens.
When shutting down resolving:
If ref counting shows thread has finished, join it, free everything.
If thread has not finished, try pthread_cancel() (non Windows), but
keep the thread handle around.
When destroying resolving:
Shutdown first, then, if the thread is still there and 'quick_exit' is
not set, join it and free everything. This might occur a delay if
getaddrinfo() hangs and cannot be interrupted by pthread_cancel().
Destroying resolving happens when another resolve is started on an
easy handle or when the easy handle is closed.
Add test795 to check that connect timeout triggers correctly
when resolving is delayed. Add debug env var `CURL_DNS_DELAY_MS`
to simulate delays in resolving.
Fix test1557 to set `quick_exit` and use `xxx.invalid` as domain
instead of `nothing` that was leading to hangers in CI.
8b2d9ac to
3c25322
Compare
|
Nice work! |
|
Thanks, it was a bit of a struggle.😌 |
|
CI seems to be bumping into macOS gcc-12 hangs after merge: Also in the AWS-LC job, and the !ssl one seen here earlier. Ref: #18330 (comment) |
|
@vszakats I find it difficult to read what "gcc-12" really installs via homebrew. Also, does it come with its own libc? |
|
checking notes... (_build.sh from curl-for-win) and I think after installing it with |
A wrong type here has seen to manifest in CI failures with gcc-12 macOS. Ref: #18348 (comment) Ref: https://github.com/curl/curl/actions/runs/17153761944/job/48665734013?pr=18349 Follow-up to b63cce7 #18339 Follow-up to 88fc6c4 #18263 Closes #18355
mingw32ce, CM 4.4.0-arm schannel: ``` lib/asyn-thrdd.c: In function 'gethostbyname_thread': lib/asyn-thrdd.c:349: error: too many arguments to function 'async_thrd_cleanup' ``` Ref: https://github.com/curl/curl/actions/runs/17158865566/job/48682687295?pr=18039#step:9:21 Follow-up to 88fc6c4 #18263 Closes #18371
Changed strategy to terminate resolver thread.
When starting up:
Start the thread with mutex acquired, wait for signal from thread that it started and has incremented the ref counter. Thread set pthread_cancel() to disabled before that and only enables cancelling during resolving itself. This assure that the ref counter is correct and the unlinking of the resolve context always happens.
When shutting down resolving:
If ref counting shows thread has finished, join it, free everything. If thread has not finished, try pthread_cancel() (non Windows), but keep the thread handle around.
When destroying resolving:
Shutdown first, then, if the thread is still there and 'quick_exit' is not set, join it and free everything. This might occur a delay if getaddrinfo() hangs and cannot be interrupted by pthread_cancel().
Destroying resolving happens when another resolve is started on an easy handle or when the easy handle is closed.
Add test795 to check that connect timeout triggers correctly when resolving is delayed. Add debug env var
CURL_DNS_DELAY_MSto simulate delays in resolving.