Skip to content

fix threaded resolver shutdown#18263

Closed
icing wants to merge 1 commit intocurl:masterfrom
icing:asyn-thrdd-no-join
Closed

fix threaded resolver shutdown#18263
icing wants to merge 1 commit intocurl:masterfrom
icing:asyn-thrdd-no-join

Conversation

@icing
Copy link
Contributor

@icing icing commented Aug 12, 2025

Changed strategy to terminate resolver thread.

When starting up:

Start the thread with mutex acquired, wait for signal from thread that it started and has incremented the ref counter. Thread set pthread_cancel() to disabled before that and only enables cancelling during resolving itself. This assure that the ref counter is correct and the unlinking of the resolve context always happens.

When shutting down resolving:

If ref counting shows thread has finished, join it, free everything. If thread has not finished, try pthread_cancel() (non Windows), but keep the thread handle around.

When destroying resolving:

Shutdown first, then, if the thread is still there and 'quick_exit' is not set, join it and free everything. This might occur a delay if getaddrinfo() hangs and cannot be interrupted by pthread_cancel().

Destroying resolving happens when another resolve is started on an easy handle or when the easy handle is closed.

Add test795 to check that connect timeout triggers correctly when resolving is delayed. Add debug env var CURL_DNS_DELAY_MS to simulate delays in resolving.

@github-actions github-actions bot added the tests label Aug 12, 2025
@icing
Copy link
Contributor Author

icing commented Aug 12, 2025

@vszakats I am not sure how to interpret the thread-sanitizer errors here:

 ==51718==WARNING: Can't write to symbolizer at fd 7
 /usr/bin/llvm-symbolizer-18: /home/runner/work/curl/curl/bld/lib/.libs/libcurl.so.4: no version information available (required by /usr/bin/llvm-symbolizer-18)
 /usr/bin/llvm-symbolizer-18: symbol lookup error: /home/runner/openssl/lib/libcrypto.so.3: undefined symbol: __tsan_func_entry

Any advice?

@vszakats
Copy link
Member

@vszakats I am not sure how to interpret the thread-sanitizer errors here:

 ==51718==WARNING: Can't write to symbolizer at fd 7
 /usr/bin/llvm-symbolizer-18: /home/runner/work/curl/curl/bld/lib/.libs/libcurl.so.4: no version information available (required by /usr/bin/llvm-symbolizer-18)
 /usr/bin/llvm-symbolizer-18: symbol lookup error: /home/runner/openssl/lib/libcrypto.so.3: undefined symbol: __tsan_func_entry

Any advice?

Hm, interesting and somewhat puzzling.

It may not be a new problem, but possibly the first time the sanitizier
is tripping and running into this?

Added in a2bcec0 #14751
Can't see fundamental changes since then, and no obvious suspects
for regressions.

Maybe building the whole stack in static mode could make difference,
but it's just a random shot in the dark.

@icing
Copy link
Contributor Author

icing commented Aug 12, 2025

Thanks for having a look. Running into a wall trying to find the problem. Will try tomorrow to reproduce locally on my debian box.

@vszakats
Copy link
Member

Maybe just rebuilding openssl-tsan with a fresh toolchain may help?
This one is easy to try.

Another guess is an Ubuntu package necessary for this, which isn't
explicitly installed and, got lost when disabling recommended/suggested
packages in b327a53, which
later became the default when swiching to ubuntu-2024. Also easy
to test by switching back to ubuntu-2022.

@icing icing force-pushed the asyn-thrdd-no-join branch from 9e527be to 8ea57ce Compare August 13, 2025 09:31
@curl curl deleted a comment from testclutch Aug 13, 2025
@github-actions github-actions bot added the CI Continuous Integration label Aug 13, 2025
@icing
Copy link
Contributor Author

icing commented Aug 13, 2025

@vszakats I reproduced the errors with /usr/bin/llvm-symbolizer-18 on my debian box. The problem is that libtests (and src/curl) change LD_LIBRARY_PATH and the llvm symbolizer gets confused by that. Not sure what to do about that.

The runs work when nothing fails, but in case something is detected the analysis runs into this problem.

@icing icing requested a review from bagder August 13, 2025 12:52
@vszakats
Copy link
Member

vszakats commented Aug 13, 2025

@vszakats I reproduced the errors with /usr/bin/llvm-symbolizer-18 on my debian box. The problem is that libtests (and src/curl) change LD_LIBRARY_PATH and the llvm symbolizer gets confused by that. Not sure what to do about that.

The runs work when nothing fails, but in case something is detected the analysis runs into this problem.

I wonder if the LD_LIBRARY_PATH change is done by the libtool wrapper
created by autotools? With some shell hackery the real binaries under
.libs could be moved to replace the wrappers, and LD_LIBRARY_PATH
tweaked by hand (if necessary) to make both libcurl and the symbolizer
parts found.

Another option may be cmake, which doesn't alter LD_LIBRARY_PATH.
I'm testing this in #18274.

Do you have a diff to make it detect something, to verify?

@bagder
Copy link
Member

bagder commented Aug 13, 2025

Another option may be cmake, which doesn't alter LD_LIBRARY_PATH.

Or not build the shared version, which then avoids the libtool wrapper

@icing
Copy link
Contributor Author

icing commented Aug 13, 2025

@vszakats I reproduced the errors with /usr/bin/llvm-symbolizer-18 on my debian box. The problem is that libtests (and src/curl) change LD_LIBRARY_PATH and the llvm symbolizer gets confused by that. Not sure what to do about that.
The runs work when nothing fails, but in case something is detected the analysis runs into this problem.

I wonder if the LD_LIBRARY_PATH change is done by the libtool wrapper created by autotools? With some shell hackery the real binaries under .libs could be moved to replace the wrappers, and LD_LIBRARY_PATH tweaked by hand (if necessary) to make both libcurl and the symbolizer parts found.

Another option may be cmake, which doesn't alter LD_LIBRARY_PATH. I'm testing this in #18274.

Do you have a diff to make it detect something, to verify?

the src/.libs/curl and tests/libtest/.libs/libtests are linked against $PREFIX/lib. Calling them directly would result in the wrong libcurl to be linked.

@vszakats
Copy link
Member

Agree, not fighting the build tool is better.

cmake so far works (and 30s quicker). trying to trip tsan with:
8ea57ce

It looks fine now, what do you think?:
https://github.com/curl/curl/actions/runs/16939193922/job/48003405272?pr=18274#step:39:3012

=== Start of file stderr1553
 URL: imap://non-existing-host.haxx.se:46299/1553
 13:49:55.939483 == Info: !!! WARNING !!
 13:49:55.939635 == Info: This is a debug build of libcurl, do not use in production.
 13:49:55.945316 == Info: closing connection #0
 ThreadSanitizer: thread T1  finished with ignores enabled, created at:
     #0 pthread_create <null> (libtests+0x6bc0f) (BuildId: 4fe889446291259934205ac03931c397aa0210d3)
     #1 Curl_thread_create /home/runner/work/curl/curl/lib/curl_threads.c:73:6 (libcurl.so.4+0x55a76) (BuildId: cb0f14ba2ad68c9cab0c980d9a5d7a53cc0782da)
     #2 async_thrdd_init /home/runner/work/curl/curl/lib/asyn-thrdd.c:500:26 (libcurl.so.4+0x1c153) (BuildId: cb0f14ba2ad68c9cab0c980d9a5d7a53cc0782da)
     #3 Curl_async_getaddrinfo /home/runner/work/curl/curl/lib/asyn-thrdd.c:793:6 (libcurl.so.4+0x1b9b3) (BuildId: cb0f14ba2ad68c9cab0c980d9a5d7a53cc0782da)
     #4 Curl_resolv /home/runner/work/curl/curl/lib/hostip.c:932:12 (libcurl.so.4+0x7df0c) (BuildId: cb0f14ba2ad68c9cab0c980d9a5d7a53cc0782da)
[...]
   One of the following ignores was not ended (in order of probability)
   Ignore was enabled at:
     #0 getaddrinfo <null> (libtests+0x7440b) (BuildId: 4fe889446291259934205ac03931c397aa0210d3)
     #1 Curl_getaddrinfo_ex /home/runner/work/curl/curl/lib/curl_addrinfo.c:122:11 (libcurl.so.4+0x4c7dc) (BuildId: cb0f14ba2ad68c9cab0c980d9a5d7a53cc0782da)
     #2 getaddrinfo_thread /home/runner/work/curl/curl/lib/asyn-thrdd.c:341:8 (libcurl.so.4+0x115242) (BuildId: cb0f14ba2ad68c9cab0c980d9a5d7a53cc0782da)
     #3 curl_thread_create_thunk /home/runner/work/curl/curl/lib/curl_threads.c:57:3 (libcurl.so.4+0x55b6c) (BuildId: cb0f14ba2ad68c9cab0c980d9a5d7a53cc0782da)
=== End of file stderr1553
test 1557...[Remove easy handle in pending connections doesn't leave dangling entry]

@icing
Copy link
Contributor Author

icing commented Aug 13, 2025

nice work, @vszakats !

vszakats added a commit that referenced this pull request Aug 13, 2025
Replace autotools with cmake to avoid libtool wrappers that are changing
`LD_LIBRARY_PATH` in a way incompatible with the thread sanitizer.

To fix the output when the sanitizier is finding something:
```
==51718==WARNING: Can't write to symbolizer at fd 7
 /usr/bin/llvm-symbolizer-18: /home/runner/work/curl/curl/bld/lib/.libs/libcurl.so.4: no version information available (required by /usr/bin/llvm-symbolizer-18)
 /usr/bin/llvm-symbolizer-18: symbol lookup error: /home/runner/openssl/lib/libcrypto.so.3: undefined symbol: __tsan_func_entry
```
Ref: https://github.com/curl/curl/actions/runs/16911402500/job/47913783729#step:39:4466

After:
```
 13:50:04.117885 == Info:ThreadSanitizer: thread T1  finished with ignores enabled, created at:
  closing connection #0
     #0 pthread_create <null> (libtests+0x6bc0f) (BuildId: 4fe889446291259934205ac03931c397aa0210d3)
     #1 Curl_thread_create /home/runner/work/curl/curl/lib/curl_threads.c:73:6 (libcurl.so.4+0x55a76) (BuildId: cb0f14ba2ad68c9cab0c980d9a5d7a53cc0782da)
     #2 async_thrdd_init /home/runner/work/curl/curl/lib/asyn-thrdd.c:500:26 (libcurl.so.4+0x1c153) (BuildId: cb0f14ba2ad68c9cab0c980d9a5d7a53cc0782da)
[...]
```
Ref: https://github.com/curl/curl/actions/runs/16939193922/job/48003405272?pr=18274#step:39:4018

Also:
- disable memory tracker which turned out to be incompatible with
  the thread sanitizer and detaching threads.
  Ref: #18263 and #curl IRC.
- the job is ~30 seconds faster after this patch.

Reported-by: Stefan Eissing
Bug: #18263 (comment)
Follow-up to a2bcec0 #14751
Closes #18274
@icing icing force-pushed the asyn-thrdd-no-join branch from fe14c3f to 5e1beae Compare August 14, 2025 08:21
@bagder
Copy link
Member

bagder commented Aug 18, 2025

This branch has conflicts that must be resolved

@icing icing force-pushed the asyn-thrdd-no-join branch 3 times, most recently from b24fbba to e70aad0 Compare August 19, 2025 13:26
@icing icing marked this pull request as draft August 19, 2025 15:16
@icing icing force-pushed the asyn-thrdd-no-join branch from 4334033 to 9decb0e Compare August 20, 2025 09:07
        Changed strategy to start up and terminate resolver thread.

        When starting up:

        Start the thread with mutex acquired, wait for signal from thread
        that it started and has incremented the ref counter. Thread set
        pthread_cancel() to disabled before that and only enables
        cancelling during resolving itself. This assure that the ref
        counter is correct and the unlinking of the resolve context
        always happens.

        When shutting down resolving:

        If ref counting shows thread has finished, join it, free everything.
        If thread has not finished, try pthread_cancel() (non Windows), but
        keep the thread handle around.

        When destroying resolving:

        Shutdown first, then, if the thread is still there and 'quick_exit' is
        not set, join it and free everything. This might occur a delay if
        getaddrinfo() hangs and cannot be interrupted by pthread_cancel().

        Destroying resolving happens when another resolve is started on an
        easy handle or when the easy handle is closed.

        Add test795 to check that connect timeout triggers correctly
        when resolving is delayed. Add debug env var `CURL_DNS_DELAY_MS`
        to simulate delays in resolving.

        Fix test1557 to set `quick_exit` and use `xxx.invalid` as domain
        instead of `nothing` that was leading to hangers in CI.
@icing icing force-pushed the asyn-thrdd-no-join branch from 8b2d9ac to 3c25322 Compare August 20, 2025 13:50
@icing icing marked this pull request as ready for review August 20, 2025 14:39
@icing icing requested a review from bagder August 20, 2025 14:39
@bagder bagder closed this in 88fc6c4 Aug 21, 2025
@bagder
Copy link
Member

bagder commented Aug 21, 2025

Nice work!

@icing
Copy link
Contributor Author

icing commented Aug 21, 2025

Thanks, it was a bit of a struggle.😌

@vszakats
Copy link
Member

vszakats commented Aug 21, 2025

@icing
Copy link
Contributor Author

icing commented Aug 21, 2025

@vszakats I find it difficult to read what "gcc-12" really installs via homebrew. Also, does it come with its own libc?

@vszakats
Copy link
Member

checking notes... (_build.sh from curl-for-win) and I think /usr/lib/libSystem.B.dylib should be used there as well. Plus there is -lgcc with the intrinsics.

after installing it with brew install gcc@12, and building a short test:

$ gcc-12 test.c -v
Using built-in specs.
COLLECT_GCC=gcc-12
COLLECT_LTO_WRAPPER=/opt/homebrew/Cellar/gcc@12/12.4.0/bin/../libexec/gcc/aarch64-apple-darwin23/12/lto-wrapper
Target: aarch64-apple-darwin23
Configured with: ../configure --prefix=/opt/homebrew/opt/gcc@12 --libdir=/opt/homebrew/opt/gcc@12/lib/gcc/12 --disable-nls --enable-checking=release --with-gcc-major-version-only --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-12 --with-gmp=/opt/homebrew/opt/gmp --with-mpfr=/opt/homebrew/opt/mpfr --with-mpc=/opt/homebrew/opt/libmpc --with-isl=/opt/homebrew/opt/isl --with-zstd=/opt/homebrew/opt/zstd --with-pkgversion='Homebrew GCC 12.4.0' --with-bugurl=https://github.com/Homebrew/homebrew-core/issues --with-system-zlib --build=aarch64-apple-darwin23 --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk --with-ld=/Library/Developer/CommandLineTools/usr/bin/ld-classic
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.4.0 (Homebrew GCC 12.4.0) 
COLLECT_GCC_OPTIONS='-v' '-mmacosx-version-min=15.0.0' '-asm_macosx_version_min=15.0' '-nodefaultexport' '-mcpu=apple-m1' '-mlittle-endian' '-mabi=lp64' '-dumpdir' 'a-'
 /opt/homebrew/Cellar/gcc@12/12.4.0/bin/../libexec/gcc/aarch64-apple-darwin23/12/cc1 -quiet -v -iprefix /opt/homebrew/Cellar/gcc@12/12.4.0/bin/../lib/gcc/12/gcc/aarch64-apple-darwin23/12/ -D__DYNAMIC__ t.c -fPIC -quiet -dumpdir a- -dumpbase t.c -dumpbase-ext .c -mmacosx-version-min=15.0.0 -mcpu=apple-m1 -mlittle-endian -mabi=lp64 -version -o /var/folders/zm/2t3lpfq88xj1b9006s7hcdyh0000gn/T//ccQmwTCz.s
GNU C17 (Homebrew GCC 12.4.0) version 12.4.0 (aarch64-apple-darwin23)
	compiled by GNU C version 12.4.0, GMP version 6.3.0, MPFR version 4.2.1, MPC version 1.3.1, isl version isl-0.27-GMP

warning: MPFR header version 4.2.1 differs from library version 4.2.2.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/opt/homebrew/Cellar/gcc@12/12.4.0/bin/../lib/gcc/12/gcc/aarch64-apple-darwin23/12/../../../../../../aarch64-apple-darwin23/include"
ignoring duplicate directory "/opt/homebrew/Cellar/gcc@12/12.4.0/bin/../lib/gcc/12/gcc/../../../../lib/gcc/12/gcc/aarch64-apple-darwin23/12/include"
ignoring nonexistent directory "/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk/usr/local/include"
ignoring duplicate directory "/opt/homebrew/Cellar/gcc@12/12.4.0/bin/../lib/gcc/12/gcc/../../../../lib/gcc/12/gcc/aarch64-apple-darwin23/12/include-fixed"
ignoring nonexistent directory "/opt/homebrew/Cellar/gcc@12/12.4.0/bin/../lib/gcc/12/gcc/../../../../lib/gcc/12/gcc/aarch64-apple-darwin23/12/../../../../../../aarch64-apple-darwin23/include"
ignoring nonexistent directory "/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk/Library/Frameworks"
#include "..." search starts here:
#include <...> search starts here:
 /opt/homebrew/Cellar/gcc@12/12.4.0/bin/../lib/gcc/12/gcc/aarch64-apple-darwin23/12/include
 /opt/homebrew/Cellar/gcc@12/12.4.0/bin/../lib/gcc/12/gcc/aarch64-apple-darwin23/12/include-fixed
 /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk/usr/include
 /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk/System/Library/Frameworks
End of search list.
Compiler executable checksum: 6a6f7894960128a07e0fd72128946c71
COLLECT_GCC_OPTIONS='-v' '-mmacosx-version-min=15.0.0'  '-nodefaultexport' '-mcpu=apple-m1' '-mlittle-endian' '-mabi=lp64' '-dumpdir' 'a-'
 as -arch arm64 -v -mmacosx-version-min=15.0 -o /var/folders/zm/2t3lpfq88xj1b9006s7hcdyh0000gn/T//ccvLS2cl.o /var/folders/zm/2t3lpfq88xj1b9006s7hcdyh0000gn/T//ccQmwTCz.s
Apple clang version 17.0.0 (clang-1700.0.13.5)
Target: arm64-apple-darwin24.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
 "/Library/Developer/CommandLineTools/usr/bin/clang" -cc1as -triple arm64-apple-macosx15.0.0 -target-sdk-version=15.5 -filetype obj -main-file-name ccQmwTCz.s -target-cpu apple-m1 -target-feature +zcm -target-feature +zcz -target-feature +v8.5a -target-feature +aes -target-feature +altnzcv -target-feature +ccdp -target-feature +complxnum -target-feature +crc -target-feature +dotprod -target-feature +fp-armv8 -target-feature +fp16fml -target-feature +fptoint -target-feature +fullfp16 -target-feature +jsconv -target-feature +lse -target-feature +neon -target-feature +pauth -target-feature +perfmon -target-feature +predres -target-feature +ras -target-feature +rcpc -target-feature +rdm -target-feature +sb -target-feature +sha2 -target-feature +sha3 -target-feature +specrestrict -target-feature +ssbs -fdebug-compilation-dir=[...] -dwarf-debug-producer "Apple clang version 17.0.0 (clang-1700.0.13.5)" -dwarf-version=5 -mrelocation-model pic -o /var/folders/zm/2t3lpfq88xj1b9006s7hcdyh0000gn/T//ccvLS2cl.o /var/folders/zm/2t3lpfq88xj1b9006s7hcdyh0000gn/T//ccQmwTCz.s
COMPILER_PATH=/opt/homebrew/Cellar/gcc@12/12.4.0/bin/../libexec/gcc/aarch64-apple-darwin23/12/:/opt/homebrew/Cellar/gcc@12/12.4.0/bin/../libexec/gcc/
LIBRARY_PATH=/opt/homebrew/Cellar/gcc@12/12.4.0/bin/../lib/gcc/12/gcc/aarch64-apple-darwin23/12/:/opt/homebrew/Cellar/gcc@12/12.4.0/bin/../lib/gcc/12/gcc/:/opt/homebrew/Cellar/gcc@12/12.4.0/bin/../lib/gcc/12/gcc/aarch64-apple-darwin23/12/../../../
COLLECT_GCC_OPTIONS='-v' '-mmacosx-version-min=15.0.0'  '-nodefaultexport' '-mcpu=apple-m1' '-mlittle-endian' '-mabi=lp64' '-dumpdir' 'a.'
 /opt/homebrew/Cellar/gcc@12/12.4.0/bin/../libexec/gcc/aarch64-apple-darwin23/12/collect2 -demangle -syslibroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk/ -dynamic -arch arm64 -platform_version macos 15.0.0 0.0 -o a.out -L/opt/homebrew/Cellar/gcc@12/12.4.0/bin/../lib/gcc/12/gcc/aarch64-apple-darwin23/12 -L/opt/homebrew/Cellar/gcc@12/12.4.0/bin/../lib/gcc/12/gcc -L/opt/homebrew/Cellar/gcc@12/12.4.0/bin/../lib/gcc/12/gcc/aarch64-apple-darwin23/12/../../.. -lemutls_w -lheapt_w /var/folders/zm/2t3lpfq88xj1b9006s7hcdyh0000gn/T//ccvLS2cl.o -lgcc -lSystem -no_compact_unwind -rpath @loader_path -rpath /opt/homebrew/Cellar/gcc@12/12.4.0/lib/gcc/12/gcc/aarch64-apple-darwin23/12 -rpath /opt/homebrew/Cellar/gcc@12/12.4.0/lib/gcc/12/gcc -rpath /opt/homebrew/Cellar/gcc@12/12.4.0/lib/gcc/12

$ otool -L a.out 
a.out:
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1345.120.2)

vszakats added a commit that referenced this pull request Aug 22, 2025
A wrong type here has seen to manifest in CI failures with gcc-12 macOS.

Ref: #18348 (comment)
Ref: https://github.com/curl/curl/actions/runs/17153761944/job/48665734013?pr=18349

Follow-up to b63cce7 #18339
Follow-up to 88fc6c4 #18263

Closes #18355
vszakats added a commit that referenced this pull request Aug 22, 2025
mingw32ce, CM 4.4.0-arm schannel:
```
lib/asyn-thrdd.c: In function 'gethostbyname_thread':
lib/asyn-thrdd.c:349: error: too many arguments to function 'async_thrd_cleanup'
```
Ref: https://github.com/curl/curl/actions/runs/17158865566/job/48682687295?pr=18039#step:9:21

Follow-up to 88fc6c4 #18263
Closes #18371
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Continuous Integration tests

Development

Successfully merging this pull request may close these issues.

3 participants