-
Notifications
You must be signed in to change notification settings - Fork 269
sched_getaffinity with native tid in std::rt::lang_start_internal, pthread_getattr_np #3626
Description
In rust programs, examining the strace log shows calls to sched_getaffinity with the native thread ID.
Executing a rust program natively with strace -k to get a full stack trace, we see something like:
sched_getaffinity(3305701, 32, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]) = 8
> /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_getaffinity_np+0x20) [0x95cf0]
> /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_getattr_np+0x134) [0x95f14]
> /home/jnewsome/projects/shadow/build/src/target/debug/test_determinism(std::rt::lang_start_internal+0x252) [0x26772]
> /home/jnewsome/projects/shadow/build/src/target/debug/test_determinism(main+0x34) [0x9d14]
> /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_init_first+0x90) [0x29d90]
> /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x29e40]
> /home/jnewsome/projects/shadow/build/src/target/debug/test_determinism(_start+0x25) [0x8b65]
Earlier in the native strace we can see a call to set_tid_address, which returns the tid:
set_tid_address(0x70451abf0a50) = 3305701
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2(_dl_deallocate_tls+0x6ce) [0x1513e]
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2(_dl_catch_error+0x3a38) [0x20e08]
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2(_dl_catch_error+0x727c) [0x2464c]
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2(_dl_catch_error+0x246c) [0x1f83c]
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2(_dl_catch_error+0x41c8) [0x21598]
> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2(_dl_catch_error+0x2ec8) [0x20298]
That call doesn't appear at all in the shadow strace log, so I think it happens before shadow gets control.
Looking at the glibc source, I don't see the call to set_tid_address in _dl_deallocate_tls, and I'm not sure why thread local storage would be getting deallocated during early initialization.
Searching glibc for set_tid_address though, it does look like it gets called during tls initialization in __tls_init_tp, and stashed away in the thread control block.
My best guess is that the pthread internals later see this stashed native thread ID when looking at data about the main thread.
Possible fixes:
- Get control earlier, before
set_tid_addressis called. This would probably involve a major architectural change, along the lines of what we'd need do to work without LD_PRELOAD (Running 100% statically linked executables #1839). - Get hold of the thread control block and overwrite the stashed tid. This is potentially less work, but also potentially fragile.
I don't see any earlier syscalls where it might have gotten hold of the native thread ID under shadow. There's aset_tid_address, which returns the tid, but shadow correctly intercepts that and returns the emulated thread id.