Skip to content

sched_getaffinity with native tid in std::rt::lang_start_internal, pthread_getattr_np #3626

@sporksmith

Description

@sporksmith

In rust programs, examining the strace log shows calls to sched_getaffinity with the native thread ID.

Executing a rust program natively with strace -k to get a full stack trace, we see something like:

sched_getaffinity(3305701, 32, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]) = 8
 > /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_getaffinity_np+0x20) [0x95cf0]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_getattr_np+0x134) [0x95f14]
 > /home/jnewsome/projects/shadow/build/src/target/debug/test_determinism(std::rt::lang_start_internal+0x252) [0x26772]
 > /home/jnewsome/projects/shadow/build/src/target/debug/test_determinism(main+0x34) [0x9d14]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_init_first+0x90) [0x29d90]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x29e40]
 > /home/jnewsome/projects/shadow/build/src/target/debug/test_determinism(_start+0x25) [0x8b65]

Earlier in the native strace we can see a call to set_tid_address, which returns the tid:

set_tid_address(0x70451abf0a50)         = 3305701
 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2(_dl_deallocate_tls+0x6ce) [0x1513e]
 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2(_dl_catch_error+0x3a38) [0x20e08]
 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2(_dl_catch_error+0x727c) [0x2464c]
 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2(_dl_catch_error+0x246c) [0x1f83c]
 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2(_dl_catch_error+0x41c8) [0x21598]
 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2(_dl_catch_error+0x2ec8) [0x20298]

That call doesn't appear at all in the shadow strace log, so I think it happens before shadow gets control.

Looking at the glibc source, I don't see the call to set_tid_address in _dl_deallocate_tls, and I'm not sure why thread local storage would be getting deallocated during early initialization.

Searching glibc for set_tid_address though, it does look like it gets called during tls initialization in __tls_init_tp, and stashed away in the thread control block.

My best guess is that the pthread internals later see this stashed native thread ID when looking at data about the main thread.

Possible fixes:

  • Get control earlier, before set_tid_address is called. This would probably involve a major architectural change, along the lines of what we'd need do to work without LD_PRELOAD (Running 100% statically linked executables #1839).
  • Get hold of the thread control block and overwrite the stashed tid. This is potentially less work, but also potentially fragile.
    I don't see any earlier syscalls where it might have gotten hold of the native thread ID under shadow. There's a set_tid_address, which returns the tid, but shadow correctly intercepts that and returns the emulated thread id.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: BugError or flaw producing unexpected results

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions