-
Notifications
You must be signed in to change notification settings - Fork 269
Add a fallback mechanism for interposing syscalls in ThreadPreload #991
Description
Background/Motivation
Right now ThreadPreload can miss syscalls when the plugin calls a libc interface that we haven't reimplemented in the shim. This is because inside glibc, syscall wrapper functions are called via private symbols. Calls to such symbols are compiled as direct calls that don't go through the PLT. As a result, they can't be interposed via LD_PRELOAD. For more about this issue, see https://www.jimnewsome.net/posts/interposing-internal-libc-calls/.
Our plan has been to reimplement all of the libc interfaces that make syscalls that we want to be able to interpose. For some interfaces, this is fairly trivial (e.g. sleep(3), which basically just scales its argument and calls usleep(2)). For others though, it's more complex. fwrite(3) doesn't just call write(2) - it also does buffering in user-space. pthread_create(3) doesn't just call clone - it does a decent amount of work in user-space to set things up properly, including setting up thread-local-storage.
It'd be preferable if we didn't have to write and maintain all of this functionality that would be mostly duplicative of glibc.
In the short term we're doing initial development and benchmarking using a patched libc that makes its syscalls by calling the syscall(2) function via the PLT; this allows us to interpose everything with our own syscall function in the LD_PRELOADed shim. However, we don't want to write and maintain such glibc patches for every supported platform.
Alternatives under consideration
ptrace-attach in ThreadPreload as well. On a syscall-stop, service the syscall as ThreadPtrace does now. In this model, parts of the libc interface that we do reimplement would go through the ThreadPreload fast path, and parts that we don't would go through the slower ptrace-path, but would at least be reliably interposed.ptrace-attach in ThreadPreload as well. On a syscall-stop, hot-patch the executable in-memory to call the shim's syscall function instead of doing a direct syscall. This is more complex, but potentially means good performance without reimplementing any of the libc interfaces at all.- Patch syscall instructions at load-time, using a tool such as https://github.com/srg-imperial/SaBRe.
We're currently leaning towards first implementing 1. (reimplementing hot paths of the libc interface, and using ptrace to handle other paths) since it's relatively simple, and we believe it shouldn't be too difficult to reimplement the "hot path interfaces" for tor.
Rejected alternatives
- Dynamically rewrite the entire binary with a tool like pin, valgrind, or dynamoRIO. These would introduce substantial runtime overhead and possibly unacceptably large memory overhead (particularly if every page of every program is rewritten, and those pages don't end up getting shared across multiple instances of the same process). These are also fairly complex dependencies to bring in.
- Tools such as https://dyninst.org/dyninst enable more-selective binary rewriting. This could address the performance and memory usage concerns vs rewriting the whole binary, but is still a complex dependency to bring in.