-
Notifications
You must be signed in to change notification settings - Fork 176
Issues with multithreaded code and CPU dispatching. #65
Description
Suppose we are calling base64_encode or base64_decode in a loop (for different inputs) and doing it from multiple threads (for different data).
-
If we pass non-zero
flagsto these routines, it will write to a single global variable repeatedly incodec_choose_forcedfunction and it will lead to "false sharing" and poor scalability. -
There is no method to pre-initialize the choice of codec. (Actually, there is: we can simply call one of the encode/decode routines in advance with empty input, but it looks silly). If we don't do that and if we run our code with thread-sanitizer, it will argue about data race on codec function pointers. In fact, it is safe, because it is a single pointer - single machine word that is (supposedly) placed in aligned memory location. But we have to annotate it as _Atomic and store/load with memory_order_relaxed. Look at the similar issue here: Make dynamic dispatch free of TSan warnings simdjson/simdjson#256
-
Suppose we use these routines in a loop for short inputs. They have a branch to check if encoders/decoders were initialized. We want to move these branches out of the loop: check for CPU and call specialized implementation directly. But architecture specific methods are not exported and we cannot do that. We also have to pay for two non-inlined function calls.
All these issues was found while integrating this library to ClickHouse: ClickHouse/ClickHouse#8397