Issues with multithreaded code and CPU dispatching.

Suppose we are calling `base64_encode` or `base64_decode` in a loop (for different inputs) and doing it from multiple threads (for different data).

1. If we pass non-zero `flags` to these routines, it will write to a single global variable repeatedly in `codec_choose_forced` function and it will lead to "false sharing" and poor scalability.

2. There is no method to pre-initialize the choice of codec. (Actually, there is: we can simply call one of the encode/decode routines in advance with empty input, but it looks silly). If we don't do that and if we run our code with thread-sanitizer, it will argue about data race on codec function pointers. In fact, it is safe, because it is a single pointer - single machine word that is (supposedly) placed in aligned memory location. But we have to annotate it as _Atomic and store/load with memory_order_relaxed. Look at the similar issue here: https://github.com/lemire/simdjson/pull/256

3. Suppose we use these routines in a loop for short inputs. They have a branch to check if encoders/decoders were initialized. We want to move these branches out of the loop: check for CPU and call specialized implementation directly. But architecture specific methods are not exported and we cannot do that. We also have to pay for two non-inlined function calls.

All these issues was found while integrating this library to ClickHouse: https://github.com/ClickHouse/ClickHouse/issues/8397

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with multithreaded code and CPU dispatching. #65

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issues with multithreaded code and CPU dispatching. #65

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions