keyspace - Unify key and value & use dict no_value=1#13806
Conversation
ShooterIT
left a comment
There was a problem hiding this comment.
I haven’t fully mastered it yet, but it looks interesting. I believe it could improve both memory usage and performance, especially for datasets with expiration times on many keys. Maybe the introduction of this PR may provide us with the opportunity to use other data types to represent the expire key dictionary. Since the kvobject already contains the expiration time, the expire key data structure only needs iteration functionality. For instance, using an rax tree to represent the expire dictionary could save memory, with each element being a combination of [expiretime + kvobject_addr].
@ShooterIT, that's exactly the direction I'm heading! However, the approach you suggested was already evaluated long ago by Salvatore and didn’t prove effective. My goal is to assess the ebuckets data structure for this purpose, as it has demonstrated low memory consumption per key while maintaining decent performance. |
|
Did some basic benchmarking. I run Takeaway
|
This PR is based on: valkey-io/valkey#861 > ### Memory Access Amortization > (Designed and implemented by [dan touitou](https://github.com/touitou-dan)) > > Memory Access Amortization (MAA) is a technique designed to optimize the performance of dynamic data structures by reducing the impact of memory access latency. It is applicable when multiple operations need to be executed concurrently. The principle behind it is that for certain dynamic data structures, executing operations in a batch is more efficient than executing each one separately. > > Rather than executing operations sequentially, this approach interleaves the execution of all operations. This is done in such a way that whenever a memory access is required during an operation, the program prefetches the necessary memory and transitions to another operation. This ensures that when one operation is blocked awaiting memory access, other memory accesses are executed in parallel, thereby reducing the average access latency. > > We applied this method in the development of dictPrefetch, which takes as parameters a vector of keys and dictionaries. It ensures that all memory addresses required to execute dictionary operations for these keys are loaded into the L1-L3 caches when executing commands. Essentially, dictPrefetch is an interleaved execution of dictFind for all the keys. ### Implementation of Redis When the main thread processes clients with ready-to-execute commands (i.e., clients for which the IO thread has parsed the commands), a batch of up to 16 commands is created. Initially, the command's argv, which were allocated by the IO thread, is prefetched to the main thread's L1 cache. Subsequently, all the dict entries and values required for the commands are prefetched from the dictionary before the command execution. #### Memory prefetching for main hash table As shown in the picture, after #13806 , we unify key value and the dict uses no_value optimization, so the memory prefetching has 4 steps: 1. prefetch the bucket of the hash table 2. prefetch the entry associated with the given key's hash 3. prefetch the kv object of the entry 4. prefetch the value data of the kv object we also need to handle the case that the dict entry is the pointer of kv object, just skip step 3. MAA can improves single-threaded memory access efficiency by interleaving the execution of multiple independent operations, allowing memory-level parallelism and better CPU utilization. Its key point is batch-wise interleaved execution. Split a batch of independent operations (such as multiple key lookups) into multiple state machines, and interleave their progress within a single thread to hide the memory access latency of individual requests. The difference between serial execution and interleaved execution: **naive serial execution** ``` key1: step1 → wait → step2 → wait → done key2: step1 → wait → step2 → wait → done ``` **interleaved execution** ``` key1: step1 → step2 → done key2: step1 → step2 → done key3: step1 → step2 → done ↑ While waiting for key1’s memory, progress key2/key3 ``` #### New configuration This PR involves a new configuration `prefetch-batch-max-size`, but we think it is a low level optimization, so we hide this config: When multiple commands are parsed by the I/O threads and ready for execution, we take advantage of knowing the next set of commands and prefetch their required dictionary entries in a batch. This reduces memory access costs. The optimal batch size depends on the specific workflow of the user. The default batch size is 16, which can be modified using the 'prefetch-batch-max-size' config. When the config is set to 0, prefetching is disabled. --------- Co-authored-by: Uri Yagelnik <uriy@amazon.com> Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
…4233) Introduced by #13806 Fixed a crash in the MOVE command when moving hash objects that have both key expiration and field expiration. The issue occurred in the following scenario: 1. A hash has both key expiration and field expiration. 2. During MOVE command, `setExpireByLink()` is called to set the expiration time for the target hash, which may reallocate the kvobj of hash. 3. Since the hash has field expiration, `hashTypeAddToExpires()` is called to update the minimum field expiration time Issue: However, the kvobj pointer wasn't updated with the return value from `setExpireByLink()`, causing `hashTypeAddToExpires()` to use freed memory.
The idea of packing the key (
sds), value (robj) and optionally TTL into a single struct in memory was mentioned a few times in the past by the community in various flavors. This approach improves memory efficiency, reduces pointer dereferences for faster lookups, and simplifies expiration management by keeping all relevant data in one place. This change goes along with setting keyspace's dict to no_value=1, and saving considerable amount of memory.Two more motivations that well aligned with this unification are:
ebucketsdata structure that was introduced as part of Hash Field Expiration feature. Using this data structure requires embedding the ExpireMeta structure within each object.Before this PR, I POC'ed on a variant of open addressing hash-table and was surprised to find that dict with no_value actually could provide a good balance between performance, memory efficiency, and simplicity. This realization prompted the separation of the unification step from the evaluation of a new hash table to avoid introducing too many changes at once and to evaluate its impact independently before considering replacement of existing hash-table. On an earlier commit I extended dict no_value optimization (which saves keeping dictEntry where possible) to be relevant also for objects with even addresses in memory. Combining it with this unification saves a considerable amount of memory for keyspace.
kvobj
This PR adopts Valkey’s packing layout and logic for key, value, and TTL. However, unlike Valkey implementation, which retained a common
robjthroughout the project, this PR distinguishes between the general-purpose, overusedrobj, and the newkvobj, which embeds both the key and value and used by the keyspace. Conceptually,robjserves as a base class, whilekvobjacts as a derived class.Two new flags introduced into redis object,
iskvobjandexpirable:When the
iskvobjflag is set, the object includes also the key and it is appended to the end of the object. If theexpirableflag is set, an additional 8 bytes are added to the object. If the object is of type string, and the string is rather short, then it will be embedded as well.As a result, all keys in the keyspace are promoted to be of type
kvobj. This term attempts to align with the existing Redis object, robj, and the kvstore data structure.EXPIRE Implementation
As
kvobjembeds expiration time as well, looking up expiration times is now an O(1) operation. And the hash-table of EXPIRE is set now to beno_valuemode, directly referencingkvobjentries, and in turn, saves memory.Next, I plan to evaluate replacing the EXPIRE implementation with the ebuckets data structure, which would eliminate keyspace scans for expired keys. This requires embedding
ExpireMetawithin eachkvobjof each key with expiration. In such implementation, theexpirableflag will be shifted to indicate whetherExpireMetais attached.Benchmark
Tested briefly on my laptop. From the limited tests I conducted, there is no change in RPS, but the memory savings is significant for short strings:
Implementation notes
Manipulating keyspace (find, modify, insert)
Initially, unifying the key and value into a single object and storing it in dict with
no_valueoptimization seemed like a quick win. However, it (quickly) became clear that this change required deeper modifications to how keys are manipulated. The challenge was handling cases where a dictEntry is opt-out due to no_value optimization. In such cases, many of the APIs that return the dictEntry from a lookup become insufficient, as it just might be the key itself. To address this issue, a new-old approach of returning a "link" to the looked-up key'sdictEntryinstead of thedictEntryitself. The termlinkwas already somewhat available in dict API, and is well aligned with the new dictEntLink declaration:This PR introduces two new function APIs to dict to leverage returned link from the search:
After calling
link = dictFindLink(...), any necessary updates must be performed immediately after by callingdictSetKeyAtLink()without any intervening operations on given dict. Otherwise,dictEntLinkmay become invalid. Example:dict.h
Hash Field Expiration
Until now each hash object with expiration on fields needed to maintain a reference to its key-name (of the hash object), such that in case it will be active-expired, then it will be possible to resolve the key-name for the notification sake. Now there is no need anymore.