keyspace - Unify key and value & use dict no_value=1 by moticless · Pull Request #13806 · redis/redis

moticless · 2025-02-16T12:39:37Z

The idea of packing the key (sds), value (robj) and optionally TTL into a single struct in memory was mentioned a few times in the past by the community in various flavors. This approach improves memory efficiency, reduces pointer dereferences for faster lookups, and simplifies expiration management by keeping all relevant data in one place. This change goes along with setting keyspace's dict to no_value=1, and saving considerable amount of memory.

Two more motivations that well aligned with this unification are:

Prepare the groundwork for replacing EXPIRE scan based implementation and evaluate instead new ebuckets data structure that was introduced as part of Hash Field Expiration feature. Using this data structure requires embedding the ExpireMeta structure within each object.
Consider replacing dict with a more space efficient open addressing approach hash table that might rely on keeping a single pointer to object.

Before this PR, I POC'ed on a variant of open addressing hash-table and was surprised to find that dict with no_value actually could provide a good balance between performance, memory efficiency, and simplicity. This realization prompted the separation of the unification step from the evaluation of a new hash table to avoid introducing too many changes at once and to evaluate its impact independently before considering replacement of existing hash-table. On an earlier commit I extended dict no_value optimization (which saves keeping dictEntry where possible) to be relevant also for objects with even addresses in memory. Combining it with this unification saves a considerable amount of memory for keyspace.

kvobj

This PR adopts Valkey’s packing layout and logic for key, value, and TTL. However, unlike Valkey implementation, which retained a common robj throughout the project, this PR distinguishes between the general-purpose, overused robj, and the new kvobj, which embeds both the key and value and used by the keyspace. Conceptually, robj serves as a base class, while kvobj acts as a derived class.

Two new flags introduced into redis object, iskvobj and expirable:

struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS;
    unsigned iskvobj : 1;             /* new flag */
    unsigned expirable : 1;           /* new flag */
    unsigned refcount : 30;           /* modified: 32bits->30bits */
    void *ptr;
};

typedef struct redisObject robj;
typedef struct redisObject kvobj;

When the iskvobj flag is set, the object includes also the key and it is appended to the end of the object. If the expirable flag is set, an additional 8 bytes are added to the object. If the object is of type string, and the string is rather short, then it will be embedded as well.

As a result, all keys in the keyspace are promoted to be of type kvobj. This term attempts to align with the existing Redis object, robj, and the kvstore data structure.

EXPIRE Implementation

As kvobj embeds expiration time as well, looking up expiration times is now an O(1) operation. And the hash-table of EXPIRE is set now to be no_value mode, directly referencing kvobj entries, and in turn, saves memory.

Next, I plan to evaluate replacing the EXPIRE implementation with the ebuckets data structure, which would eliminate keyspace scans for expired keys. This requires embedding ExpireMeta within each kvobj of each key with expiration. In such implementation, the expirable flag will be shifted to indicate whether ExpireMeta is attached.

Benchmark

Tested briefly on my laptop. From the limited tests I conducted, there is no change in RPS, but the memory savings is significant for short strings:

NumOfKeys	SET_Ops/sec (unstable)	GET_Ops/sec (unstable)	used_memory_human (unstable)	SET_Ops/sec (unified)	GET_Ops/sec (unified)	used_memory_human (unified)
1,000,000	1,093,691.04	1,483,019.43	77.34M	1,157,560.14	1,448,104.29	59.87M
2,000,000	1,058,650.84	1,440,737.43	161.63M	1,137,901.74	1,415,035.60	119.07M
3,000,000	1,030,358.48	1,436,057.39	253.93M	1,080,256.58	1,380,673.88	183.08M
4,000,000	1,079,633.79	1,422,600.57	330.22M	1,087,372.86	1,362,898.23	237.47M
5,000,000	1,023,470.64	1,401,456.06	438.51M	1,068,068.43	1,272,556.06	312.37M
6,000,000	953,786.65	1,274,957.56	514.81M	1,040,961.13	1,352,156.86	365.47M
7,000,000	949,587.36	1,200,983.37	591.10M	1,029,271.15	1,265,204.82	419.46M
8,000,000	954,916.48	1,228,501.61	667.39M	931,313.21	1,111,849.72	474.25M
9,000,000	917,554.47	1,119,246.37	807.69M	950,059.29	1,019,576.66	571.71M
10,000,000	898,903.31	1,209,029.03	883.98M	908,114.36	1,058,442.09	624.07M

Implementation notes

Manipulating keyspace (find, modify, insert)

Initially, unifying the key and value into a single object and storing it in dict with no_value optimization seemed like a quick win. However, it (quickly) became clear that this change required deeper modifications to how keys are manipulated. The challenge was handling cases where a dictEntry is opt-out due to no_value optimization. In such cases, many of the APIs that return the dictEntry from a lookup become insufficient, as it just might be the key itself. To address this issue, a new-old approach of returning a "link" to the looked-up key's dictEntry instead of the dictEntry itself. The term link was already somewhat available in dict API, and is well aligned with the new dictEntLink declaration:

typedef dictEntry **dictEntLink;

This PR introduces two new function APIs to dict to leverage returned link from the search:

dictEntLink dictFindLink(dict *d, const void *key, dictEntLink *bucket);
void dictSetKeyAtLink(dict *d, void *key, dictEntLink *link, int newItem);

After calling link = dictFindLink(...), any necessary updates must be performed immediately after by calling dictSetKeyAtLink() without any intervening operations on given dict. Otherwise, dictEntLink may become invalid. Example:

/* replace existing key */
link = dictFindLink(d, key, &bucket, 0);
// ... Do something, but don't modify the dict ...
// assert(link != NULL);
dictSetKeyAtLink(d, kv, &link, 0);
     
/* Add new value (If no space for the new key, dict will be expanded and 
   bucket will be looked up again.) */  
link = dictFindLink(d, key, &bucket);
// ... Do something, but don't modify the dict ...
// assert(link == NULL);
dictSetKeyAtLink(d, kv, &bucket, 1);

dict.h

The dict API has became cluttered with many unused functions. I have removed these from dict.h.
Additionally, APIs specifically related to hash maps (no_value=0), primarily those handling key-value access, have been gathered and isolated.
Removed entirely internal functions ending with “*ByHash()” that were originally added for optimization and not required any more.
Few other legacy dict functions were adapted at API level to work with the term dictEntLink as well.
Simplified and generalized an optimization that related to comparison of length of keys of type strings.

Hash Field Expiration

Until now each hash object with expiration on fields needed to maintain a reference to its key-name (of the hash object), such that in case it will be active-expired, then it will be possible to resolve the key-name for the notification sake. Now there is no need anymore.

ShooterIT

I haven’t fully mastered it yet, but it looks interesting. I believe it could improve both memory usage and performance, especially for datasets with expiration times on many keys. Maybe the introduction of this PR may provide us with the opportunity to use other data types to represent the expire key dictionary. Since the kvobject already contains the expiration time, the expire key data structure only needs iteration functionality. For instance, using an rax tree to represent the expire dictionary could save memory, with each element being a combination of [expiretime + kvobject_addr].

moticless · 2025-02-24T23:13:18Z

Since the kvobject already contains the expiration time, the expire key data structure only needs iteration functionality. For instance, using an rax tree to represent the expire dictionary could save memory, with each element being a combination of [expiretime + kvobject_addr].

@ShooterIT, that's exactly the direction I'm heading! However, the approach you suggested was already evaluated long ago by Salvatore and didn’t prove effective. My goal is to assess the ebuckets data structure for this purpose, as it has demonstrated low memory consumption per key while maintaining decent performance.

moticless · 2025-03-10T10:00:01Z

Did some basic benchmarking. I run memtier_benchmark (-c 1 -t 1 --pipeline 1000) of SET and GET with short strings on c6i.2xlarge :

numitems Test        Step Cmd        Ops/sec_unify  Ops/sec_unstable  Ops/sec_diff_%  used_memory_unify  used_memory_unstable  used_memory_diff_%                                                                                                            
100000   SINGLE_HT   1    Hsets         1276959.81        1506863.76           18.00            7086088               6562392               -7.39
                     2    Hgets         1733102.25        1813697.04            4.65            6635864               6636456                0.01
                     3    Hdels         1507386.19        1642548.58            8.97             811888                812464                0.07
         MULTIPLE_HT 1    Hsets          293292.12         294659.30            0.47           15931872              17061008                7.09
                     2    Hgets         1683416.66        1744683.08            3.64           15931872              17061008                7.09
                     3    Hdels         1715589.56        1765754.95            2.92           15931872              17061008                7.09
         STRING      1    Sets          1553735.96        1525483.20           -1.82            6364712               8285848               30.18
                     2    Sets          1688533.17        1780087.94            5.42            6364712               8285848               30.18
                     3    Gets          2177368.43        2182214.95            0.22            6389400               8310536               30.07
                     4    Dels          2149197.27        2174291.18            1.17             886104                886680                0.07
         STRING_EX   1    Sets           829730.92         883946.64            6.53            8742168              11783864               34.79
                     2    Sets           981807.11        1176456.75           19.83            8742168              11783864               34.79
                     3    Dels          1690960.13        1557389.81           -7.90             886200                886776                0.06
         INCR        1    Incrs         1449968.83        1632972.99           12.62            6430776               5959912               -7.32
         LIST        1    Lpushs        1234674.60        1209935.99           -2.00            8055464               9184600               14.02
                     2    Lranges       1727891.63        1766441.15            2.23            8080152               9209288               13.97
         SORTED_SET  1    Zadds          862039.24         895768.39            3.91            8904992              10034128               12.68
                     2    Zranges       1547628.26        1737076.15           12.24            8929680              10058816               12.64
         SET         1    Sadds         1205865.33        1141135.66           -5.37            8154368               9283656               13.85
                     2    Smemberss     1702533.37        1921451.08           12.86            8179056               9308344               13.81
                     3    Srems         1256407.68        1258114.84            0.14            1083760               1084488                0.07
500000   SINGLE_HT   1    Hsets         1080557.74        1253333.87           15.99           29278152              29278896                0.00
                     2    Hgets         1908528.07        1993135.64            4.43           29278152              29278896                0.00
                     3    Hdels         1503772.97        1807671.03           20.21            1083760               1084488                0.07
         MULTIPLE_HT 1    Hsets          308362.74         309097.99            0.24           76113392              81278760                6.79
                     2    Hgets         1876426.08        1785880.12           -4.83           76113392              81278760                6.79
                     3    Hdels         1556018.21        1658495.81            6.59           76113392              81278760                6.79
         STRING      1    Sets          1350234.67        1466882.20            8.64           28121544              37278912               32.56
                     2    Sets          1578920.78        1773848.51           12.35           28121544              37278912               32.56
                     3    Gets          2182872.31        1981783.45           -9.21           28121544              37278912               32.56
                     4    Dels          2041841.41        2052300.83            0.51            1083912               1084640                0.07
         STRING_EX   1    Sets           702617.25         821823.69           16.97           39159272              53473280               36.55
                     2    Sets           934191.79        1128734.42           20.82           39159272              53473280               36.55
                     3    Dels          1575468.07        1479942.34           -6.06            1084008               1084736                0.07
         INCR        1    Incrs         1430828.05        1514114.58            5.82           28113544              25278912              -10.08
         LIST        1    Lpushs        1067229.03        1057636.99           -0.90           36113696              41279064               14.30
                     2    Lranges       1664851.98        1679249.85            0.86           36113696              41279064               14.30
         SORTED_SET  1    Zadds          813232.93         783066.96           -3.71           40113848              45279216               12.88
                     2    Zranges       1623160.55        1614465.61           -0.54           40113848              45279216               12.88
         SET         1    Sadds         1088191.38        1146828.45            5.39           36114000              41279368               14.30
                     2    Smemberss     1898513.08        1935561.29            1.95           36114000              41279368               14.30
                     3    Srems         1230808.62        1215690.18           -1.23            1084368               1085096                0.07
1000000  SINGLE_HT   1    Hsets         1176701.78        1312120.72           11.51           57473064              57473808                0.00
                     2    Hgets         1817497.78        1856034.80            2.12           57473064              57473808                0.00
                     3    Hdels         1469903.00        1683079.50           14.50            1084368               1085096                0.07
         MULTIPLE_HT 1    Hsets          298590.00         299179.50            0.20          151153344             161473672                6.83
                     2    Hgets         1729969.12        1737613.42            0.44          151153344             161473672                6.83

Takeaway

Memory Savings: This PR considerably reduces used_mem by 25-37% for short strings.
Performance Tradeoff: This table also reflects the tradeoff between maintaining simple dict with key and value versus optimized dict with no value. The expected RPS gain from avoiding an extra pointer indirection is offset by additional maintenance overhead in the PR’s packed key-value design.
Room to optimize:
- Packing key and value efficiency and readability.
- Consider replacing the hash table in the future.
- SDS is pricy. Consider use in the future plain string and fix 4 bytes length for keys.

sundb · 2025-05-07T13:31:03Z

fully CI: https://github.com/sundb/redis/actions/runs/14884654571

This PR is based on: valkey-io/valkey#861 > ### Memory Access Amortization > (Designed and implemented by [dan touitou](https://github.com/touitou-dan)) > > Memory Access Amortization (MAA) is a technique designed to optimize the performance of dynamic data structures by reducing the impact of memory access latency. It is applicable when multiple operations need to be executed concurrently. The principle behind it is that for certain dynamic data structures, executing operations in a batch is more efficient than executing each one separately. > > Rather than executing operations sequentially, this approach interleaves the execution of all operations. This is done in such a way that whenever a memory access is required during an operation, the program prefetches the necessary memory and transitions to another operation. This ensures that when one operation is blocked awaiting memory access, other memory accesses are executed in parallel, thereby reducing the average access latency. > > We applied this method in the development of dictPrefetch, which takes as parameters a vector of keys and dictionaries. It ensures that all memory addresses required to execute dictionary operations for these keys are loaded into the L1-L3 caches when executing commands. Essentially, dictPrefetch is an interleaved execution of dictFind for all the keys. ### Implementation of Redis When the main thread processes clients with ready-to-execute commands (i.e., clients for which the IO thread has parsed the commands), a batch of up to 16 commands is created. Initially, the command's argv, which were allocated by the IO thread, is prefetched to the main thread's L1 cache. Subsequently, all the dict entries and values required for the commands are prefetched from the dictionary before the command execution. #### Memory prefetching for main hash table As shown in the picture, after #13806 , we unify key value and the dict uses no_value optimization, so the memory prefetching has 4 steps: 1. prefetch the bucket of the hash table 2. prefetch the entry associated with the given key's hash 3. prefetch the kv object of the entry 4. prefetch the value data of the kv object we also need to handle the case that the dict entry is the pointer of kv object, just skip step 3. MAA can improves single-threaded memory access efficiency by interleaving the execution of multiple independent operations, allowing memory-level parallelism and better CPU utilization. Its key point is batch-wise interleaved execution. Split a batch of independent operations (such as multiple key lookups) into multiple state machines, and interleave their progress within a single thread to hide the memory access latency of individual requests. The difference between serial execution and interleaved execution: **naive serial execution** ``` key1: step1 → wait → step2 → wait → done key2: step1 → wait → step2 → wait → done ``` **interleaved execution** ``` key1: step1 → step2 → done key2: step1 → step2 → done key3: step1 → step2 → done ↑ While waiting for key1’s memory, progress key2/key3 ``` #### New configuration This PR involves a new configuration `prefetch-batch-max-size`, but we think it is a low level optimization, so we hide this config: When multiple commands are parsed by the I/O threads and ready for execution, we take advantage of knowing the next set of commands and prefetch their required dictionary entries in a batch. This reduces memory access costs. The optimal batch size depends on the specific workflow of the user. The default batch size is 16, which can be modified using the 'prefetch-batch-max-size' config. When the config is set to 0, prefetching is disabled. --------- Co-authored-by: Uri Yagelnik <uriy@amazon.com> Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>

…4233) Introduced by #13806 Fixed a crash in the MOVE command when moving hash objects that have both key expiration and field expiration. The issue occurred in the following scenario: 1. A hash has both key expiration and field expiration. 2. During MOVE command, `setExpireByLink()` is called to set the expiration time for the target hash, which may reallocate the kvobj of hash. 3. Since the hash has field expiration, `hashTypeAddToExpires()` is called to update the minimum field expiration time Issue: However, the kvobj pointer wasn't updated with the return value from `setExpireByLink()`, causing `hashTypeAddToExpires()` to use freed memory.

keyspace - Unify Key+Value into one struct and use dict no_value=1

3e667b1

moticless marked this pull request as draft February 17, 2025 13:27

ShooterIT mentioned this pull request Feb 24, 2025

Reduce redundant key slot calculations on expiration checks #13796

Merged

1 task

ShooterIT reviewed Feb 24, 2025

View reviewed changes

Comment thread src/sds.c

Comment thread src/server.c Outdated

Comment thread src/sds.c Outdated

Comment thread src/object.c Outdated

Comment thread src/object.c Outdated

Comment thread src/object.c Outdated

Comment thread src/object.c Outdated

Fixes, arrangement and renaming

6ac1a52

moticless added the action:run-benchmark Triggers the benchmark suite for this Pull Request label Mar 3, 2025

Merge branch 'unstable' into unify-keys-and-vals

e9082d1

redis deleted a comment from fcostaoliveira Mar 5, 2025

moticless added action:run-benchmark Triggers the benchmark suite for this Pull Request and removed action:run-benchmark Triggers the benchmark suite for this Pull Request labels Mar 5, 2025

moticless added 3 commits March 5, 2025 15:58

Refine memefficiency test as kvobj improve defrag (less frag)

12c4694

Revert populate 20000 to 10000

702e8e1

Bring back modified dictFind key length optimization

de8b344

Refactor SDS code

c04b8b1

moticless added action:run-benchmark Triggers the benchmark suite for this Pull Request and removed action:run-benchmark Triggers the benchmark suite for this Pull Request labels Mar 18, 2025

redis deleted a comment from fcostaoliveira Mar 18, 2025

moticless added action:run-benchmark Triggers the benchmark suite for this Pull Request and removed action:run-benchmark Triggers the benchmark suite for this Pull Request labels Mar 18, 2025

redis deleted a comment from fcostaoliveira Mar 18, 2025

moticless added action:run-benchmark Triggers the benchmark suite for this Pull Request and removed action:run-benchmark Triggers the benchmark suite for this Pull Request labels Mar 18, 2025

redis deleted a comment from fcostaoliveira Mar 18, 2025

moticless added action:run-benchmark Triggers the benchmark suite for this Pull Request and removed action:run-benchmark Triggers the benchmark suite for this Pull Request labels Mar 18, 2025

sundb requested changes Mar 21, 2025

View reviewed changes

Comment thread src/sds.h Outdated

Comment thread src/dict.c Outdated

Comment thread src/t_string.c Outdated

Comment thread src/sentinel.c Outdated

Comment thread src/sds.c

Comment thread src/server.h Outdated

PR fixes

23ed15d

Fix test repliation-rdbchannel threshold

e743cc3

ShooterIT reviewed May 6, 2025

View reviewed changes

Comment thread src/dict.h Outdated

Comment thread src/t_string.c

moticless added 2 commits May 6, 2025 18:27

Rename dictEntLink -> dictEntryLink

d2275f8

PR fixes

c9ce257

ShooterIT reviewed May 7, 2025

View reviewed changes

Comment thread src/server.c Outdated

sundb reviewed May 7, 2025

View reviewed changes

Comment thread src/t_string.c Outdated

sundb reviewed May 7, 2025

View reviewed changes

Comment thread src/expire.c Outdated

sundb reviewed May 7, 2025

View reviewed changes

Comment thread src/dict.c Outdated

moticless added 6 commits May 7, 2025 17:07

PR fixes

c607259

remove likely

eb097a9

remove likely

1e2ce01

Merge branch 'unstable' into unify-keys-and-vals

865901f

remove unneeded dummy file

eae8975

Fix info test

b749e84

sundb approved these changes May 9, 2025

View reviewed changes

sundb and others added 3 commits May 9, 2025 14:08

Merge branch 'unstable' into unify-keys-and-vals

05ff973

Fix setting EXPIRE when HFEs exists

2eed8c3

Merge branch 'unstable' into unify-keys-and-vals

4a26ee6

moticless merged commit e1789e4 into redis:unstable May 12, 2025

ShooterIT mentioned this pull request May 14, 2025

Improve multithreaded performance with memory prefetching #14017

Merged

moticless mentioned this pull request May 25, 2025

Avoid accessing the expire dict for keys without expire #9547

Closed

moticless deleted the unify-keys-and-vals branch May 28, 2025 03:43

moticless mentioned this pull request Jun 8, 2025

[DRAFT] EXPIRE Implementation Redesign #14108

Draft

9 tasks

hilikredis mentioned this pull request Jul 21, 2025

[NEW]dynamic expand kvobj to support additional per key metadata and support module interfaces for this. #14211

Open

sundb mentioned this pull request Jul 30, 2025

Fix missing kvobj reassignment after reallocation in MOVE command #14233

Merged

sundb removed the action:run-benchmark Triggers the benchmark suite for this Pull Request label Sep 30, 2025

sundb mentioned this pull request Oct 19, 2025

[QUESTION] Why does a 40-byte string become RAW instead of EMBSTR in Redis 8.2.1? #14438

Open

Uh oh!

Conversation

moticless commented Feb 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

kvobj

EXPIRE Implementation

Benchmark

Implementation notes

Manipulating keyspace (find, modify, insert)

dict.h

Hash Field Expiration

Uh oh!

ShooterIT left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

moticless commented Feb 24, 2025

Uh oh!

moticless commented Mar 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sundb commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

moticless commented Feb 16, 2025 •

edited

Loading