Skip to content

Rethinking the main Redis dictionary #10802

@madolson

Description

@madolson

I want to start a conversation about trying to come up with a more memory efficient main Redis dictionary. The motivation primarily being that a lot of users of Redis have a lot of small strings and values, and Redis has ~52 bytes of overhead for every value and ~32 bytes of overhead for TTL. If you have a lot of small keys/values, this overhead will dominate. It's still worse for cluster mode, since there is another 16 bytes of overhead.

I'm not necessarily implying I have an answer, but think we should be investigating this further. If other folks have relevant ideas, maybe we can discuss it here.

Background (Current structure)

Currently keys and values are stored in a simple Hashmap that uses a linked list to resolve collisions. The key is stored as an SDS inside the map, with the value being a "redisObject" which is a container with some metadata.

Current_state

52 bytes:

  • 8 byte dictEntry pointer(Assuming 100% load factor on DB)
  • 24 Bytes for the dictEntry (8 byte key, 8 next, 8 byte value)
  • 16 byte Redis Object (4 byte for type/encoding/LRU, 8 for value pointer, 4 for refcount)
  • 4 bytes for SDS overhead (max 3 bytes + \0, might be a bit less with jemalloc arenas)

32 Additional bytes for TTL:
-8 byte dictEntry pointer in TTL table (Assuming 100% load factor)
-24 byte dictEntry

Additional memory for cluster Mode:

  • 16 byte, next and previous pointers for maintaining the slot to keys mapping.

One possible alternative

New State

  • Use an open addressing scheme and embed the value directly into the main dictionary (Saves 8 bytes, no next. Not strictly required, means a good bit of work to figure out how to maintain SCAN/rehashing, but it's doable).
  • Promote the Redis object to be the dictEntry, saving a pointer. (Saves 8 bytes)
  • Embed the key when possible, which is basically always unless it's really large. Prefetch the value before doing the key comparison. Unlike embedded string robj, all key/values have an embeddable string, and most keys are shorter than values. (Our internal estimates says average key ~17 bytes, easily embeddable!) (Saves 8 bytes if we can embed the key)
  • Embed the TTL information into the refcount. AFAIK we don't have high refcounts, so stealing some of the TTL precision should be OK. (No memory saving, just moving data around). We are removing the key->TTL dictionary, so we need to keep track of it.
  • Replace the TTL dictionary RAX, which maps TTLs -> buckets. TTL buckets will be a fixed number of Redis Objects with similar TTL, buckets will be split as they fill up and will be marked so they can be dropped together. (Saves about 8 bytes) The intuition is that most of the time items are going to be added near each other (You're usually incrementing the expire or setting it to a fixed increment in the future) so blocks will typically be mostly full.

Total saves about 16 bytes, potentially an extra 8 for embedding key. TTL saves another 8. Since the TTL is now ordered, we can deterministically and proactively expire items. Which should reduce steady state memory overhead.

Another perhaps downside is that we lose refcounting for independent objects, which is used very frequently for strings on argument vector. It might be time to try to be more efficient here anyways, and just have a refcountable string object.

Other ongoing efforts

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions