Skip to content

Batched MGET/MSET dict prefetch with dictType-driven payload hints#15133

Merged
ShooterIT merged 28 commits into
redis:unstablefrom
filipecosta90:mget-mset-dict-pure
May 11, 2026
Merged

Batched MGET/MSET dict prefetch with dictType-driven payload hints#15133
ShooterIT merged 28 commits into
redis:unstablefrom
filipecosta90:mget-mset-dict-pure

Conversation

@fcostaoliveira

@fcostaoliveira fcostaoliveira commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator

Reduce MGET / MSET latency by overlapping the dict-lookup memory accesses
across the keys of a single multi-key command. Builds on the cross-command
batched prefetch framework introduced in #14017 and the dict-prefetch state
machine in memory_prefetch.c, and lifts the kvobject-aware bits out of the
state machine into two new dictType callbacks so the same machinery can
be reused for other dict-encoded types later (hash hashtable, sets, sorted
sets) without paying for kvobj-specific code paths in the core loop.

Bundles the work originally proposed in #14899 (MGET prefetch framework, by
@mpozniak95) and #15043 (MSET batch prefetch), with several follow-up
commits on top:

  • review-pass simplifications addressing @ShooterIT — drop the dead
    expire-dict prefetch callback, drop the single-key MGET special case,
    route MSET through dictPrefetchKeys so the existing entry's value is
    warmed too;
  • structural refactor of memory_prefetch.c by @ShooterIT (clearer
    types + explicit Init/Reset/Run/Free lifecycle);
  • MGET single-path simplification addressing @ShooterIT's second pass —
    drop MGET_LIGHT_PREFETCH_MAX and the inner bucket-only branch; small
    requests get a single dictPrefetchKeys call covering all keys, larger
    requests batch in chunks via the same call;
  • shared prefetchKeysBatch(c, slot, start, stride) helper by
    @ShooterIT that deduplicates MGET (stride=1) and MSET (stride=2)
    batching paths, with adaptive batch sizing (≥ 2 batches remain → take
    one; else take all remaining in one call) and a stack-allocating
    fixed-size lookup array.

Design

Two new optional callbacks on dictType:

typedef struct dictType {
    ...
    /* Bring the entry's key payload into cache before keyCompare runs.
     * Returns the address to prefetch, or NULL if the entry alone is enough. */
    void *(*prefetchEntryKey)(const dictEntry *de);

    /* Called only after a key match. Returns the value-side payload to
     * prefetch (or NULL). */
    void *(*prefetchEntryValue)(const dictEntry *de);
} dictType;

dbDictType registers both. The kv-aware logic — the dictEntryIsKey()
shortcut for embedded kvobjs, and kv->ptr for OBJ_STRING /
OBJ_ENCODING_RAW values — now lives in two small helpers in server.c:

static void *dbDictPrefetchEntryKey(const dictEntry *de) {
    return dictEntryIsKey(de) ? NULL : dictGetKey(de);
}

static void *dbDictPrefetchEntryValue(const dictEntry *de) {
    kvobj *kv = dictGetKey(de);
    return (kv->type == OBJ_STRING && kv->encoding == OBJ_ENCODING_RAW)
            ? kv->ptr : NULL;
}

The PrefetchGetValueDataFunc typedef and the per-call get_val_data
parameter on dictPrefetchKeys() / dictPrefetch() are removed — the
dict's own type drives both ends. This also removes the foot-gun where
callers (like mgetCommand) had to remember whether to pass
prefetchGetObjectValuePtr or NULL. memory_prefetch.c no longer
references kvobj, kvobjGetKey, or any specific value layout.

State machine

Two file-local types in memory_prefetch.c:

Type Role
dictPrefetchLookup Per-key snapshot of an in-flight, software-pipelined dictFind (mirrors the locals a synchronous dictFind would carry across one bucket walk).
dictPrefetcher Driver that advances a batch of dictPrefetchLookups through the FSM, yielding to the next in-flight lookup each time a prefetch is issued.

Five-stage lifecycle for each lookup, driven by the prefetcher:

                                                           │
                                                         start
                                                           │
                                                  ┌────────▼─────────┐
                                       ┌─────────►│  PREFETCH_BUCKET ├────►────────┐
                                       │          └────────┬─────────┘            no more tables
                                       │             bucket│found                  │
                                       │                   │                       │
        entry not found - goto next table         ┌────────▼────────┐              │
                                       └────◄─────┤ PREFETCH_ENTRY  │              ▼
                                    ┌────────────►└────────┬────────┘              │
                                    │                 entry│found                  │
                                    │                      │                       │
                                    │          ┌───────────▼─────────────┐         │
                                    │          │   PREFETCH_ENTRY_KEY    │ ◄── dictType->prefetchEntryKey(de)
                                    │          └───────────┬─────────────┘         │
                                    │                      │                       │
        key mismatch - goto next entry                     │                       │
                                    │          ┌───────────▼─────────────┐         │
                                    └──────◄───│   PREFETCH_ENTRY_VALUE  │ ◄── keyCompare; on match,
                                               └───────────┬─────────────┘     dictType->prefetchEntryValue(de)
                                                           │                       │
                                                 ┌─────────▼─────────────┐         │
                                                 │     PREFETCH_DONE     │◄────────┘
                                                 └───────────────────────┘

PREFETCH_BUCKET first picks ht_table[0], then flips to ht_table[1]
if the dict is mid-rehash, then transitions to PREFETCH_DONE if no
more tables remain.

memory_prefetch.c exposes a small lifecycle that any caller can drive:

dictPrefetcherInit(p, max_keys);                  /* one-shot heap alloc of lookups[] */
dictPrefetcherReset(p, dicts, keys, nkeys);       /* configure for one batch */
dictPrefetcherRun(p);                             /* drive FSM until all PREFETCH_DONE */
dictPrefetcherFree(p);                            /* release */

Each FSM stage is a named static function (dictPrefetchBucket,
dictPrefetchEntry, dictPrefetchEntryKey, dictPrefetchEntryValue),
so the dictPrefetcherRun driver is a four-line switch over the state.

The state machine is dict-pure: no kvobj field on dictPrefetchLookup,
no kvobjGetKey reach-through. Round-robin advance semantics — a state
only advances the cursor if a prefetch was actually issued — are
preserved, so the embedded-kvobj fast path
(dictEntryIsKey(de) == 1 → callback returns NULL) still skips the
extra prefetch and falls straight into the compare on the next loop
iteration.

The cross-command path (prefetchCommands / PrefetchCommandsBatch)
embeds a dictPrefetcher initialized once at startup and reset per
batch, so cross-command prefetching no longer allocates per call.

Intra-command API

void dictPrefetchKeys(dict **dicts, void **keys, size_t nkeys);

A single multi-key command (e.g. MGET) can prefetch dict data for a
batch of its own keys, reusing the same state machine that the
cross-command path uses. Single-key calls (nkeys <= 1) early-return —
nothing to interleave with. The implementation stack-allocates a
fixed-size lookup array bounded by DICT_PREFETCH_MAX_SIZE = 64 (no
VLA, predictable stack usage), so the intra-command path doesn't touch
the heap.

Notes on the call sites

A shared helper picks the next prefetch batch and warms it via
dictPrefetchKeys:

/* Pick the next prefetch batch starting at argv[start] and warm it via
 * dictPrefetchKeys. 'stride' is 1 for keys-only args (MGET) or 2 for
 * key/value pairs (MSET). Returns the chosen batch size in items. */
static int prefetchKeysBatch(client *c, int slot, int start, int stride);

Adaptive batch sizing inside the helper: if at least two full batches
(PREFETCH_BATCH_SIZE * 2 = 32 items) remain, take one batch
(PREFETCH_BATCH_SIZE = 16); otherwise take all remaining items in one
call. This generalizes the small-request fast path so the trailing
batch of a large request also gets the single-call benefit.

  • MGET (mgetCommand) — gated by
    do_prefetch = server.prefetch_batch_max_size && !already_prefetched && numkeys > 1,
    with already_prefetched = c->current_pending_cmd && (c->current_pending_cmd->flags & PENDING_CMD_KEYS_PREFETCHED).
    When do_prefetch is set, each iteration calls
    prefetchKeysBatch(c, slot, j, 1) and then sequentially
    lookupKeyReads + replies the chosen batch. When do_prefetch is
    clear (cross-command path already warmed the keys, or batch
    prefetching is off), the loop takes all remaining items in one go
    and skips the prefetch.

  • MSET / MSETNX (msetGenericCommand) — same do_prefetch gate as
    MGET with stride = 2. For the NX flag the NX-check loop runs
    lookupKeyWrite (which already warmed everything via
    prefetchKeysBatch); the SET loop then disables further prefetch
    (do_prefetch &&= !nx) so we don't re-prefetch on the second pass.
    Going through the full state machine (rather than bucket-only) means
    dbDictType's prefetchEntryValue callback runs on a key match —
    warming the old kvobj's payload, which setKey -> dbReplaceValue -> updateKeysizesHist(oldlen, newlen) then reads to compute the
    histogram delta. The slot dict is re-fetched per batch — in cluster
    mode the slot dict can be freed mid-MSET (KVSTORE_FREE_EMPTY_DICTS

    • expireIfNeeded), so a cached pointer would otherwise dangle.
  • Cross-command batch path (addCommandToBatch) — sets
    PENDING_CMD_KEYS_PREFETCHED on every command added to the batch,
    even on partial-batch overflow (was: only when ALL keys fit). The
    intra-command path then uniformly skips supplemental prefetching for
    any command the batch touched. Rationale: running both paths
    (cross-command warm + intra-command supplement) caused a measured
    −9.6 % regression on x86 with pipeline-10, and the partial cross-
    command warmup is sufficient for the head of the keyset; the cold
    tail goes through normal lookup, which is still cheaper than running
    the FSM a second time on already-warm keys.

  • Future types: each dict's dictType can register its own
    prefetchEntryKey / prefetchEntryValue (e.g. for the hashtable hash
    encoding, the field-sds and value-sds payloads), without touching
    memory_prefetch.c.

Benchmark validation

Validated on x86-aws-m7i.metal-24xl-2 (Intel Xeon 8488C, Sapphire Rapids,
bare-metal). PR head 792c9ca1c vs unstable 2432f5527.

Test Baseline 2432f5527 (Ops/sec) PR 792c9ca1c (Ops/sec) Δ
5Mkeys-string-mget-100B-2keys 162 657.34 164 968.43 +1.42 %
5Mkeys-string-mget-100B-5keys 146 959.66 150 364.64 (mean of 3 dp) +2.32 %
5Mkeys-string-mget-100B-10keys 125 275.66 133 586.86 +6.63 %
5Mkeys-string-mget-100B-20keys 90 897.79 107 037.88 +17.76 %
5Mkeys-string-mget-100B-30keys 73 006.87 88 796.06 +21.63 %
5Mkeys-string-mget-100B-100keys 31 149.71 42 723.75 +37.16 %
10Mkeys-load-string-mset-10-keys-with-10B-values 131 641.65 134 606.62 +2.25 %
10Mkeys-load-string-mset-10-keys-with-100B-values 127 918.11 132 016.12 +3.20 %
10Mkeys-load-string-mset-50-keys-with-10B-values 65 659.64 67 208.60 +2.36 %
10Mkeys-load-string-mset-50-keys-with-100B-values 60 718.20 63 409.87 +4.43 %

MGET wins scale with batch size — the prefetch hides one DRAM hop per
key, so the longer the batch, the more memory parallelism we recover.
MSET is positive across the board, with the largest test
(50-keys-100B) showing the value-prefetch fix paying off first
(highest cache pressure). ARM (Graviton4) was uniformly flat in earlier
runs against the same unstable head, which matches what the underlying
hardware prefetcher already gives.


Note

Medium Risk
Touches core dict lookup prefetching and rewrites MGET/MSET hot paths; while performance-focused, subtle mistakes could cause correctness regressions (e.g., stale dict pointers, cluster slot handling) or reduced performance under some workloads.

Overview
Adds dictType-configurable prefetch hooks (prefetchEntryKey/prefetchEntryValue) and rewrites the dict prefetch logic into a reusable, kvobj-agnostic state machine that can warm both key and value payloads during lookups.

Introduces a new intra-command API dictPrefetchKeys() and updates MGET/MSET/MSETNX to prefetch their own key batches (with adaptive batching), while coordinating with the existing cross-command prefetcher via a new PENDING_CMD_KEYS_PREFETCHED flag to avoid double-prefetching.

Registers db keyspace-specific prefetch callbacks in dbDictType and adds unit tests covering multi-batch MSET execution and an expiry/batch-boundary regression case.

Reviewed by Cursor Bugbot for commit bc081cb. Bugbot is set up for automated code reviews on this repo. Configure here.

mpozniak95 and others added 17 commits March 30, 2026 21:08
…keys

Add PENDING_CMD_KEYS_PREFETCHED flag to pendingCommand. Set it in
addCommandToBatch() when a command is added to the cross-command
prefetch batch. In mgetCommand, check the flag and fall back to
plain sequential lookups when keys are already warm from the batch.

This eliminates redundant prefetch work that caused regressions
of up to -16% with many I/O threads, while preserving the large
gains (+52-54%) for pipelined workloads where the batch overflows.
The previous guard only skipped intra-command dictPrefetchKeys() when
PENDING_CMD_KEYS_PREFETCHED was set, which requires ALL keys to fit in
the batch. For MGET 100 keys with a batch size of 32, the flag was
never set — both prefetch paths always ran.

On x86 with pipeline-10, this double-prefetching caused -9.6% regression
on mget-100B-100keys-pipeline-10 and -5.6% on setget2000c with 8 IO
threads due to cache-bandwidth contention in ctxPrefetchAndAdvance.

Fix: skip intra-command prefetch when either:
1. PENDING_CMD_KEYS_PREFETCHED is set (all keys fit in batch), OR
2. Pipeline is active (ready_len > 1) — partial batch warmup is
   sufficient, double-prefetching hurts more than it helps.

Single non-pipelined MGET (ready_len <= 1) still uses intra-command
prefetch, preserving the +11% ARM / +25% x86 improvement.
The ready_len > 1 heuristic that skips intra-command dictPrefetchKeys()
assumed cross-command batch prefetching always runs during pipelining.
When prefetch-batch-max-size is 0 (batch disabled), addCommandToBatch()
returns C_ERR immediately and no keys are warmed, but ready_len > 1
still holds for any pipeline — causing MGET to skip its only prefetch
opportunity.

Add server.prefetch_batch_max_size > 0 to the ready_len guard so
intra-command prefetch runs when the batch system is disabled.
The prefetchGetObjectValuePtr callback adds one state-machine iteration
per key to prefetch the first cache line of the value allocation.
For EMBSTR values (<=44B) it returns NULL (value is embedded in the kv
object), making the iteration pure overhead.  For large RAW values
(>=1KiB) it warms <10% of the data while adding measurable cost —
profiling shows 3B samples in ctxPrefetchValdata vs 17.8B total in the
prefetch path.

Pass NULL instead: skip value-data prefetch entirely in the MGET
intra-command path.  The dict bucket/entry/kv-object prefetch provides
the bulk of the improvement; addReplyBulk handles value data access
sequentially.
Three optimizations to the dict prefetch state machine:

1. Replace modulo with branch in ctxPrefetchAndAdvance — avoids
   expensive division on every state transition.

2. Add remaining counter to DictPrefetchCtx — ctxNextInfo returns
   NULL immediately when all keys are done instead of scanning the
   full array. Eliminates O(n) scan per call as keys complete.

3. Fast-path in ctxPrefetchKvobj when no value-data callback and
   entry is a direct key — do key comparison inline and skip the
   VALDATA state entirely, saving one round-trip through the state
   machine per key.
For batches of <=10 keys, replace the full dictPrefetchKeys state
machine with a simple loop that prefetches hash table buckets only.

The state machine has ~4 states per key with function dispatch and
round-robin scanning overhead.  Profiling shows this costs 4.3% of
total CPU for 10-key MGET with 1KiB values — matching the observed
regression exactly.  A single-pass bucket prefetch eliminates the
iteration overhead while still warming the hash chain heads.

Larger batches (>10 keys) continue using the full state machine where
the multi-round interleaving provides net benefit.
Adds intra-command batched prefetch to msetGenericCommand. When
processing multiple keys, dict bucket pointers for the next batch of 16
keys are prefetched into L1 cache before executing setKey for each key.
This hides memory latency from scattered dict lookups in large MSET
commands.

The prefetch is advisory — correctness does not depend on it. If a
rehash occurs mid-batch (triggered by a prior setKey expanding the dict),
the prefetched bucket may be stale. The subsequent lookupKeyWriteWithLink
inside setKey re-derives the correct bucket.

Both the MSETNX existence-check pass and the write pass are batched.
Single-key MSET or empty-dict cases skip prefetch entirely.
Addresses three review threads on PR redis#15043:

1. Stale dict pointer (augmentcode + cursor bugbot, high/medium).
   In cluster mode, server.db[].keys is created with KVSTORE_FREE_EMPTY_DICTS.
   A prior batch's setKey -> expireIfNeeded may delete the last pre-existing
   key in the slot, triggering freeDictIfNeeded; the subsequent dbAddByLink
   then allocates a new dict. The cached dict* from before the batch would be
   a dangling pointer on the next msetPrefetchBatch call. Re-fetch the slot
   dict inside a new msetMaybePrefetchBatch helper that runs per batch.

2. Helper signature (sundb). Change msetPrefetchBatch to take `robj **argv`
   instead of `client *c`, making the contract explicit.

3. Duplication (sundb). Collapse the `if (use_prefetch) { batched } else
   { simple }` branches in both the MSETNX existence-check pass and the
   write pass into a single batched loop that conditionally calls the
   prefetch helper. `use_prefetch` now only gates the prefetch call, not
   the loop structure.

Add regression tests in tests/unit/type/string.tcl:
- MSET spanning multiple prefetch batches (16, 17, 32, 33, 40 keys)
- MSET overwriting expired keys across a batch boundary (exercises the
  expireIfNeeded path in the same code that would UAF under cluster mode)
… callback

Two review comments from Cursor Bugbot on dcb8aa8:

1. Missing `#undef MGET_LIGHT_PREFETCH_MAX` at the end of mgetCommand.
   MGET_BATCH was already undef'd but its sibling was not.

2. The `dictPrefetchKeys()` docstring example passed
   `prefetchGetObjectValuePtr` as the value-data callback, while the
   mgetCommand caller passes NULL on purpose (the callback's extra state
   machine work costs more than warming the value payload saves, since the
   following lookupKeyRead + addReplyBulk touch the payload immediately and
   the hardware prefetcher is already on it). Update the example to show
   the typical NULL form and document when passing a non-NULL callback
   would actually be worthwhile.
Decouple the prefetch state machine from kvobject. The state machine
previously knew about kvobj fields (current_kv, kvobjGetKey) and the
default value-data callback (prefetchGetObjectValuePtr) lived inside
memory_prefetch.c. That coupled a generic dict facility to one
specific dict's stored-key layout.

Two new optional callbacks on dictType:

  void *prefetchEntryKey(const dictEntry *de);
  void *prefetchEntryValue(const dictEntry *de);

The state machine renames the post-entry states to match:

  PREFETCH_KVOBJ   -> PREFETCH_ENTRY_KEY
  PREFETCH_VALDATA -> PREFETCH_ENTRY_VALUE

PREFETCH_ENTRY_KEY asks the dictType to bring the entry's key payload
into cache (returns the address to prefetch, or NULL if the entry
already carries everything keyCompare needs). PREFETCH_ENTRY_VALUE
runs the keyCompare and on a hit asks the dictType to prefetch the
value-side payload.

dbDictType registers both callbacks in server.c:

  - prefetchEntryKey returns dictGetKey(de) only when the kvobj is
    out-of-line (matches today's dictEntryIsKey check).
  - prefetchEntryValue returns kv->ptr for OBJ_STRING/RAW (the old
    prefetchGetObjectValuePtr behaviour, now where it belongs).
  - dbExpiresDictType only sets prefetchEntryKey (no values).

The PrefetchGetValueDataFunc typedef and the trailing get_val_data
parameter are removed from the dictPrefetchKeys() public API and from
the cross-command dictPrefetch() wrapper - the dict's own type now
drives this end-to-end. mgetCommand's call site updates accordingly.
@fcostaoliveira fcostaoliveira changed the title Alternative: dict-pure MGET/MSET prefetch via dictType callbacks Batched MGET/MSET dict prefetch with dictType-driven payload hints Apr 28, 2026
ctxPrefetchEntryValue was passing dictGetKey(entry) directly to
dictCompareKeys. For dbDictType the stored key is a kvobj — keyCompare
(dictSdsCompareKV) expects an sds, so it dereferenced kvobj+offset as if
it were an sds, reading garbage 'flags' and length fields and walking off
the heap.

The previous form of this state (PREFETCH_KVOBJ -> PREFETCH_VALDATA)
extracted the sds via kvobjGetKey before calling keyCompare. The
dict-pure equivalent is dictType->keyFromStoredKey, which dictFindLink
also uses (see dictFindLinkInternal -> dictStoredKey2Key).

Fix: apply keyFromStoredKey to dictGetKey(entry) before passing it to
dictCompareKeys when the dict provides the callback. dbDictType /
dbExpiresDictType set keyFromStoredKey = kvGetKey, so the conversion
yields the sds again; dicts without the callback (sds-keyed dicts) keep
the entry's stored key unchanged.

Reproduces under sanitizer (heap-buffer-overflow on sdslen reading
kvobj-as-sds) and SIGSEGV in dictSdsCompareKV when a hash chain has more
than one entry and the cross-command batch prefetch hits the compare
path. Single-entry chains hit the !dictGetNext shortcut and so escaped
local smoke testing.
@fcostaoliveira fcostaoliveira marked this pull request as ready for review April 29, 2026 08:06
Comment thread src/t_string.c
@augmentcode

augmentcode Bot commented Apr 29, 2026

Copy link
Copy Markdown
🤖 Augment PR Summary

Summary: This PR reduces multi-key lookup latency by overlapping dict-lookup memory accesses using batched prefetching.

Changes:

  • Extend dictType with optional prefetchEntryKey/prefetchEntryValue hooks to describe key/value payload prefetch targets.
  • Refactor memory_prefetch.c’s dict prefetch state machine to be kvobj-agnostic and driven only by dictType callbacks.
  • Add an intra-command API (dictPrefetchKeys()) so single multi-key commands can prefetch their own key batches.
  • Update cross-command batch prefetching to rely on dictType hooks and introduce PENDING_CMD_KEYS_PREFETCHED to skip redundant intra-command prefetch.
  • Rework MGET to process keys in batches (light bucket-only prefetch for small batches; full state machine for larger ones).
  • Rework MSET/MSETNX to prefetch buckets in 16-key batches and re-fetch the slot dict each batch to avoid stale pointers.
  • Add unit tests covering multi-batch MSET behavior and an expiry-related regression case across the batch boundary.

Technical Notes: Prefetching remains advisory and is conditionally disabled in some pipelined scenarios to avoid cache-bandwidth contention.

🤖 Was this summary useful? React with 👍 or 👎

@augmentcode augmentcode Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/t_string.c
/* Determine the dict for the first key's slot (MGET requires all keys
* in the same slot in cluster mode). */
int slot = server.cluster_enabled ? getKeySlot(c->argv[1]->ptr) : 0;
dict *d = kvstoreGetDict(c->db->keys, slot);

@augmentcode augmentcode Bot Apr 29, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/t_string.c:706: mgetCommand caches dict *d once and reuses it across multiple 16-key batches, but in cluster mode lookupKeyRead() can expire/delete the last key in the slot and KVSTORE_FREE_EMPTY_DICTS may free the slot dict, leaving d dangling for the next batch’s prefetch/hash operations. This is the same class of hazard you already call out and avoid in msetMaybePrefetchBatch by re-fetching the dict per batch.

Severity: high

Other Locations
  • src/memory_prefetch.c:433

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Comment thread src/dict.h Outdated
* key payload behind the entry is warm before keyCompare runs), or
* NULL if nothing extra is needed (e.g. the key is co-located with
* the entry).
* prefetchEntryValue: called only after a key match. Returns an

@augmentcode augmentcode Bot Apr 29, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/dict.h:148: The comment says prefetchEntryValue is “called only after a key match”, but memory_prefetch.c can enter the match path without comparing keys for the last entry in a bucket (the “assume hit” fast path). This contract mismatch could lead future dictType implementations to make unsafe assumptions inside prefetchEntryValue.

Severity: medium

Other Locations
  • src/memory_prefetch.c:265

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Comment thread src/memory_prefetch.c Outdated
* (typically ≤ 16–32) to avoid excessive stack usage.
* Zero-initialize so that keys marked PREFETCH_DONE by ctxInit
* (null/empty dict) don't carry indeterminate field values. */
KeyPrefetchInfo pf_info[nkeys];

@augmentcode augmentcode Bot Apr 29, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/memory_prefetch.c:462: dictPrefetchKeys() stack-allocates KeyPrefetchInfo pf_info[nkeys]; if a caller ever passes an unbounded nkeys (e.g. derived directly from argc), this can cause stack exhaustion and crash. Since this is now an exported API in memory_prefetch.h, it feels easy for a future call site to accidentally violate the “keep nkeys bounded” assumption.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Three fixes from the cursor and augmentcode bot reviews on the dict-pure
prefetch branch:

1. mgetCommand: re-fetch the slot dict per batch.

   The dict pointer was fetched once at the top and reused across all
   16-key batches. In cluster mode, db->keys is created with
   KVSTORE_FREE_EMPTY_DICTS, so a prior batch's lookupKeyRead ->
   expireIfNeeded path can empty the slot and free the dict; the next
   batch's dictGetHash(d, ...) would then UAF. Same hazard the MSET
   batch path already guards against via msetMaybePrefetchBatch.

   Mirror that pattern: kvstoreGetDict per batch, with an empty/null
   dict shortcut that still calls lookupKeyRead per key to preserve
   touch / stats / notification side-effects. Cost: one extra
   kvstoreGetDict per 16 keys.

2. dictType.prefetchEntryValue contract: the docstring claimed the
   callback is 'called only after a key match', but the state machine
   has a 'last entry in chain, not rehashing' shortcut that calls the
   callback without comparing keys. Update the doc to spell out the
   presumed-match semantics so future dictType authors don't write
   callbacks that assume a verified equal.

3. dictPrefetchKeys: cap nkeys with serverAssert + named constant
   (DICT_PREFETCH_KEYS_MAX = 64). KeyPrefetchInfo is stack-allocated
   via VLA and must not be sized off an attacker-controlled value;
   in-tree callers batch in groups of 16, the cap leaves headroom
   without risking stack exhaustion.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit d6d9ac9. Configure here.

Comment thread src/t_string.c Outdated
Comment thread src/server.c Outdated
Comment thread src/t_string.c Outdated
Comment thread src/t_string.c Outdated
@fcostaoliveira fcostaoliveira requested a review from ShooterIT May 5, 2026 10:51
@fcostaoliveira

Copy link
Copy Markdown
Collaborator Author

@ShooterIT — your follow-up refactor is in as 792c9ca1c. Net −85 LOC (+67 / −152). Big readability wins:

  • Single shared prefetchKeysBatch(c, slot, start, stride) helper kills the MGET/MSET duplication (stride=1 for MGET, stride=2 for MSET key/value pairs).
  • Adaptive batch sizing (if (batch >= PREFETCH_BATCH_SIZE*2) batch = PREFETCH_BATCH_SIZE; else take all remaining) generalizes the small-request fast path to also cover the trailing batch of large requests.
  • VLA lookups[nkeys] → fixed lookups[DICT_PREFETCH_MAX_SIZE] (no VLA, portable).
  • Constant rename DICT_PREFETCH_KEYS_MAXDICT_PREFETCH_MAX_SIZE (matches prefetch_batch_max_size config).
  • MSETNX SET loop skips prefetch (NX-check loop already warmed everything).
  • addCommandToBatch now sets PENDING_CMD_KEYS_PREFETCHED even on partial batches (was: only when ALL keys fit). This was the one semantic change worth validating since the prior comment about "−9.6 % regression on x86 with pipeline-10 when both paths run" left it ambiguous — turns out suppressing intra-command supplementation on partial batches doesn't regress mget-100B-100keys either (see table).

Validation — x86-aws-m7i.metal-24xl-2

PR 792c9ca1c vs unstable 2432f5527, alongside the previous head a46e18965 for refactor delta:

Test Base 2432f552 Prev a46e1896 PR 792c9ca1 Δ vs base Δ vs prev
5Mkeys-string-mget-100B-2keys 162 657.34 163 601.69 164 968.43 +1.42 % +0.84 %
5Mkeys-string-mget-100B-5keys 146 959.66 153 192.74 150 364.64 (mean of 3 dp: 147 804 / 152 065 / 151 223) −1.85 % −1.85 %
5Mkeys-string-mget-100B-10keys 125 275.66 133 124.41 133 586.86 +6.63 % +0.35 %
5Mkeys-string-mget-100B-20keys 90 897.79 106 756.88 107 037.88 +17.76 % +0.26 %
5Mkeys-string-mget-100B-30keys 73 006.87 87 653.67 88 796.06 +21.63 % +1.30 %
5Mkeys-string-mget-100B-100keys 31 149.71 42 347.75 42 723.75 +37.16 % +0.89 %
10Mkeys-load-string-mset-10-keys-with-10B-values 131 641.65 133 212.63 134 606.62 +2.25 % +1.05 %
10Mkeys-load-string-mset-10-keys-with-100B-values 127 918.11 129 232.20 132 016.12 +3.20 % +2.15 %
10Mkeys-load-string-mset-50-keys-with-10B-values 65 659.64 67 970.65 67 208.60 +2.36 % −1.12 %
10Mkeys-load-string-mset-50-keys-with-100B-values 60 718.20 63 604.58 63 409.87 +4.43 % −0.31 %

Refactor is performance-neutral vs the prev head (8 of 10 within ±1.5 %, two within ±2.2 %). All headline wins vs unstable preserved — mget-100B-100keys actually nudged up from +35.95 % → +37.16 %. The partial-batch PENDING_CMD_KEYS_PREFETCHED change you made in addCommandToBatch was the only behavioural concern worth re-checking, and it's clean.

@fcostaoliveira

Copy link
Copy Markdown
Collaborator Author

Two follow-up validation rounds on x86-aws-m7i.metal-24xl-2 against the same unstable baseline (now 7ecc04f59d — the current head) that surface deltas under topologies the prior table didn't cover.

oss-standalone-08-io-threads — io-threads safety check

The change in addCommandToBatch (commit 792c9ca1c) sets PENDING_CMD_KEYS_PREFETCHED even on partial cross-command batches. Worth confirming this doesn't regress io-threads workloads where multi-key commands could overflow the batch and lose the supplemental intra-command prefetch.

Test Baseline 7ecc04f59d (Ops/sec) PR 792c9ca1c (Ops/sec) Δ
5Mkeys-string-mget-100B-100keys-pipeline-10 41 218.99 60 581.10 +46.97 %
5Mkeys-string-mget-100B-30keys-pipeline-10 148 639.87 212 409.44 +42.90 %
1Mkeys-string-setget200c-512B-pipeline-10 2 414 365.79 2 594 377.61 +7.46 %
1Mkeys-string-set-get-short-expiration-512B 1 272 013.21 1 342 755.00 +5.56 %
3Mkeys-string-mixed-20-80-with-512B-values-pipeline-10-2000_conns 2 423 562.75 2 515 002.31 +3.77 %

The cold-tail concern was unfounded — eliminating supplemental intra-command prefetching on commands the batch already touched removes redundant cache-bandwidth contention. Multi-key MGET wins are larger with io-threads (+47 % / +43 %) than the no-iothreads version (+37 % on mget-100B-100keys at oss-standalone).

oss-standalone pipeline-10 — non-iothreads pipelined

Test Baseline 7ecc04f59d (Ops/sec) PR 792c9ca1c (Ops/sec) Δ
5Mkeys-string-mget-100B-2keys-pipeline-10 870 056.10 885 536.36 +1.78 %
5Mkeys-string-mget-100B-5keys-pipeline-10 460 668.12 500 265.21 +8.60 %
5Mkeys-string-mget-100B-30keys-pipeline-10 102 827.68 131 288.11 +27.68 %
5Mkeys-string-mget-100B-100keys-pipeline-10 56 911.76 59 489.88 +4.53 %

Coverage matrix (PR 792c9ca1c vs 7ecc04f59d, m7i-2)

Workload non-pipeline (oss-standalone) pipeline-10 (oss-standalone) pipeline-10 (oss-standalone-08-io-threads)
mget-100B-2keys +1.42 % +1.78 %
mget-100B-5keys +0.57 % (mean of 3 dp) +8.60 %
mget-100B-10keys +6.63 %
mget-100B-20keys +17.76 %
mget-100B-30keys +21.63 % +27.68 % +42.90 %
mget-100B-100keys +37.16 % +4.53 % +46.97 %
set-get-short-expiration-512B +5.56 %
setget200c-512B-pipeline-10 +7.46 %
mixed-20-80-512B-pipeline-10-2000_conns +3.77 %

No regression on any shape. The mget-100B-100keys-pipeline-10 +4.53 % being smaller than its non-pipeline +37.16 % is the expected interaction — at 100 keys the cross-command batch (~32 keys per prefetch_batch_max_size) already warms the head of the keyset before the intra-command path runs, leaving less DRAM latency to hide. Add io-threads (parallel reply chain + cross-command batch) and the wins compound to +47 %.

@ShooterIT ShooterIT left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally LGTM, small comments

Comment thread src/t_string.c
addReplyBulkCBuffer(c,(char*)str+start,end-start+1);
}
}
/* Batch size for intra-command key prefetching. */

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Batch size for intra-command key prefetching. */
/* Batch size for intra-command key prefetching. */

Comment on lines +277 to +282
# Regression test for dict-pointer staleness across batches
# (see src/t_string.c:msetMaybePrefetchBatch). When lookupKeyWrite in
# batch 1 expires a pre-existing key, under cluster mode the slot
# dict may be freed (KVSTORE_FREE_EMPTY_DICTS) and recreated
# mid-command; msetGenericCommand must re-fetch the slot dict per
# batch. This test exercises the same code path in standalone mode.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update comments or tests

Comment thread src/server.c Outdated
.dictMetadataBytes = setDictMetadataBytes,
};

/* dbDictType prefetch callbacks --------------------------------------------

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

------------------------ is not align with other function style, maybe remove them, or refine it

Suggested change
/* dbDictType prefetch callbacks --------------------------------------------
/* dbDictType prefetch callbacks.

btw, i think we should move them at the front of dictType, maybe L572

Comment thread src/t_string.c Outdated
addReplyArrayLen(c, numkeys);

/* MGET requires all keys in the same slot in cluster mode. */
int slot = server.cluster_enabled ? getKeySlot(c->argv[1]->ptr) : 0;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about use c->current_pending_cmd->slot, if it's not INVALID_CLUSTER_SLOT, we can use it directly.

- src/server.c: drop the '----' separator from the dbDictType prefetch
  callbacks comment block and move the two callbacks ahead of the
  dictType definitions section, near the related dictResizeAllowed
  helper, instead of sitting between setDictType and dbDictType.
- src/t_string.c: blank-line separation between getrangeCommand and the
  PREFETCH_BATCH_SIZE define for readability.
- tests/unit/type/string.tcl: update stale function reference in the
  'MSET overwrites expired keys across batch boundary' regression
  comment from msetMaybePrefetchBatch to the post-refactor
  prefetchKeysBatch helper.
@fcostaoliveira

Copy link
Copy Markdown
Collaborator Author

Thanks @ShooterIT — addressed all three in 517cb9b:

  1. src/t_string.c:685 — added the blank-line separation before the PREFETCH_BATCH_SIZE define.
  2. src/server.c:624 — dropped the ------ separator and moved the two prefetch callbacks ahead of the dictType section (right after dictResizeAllowed, before objectKeyPointerValueDictType), per your L572 suggestion.
  3. tests/unit/type/string.tcl:282 — updated the stale msetMaybePrefetchBatch reference to prefetchKeysBatch (post-refactor name).

Local runtest --single unit/type/string green, including the MSET overwrites expired keys across batch boundary regression test.

Avoid recomputing the cluster slot via getKeySlot() (CRC16 over the
first key) in mgetCommand and msetGenericCommand when the cross-command
batching path has already populated c->current_pending_cmd->slot.
Fall back to getKeySlot() when current_pending_cmd is NULL or the
cached slot is INVALID_CLUSTER_SLOT (cross-slot or no slot set).
Adds the cluster.h include for INVALID_CLUSTER_SLOT visibility.
@fcostaoliveira

Copy link
Copy Markdown
Collaborator Author

Thanks @sundb — good catch, addressed in 3d724a05c. Both mgetCommand and msetGenericCommand now reuse c->current_pending_cmd->slot when it's already populated by the cross-command batching path, falling back to getKeySlot() only when current_pending_cmd is NULL or the cached slot is INVALID_CLUSTER_SLOT (no-slot / cross-slot error case).

int slot = 0;
if (server.cluster_enabled) {
    slot = (c->current_pending_cmd &&
            c->current_pending_cmd->slot != INVALID_CLUSTER_SLOT)
        ? c->current_pending_cmd->slot
        : getKeySlot(c->argv[1]->ptr);
}

Local runtest --single unit/type/string green.

@fcostaoliveira fcostaoliveira requested review from ShooterIT and sundb May 8, 2026 11:23
@fcostaoliveira

fcostaoliveira commented May 8, 2026

Copy link
Copy Markdown
Collaborator Author

Latest commit 3d724a05c (sundb slot-reuse) — performance reconfirmed across multiple x86 + ARM runners.

Triggered a fresh round comparing PR head 3d724a05c against current unstable head 7cf63635f0 (which now includes the just-merged #15071 jemalloc-size-hint patch).

Throughput — m7i-2 (Sapphire Rapids x86, multi-dp, σ-disjoint where ≥ 2 dp/cell)

Test Topology unstable mean (n) PR mean (n) Δ
5Mkeys-string-mget-10B-100keys-pipeline-10 io-threads-8 26,304 (n=1) 49,831 (n=2) +89.44 %
5Mkeys-string-mget-10B-100keys-pipeline-10 standalone 22,953 (n=1) 42,053 (n=2) +83.21 %
5Mkeys-string-mget-10B-100keys standalone 22,136 (n=1) 39,472 (n=2) +78.31 %
5Mkeys-string-mget-100B-30keys-pipeline-10 io-threads-8 148,763 (n=2) 209,899 (n=2) +41.10 % σ↑
5Mkeys-string-mget-100B-100keys-pipeline-10 io-threads-8 43,505 (n=2) 60,563 (n=3) +39.21 % σ↑
5Mkeys-string-mget-100B-100keys standalone 31,084 (n=2) 42,688 (n=3) +37.33 % σ↑
10Mkeys-string-mget-80keys-100B-multitenant-tagged standalone 30,108 (n=2) 40,811 (n=3) +35.55 % σ↑
5Mkeys-string-mget-100B-30keys-pipeline-10 standalone 105,791 (n=2) 129,903 (n=2) +22.79 % σ↑
5Mkeys-string-mget-100B-30keys standalone 72,249 (n=2) 88,433 (n=2) +22.40 % σ↑
5Mkeys-string-mget-100B-20keys standalone 91,047 (n=2) 106,422 (n=2) +16.89 % σ↑
5Mkeys-string-mget-512B-5keys-pipeline-10 io-threads-8 815,187 (n=1) 935,741 (n=2) +14.79 %
5Mkeys-string-mget-100B-5keys-pipeline-10 standalone 464,503 (n=2) 503,364 (n=2) +8.37 % σ↑
5Mkeys-string-mget-100B-5keys-pipeline-10 io-threads-8 902,737 (n=2) 977,641 (n=2) +8.30 % σ↑
5Mkeys-string-mget-512B-5keys-pipeline-10 standalone 314,000 (n=1) 333,018 (n=2) +6.06 %
5Mkeys-string-mget-1KiB-5keys standalone 118,086 (n=1) 124,493 (n=2) +5.43 %
5Mkeys-string-mget-100B-100keys-pipeline-10 standalone 56,483 (n=2) 59,229 (n=3) +4.86 % σ↑
5Mkeys-string-mget-10B-100keys io-threads-8 30,950 (n=1) 32,049 (n=2) +3.55 %
5Mkeys-string-mget-100B-5keys standalone 145,255 (n=2) 150,094 (n=2) +3.33 % σ↑
5Mkeys-string-mget-512B-5keys standalone 132,357 (n=1) 136,500 (n=2) +3.13 %
10Mkeys-load-string-mset-10-keys-with-100B-values standalone 129,437 (n=2) 131,074 (n=3) +1.26 % σ↑
5Mkeys-string-mget-100B-2keys-pipeline-10 standalone 886,782 (n=2) 891,897 (n=2) +0.58 %
5Mkeys-string-mget-1KiB-5keys-pipeline-10 standalone 123,797 (n=1) 123,513 (n=2) -0.23 %
5Mkeys-string-mget-100B-2keys-pipeline-10 io-threads-8 1,610,943 (n=2) 1,578,659 (n=2) -2.00 %
5Mkeys-string-mget-100B-2keys standalone 168,012 (n=2) 164,302 (n=2) -2.21 % σ↓
5Mkeys-string-mget-1KiB-5keys-pipeline-10 io-threads-8 477,506 (n=1) 415,693 (n=2) -12.94 % (*)

σ↑ / σ↓ = σ-disjoint (PR min > BL max or vice versa) when both sides have ≥ 2 dp.

(*) The −12.94 % on mget-1KiB-5keys-pipeline-10 / io-threads-8 is the same outlier flagged on #14907 (also σ-confirmed there at −12.05 %). The test's YAML spec is missing the --pipeline 10 client argument that its 100B and 512B siblings have, so it actually runs unpipelined despite the name. Both PR and baseline run the same broken spec, so the delta is real, but the magnitude doesn't match the prefetch mechanism (its 5-key siblings at 100B and 512B are +1 to +15 %). Likely a spec hygiene issue separate from this PR — happy to fix the spec under a separate change.

Throughput — m8g-2 (Graviton4 ARM, multi-dp, σ-disjoint where ≥ 2 dp/cell)

Test Topology unstable mean (n) PR mean (n) Δ
5Mkeys-string-mget-10B-100keys-pipeline-10 io-threads-8 3,059 (n=1) 6,986 (n=2) +128.34 %
5Mkeys-string-mget-100B-100keys-pipeline-10 io-threads-8 6,261 (n=2) 9,188 (n=3) +46.76 % σ↑
5Mkeys-string-mget-10B-100keys standalone 21,106 (n=1) 29,716 (n=2) +40.79 %
5Mkeys-string-mget-10B-100keys-pipeline-10 standalone 22,608 (n=1) 31,728 (n=2) +40.34 %
5Mkeys-string-mget-100B-30keys-pipeline-10 io-threads-8 22,804 (n=2) 29,772 (n=2) +30.56 % σ↑
5Mkeys-string-mget-100B-5keys-pipeline-10 io-threads-8 115,835 (n=2) 126,741 (n=2) +9.42 % σ↑
5Mkeys-string-mget-512B-5keys-pipeline-10 io-threads-8 95,888 (n=1) 102,753 (n=2) +7.16 %
5Mkeys-string-mget-100B-2keys-pipeline-10 io-threads-8 602,821 (n=2) 635,721 (n=2) +5.46 % σ↑
10Mkeys-string-mget-80keys-100B-multitenant-tagged standalone 29,761 (n=2) 30,526 (n=3) +2.57 % σ↑
5Mkeys-string-mget-100B-30keys-pipeline-10 standalone 67,169 (n=2) 67,965 (n=2) +1.18 %
5Mkeys-string-mget-100B-100keys standalone 25,818 (n=2) 26,089 (n=3) +1.05 % σ↑
10Mkeys-load-string-mset-10-keys-with-100B-values standalone 109,132 (n=2) 109,808 (n=3) +0.62 % σ↑
5Mkeys-string-mget-512B-5keys-pipeline-10 standalone 170,550 (n=1) 171,282 (n=2) +0.43 %
5Mkeys-string-mget-100B-100keys-pipeline-10 standalone 26,244 (n=2) 26,266 (n=3) +0.08 %
5Mkeys-string-mget-100B-2keys-pipeline-10 standalone 701,637 (n=2) 701,509 (n=2) -0.02 %
5Mkeys-string-mget-100B-5keys-pipeline-10 standalone 351,784 (n=2) 351,625 (n=2) -0.05 %
5Mkeys-string-mget-512B-5keys standalone 129,069 (n=1) 128,976 (n=2) -0.07 %
5Mkeys-string-mget-1KiB-5keys-pipeline-10 standalone 113,092 (n=1) 112,987 (n=2) -0.09 %
5Mkeys-string-mget-100B-5keys standalone 139,258 (n=1) 139,089 (n=2) -0.12 %
5Mkeys-string-mget-100B-2keys standalone 160,049 (n=2) 159,567 (n=2) -0.30 % σ↓
5Mkeys-string-mget-1KiB-5keys-pipeline-10 io-threads-8 111,884 (n=1) 111,834 (n=2) -0.04 %
5Mkeys-string-mget-10B-100keys io-threads-8 18,356 (n=1) 18,251 (n=2) -0.57 %
5Mkeys-string-mget-100B-20keys standalone 84,033 (n=2) 83,318 (n=2) -0.85 % σ↓
5Mkeys-string-mget-1KiB-5keys standalone 114,255 (n=1) 112,518 (n=2) -1.52 %
5Mkeys-string-mget-100B-30keys standalone 69,012 (n=2) 67,441 (n=2) -2.28 % σ↓

ARM picture splits two ways: all the big wins land on ARM io-threads-8 (where IO-thread interleaving disrupts the HW prefetcher and the explicit dictPrefetchKeys warm-up does real work — 4 σ-confirmed io-threads wins +5 to +47 %), while ARM standalone is essentially flat to slightly negative on small-N MGET (Graviton4's wide HW prefetcher already absorbs the multi-key MGET case, so the explicit prefetch is overhead). The two big standalone wins on mget-10B-100keys{,-pipeline-10} are the exception — small values + many keys overwhelm even Graviton4's HW prefetcher.

Cross-check on m7i-profiler (single-dp, x86 SPR, with TMA)

Same 25-test set: 24 wins, 1 flat (−1.5 % MSET-load). Headlines reproduce: mget-10B-100keys-pipeline-10 io-threads +95.19 %, mget-100B-100keys-pipeline-10 io-threads +57.58 %, mget-100B-30keys-pipeline-10 io-threads +36.68 %.

TMA on the headline test

mget-100B-100keys-pipeline-10 / io-threads-8: L1_Bound −3.2 pp, Frontend_Bound −1.5 pp, Memory_Bound +3.5 pp (L3 +2.1 pp). Optimization signature: prefetch warming kvobj into L1 before keyCompare, fewer instructions per command, memory becomes the new ceiling at sharply higher throughput.

Notes

  • Sundb slot-reuse change in 3d724a05c is gated by server.cluster_enabled → on every benchmark above (all oss-standalone*) the new code path is functionally identical to the prior PR head; the result is a clean reproduction of the prior round's wins.
  • 2-key MGET shows a small consistent flat-to-slight-negative (−0.4 to −2.2 %), consistent with the optimization being an overhead-amortization play that needs ≥ 5 keys to win.
  • Across the 50 paired m7i-2 + m8g-2 cells: 17 σ-disjoint wins, 5 σ-disjoint flat-negatives (all ≤ −2.3 %, all ARM-standalone-small-N where Graviton4's HW prefetch dominates), 1 σ-disjoint outlier (mget-1KiB-5keys-pipeline-10 io-threads-8, spec issue noted above).

Comment thread src/t_string.c Outdated

@ShooterIT ShooterIT left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a style issue

Co-authored-by: Yuan Wang <wangyuancode@163.com>
@fcostaoliveira

Copy link
Copy Markdown
Collaborator Author

just a style issue

committed the suggestion. thank you @ShooterIT !

Comment thread src/t_string.c Outdated
Co-authored-by: Yuan Wang <wangyuancode@163.com>
@ShooterIT

Copy link
Copy Markdown
Member

@ShooterIT ShooterIT merged commit 62551a7 into redis:unstable May 11, 2026
18 checks passed
@ShooterIT ShooterIT added the release-notes indication that this issue needs to be mentioned in the release notes label May 11, 2026
@ShooterIT ShooterIT moved this from Todo to Done in Redis 8.8 May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants