Fix random `__data__` generation by paulorsousa · Pull Request #326 · redis/memtier_benchmark

paulorsousa · 2025-12-05T10:57:21Z

This PR addresses two issues in --random-data generation:

value buffer mutation wrap-around caused repeated patterns.
The mutation index is designed to avoid generating a completely new random value on every request by mutating an existing buffer.
However, once the index wrapped, the mutation pattern repeated, significantly reducing randomness.
Initial value buffer was not seeded per client, even when --distinct-client-seed was enabled.
As a result, multiple clients generated identical initial buffers, again reducing randomness across workloads.

These issues reduced entropy and could distort benchmark results or the realism of workload simulations.

What this PR changes

Random value buffers are now:

Fully regenerated when the mutation index wraps, eliminating cyclic patterns.
Initialised only after per-client seed setup, ensuring each client receives its own independent initial buffer.

Impact

Test: 100% SETs, 36 bytes random values, 50 clients (each with a different seed), 10 seconds test time.

Scenario	Values	Repetitions per Value
Before	1.2M	~150x each
After (this PR)	1.2M	1x each

How to reproduce and measure

Start redis:
redis-server --save "" --io-threads 4
Monitor commands:
redis-cli monitor > executed-cmds.log
Start memtier:
./memtier_benchmark --ratio=1:0 --data-size=36 --random-data --clients=50 --distinct-client-seed --threads=1 --test-time 10 --hide-histogram
Stop redis-cli monitor with ctrl + c (or equivalent)
Count repetitions:
sed -n 's/.* "$.*$"$/\1/p' executed-cmds.log | sort | uniq -c | sort -nr | less

Impact on performance

No significant performance degradation was observed.

The previous implementation only incremented a single byte position in the value buffer, causing values to repeat after cycling through all bytes in all buffer positions. This change regenerates completely new random data when the mutation position wraps around, giving more guarantees of random values throughout the benchmark run while preventing hurting performance too much.

…ation - Extract buffer filling logic into new fill_value_buffer() method - Remove automatic buffer filling from alloc_value_buffer() - Call fill_value_buffer() explicitly after set_random_seed() in client setup - Simplify random data generation to use gaussian_noise::get_random() - Remove /dev/urandom file descriptor (m_random_fd) and XOR logic - Remove alloc_value_buffer(const char* copy_from) overload This ensures random data is generated with the correct per-client seed rather than using the default seed during initial buffer allocation (leading to repeated values).

jit-ci · 2025-12-05T10:57:28Z

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

fcostaoliveira

LGTM. Thank you @paulorsousa !

* Improve random data generation to avoid value repetition The previous implementation only incremented a single byte position in the value buffer, causing values to repeat after cycling through all bytes in all buffer positions. This change regenerates completely new random data when the mutation position wraps around, giving more guarantees of random values throughout the benchmark run while preventing hurting performance too much. * Separate buffer allocation from filling and fix random seed initialization - Extract buffer filling logic into new fill_value_buffer() method - Remove automatic buffer filling from alloc_value_buffer() - Call fill_value_buffer() explicitly after set_random_seed() in client setup - Simplify random data generation to use gaussian_noise::get_random() - Remove /dev/urandom file descriptor (m_random_fd) and XOR logic - Remove alloc_value_buffer(const char* copy_from) overload This ensures random data is generated with the correct per-client seed rather than using the default seed during initial buffer allocation (leading to repeated values). * Add null check for value buffer in `fill_value_buffer` function * Minor formatting cleanup in obj_gen.cpp

paulorsousa added 4 commits December 4, 2025 18:27

Add null check for value buffer in fill_value_buffer function

16e7e47

Minor formatting cleanup in obj_gen.cpp

ae71d3a

paulorsousa requested a review from fcostaoliveira December 5, 2025 10:57

fcostaoliveira added the bug label Dec 5, 2025

fcostaoliveira approved these changes Dec 5, 2025

View reviewed changes

fcostaoliveira merged commit 8985eb5 into master Dec 5, 2025
39 checks passed

fcostaoliveira deleted the fix/enhance-random-data-generation branch January 2, 2026 11:48

fcostaoliveira mentioned this pull request Feb 26, 2026

Prepare for 2.2.2 version. #349

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix random `data` generation#326

Fix random `data` generation#326
fcostaoliveira merged 4 commits intomasterfrom
fix/enhance-random-data-generation

paulorsousa commented Dec 5, 2025 •

edited

Loading

Uh oh!

jit-ci bot commented Dec 5, 2025

Uh oh!

fcostaoliveira left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

paulorsousa commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR changes

Impact

How to reproduce and measure

Impact on performance

Uh oh!

jit-ci bot commented Dec 5, 2025

Uh oh!

fcostaoliveira left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

paulorsousa commented Dec 5, 2025 •

edited

Loading