Merged
Conversation
The previous implementation only incremented a single byte position in the value buffer, causing values to repeat after cycling through all bytes in all buffer positions. This change regenerates completely new random data when the mutation position wraps around, giving more guarantees of random values throughout the benchmark run while preventing hurting performance too much.
…ation - Extract buffer filling logic into new fill_value_buffer() method - Remove automatic buffer filling from alloc_value_buffer() - Call fill_value_buffer() explicitly after set_random_seed() in client setup - Simplify random data generation to use gaussian_noise::get_random() - Remove /dev/urandom file descriptor (m_random_fd) and XOR logic - Remove alloc_value_buffer(const char* copy_from) overload This ensures random data is generated with the correct per-client seed rather than using the default seed during initial buffer allocation (leading to repeated values).
|
Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
fcostaoliveira
approved these changes
Dec 5, 2025
Collaborator
fcostaoliveira
left a comment
There was a problem hiding this comment.
LGTM. Thank you @paulorsousa !
fcostaoliveira
pushed a commit
that referenced
this pull request
Jan 5, 2026
* Improve random data generation to avoid value repetition The previous implementation only incremented a single byte position in the value buffer, causing values to repeat after cycling through all bytes in all buffer positions. This change regenerates completely new random data when the mutation position wraps around, giving more guarantees of random values throughout the benchmark run while preventing hurting performance too much. * Separate buffer allocation from filling and fix random seed initialization - Extract buffer filling logic into new fill_value_buffer() method - Remove automatic buffer filling from alloc_value_buffer() - Call fill_value_buffer() explicitly after set_random_seed() in client setup - Simplify random data generation to use gaussian_noise::get_random() - Remove /dev/urandom file descriptor (m_random_fd) and XOR logic - Remove alloc_value_buffer(const char* copy_from) overload This ensures random data is generated with the correct per-client seed rather than using the default seed during initial buffer allocation (leading to repeated values). * Add null check for value buffer in `fill_value_buffer` function * Minor formatting cleanup in obj_gen.cpp
fcostaoliveira
pushed a commit
that referenced
this pull request
Jan 5, 2026
* Improve random data generation to avoid value repetition The previous implementation only incremented a single byte position in the value buffer, causing values to repeat after cycling through all bytes in all buffer positions. This change regenerates completely new random data when the mutation position wraps around, giving more guarantees of random values throughout the benchmark run while preventing hurting performance too much. * Separate buffer allocation from filling and fix random seed initialization - Extract buffer filling logic into new fill_value_buffer() method - Remove automatic buffer filling from alloc_value_buffer() - Call fill_value_buffer() explicitly after set_random_seed() in client setup - Simplify random data generation to use gaussian_noise::get_random() - Remove /dev/urandom file descriptor (m_random_fd) and XOR logic - Remove alloc_value_buffer(const char* copy_from) overload This ensures random data is generated with the correct per-client seed rather than using the default seed during initial buffer allocation (leading to repeated values). * Add null check for value buffer in `fill_value_buffer` function * Minor formatting cleanup in obj_gen.cpp
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR addresses two issues in
--random-datageneration:value buffer mutation wrap-around caused repeated patterns.
The mutation index is designed to avoid generating a completely new random value on every request by mutating an existing buffer.
However, once the index wrapped, the mutation pattern repeated, significantly reducing randomness.
Initial value buffer was not seeded per client, even when
--distinct-client-seedwas enabled.As a result, multiple clients generated identical initial buffers, again reducing randomness across workloads.
These issues reduced entropy and could distort benchmark results or the realism of workload simulations.
What this PR changes
Random value buffers are now:
Impact
Test: 100% SETs, 36 bytes random values, 50 clients (each with a different seed), 10 seconds test time.
How to reproduce and measure
redis-server --save "" --io-threads 4redis-cli monitor > executed-cmds.log./memtier_benchmark --ratio=1:0 --data-size=36 --random-data --clients=50 --distinct-client-seed --threads=1 --test-time 10 --hide-histogramredis-cli monitorwith ctrl + c (or equivalent)sed -n 's/.* "\(.*\)"$/\1/p' executed-cmds.log | sort | uniq -c | sort -nr | lessImpact on performance
No significant performance degradation was observed.