Shuffle an Array Using Modern C++ STL

Shuffling looks trivial until you ship a feature that quietly favors the first half of your data. I once pushed a playlist service where early tracks repeated because I seeded with the system clock and used rand(). Users noticed before monitoring did. Since then I’ve treated array shuffling as a first-class task: it deserves a real random engine, predictable APIs, and testable outcomes. In this longform guide I walk through shuffling arrays in C++ with the Standard Library as it stands in January 2026, explain why older approaches disappeared, and show patterns that keep randomness fair, debuggable, and fast.

Why shuffling still matters in 2026

Random order drives card games, load tests, A/B allocations, batch job sampling, data augmentation for ML, and even subtle UX touches like randomized onboarding tips. Every one of those needs two guarantees: each permutation is equally likely, and you can reproduce a run when debugging. The STL gives you that through std::shuffle and its ranges sibling std::ranges::shuffle. The older std::random_shuffle is gone because it depended on rand() and produced correlated sequences; compilers removed it when the C++17 standard took effect.

A quick tour of modern C++ randomness

C++11 introduced the header with two pieces you wire together:

  • Engines: deterministic number generators such as std::mt19937 or std::mt19937_64 that model UniformRandomBitGenerator (URBG).
  • Distributions: adapters that reshape engine output. std::shuffle hardcodes a uniform integer distribution internally, so you only supply the engine.

Engines must provide min(), max(), and operator(). Passing a plain function that returns an int will fail because it doesn’t meet the URBG concept. Wrapping a legacy source is possible but usually a smell—prefer the standard engines unless you have a vetted PRNG you must keep.

How std::shuffle works under the hood

The algorithm implements Durstenfeld’s in-place Fisher–Yates: starting from the end, pick a uniform integer in [0, i] and swap. That’s why complexity is linear and why the quality of the generator dictates fairness. Because the engine is injected, you decide whether the run is deterministic (fixed seed) or non-deterministic (std::random_device). The algorithm’s unbiased property rests entirely on the uniformity of that injected URBG.

#include 

#include

#include

#include

#include

int main() {

std::array ids{1,2,3,4,5,6,7};

// Non-deterministic seed when available; falls back to steady_clock.

std::random_device rd;

auto seed = rd.entropy() ? rd() : static_cast(

std::chrono::steadyclock::now().timesince_epoch().count());

std::mt19937 rng(seed);

std::shuffle(ids.begin(), ids.end(), rng);

for (int v : ids) std::cout << v << ' ';

std::cout << '\n';

}

Ranges-era shuffling: cleaner intent

C++20 added std::ranges::shuffle, which removes iterator boilerplate and accepts any random-access range. It pairs nicely with std::span for shuffling C-style arrays without copying.

#include 

#include

#include

#include

#include

void shuffle_span(std::span data, std::mt19937& rng) {

std::ranges::shuffle(data, rng);

}

int main() {

int raw[] {10, 20, 30, 40};

std::mt19937 rng(std::random_device{}());

shuffle_span(raw, rng);

for (int v : raw) std::cout << v << ' ';

std::cout << '\n';

}

Key benefit: the signature shuffle_span(std::span, std::mt19937&) makes it obvious the function mutates the caller’s memory, not a copy. The ranges version keeps the same probability guarantees as the classic overload.

What happened to std::random_shuffle

If you compile with -std=c++17 or newer, std::random_shuffle is missing. GCC, Clang, and MSVC aligned with the standard: the algorithm was deprecated in C++14 and removed in C++17 because it allowed an unspecified random source (often rand()), which is low quality and not thread-safe. Static analysis tools such as clang-tidy include modernize-replace-random-shuffle to nudge you toward std::shuffle.

Quick comparison

Aspect

std::shuffle

std::random_shuffle

— Standard status

Supported (C++11–now)

Removed in C++17 Randomness source

Your URBG (e.g., mt19937)

Implementation-defined (often rand) Determinism control

Full (choose seed)

Limited Ranges overload

Yes (std::ranges::shuffle)

No Lint support

Encouraged by modernize checks

Flagged for replacement

Seeding patterns that hold up in 2026

Seeding is where most fairness bugs originate. I keep four patterns handy:

1) Reproducible runs for tests: accept a seed argument and fall back to a fixed default.

std::mt19937 maketestrng(std::optional seed = std::nullopt) {

return seed ? std::mt19937(*seed) : std::mt19937(0x5bd1e995u);

}

2) High-entropy runtime seed: draw a full state from std::randomdevice via std::seedseq. Using one 32-bit word under-seeds the engine; filling the entire state improves quality.

std::mt19937 makeruntimerng() {

std::random_device rd;

std::seed_seq seq{rd(), rd(), rd(), rd(), rd(), rd(), rd(), rd(), rd(), rd()};

return std::mt19937(seq);

}

3) Stable seed for distributed jobs: derive from job metadata (user ID, day stamp) to ensure consistent bucketing across services.

4) Security-adjacent but non-crypto: if you only need unpredictability against casual inspection, combine multiple entropy sources (PID, monotonic clock ticks, hardware nonce) into a std::seed_seq. This is still not cryptographic—use OS CSPRNG APIs when you need that.

On modern Linux kernels the entropy pool refills quickly and std::randomdevice is backed by a non-blocking CSPRNG, so the “randomdevice hangs at boot” folklore is largely obsolete. Windows and macOS expose similarly non-blocking providers in their libstdc++/libc++ bindings.

Seed lifecycle tips

  • Seed once per component and reuse the engine. Re-seeding per call usually skews results and wastes CPU.
  • Log seeds for reproducibility; emit them alongside request IDs or test names.
  • Treat seed size seriously: std::mt19937 holds 624 words of state; supplying only one 32-bit seed leaves much of that state correlated. std::seed_seq solves this.

Performance notes without the buzzwords

  • Complexity is O(n); cache behavior matters more than engine choice for typical array sizes.
  • std::mt19937 is fast enough for UI work and batch shuffles up to millions of items. For tight micro-benchmarks, keep a single engine alive and pass it by reference; constructing per call is the common culprit.
  • std::mt19937_64 doubles state size; prefer it only if you explicitly need the 64-bit period properties.
  • Avoid re-seeding inside tight loops; seed once, reuse many times.
  • For trivially copyable data, shuffling an index vector and then viewing the data through that index avoids moving large objects.
  • On NUMA systems, consider shuffling per node then merging if the consumer is NUMA-aware; cross-node swaps can dominate runtime.

Microbenchmark skeleton

Use a lightweight harness to compare engines:

#include 

#include

#include

#include

static void BMShufflemt19937(benchmark::State& state) {

std::vector v(state.range(0));

std::iota(v.begin(), v.end(), 0);

std::mt19937 rng(123456u);

for (auto _ : state) {

std::ranges::shuffle(v, rng);

benchmark::ClobberMemory();

}

}

BENCHMARK(BMShufflemt19937)->Range(1<<10, 1<<20);

BENCHMARK_MAIN();

Focus on relative trends: whether reusing engines, moving to spans, or switching layouts gives meaningful wins for your workload.

Common mistakes I still see

  • Using rand(): low quality and shared global state, which breaks concurrency.
  • Seeding with time(0): one-second granularity means identical results for requests arriving within the same second.
  • Calling std::shuffle with std::defaultrandomengine without a fixed seed: different standard libraries seed that engine differently, so results vary across platforms.
  • Passing a function to std::shuffle instead of a URBG object: fails to compile because the algorithm queries min()/max().
  • Forgetting move semantics: if you shuffle containers of move-only types, ensure the type is permutable; otherwise the algorithm won’t compile.
  • Shuffling input ranges: std::ranges::shuffle requires random-access ranges; passing a singly linked list will be ill-formed.
  • Re-seeding inside loops: regenerating the engine every iteration destroys statistical quality and slows everything down.
  • Copying engines by value: copying mt19937 is fine but duplicating it mid-shuffle can lead to identical sequences if seeds are equal; pass by reference unless you truly want independent streams.

Real-world recipe: shuffling a playlist service

I often use this pattern for deterministic yet fair playback:

struct Shuffler {

explicit Shuffler(uint64t userid)

: rng(staticcast(user_id ^ 0x9e3779b97f4a7c15ULL)) {}

template

void operator()(Range&& r) {

staticassert(std::ranges::randomaccess_range);

std::ranges::shuffle(r, rng_);

}

private:

std::mt19937 rng_;

};

// Usage

std::vector trackids = loadtracks(user);

Shuffler(user.id)(track_ids);

This yields stable order per user (good for A/B tests) yet varies across users. When I need a new permutation, I rotate the seed with a play-count counter. If customer support asks for a reproduction, I log both the user id and the counter so I can rebuild the engine state.

Testing and observability

  • Snapshot tests: given a fixed seed, verify the exact permutation to guard against accidental engine changes.
  • Distribution checks: for fairness-sensitive paths (loot tables, ads) run chi-squared or KS tests offline to ensure uniformity across many trials.
  • Logging seeds: emit the seed along with request IDs so you can replay issues.
  • Linting: enable clang-tidy -checks=modernize-replace-random-shuffle to prevent regressions.
  • Property-based tests: assert invariants like permutation membership: the multiset of elements is unchanged after shuffling.

Example property test with Catch2

TEST_CASE("shuffle preserves multiset") {

std::vector data{1,2,3,4,5};

auto original = data;

std::mt19937 rng(1337);

std::ranges::shuffle(data, rng);

REQUIRE(std::is_permutation(data.begin(), data.end(), original.begin()));

}

When not to use these tools

If you need cryptographic unpredictability, the header is not enough. Use a vetted crypto library (platform CSPRNG APIs, libsodium) and perform the shuffle by sampling indices with those generators. For highly parallel GPU workloads, prefer domain-specific shuffle kernels to avoid host-device transfers. For streaming data too large to fit in memory, use reservoir sampling or partial Fisher–Yates instead of materializing the entire array.

New in the C++23–C++26 window

C++23 became official in 2024 and did not change std::shuffle, but it expanded ranges heavily and clarified borrowed_range rules. That matters because std::ranges::shuffle now works more smoothly with views that own or borrow data, reducing lifetime footguns. Papers in flight for C++26 refine range adaptors and execution policies; the shuffle contract is stable, so code you write today should carry forward. The same binaries you ship in 2026 will run unmodified on future standard-library implementations as long as you avoid removed legacy APIs.

C-style arrays and interop

std::ranges::shuffle plus std::span lets you mutate legacy buffers without copying. Remember that a span’s extent is compile-time for arrays and runtime for pointers; you still need the size when constructing a span from a pointer/length pair.

void shufflecbuffer(int* data, std::size_t n, std::mt19937& rng) {

std::span view{data, n};

std::ranges::shuffle(view, rng);

}

If the buffer originates from another language (Python C API, JNI), make sure ownership and lifetime are stable during the shuffle.

Exception safety and strong guarantees

std::shuffle provides the basic guarantee: if your element swap throws, the container may end up partially shuffled. For move-only types that can throw on move, consider shuffling indices instead of the objects themselves, or wrap swaps in std::swap specializations that are noexcept. For types with expensive moves, shuffling indices also reduces data movement.

Concurrency and multi-threading

Engines are not thread-safe. Give each thread its own engine, seeded from a std::seed_seq split across threads. Avoid sharing a single std::mt19937 with locks; the state is large and lock contention eclipses any benefit. A common pattern is to build a thread-local engine:

threadlocal std::mt19937 tlsrng = makeruntimerng();

void shufflethreadlocal(auto& vec) {

std::ranges::shuffle(vec, tls_rng);

}

If you must coordinate across nodes (e.g., distributed simulations), derive seeds from deterministic components (job id, rank, epoch) so that replay is possible. For forked processes, reseed after fork() to avoid identical child streams.

Embedded and size-constrained builds

std::mt19937 uses roughly 2.5 KB of state. On microcontrollers that hurts. Alternatives:

  • std::minstd_rand: tiny state, weaker quality but acceptable for UI or non-critical animations.
  • Custom XORSHIFT or PCG packaged as URBGs; ensure you expose min(), max(), and operator() and document the statistical quality.
  • If flash is limited, keep the engine state in static storage and reuse across modules to avoid multiple copies.

constexpr and compile-time shuffles

std::shuffle is not constexpr because engines aren’t. If you need a compile-time permutation, generate it with std::index_sequence or template metaprogramming and apply at compile time; treat runtime shuffling separately. Example: build a compile-time lookup table, then at runtime only reorder indices that reference it.

Custom URBGs for domain needs

You can adapt domain PRNGs by implementing the URBG interface. Example: a PCG32 wrapper.

class pcg32 {

public:

using resulttype = std::uint32t;

explicit pcg32(std::uint64t seed = 54u) { seed = seed + 0x853c49e6748fea9bull; step(); }

static constexpr result_type min() { return 0; }

static constexpr resulttype max() { return std::numericlimits::max(); }

resulttype operator()() { step(); return output; }

private:

std::uint64t state{};

resulttype output{};

void step() {

state = state * 6364136223846793005ULL + 1u;

auto xorshifted = staticcast<resulttype>(((state >> 18u) ^ state) >> 27u);

auto rot = state_ >> 59u;

output_ = (xorshifted >> rot) | (xorshifted << ((-rot) & 31));

}

};

Plug it into std::shuffle just like mt19937. Document the statistical properties so your future self knows whether it’s suitable for fairness-critical paths. For teams with regulatory requirements (gambling, ads), keep a short README beside the wrapper with links to the generator’s test suite results.

Observability in production

Log the seed and the engine type whenever a shuffle materially affects user experience. Example structured log:

{"event":"playlist_shuffle","seed":4123987,"engine":"mt19937","size":57}

With that, reproducing a user issue is deterministic: reconstruct the engine with the same seed and run the same shuffle. If privacy rules apply, hash user identifiers before logging but keep the seed clear.

Property-based testing

Use frameworks like RapidCheck, Hypothesis for C++, or Catch2 generators to assert invariants: the shuffled range is a permutation (multiset equality) and every element remains present. With a fixed seed you get deterministic failures; with randomized seeds you get broader exploration. Record failing seeds so reruns are trivial.

Handling partial shuffles

Sometimes you only need the first k random elements (e.g., sampling). Instead of shuffling the entire array, run Fisher–Yates for i from n-1 down to n-k, stopping early. std::shuffle doesn’t expose that knob, but a short custom loop does. It preserves uniformity over combinations of size k.

template

void shuffleprefix(RAIt first, RAIt last, std::sizet k, URBG& g) {

auto n = staticcast<std::sizet>(last - first);

for (std::size_t i = 0; i i; ++i) {

std::uniformintdistribution dist(i, n - 1);

std::swap(first[i], first[dist(g)]);

}

}

Great for sampling users for a survey or selecting a mini-batch for training without paying O(n) swaps.

Memory layout choices

For POD-heavy structures, consider shuffling indices while keeping data in SoA (structure of arrays) layout. That keeps cache lines hot and avoids moving fat objects. When the consumer can tolerate an indirection, a shuffled index view can be more performant than shuffling the records themselves. Example: keep std::vector stable and shuffle a std::vector of indices for randomized iteration.

Interoperability with ranges views

Views are lazy; shuffling requires ownership. Convert to a materialized container (std::vector, std::ranges::to) before shuffling, or ensure the view is a borrowed_range that refers to stable storage. C++23’s improvements to borrowed ranges make this less error-prone. If you pipe a view into std::ranges::shuffle and it fails to compile, materialize with auto vec = range | std::ranges::to(); then shuffle.

Benchmarking your shuffle

When optimizing, measure:

  • Time per shuffle at target sizes.
  • Cache misses (use perf stat -e cache-misses on Linux).
  • Whether reusing engines reduces allocations.
  • Effects of index-shuffle vs data-shuffle.

Keep benchmarks near the code (e.g., bench/ folder) and wire them into CI so regressions surface early.

Debugging tips

  • Print the first few outputs of your engine when seeds differ: divergent outputs confirm distinct streams.
  • If results look non-uniform, check for accidental modulo bias from custom code; std::shuffle already handles uniform integer distribution.
  • In tests, pin both the engine type and seed; swapping mt19937 for defaultrandomengine can change outputs and break snapshots.

API design patterns

  • Accept an engine reference in library functions instead of seeding internally; callers keep control over determinism.
  • Provide overloads that create a default engine for convenience, but document that output will differ across runs.
  • Name parameters explicitly: shuffle(items, rng) reads better than shuffle(items); when determinism matters.

Alternative approaches and when to use them

  • Reservoir sampling: when the full dataset doesn’t fit in memory or is a stream; produces a uniform sample without full shuffle.
  • Partial Fisher–Yates: when only k elements are needed; cheaper than full shuffle.
  • Sorting by random key: std::ranges::sort on random keys is tempting but biased if keys collide; avoid it unless you use a 128-bit key with collision checks.
  • Parallel shuffles: for very large arrays, split into blocks, shuffle independently, then shuffle the order of blocks; works if cross-block ordering need not be fully uniform. For exact global uniformity, use a parallel Fisher–Yates variant with care.

Security considerations

  • engines are not cryptographically secure. Avoid them for security tokens, lottery draws with monetary value, or adversarial settings.
  • For audit trails, log seed derivation logic and engine choice.
  • Avoid predictable seeds like timestamps or incremental counters when adversaries can observe multiple outcomes.

Language and toolchain notes (GCC/Clang/MSVC)

  • All major compilers ship std::ranges::shuffle since their C++20 libraries stabilized.
  • MSVC’s std::random_device is backed by BCrypt RNG; Clang/libc++ on macOS uses /dev/urandom; libstdc++ on Linux uses getrandom when available.
  • -fno-exceptions builds: std::shuffle is still usable, but your types must not throw during swap.
  • Sanitizers: -fsanitize=thread helps confirm you’re not sharing engines across threads.

Tooling: linters and formatters

  • clang-tidy check modernize-replace-random-shuffle catches legacy calls.
  • clang-format leaves shuffle calls alone; ensure you don’t accidentally format away important thread_local qualifiers.
  • Add a short comment near custom URBGs to satisfy security reviews.

Documentation and onboarding

When handing off code, document:

  • Engine choice and why (e.g., period length, compatibility).
  • Seeding scheme (entropy sources, reproducibility rules).
  • Testing strategy (snapshot seeds, property tests).
  • Performance expectations (target sizes and budgets).

New team members often assume randomness is “just call shuffle”; having a two-paragraph doc prevents accidental regressions.

Practical scenarios

1) Game loot tables: shuffle indices of loot then iterate; log seed per match for anti-cheat review.

2) Batching for ML: shuffle a vector of indices each epoch; seed from epoch number for deterministic resumption.

3) UI experiments: shuffle card tiles client-side with a per-session seed stored in local storage; send seed in telemetry to replay bugs.

4) Load shedding: shuffle a list of backend hosts before attempting connections to spread traffic evenly.

5) Data migration: shuffle row IDs before processing to reduce hotspotting on clustered indexes.

Extending to other containers

std::ranges::shuffle works for any random-access range: std::vector, std::array, std::deque (random-access iterators), and std::basic_string. For fixed-size std::array, prefer passing by reference to avoid copies. For std::deque, be aware of non-contiguous storage; cache behavior may differ but correctness holds.

Handling very large datasets

  • Shuffle indices, not records, to minimize movement.
  • Chunk-based shuffles: shuffle each chunk, then shuffle chunk order; acceptable when strict global uniformity isn’t required.
  • External-memory approaches: write blocks to disk with random order metadata, then stream.

Stable interfaces for libraries

If you publish a library function shuffleinplace, freeze its engine choice or expose the engine type in the signature to avoid silent behavioral changes. Users who snapshot outputs in tests will thank you.

Putting it all together: reusable utility

template <std::ranges::randomaccessrange Range,

class URBG = std::mt19937>

void shuffleinplace(Range&& r, URBG& g) {

std::ranges::shuffle(r, g);

}

template <std::ranges::randomaccessrange Range>

void shuffleinplace(Range&& r) {

static threadlocal std::mt19937 tlsrng = makeruntimerng();

std::ranges::shuffle(r, tls_rng);

}

The two-overload pattern offers deterministic control for callers who have an engine, and convenience for callers who don’t care. Document that the convenience overload will yield different results across runs.

Checklist before shipping

  • Using std::shuffle or std::ranges::shuffle, not deprecated APIs.
  • Engine seeded once, reused; seed logged when reproducibility matters.
  • Thread-local or per-thread engines for concurrent contexts.
  • Tests pin engine type and seed; property tests ensure permutation validity.
  • Performance measured at realistic sizes; no per-call re-seeding.
  • Documentation notes engine, seed strategy, and reproducibility story.

Closing thoughts

Shuffling isn’t glamorous, but bias sneaks in wherever randomness meets production. Modern C++ makes it straightforward to get right: pick a good engine, seed it thoughtfully, reuse it, and expose the seed for replay. Whether you’re randomizing a card deck, distributing load, or sampling data for a model, treating shuffle as a deliberate API keeps users from seeing patterns where none should exist. The STL tools are stable through the C++23–C++26 window, so the investment you make today will keep paying off. Now go delete that rand() call and sleep better.

Scroll to Top