Redis SCAN is a versatile command that enables powerful keyspace iteration. This comprehensive guide dives deep into SCAN internals, usage patterns, customization tips, and integration with other Redis features. Follow these best practices to expertly harness SCAN for scaling, administration, and analytics.

Diving Into SCAN Internals

Let‘s first understand what happens behind the scenes when SCAN executes:

  1. Key hash – Redis hashes every key name into a 64-bit integer value using the CRC64 algorithm. This hash determines key order in the internal dictionary.

  2. Ordered set – Keys are stored ordered by hash value in a special internal dictionary structure optimized for insertion, deletion and scan operations.

  3. Cursor movements – On every call, SCAN uses the cursor to seek to a position in the ordered set per the specified COUNT, returning keys in increasing hash order.

  4. Deduplication – Redis 6 introduced automatic deduplication in SCAN results across calls in case hash collisions cause repetition.

So in essence, SCAN allows moving a cursor across an internal ordered key structure to incrementally return keys. The cursor tracks scan progress across calls.

Knowing these internals helps tune SCAN behavior by targeting specific parameters.

Order of Results

Given the hash ordering, expect key results in pseudorandom but consistent order per database. Results for same cursor will be identical unless keys are added/updated/deleted between calls.

COUNT Limits

COUNT determines how many keys to return from current cursor position before updating it. Adapt this value to balance latency and memory utilization.

Cursor Expiry

Cursors automatically expire after 5 minutes if a scan iteration is interrupted. This prevents stale cursors from continuing an old scan.

Now let‘s analyze SCAN performance…

Analyzing SCAN Performance

SCAN scans keys at Redis speed since it reads from primary memory. However, some tips for optimizing large scans:

Client-Side

  • Use pipelining to batch multiple SCAN calls reducing round trips
  • Parallelize by scanning different key ranges concurrently using LUA scripting
  • Persist cursors to resume partial iterations after crashes

Server-Side

  • For persistence-heavy workloads, disable fsync and AOF logging to speed up background writes
  • Profile memory overhead by tracking allocator fragmentation via INFO keyspace
  • Set CPU affinity to isolate SCAN thread(s) from other read/write operations

Since SCAN distributes work across calls, throughput is generally consistent even on huge databases. Latency for a single SCAN depends on density of keys around the cursor.

Now let‘s explore SCAN use cases…

Advanced Use Cases

So far we‘ve covered basic keyspace iteration. Now let‘s tackle some advanced SCAN usage patterns.

Range Scans

Scan a subset of key range by prefix using MATCH, for sharding huge keyspaces:

SCAN 0 MATCH ‘users:100*‘ COUNT 1000

This scans starting from user id 100,000 in shard containing user ids between 100,000 to 199,999.

Target Point Scans

Seek right into a position in ordered set by targeting key hashes:

SCAN 79695648597152144 COUNT 100

Hash the expected key to get the entry point cursor position.

Read/Write Race Handling

During iteration, new keys can enter due to parallel writes leading to potential misses or duplication. Some mitigation options:

  1. Lock keyspace for writes using Redis transactions with WATCH during scan

  2. Sort merged post-scan results server-side to handle duplicates

  3. Compare cursor positions after scan to indicate intervening writes

So plan races handling as needed by application.

Atomic Range Deletes

Scan keys within key range matching timestamp criteria, deleting atomically via Lua:

local keys = {} 
local cursor = ‘0‘  
repeat
    local r = redis.call(‘SCAN‘, cursor, ‘MATCH‘, ‘event:*‘, ‘COUNT‘, 100)
    cursor = r[1]
    keys = table.concat(keys, r[2]) 
until cursor == ‘0‘
redis.call(‘DEL‘, unpack(keys)) 

This atomically deletes a key range avoiding participate race conditions.

So in addition to basic usage, SCAN enables ordered range scans, targeted seeks, race handling, atomic multi-key operations etc. Make sure to validate correctness in concurrent production environments.

Now let‘s look at customizing and parameterizing scans…

Customizing SCAN Parameters

We discussed the basics of COUNT and MATCH earlier. Beyond those, there are additional ways to customize and parameterize scans:

Dynamic COUNT

Adapt COUNT dynamically based on current server load or latency by wrapping scan in Lua:

local count = (redis.call(‘TIME‘)[1] % 100) + 100 
redis.call(‘SCAN‘, cursor, ‘COUNT‘, count)

Here we vary COUNT between 100-200 proportional to current time.

External Filters

SCAN itself only does prefix pattern filtering. For more complex filters like uniqueness, datatype checking etc, post-process scan results externally.

Scan Progress Info

Track scan progress by storing stats like total keys scanned, time elapsed etc. Calculate completion ratio from total database keys in INFO keyspace for progress bar.

Traffic Shaping

Introduce delays between pipelined SCAN calls to maintain steady bandwidth usage:

while more_results do
   scan()

   -- Traffic shaping
   sleep(10) 
end

Tune the sleep based on targettraffic profile.

So parameterize scans to suit evolving runtime needs beyond standard options.

Next, let‘s integrate SCAN with streams for building event pipelines.

Integrating SCAN with Streams

Redis streams act as persistent message queues supporting a range of use cases. SCAN integration unlocks streaming scan progress events for long running scans.

Here is one pattern for integration:

  1. Create scoped progress stream: XADD scan-progress * ...

  2. Pipeline SCAN ops with each call publishing progress event to the stream.

     XADD scan-progress * cursor 72392 count 10000  
  3. Consume scan-progress stream to show status.

Benefits include:

  • Visibility into scan position across long iterations
  • Failure handling by restarting iteration from last received cursor
  • Pause and resume by saving last consumed event offset

Streams durable storage prevents loss of state during failures. Make sure to delete standalone cursors once consumed from streams.

So streams integration provides visibility and tooling around long scans by persisting progress via events.

Alternatives to Redis SCAN

Despite its usefulness, sometimes alternatives better serve specific niche needs:

Lua Foreach

For smaller keyspaces, iterating via Lua avoids network overhead:

local keys = redis.call(‘KEYS‘, ‘*‘)
for i,k in ipairs(keys) do
    -- operate on k
end

But this still blocks, so unsuitable for large keyspaces.

Read Replicas

Offload scan to read replicas to prevent interference on primary write traffic. However, this increases costs and still blocks reads during iterations.

Dedicated Redis

For databases exceeding 500 million keys, having a dedicated Redis for scans can prevent interference with primary application traffic.

Use federated engines like Redis CRDTs to sync keyspaces between primary and scan shards.

So while SCAN hits the sweet spot for a majority of use cases, consider alternatives for niche situations.

Conclusion: SCAN Best Practices

Here is a summary of SCAN best practices based on everything we have covered:

Robustness

  1. Persist cursors and handle failures
  2. Validate counts before passing to SCAN
  3. Wrap in protected mode for stability

Customization

  1. Tune COUNT intelligently
  2. Use MATCH filters judiciously
  3. Shape traffic between pipelined calls

Monitoring

  1. Collect latency and traffic stats
  2. Check memory overheads via INFO
  3. Integrate with streams for observability

Optimization

  1. Parallelize with Lua multi-key calls
  2. Pipeline batched calls and commands
  3. Shift CPU affinity for SCAN threads

So that wraps up this comprehensive guide to mastering Redis SCAN. Follow these tips to gain expertise in leveraging this versatile command for scaling, administration and analytics.

Similar Posts