The Definitive Guide to Redis SCAN

Redis SCAN is a versatile command that enables powerful keyspace iteration. This comprehensive guide dives deep into SCAN internals, usage patterns, customization tips, and integration with other Redis features. Follow these best practices to expertly harness SCAN for scaling, administration, and analytics.

Diving Into SCAN Internals

Let‘s first understand what happens behind the scenes when SCAN executes:

Key hash – Redis hashes every key name into a 64-bit integer value using the CRC64 algorithm. This hash determines key order in the internal dictionary.
Ordered set – Keys are stored ordered by hash value in a special internal dictionary structure optimized for insertion, deletion and scan operations.
Cursor movements – On every call, SCAN uses the cursor to seek to a position in the ordered set per the specified COUNT, returning keys in increasing hash order.
Deduplication – Redis 6 introduced automatic deduplication in SCAN results across calls in case hash collisions cause repetition.

So in essence, SCAN allows moving a cursor across an internal ordered key structure to incrementally return keys. The cursor tracks scan progress across calls.

Knowing these internals helps tune SCAN behavior by targeting specific parameters.

Order of Results

Given the hash ordering, expect key results in pseudorandom but consistent order per database. Results for same cursor will be identical unless keys are added/updated/deleted between calls.

COUNT Limits

COUNT determines how many keys to return from current cursor position before updating it. Adapt this value to balance latency and memory utilization.

Cursor Expiry

Cursors automatically expire after 5 minutes if a scan iteration is interrupted. This prevents stale cursors from continuing an old scan.

Now let‘s analyze SCAN performance…

Analyzing SCAN Performance

SCAN scans keys at Redis speed since it reads from primary memory. However, some tips for optimizing large scans:

Client-Side

Use pipelining to batch multiple SCAN calls reducing round trips
Parallelize by scanning different key ranges concurrently using LUA scripting
Persist cursors to resume partial iterations after crashes

Server-Side

For persistence-heavy workloads, disable fsync and AOF logging to speed up background writes
Profile memory overhead by tracking allocator fragmentation via INFO keyspace
Set CPU affinity to isolate SCAN thread(s) from other read/write operations

Since SCAN distributes work across calls, throughput is generally consistent even on huge databases. Latency for a single SCAN depends on density of keys around the cursor.

Now let‘s explore SCAN use cases…

Advanced Use Cases

So far we‘ve covered basic keyspace iteration. Now let‘s tackle some advanced SCAN usage patterns.

Range Scans

Scan a subset of key range by prefix using MATCH, for sharding huge keyspaces:

SCAN 0 MATCH ‘users:100*‘ COUNT 1000

This scans starting from user id 100,000 in shard containing user ids between 100,000 to 199,999.

Target Point Scans

Seek right into a position in ordered set by targeting key hashes:

SCAN 79695648597152144 COUNT 100

Hash the expected key to get the entry point cursor position.

Read/Write Race Handling

During iteration, new keys can enter due to parallel writes leading to potential misses or duplication. Some mitigation options:

Lock keyspace for writes using Redis transactions with WATCH during scan
Sort merged post-scan results server-side to handle duplicates
Compare cursor positions after scan to indicate intervening writes

So plan races handling as needed by application.

Atomic Range Deletes

Scan keys within key range matching timestamp criteria, deleting atomically via Lua:

local keys = {} 
local cursor = ‘0‘  
repeat
    local r = redis.call(‘SCAN‘, cursor, ‘MATCH‘, ‘event:*‘, ‘COUNT‘, 100)
    cursor = r[1]
    keys = table.concat(keys, r[2]) 
until cursor == ‘0‘
redis.call(‘DEL‘, unpack(keys))

This atomically deletes a key range avoiding participate race conditions.

So in addition to basic usage, SCAN enables ordered range scans, targeted seeks, race handling, atomic multi-key operations etc. Make sure to validate correctness in concurrent production environments.

Now let‘s look at customizing and parameterizing scans…

Customizing SCAN Parameters

We discussed the basics of COUNT and MATCH earlier. Beyond those, there are additional ways to customize and parameterize scans:

Dynamic COUNT

Adapt COUNT dynamically based on current server load or latency by wrapping scan in Lua:

local count = (redis.call(‘TIME‘)[1] % 100) + 100 
redis.call(‘SCAN‘, cursor, ‘COUNT‘, count)

Here we vary COUNT between 100-200 proportional to current time.

External Filters

SCAN itself only does prefix pattern filtering. For more complex filters like uniqueness, datatype checking etc, post-process scan results externally.

Scan Progress Info

Track scan progress by storing stats like total keys scanned, time elapsed etc. Calculate completion ratio from total database keys in INFO keyspace for progress bar.

Traffic Shaping

Introduce delays between pipelined SCAN calls to maintain steady bandwidth usage:

while more_results do
   scan()

   -- Traffic shaping
   sleep(10) 
end

Tune the sleep based on targettraffic profile.

So parameterize scans to suit evolving runtime needs beyond standard options.

Next, let‘s integrate SCAN with streams for building event pipelines.

Integrating SCAN with Streams

Redis streams act as persistent message queues supporting a range of use cases. SCAN integration unlocks streaming scan progress events for long running scans.

Here is one pattern for integration:

Create scoped progress stream: XADD scan-progress * ...
Pipeline SCAN ops with each call publishing progress event to the stream.
```
 XADD scan-progress * cursor 72392 count 10000  
```
Consume scan-progress stream to show status.

Benefits include:

Visibility into scan position across long iterations
Failure handling by restarting iteration from last received cursor
Pause and resume by saving last consumed event offset

Streams durable storage prevents loss of state during failures. Make sure to delete standalone cursors once consumed from streams.

So streams integration provides visibility and tooling around long scans by persisting progress via events.

Alternatives to Redis SCAN

Despite its usefulness, sometimes alternatives better serve specific niche needs:

Lua Foreach

For smaller keyspaces, iterating via Lua avoids network overhead:

local keys = redis.call(‘KEYS‘, ‘*‘)
for i,k in ipairs(keys) do
    -- operate on k
end

But this still blocks, so unsuitable for large keyspaces.

Read Replicas

Offload scan to read replicas to prevent interference on primary write traffic. However, this increases costs and still blocks reads during iterations.

Dedicated Redis

For databases exceeding 500 million keys, having a dedicated Redis for scans can prevent interference with primary application traffic.

Use federated engines like Redis CRDTs to sync keyspaces between primary and scan shards.

So while SCAN hits the sweet spot for a majority of use cases, consider alternatives for niche situations.

Conclusion: SCAN Best Practices

Here is a summary of SCAN best practices based on everything we have covered:

Robustness

Persist cursors and handle failures
Validate counts before passing to SCAN
Wrap in protected mode for stability

Customization

Tune COUNT intelligently
Use MATCH filters judiciously
Shape traffic between pipelined calls

Monitoring

Collect latency and traffic stats
Check memory overheads via INFO
Integrate with streams for observability

Optimization

Parallelize with Lua multi-key calls
Pipeline batched calls and commands
Shift CPU affinity for SCAN threads

So that wraps up this comprehensive guide to mastering Redis SCAN. Follow these tips to gain expertise in leveraging this versatile command for scaling, administration and analytics.

The Definitive Guide to Redis SCAN

Diving Into SCAN Internals

Order of Results

COUNT Limits

Cursor Expiry

Analyzing SCAN Performance

Advanced Use Cases

Range Scans

Target Point Scans

Read/Write Race Handling

Atomic Range Deletes

Customizing SCAN Parameters

Dynamic COUNT

External Filters

Scan Progress Info

Traffic Shaping

Integrating SCAN with Streams

Alternatives to Redis SCAN

Lua Foreach

Read Replicas

Dedicated Redis

Conclusion: SCAN Best Practices

Robustness

Customization

Monitoring

Optimization

Disabling Buttons with CSS: A Comprehensive Guide

How to Install and Use OpenSnitch Firewall in Linux

How to Create and Utilize SSL Certificates for Secure Web Servers

OpenShift vs OpenStack: An In-Depth Comparison

A Full-Stack Developer‘s Comprehensive 2600+ Word Guide to Checking and Understanding UFW Logs

Expert Guide to Configuring Raspberry Pi Display Resolution

Linuxhaxor.net – About Open Source & Linux

Diving Into SCAN Internals

Order of Results

COUNT Limits

Cursor Expiry

Analyzing SCAN Performance

Advanced Use Cases

Range Scans

Target Point Scans

Read/Write Race Handling

Atomic Range Deletes

Customizing SCAN Parameters

Dynamic COUNT

External Filters

Scan Progress Info

Traffic Shaping

Integrating SCAN with Streams

Alternatives to Redis SCAN

Lua Foreach

Read Replicas

Dedicated Redis

Conclusion: SCAN Best Practices

Robustness

Customization

Monitoring

Optimization

Related posts:

Similar Posts

Linuxhaxor.net – About Open Source & Linux