Optimize stream ID comparison and endian conversion hot paths by fcostaoliveira · Pull Request #14480 · redis/redis

fcostaoliveira · 2025-10-28T11:30:22Z

The added logic from #14402 introduced overhead to the XREADGROUP even when the added feature is not used.

This PR tries to mitigate it, by removing unnecessary streamEncodeID() calls and redundant byte-swapping operations from the stream iterator hot path.
By comparing stream IDs directly in native-endian form, we eliminate repeated encoding and memcmp() calls that were responsible for a significant portion of total CPU time during stream iteration.

A sample vtune profile about streamEncodeID

Function Stack	CPU Time: Total	CPU Time: Self	Module	Function (Full)	Source File	Start Address
streamEncodeID	7.3%	0.229s	redis-server	streamEncodeID	t_stream.c	0x220d80
  intrev64	4.2%	0s	redis-server	intrev64	endianconv.c	0x220d90

Additionally, endian conversion helpers are modernized to leverage compiler-provided intrinsics (__builtin_bswap*) for single-instruction byte-swaps on supported compilers.

Improvements Table

Altogether it leads to ~10% improvement when compared to the latest unstable, as seen bellow:

Test Case	Baseline /redis `d4307af` (median obs. +- std.dev)	Comparison filipecosta90/redis intrinsics.bswap (median obs. +- std.dev)	% change (higher-better)	Note
memtier_benchmark-stream-10M-entries-xreadgroup-count-100	5739	6325 +- 4.3% (2 datapoints)	10.2%	Comparison is Better

sundb · 2025-10-29T06:43:41Z

@filipecosta90 is there a benchmark for it?

src/endianconv.h

src/t_stream.c

src/stream.h

Co-authored-by: debing.sun <debing.sun@redis.com>

fcostaoliveira · 2025-11-05T12:40:05Z

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

Using platform named: x86-aws-m7i.metal-24xl for both baseline and comparison.

Using triggering environment: ci for both baseline and comparison.

In summary:

Detected a total of 1 stable tests between versions.
Detected a total of 1 improvements above the improvement water line (Comparison is Better).
- The median improvement (Comparison is Better) was 10.2%, with values ranging from 10.2% to 10.2%.
- Quartile distribution: P25=10.2%, P50=10.2%, P75=10.2%.

You can check a comparison in detail via the grafana link

Comparison between `d4307af` and intrinsics.bswap.

Time Period from 5 months ago. (environment used: oss-standalone)

By GROUP change csv:

command_group,min_change,q1_change,median_change,q3_change,max_change
stream,-0.157,2.434,5.025,7.616,10.208

By COMMAND change csv:

command,min_change,q1_change,median_change,q3_change,max_change

#### Improvements Table

Test Case	Baseline /redis `d4307af` (median obs. +- std.dev)	Comparison filipecosta90/redis intrinsics.bswap (median obs. +- std.dev)	% change (higher-better)	Note
memtier_benchmark-stream-10M-entries-xreadgroup-count-100	5739	6325 +- 4.3% (2 datapoints)	10.2%	Comparison is Better

Improvements test regexp names: memtier_benchmark-stream-10M-entries-xreadgroup-count-100

Full Results table:

Test Case	Baseline /redis `d4307af` (median obs. +- std.dev)	Comparison filipecosta90/redis intrinsics.bswap (median obs. +- std.dev)	% change (higher-better)	Note
memtier_benchmark-stream-10M-entries-xreadgroup-count-100	5739	6325 +- 4.3% (2 datapoints)	10.2%	Comparison is Better
memtier_benchmark-stream-10M-entries-xreadgroup-count-100-noack	37851	37791 +- 0.4% (3 datapoints)	-0.2%	No Change

fcostaoliveira · 2025-11-05T12:40:58Z

@filipecosta90 is there a benchmark for it?

yes. just added the official run in
#14480 (comment)

TLDR: ~10% bump on XREADGROUP.
Updated the main comment to include it.

sggeorgiev

Looks good.

…n overhead in stream propagation (#14516) As seen in the following flamegraph, even after PR #14480, there a lot of redundant work when propagating multiple XCLAIMs withing a XREADGROUP. This PR refactors streamPropagateXCLAIM to add a new static inline variant, `streamPropagateXCLAIMCopyFree()`, which accepts pre-created `robj*` arguments. This enables reusing argument objects across multiple XCLAIM propagations, reducing repeated creation and destruction costs during high-throughput consumer group operations.

This is the General Availability release of Redis 8.4 in Redis Open Source. ### Major changes compared to 8.2 - `DIGEST`, `DELEX`; `SET` extensions - atomic compare-and-set and compare-and-delete for string keys - `MSETEX` - atomically set multiple string keys and update their expiration - `XREADGROUP` - new `CLAIM` option for reading both idle pending and incoming stream entries - `CLUSTER MIGRATION` - atomic slot migration - `CLUSTER SLOT-STATS` - per-slot usage metrics: key count, CPU time, and network I/O - Redis query engine: `FT.HYBRID` - hybrid search and fused scoring - Redis query engine: I/O threading with performance boost for search and query commands (FT.*) - I/O threading: substantial throughput increase (e.g. >30% for caching use cases (10% `SET`, 90% `GET`), 4 cores) - JSON: substantial memory reduction for homogenous arrays (up to 91%) ### Binary distributions - Alpine and Debian Docker images - https://hub.docker.com/_/redis - Install using snap - see https://github.com/redis/redis-snap - Install using brew - see https://github.com/redis/homebrew-redis - Install using RPM - see https://github.com/redis/redis-rpm - Install using Debian APT - see https://github.com/redis/redis-debian ### Operating systems we test Redis 8.4 on - Ubuntu 22.04 (Jammy Jellyfish), 24.04 (Noble Numbat) - Rocky Linux 8.10, 9.5 - AlmaLinux 8.10, 9.5 - Debian 12 (Bookworm), Debian 13 (Trixie) - macOS 13 (Ventura), 14 (Sonoma), 15 (Sequoia) ### Bug fixes (compared to 8.4-RC1) - #14524 `XREADGROUP CLAIM` returns strings instead of integers - #14529 Add variable key-spec flags to SET IF* and DELEX - #P928 Potential memory leak (MOD-11484) - #T1801, #T1805 macOS build failures (MOD-12293) - #J1438 `JSON.NUMINCRBY` - wrong result on integer array with non-integer increment (MOD-12282) - #J1437 Thread safety issue related to ASM and shared strings (MOD-12013) ### Performance and resource utilization improvements (compared to 8.4-RC1) - #14480, #14516 Optimize `XREADGROUP` ### known bugs and limitations - When executing `FT.SEARCH`, `FT.AGGREGATE`, `FT.CURSOR`, `FT.HYBRID`, `TS.MGET`, `TS.MRANGE`, `TS.MREVRANGE` and `TS.QUERYINDEX` while an atomic slot migration process is in progress, the results may be partial or contain duplicates - `FT.PROFILE`, `FT.EXPLAIN` and `FT.EXPLACINCLI` doesn’t contain the `FT.HYBRID` option - Metrics from `FT.HYBRID` command aren’t displayed on `FT.INFO` and `INFO` - Option `EXPLAINSCORE`, `SHARD_K_RATIO`, `YIELD_DISTANCE_AS` and `WITHCURSOR` with `FT.HYBRID` are not available - Post-filtering (after `COMBINE` step) using FILTER is not available - Currently the default response format considers only `key_id` and `score`, this may change for delivering entire document content

This PR improves stream performance in the range iteration and reply generation paths, benefits xadd, xrange, xrevrange, xreadgroup. - ull2string memcpy optimization - streamID struct + streamCompareID - streamID2string + reply path - getClientType inline + cache locality Inspired by the high-level description (not the code) of redis/redis#14480. --------- Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>

fcostaoliveira added 2 commits October 27, 2025 16:35

__builtin_bswap64

98ad145

streamIterator: decoded native-endian fields for fast numeric comparison

18e5a02

sundb added this to Redis 8.4 Oct 28, 2025

github-project-automation bot moved this to Todo in Redis 8.4 Oct 28, 2025

sundb reviewed Oct 29, 2025

View reviewed changes

src/endianconv.h Outdated Show resolved Hide resolved

src/endianconv.h Show resolved Hide resolved

collinfunk reviewed Oct 29, 2025

View reviewed changes

src/endianconv.h Outdated Show resolved Hide resolved

sundb moved this from Todo to In Review in Redis 8.4 Oct 30, 2025

fcostaoliveira added 3 commits November 5, 2025 10:29

Merge remote-tracking branch 'origin/unstable' into intrinsics.bswap

cd979aa

Fixes per PR review: simplified REDIS_BSWAP64 usage.

3fb562c

Use htonu64 in streamIteratorStart

7e7a734

sundb reviewed Nov 5, 2025

View reviewed changes

src/t_stream.c Show resolved Hide resolved

src/stream.h Outdated Show resolved Hide resolved

Update src/stream.h

d4e0428

Co-authored-by: debing.sun <debing.sun@redis.com>

fcostaoliveira requested review from sggeorgiev and sundb November 5, 2025 12:41

sggeorgiev approved these changes Nov 6, 2025

View reviewed changes

fcostaoliveira mentioned this pull request Nov 7, 2025

Introduce copy-free version streamPropagateXCLAIM to reduce allocation overhead in stream propagation #14516

Merged

sundb approved these changes Nov 7, 2025

View reviewed changes

sundb merged commit b9ad4f6 into redis:unstable Nov 7, 2025
19 checks passed

github-project-automation bot moved this from In Review to Done in Redis 8.4 Nov 7, 2025

sundb mentioned this pull request Nov 18, 2025

Redis 8.4.0 GA #14546

Merged

nesty92 mentioned this pull request Jan 4, 2026

Optimize streams range hot path valkey-io/valkey#3002

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize stream ID comparison and endian conversion hot paths#14480

Optimize stream ID comparison and endian conversion hot paths#14480
sundb merged 6 commits intoredis:unstablefrom
filipecosta90:intrinsics.bswap

fcostaoliveira commented Oct 28, 2025 •

edited

Loading

Uh oh!

sundb commented Oct 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fcostaoliveira commented Nov 5, 2025

Uh oh!

fcostaoliveira commented Nov 5, 2025 •

edited

Loading

Uh oh!

sggeorgiev left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fcostaoliveira commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Improvements Table

Uh oh!

sundb commented Oct 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fcostaoliveira commented Nov 5, 2025

Automated performance analysis summary

Comparison between d4307af and intrinsics.bswap.

Uh oh!

fcostaoliveira commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sggeorgiev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fcostaoliveira commented Oct 28, 2025 •

edited

Loading

Comparison between `d4307af` and intrinsics.bswap.

fcostaoliveira commented Nov 5, 2025 •

edited

Loading