[Vector sets] Endianess fix and speedup of data loading #14144

antirez · 2025-06-23T09:22:39Z

Hello, this is a patch that improves vector sets in two ways:

It makes the RDB format compatible with big endian machines: yeah, they are non existent nowadays, but still it is better to be correct. The behavior remains unchanged in little endian systems, it only changes what happens in big endian systems in order for it to load and emit the exact same format produced by little endian. The implementation was already largely safe but for one detail.
More importantly, this PR saves nodes worst link score / index in a backward compatible way, introducing also versioning information for the serialized node encoding, that could be useful in the future. With this information, that in the past was not saved for a programming error (mine), there is no longer need to compute the worst link info at runtime when loading data. This results in a speed improvement of about 30% when loading data from disk / RESTORE. The saving performance is unaffected.

The patch was tested with care to be sure that data produced with old vector sets implementations are loaded without issues (that is, the backward compatibility was hand-tested). The new code is tested by the persistence test already in the test suite, so no new test was added.

Big endian archs are non existing today, but better to fix it now in order to have correct code. Note that the only problem was how the floats where set at the same offsets in both little and big endian. The fact of saving/loading the bits of the floats as an integer should be completely safe, as in big endian and little endian *both* integers and floats are reversed, so if a float corresponds to integer 328942 in little endian, loading it in big endian will result in the reversed bytes pattern that also matches what we want in the float.

snyk-io · 2025-06-23T09:22:55Z

🎉 Snyk checks have passed. No issues have been found so far.

✅ security/snyk check is complete. No issues have been found. (View Details)

✅ license/snyk check is complete. No issues have been found. (View Details)

lerman25

No comments

antirez · 2025-07-03T08:30:59Z

No comments

Thank you for reviewing :)

This is the General Availability release of Redis Open Source 8.2. ### Major changes compared to 8.0 - Streams - new commands: `XDELEX` and `XACKDEL`; extension to `XADD` and `XTRIM` - Bitmap - `BITOP`: new operators: `DIFF`, `DIFF1`, `ANDOR`, and `ONE` - Query Engine - new SVS-VAMANA vector index type which supports vector compression - More than 15 performance and resource utilization improvements - New metrics: per-slot usage metrics, key size distributions for basic data types, and more ### Binary distributions - Alpine and Debian Docker images - https://hub.docker.com/_/redis - Install using snap - see https://github.com/redis/redis-snap - Install using brew - see https://github.com/redis/homebrew-redis - Install using RPM - see https://github.com/redis/redis-rpm - Install using Debian APT - see https://github.com/redis/redis-debian ### Operating systems we test Redis 8.2 on - Ubuntu 22.04 (Jammy Jellyfish), 24.04 (Noble Numbat) - Rocky Linux 8.10, 9.5 - AlmaLinux 8.10, 9.5 - Debian 12 (Bookworm) - macOS 13 (Ventura), 14 (Sonoma), 15 (Sequoia) ### Security fixes (compared to 8.2-RC1) - (CVE-2025-32023) Fix out-of-bounds write in `HyperLogLog` commands - (CVE-2025-48367) Retry accepting other connections even if the accepted connection reports an error ### New Features (compared to 8.2-RC1) - #14141 Keyspace notifications - new event types: - `OVERWRITTEN` - the value of a key is completely overwritten - `TYPE_CHANGED` - key type change ### Bug fixes (compared to 8.2-RC1) - #14162 Crash when using evport with I/O threads - #14163 `EVAL` crash when error table is empty - #14144 Vector sets - RDB format is not compatible with big endian machines - #14165 Endless client blocking for blocking commands - #14164 Prevent `CLIENT UNBLOCK` from unblocking `CLIENT PAUSE` - #14216 TTL was not removed by the `SET` command - #14224 `HINCRBYFLOAT` removes field expiration on replica ### Performance and resource utilization improvements (compared to 8.2-RC1) - #14200 Store iterators on stack instead of on heap - #14144 Vector set - improve RDB loading / RESTORE speed by storing the worst link info - #Q6430 More compression variants for the SVS-VAMANA vector index - #Q6535 `SHARD_K_RATIO` parameter - favor network latency over accuracy for KNN vector query in a Redis cluster (unstable feature) (MOD-10359) ### Modules API - #14051 `RedisModule_Get*`, `RedisModule_Set*` - allow modules to access Redis configurations - #14114 `RM_UnsubscribeFromKeyspaceEvents` - unregister a module from specific keyspace notifications

Hello, this is a patch that improves vector sets in two ways: 1. It makes the RDB format compatible with big endian machines: yeah, they are non existent nowadays, but still it is better to be correct. The behavior remains unchanged in little endian systems, it only changes what happens in big endian systems in order for it to load and emit the exact same format produced by little endian. The implementation was *already largely safe* but for one detail. 2. More importantly, this PR saves nodes worst link score / index in a backward compatible way, introducing also versioning information for the serialized node encoding, that could be useful in the future. With this information, that in the past was not saved for a programming error (mine), there is no longer need to compute the worst link info at runtime when loading data. This results in a speed improvement of about 30% when loading data from disk / RESTORE. The saving performance is unaffected. The patch was tested with care to be sure that data produced with old vector sets implementations are loaded without issues (that is, the backward compatibility was hand-tested). The new code is tested by the persistence test already in the test suite, so no new test was added.

antirez added 2 commits June 19, 2025 10:14

vector sets: serialize worst idx/distance.

1381ba1

lerman25 approved these changes Jul 2, 2025

View reviewed changes

sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Jul 9, 2025

sundb added this to Redis 8.0 Backport and Redis 8.2 Jul 9, 2025

github-project-automation bot moved this to Todo in Redis 8.2 Jul 9, 2025

github-project-automation bot moved this to Todo in Redis 8.0 Backport Jul 9, 2025

sundb approved these changes Jul 10, 2025

View reviewed changes

sundb merged commit b5d5486 into redis:unstable Jul 10, 2025
18 checks passed

github-project-automation bot moved this from Todo to Done in Redis 8.2 Jul 10, 2025

sundb mentioned this pull request Aug 4, 2025

Redis 8.2.0 GA #14250

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Vector sets] Endianess fix and speedup of data loading #14144

[Vector sets] Endianess fix and speedup of data loading #14144

antirez commented Jun 23, 2025

Uh oh!

snyk-io bot commented Jun 23, 2025 •

edited

Loading

Uh oh!

lerman25 left a comment

Uh oh!

antirez commented Jul 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Vector sets] Endianess fix and speedup of data loading #14144

[Vector sets] Endianess fix and speedup of data loading #14144

Conversation

antirez commented Jun 23, 2025

Uh oh!

snyk-io bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎉 Snyk checks have passed. No issues have been found so far.

Uh oh!

lerman25 left a comment

Choose a reason for hiding this comment

Uh oh!

antirez commented Jul 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

snyk-io bot commented Jun 23, 2025 •

edited

Loading