-
Notifications
You must be signed in to change notification settings - Fork 24.4k
[Vector sets] Endianess fix and speedup of data loading #14144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+80
−16
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Big endian archs are non existing today, but better to fix it now in order to have correct code. Note that the only problem was how the floats where set at the same offsets in both little and big endian. The fact of saving/loading the bits of the floats as an integer should be completely safe, as in big endian and little endian *both* integers and floats are reversed, so if a float corresponds to integer 328942 in little endian, loading it in big endian will result in the reversed bytes pattern that also matches what we want in the float.
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
lerman25
approved these changes
Jul 2, 2025
Contributor
lerman25
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No comments
Contributor
Author
Thank you for reviewing :) |
sundb
approved these changes
Jul 10, 2025
Merged
sundb
added a commit
that referenced
this pull request
Aug 4, 2025
This is the General Availability release of Redis Open Source 8.2. ### Major changes compared to 8.0 - Streams - new commands: `XDELEX` and `XACKDEL`; extension to `XADD` and `XTRIM` - Bitmap - `BITOP`: new operators: `DIFF`, `DIFF1`, `ANDOR`, and `ONE` - Query Engine - new SVS-VAMANA vector index type which supports vector compression - More than 15 performance and resource utilization improvements - New metrics: per-slot usage metrics, key size distributions for basic data types, and more ### Binary distributions - Alpine and Debian Docker images - https://hub.docker.com/_/redis - Install using snap - see https://github.com/redis/redis-snap - Install using brew - see https://github.com/redis/homebrew-redis - Install using RPM - see https://github.com/redis/redis-rpm - Install using Debian APT - see https://github.com/redis/redis-debian ### Operating systems we test Redis 8.2 on - Ubuntu 22.04 (Jammy Jellyfish), 24.04 (Noble Numbat) - Rocky Linux 8.10, 9.5 - AlmaLinux 8.10, 9.5 - Debian 12 (Bookworm) - macOS 13 (Ventura), 14 (Sonoma), 15 (Sequoia) ### Security fixes (compared to 8.2-RC1) - (CVE-2025-32023) Fix out-of-bounds write in `HyperLogLog` commands - (CVE-2025-48367) Retry accepting other connections even if the accepted connection reports an error ### New Features (compared to 8.2-RC1) - #14141 Keyspace notifications - new event types: - `OVERWRITTEN` - the value of a key is completely overwritten - `TYPE_CHANGED` - key type change ### Bug fixes (compared to 8.2-RC1) - #14162 Crash when using evport with I/O threads - #14163 `EVAL` crash when error table is empty - #14144 Vector sets - RDB format is not compatible with big endian machines - #14165 Endless client blocking for blocking commands - #14164 Prevent `CLIENT UNBLOCK` from unblocking `CLIENT PAUSE` - #14216 TTL was not removed by the `SET` command - #14224 `HINCRBYFLOAT` removes field expiration on replica ### Performance and resource utilization improvements (compared to 8.2-RC1) - #14200 Store iterators on stack instead of on heap - #14144 Vector set - improve RDB loading / RESTORE speed by storing the worst link info - #Q6430 More compression variants for the SVS-VAMANA vector index - #Q6535 `SHARD_K_RATIO` parameter - favor network latency over accuracy for KNN vector query in a Redis cluster (unstable feature) (MOD-10359) ### Modules API - #14051 `RedisModule_Get*`, `RedisModule_Set*` - allow modules to access Redis configurations - #14114 `RM_UnsubscribeFromKeyspaceEvents` - unregister a module from specific keyspace notifications
YaacovHazan
pushed a commit
to YaacovHazan/redis
that referenced
this pull request
Sep 29, 2025
Hello, this is a patch that improves vector sets in two ways: 1. It makes the RDB format compatible with big endian machines: yeah, they are non existent nowadays, but still it is better to be correct. The behavior remains unchanged in little endian systems, it only changes what happens in big endian systems in order for it to load and emit the exact same format produced by little endian. The implementation was *already largely safe* but for one detail. 2. More importantly, this PR saves nodes worst link score / index in a backward compatible way, introducing also versioning information for the serialized node encoding, that could be useful in the future. With this information, that in the past was not saved for a programming error (mine), there is no longer need to compute the worst link info at runtime when loading data. This results in a speed improvement of about 30% when loading data from disk / RESTORE. The saving performance is unaffected. The patch was tested with care to be sure that data produced with old vector sets implementations are loaded without issues (that is, the backward compatibility was hand-tested). The new code is tested by the persistence test already in the test suite, so no new test was added.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello, this is a patch that improves vector sets in two ways:
It makes the RDB format compatible with big endian machines: yeah, they are non existent nowadays, but still it is better to be correct. The behavior remains unchanged in little endian systems, it only changes what happens in big endian systems in order for it to load and emit the exact same format produced by little endian. The implementation was already largely safe but for one detail.
More importantly, this PR saves nodes worst link score / index in a backward compatible way, introducing also versioning information for the serialized node encoding, that could be useful in the future. With this information, that in the past was not saved for a programming error (mine), there is no longer need to compute the worst link info at runtime when loading data. This results in a speed improvement of about 30% when loading data from disk / RESTORE. The saving performance is unaffected.
The patch was tested with care to be sure that data produced with old vector sets implementations are loaded without issues (that is, the backward compatibility was hand-tested). The new code is tested by the persistence test already in the test suite, so no new test was added.