Skip to content

Conversation

@antirez
Copy link
Contributor

@antirez antirez commented Jun 23, 2025

Hello, this is a patch that improves vector sets in two ways:

  1. It makes the RDB format compatible with big endian machines: yeah, they are non existent nowadays, but still it is better to be correct. The behavior remains unchanged in little endian systems, it only changes what happens in big endian systems in order for it to load and emit the exact same format produced by little endian. The implementation was already largely safe but for one detail.

  2. More importantly, this PR saves nodes worst link score / index in a backward compatible way, introducing also versioning information for the serialized node encoding, that could be useful in the future. With this information, that in the past was not saved for a programming error (mine), there is no longer need to compute the worst link info at runtime when loading data. This results in a speed improvement of about 30% when loading data from disk / RESTORE. The saving performance is unaffected.

The patch was tested with care to be sure that data produced with old vector sets implementations are loaded without issues (that is, the backward compatibility was hand-tested). The new code is tested by the persistence test already in the test suite, so no new test was added.

antirez added 2 commits June 19, 2025 10:14
Big endian archs are non existing today, but better to fix it
now in order to have correct code. Note that the only problem
was how the floats where set at the same offsets in both little
and big endian. The fact of saving/loading the bits of the
floats as an integer should be completely safe, as in big endian
and little endian *both* integers and floats are reversed, so
if a float corresponds to integer 328942 in little endian, loading
it in big endian will result in the reversed bytes pattern that
also matches what we want in the float.
@snyk-io
Copy link

snyk-io bot commented Jun 23, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

Copy link
Contributor

@lerman25 lerman25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comments

@antirez
Copy link
Contributor Author

antirez commented Jul 3, 2025

No comments

Thank you for reviewing :)

@sundb sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Jul 9, 2025
@github-project-automation github-project-automation bot moved this to Todo in Redis 8.2 Jul 9, 2025
@sundb sundb merged commit b5d5486 into redis:unstable Jul 10, 2025
18 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in Redis 8.2 Jul 10, 2025
@sundb sundb mentioned this pull request Aug 4, 2025
sundb added a commit that referenced this pull request Aug 4, 2025
This is the General Availability release of Redis Open Source 8.2.

### Major changes compared to 8.0

- Streams - new commands: `XDELEX` and `XACKDEL`; extension to `XADD`
and `XTRIM`
- Bitmap - `BITOP`: new operators: `DIFF`, `DIFF1`, `ANDOR`, and `ONE`
- Query Engine - new SVS-VAMANA vector index type which supports vector
compression
- More than 15 performance and resource utilization improvements
- New metrics: per-slot usage metrics, key size distributions for basic
data types, and more

### Binary distributions

- Alpine and Debian Docker images - https://hub.docker.com/_/redis
- Install using snap - see https://github.com/redis/redis-snap
- Install using brew - see https://github.com/redis/homebrew-redis
- Install using RPM - see https://github.com/redis/redis-rpm
- Install using Debian APT - see https://github.com/redis/redis-debian


### Operating systems we test Redis 8.2 on

- Ubuntu 22.04 (Jammy Jellyfish), 24.04 (Noble Numbat)
- Rocky Linux 8.10, 9.5
- AlmaLinux 8.10, 9.5
- Debian 12 (Bookworm)
- macOS 13 (Ventura), 14 (Sonoma), 15 (Sequoia)

### Security fixes (compared to 8.2-RC1)

- (CVE-2025-32023) Fix out-of-bounds write in `HyperLogLog` commands
- (CVE-2025-48367) Retry accepting other connections even if the
accepted connection reports an error

### New Features (compared to 8.2-RC1)

- #14141 Keyspace notifications - new event types:
  - `OVERWRITTEN` - the value of a key is completely overwritten
  - `TYPE_CHANGED` - key type change

### Bug fixes (compared to 8.2-RC1)

- #14162 Crash when using evport with I/O threads
- #14163 `EVAL` crash when error table is empty
- #14144 Vector sets - RDB format is not compatible with big endian
machines
- #14165 Endless client blocking for blocking commands
- #14164 Prevent `CLIENT UNBLOCK` from unblocking `CLIENT PAUSE`
- #14216 TTL was not removed by the `SET` command
- #14224 `HINCRBYFLOAT` removes field expiration on replica

### Performance and resource utilization improvements (compared to
8.2-RC1)

- #14200 Store iterators on stack instead of on heap
- #14144 Vector set - improve RDB loading / RESTORE speed by storing the
worst link info
- #Q6430 More compression variants for the SVS-VAMANA vector index
- #Q6535 `SHARD_K_RATIO` parameter - favor network latency over accuracy
for KNN vector query in a Redis cluster (unstable feature) (MOD-10359)

### Modules API

- #14051 `RedisModule_Get*`, `RedisModule_Set*` - allow modules to
access Redis configurations
- #14114 `RM_UnsubscribeFromKeyspaceEvents` - unregister a module from
specific keyspace notifications
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Sep 29, 2025
Hello, this is a patch that improves vector sets in two ways:

1. It makes the RDB format compatible with big endian machines: yeah,
they are non existent nowadays, but still it is better to be correct.
The behavior remains unchanged in little endian systems, it only changes
what happens in big endian systems in order for it to load and emit the
exact same format produced by little endian. The implementation was
*already largely safe* but for one detail.

2. More importantly, this PR saves nodes worst link score / index in a
backward compatible way, introducing also versioning information for the
serialized node encoding, that could be useful in the future. With this
information, that in the past was not saved for a programming error
(mine), there is no longer need to compute the worst link info at
runtime when loading data. This results in a speed improvement of about
30% when loading data from disk / RESTORE. The saving performance is
unaffected.

The patch was tested with care to be sure that data produced with old
vector sets implementations are loaded without issues (that is, the
backward compatibility was hand-tested). The new code is tested by the
persistence test already in the test suite, so no new test was added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants