Skip to content

Conversation

@sundb
Copy link
Collaborator

@sundb sundb commented Jun 12, 2024

Fix #13337

Ths PR fixes fixed two bugs that caused lag calculation errors.

  1. When the latest tombstone is before the first entry, the tombstone may stil be after the last id of consume group.
  2. When a tombstone is after the last id of consume group, the group's counter will be invalid, we should caculate the entries_read by using estimates.

@sundb sundb requested review from itamarhaber and oranagra June 12, 2024 07:20
@sundb sundb changed the title Fix incorrect lag firled in XINFO when tombstone is after the last_id of consume group Fix incorrect lag field in XINFO when tombstone is after the last_id of consume group Jun 12, 2024
@sancar
Copy link

sancar commented Jun 12, 2024

@sundb

I tried the following set of commands on your branch :

127.0.0.1:6379> XGROUP CREATE planets processing $ MKSTREAM
OK
127.0.0.1:6379> XADD planets 0-1 name Mercury
"0-1"
127.0.0.1:6379> XADD planets 0-2 name Venus
"0-2"
127.0.0.1:6379> XREADGROUP GROUP processing alice STREAMS planets >
1) 1) "planets"
   2) 1) 1) "0-1"
         2) 1) "name"
            2) "Mercury"
      2) 1) "0-2"
         2) 1) "name"
            2) "Venus"
127.0.0.1:6379> XADD planets 0-3 name Earth
"0-3"
127.0.0.1:6379> XADD planets 0-4 name Jupiter
"0-4"
127.0.0.1:6379> XDEL planets 0-1
(integer) 1
127.0.0.1:6379> XDEL planets 0-2
(integer) 1
127.0.0.1:6379> XDEL planets 0-3
(integer) 1
127.0.0.1:6379> XINFO GROUPS planets
1)  1) "name"
    2) "processing"
    3) "consumers"
    4) (integer) 1
    5) "pending"
    6) (integer) 2
    7) "last-delivered-id"
    8) "0-2"
    9) "entries-read"
   10) (integer) 2
   11) "lag"
   12) (integer) 1

The lag is actually correct. There is single item that this group needs to read, but the behavior is different than what is described in the spec. I was expecting to see Nil there by looking at the spec here -> https://redis.io/docs/latest/commands/xinfo-groups/

There are two special cases in which this mechanism is unable to report the lag:
One or more entries between the group's last-delivered-id and the stream's last-generated-id were deleted (with [XDEL](https://redis.io/docs/latest/commands/xdel/) or a trimming operation).

The second case seems to be fixed. The one that I directly deleted 0-3 without deleting 0-2 and 0-1.

@sundb sundb marked this pull request as ready for review June 13, 2024 04:57
@sancar
Copy link

sancar commented Jun 13, 2024

My tests seem to be all passing now.

@sundb
Copy link
Collaborator Author

sundb commented Jun 13, 2024

@sancar thanks, i'd like to see others' opinions.

…n last-delivered-id and stream's last-generated-id
Copy link
Member

@itamarhaber itamarhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been way too long since I looked at C code, let alone mine :P

Everything makes perfect sense - I totally missed this case, and the fix lgtm.

@oranagra oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Jun 17, 2024
@sundb sundb merged commit 93fb83b into redis:unstable Jul 30, 2024
sundb added a commit that referenced this pull request Aug 16, 2024
## Describe
When using the `XTRIM` command to trim a stream, it does not update the
maximal tombstone (`max_deleted_entry_id`). This leads to an issue where
the lag calculation incorrectly assumes that there are no tombstones
after the consumer group's last_id, resulting in an inaccurate lag.

The reason XTRIM doesn't need to update the maximal tombstone is that it
always trims from the beginning of the stream. This means that it
consistently changes the position of the first entry, leading to the
following scenarios:

1) First entry trimmed after maximal tombstone:
If the first entry is trimmed to a position after the maximal tombstone,
all tombstones will be before the first entry, so they won't affect the
consumer group's lag.

2) First entry trimmed before maximal tombstone:
If the first entry is trimmed to a position before the maximal
tombstone, the maximal tombstone will not be updated.

## Solution
Therefore, this PR optimizes the lag calculation by ensuring that when
both the consumer group's last_id and the maximal tombstone are behind
the first entry, the consumer group's lag is always equal to the number
of remaining elements in the stream.

Supplement to PR #13338
@YaacovHazan YaacovHazan mentioned this pull request Sep 11, 2024
YaacovHazan added a commit that referenced this pull request Sep 12, 2024
### New Features in binary distributions

- 7 new data structures: JSON, Time series, Bloom filter, Cuckoo filter,
Count-min sketch, Top-k, t-digest
- Redis scalable query engine (including vector search)

### Potentially breaking changes

- #12272 `GETRANGE` returns an empty bulk when the negative end index is
out of range
- #12395 Optimize `SCAN` command when matching data type

### Bug fixes

- #13510 Fix `RM_RdbLoad` to enable AOF after RDB loading is completed
- #13489 `ACL CAT` - return module commands
- #13476 Fix a race condition in the `cache_memory` of `functionsLibCtx`
- #13473 Fix incorrect lag due to trimming stream via `XTRIM` command
- #13338 Fix incorrect lag field in `XINFO` when tombstone is after the
`last_id` of the consume group
- #13470 On `HDEL` of last field - update the global hash field
expiration data structure
- #13465 Cluster: Pass extensions to node if extension processing is
handled by it
- #13443 Cluster: Ensure validity of myself when loading cluster config
- #13422 Cluster: Fix `CLUSTER SHARDS` command returns empty array

### Modules API

- #13509 New API calls: `RM_DefragAllocRaw`, `RM_DefragFreeRaw`, and
`RM_RegisterDefragCallbacks` - defrag API to allocate and free raw
memory

### Performance and resource utilization improvements

- #13503 Avoid overhead of comparison function pointer calls in listpack
`lpFind`
- #13505 Optimize `STRING` datatype write commands
- #13499 Optimize `SMEMBERS` command
- #13494 Optimize `GEO*` commands reply
- #13490 Optimize `HELLO` command
- #13488 Optimize client query buffer
- #12395 Optimize `SCAN` command when matching data type
- #13529 Optimize `LREM`, `LPOS`, `LINSERT`, and `LINDEX` commands
- #13516 Optimize `LRANGE` and other commands that perform several
writes to client buffers per call
- #13431 Avoid `used_memory` contention when updating from multiple
threads

### Other general improvements

- #13495 Reply `-LOADING` on replica while flushing the db

### CLI tools

- #13411 redis-cli: Fix wrong `dbnum` showed after the client
reconnected

### Notes

- No backward compatibility for replication or persistence.
- Additional distributions, upgrade paths, features, and improvements
will be introduced in upcoming pre-releases.
- With the GA release of 8.0 we will deprecate Redis Stack.
YaacovHazan pushed a commit that referenced this pull request Oct 30, 2024
…of consume group (#13338)

Fix #13337

Ths PR fixes fixed two bugs that caused lag calculation errors.
1. When the latest tombstone is before the first entry, the tombstone
may stil be after the last id of consume group.
2. When a tombstone is after the last id of consume group, the group's
counter will be invalid, we should caculate the entries_read by using
estimates.
YaacovHazan pushed a commit that referenced this pull request Oct 30, 2024
## Describe
When using the `XTRIM` command to trim a stream, it does not update the
maximal tombstone (`max_deleted_entry_id`). This leads to an issue where
the lag calculation incorrectly assumes that there are no tombstones
after the consumer group's last_id, resulting in an inaccurate lag.

The reason XTRIM doesn't need to update the maximal tombstone is that it
always trims from the beginning of the stream. This means that it
consistently changes the position of the first entry, leading to the
following scenarios:

1) First entry trimmed after maximal tombstone:
If the first entry is trimmed to a position after the maximal tombstone,
all tombstones will be before the first entry, so they won't affect the
consumer group's lag.

2) First entry trimmed before maximal tombstone:
If the first entry is trimmed to a position before the maximal
tombstone, the maximal tombstone will not be updated.

## Solution
Therefore, this PR optimizes the lag calculation by ensuring that when
both the consumer group's last_id and the maximal tombstone are behind
the first entry, the consumer group's lag is always equal to the number
of remaining elements in the stream.

Supplement to PR #13338
This was referenced Oct 30, 2024
YaacovHazan pushed a commit that referenced this pull request Nov 4, 2024
…of consume group (#13338)

Fix #13337

Ths PR fixes fixed two bugs that caused lag calculation errors.
1. When the latest tombstone is before the first entry, the tombstone
may stil be after the last id of consume group.
2. When a tombstone is after the last id of consume group, the group's
counter will be invalid, we should caculate the entries_read by using
estimates.
YaacovHazan pushed a commit that referenced this pull request Nov 4, 2024
## Describe
When using the `XTRIM` command to trim a stream, it does not update the
maximal tombstone (`max_deleted_entry_id`). This leads to an issue where
the lag calculation incorrectly assumes that there are no tombstones
after the consumer group's last_id, resulting in an inaccurate lag.

The reason XTRIM doesn't need to update the maximal tombstone is that it
always trims from the beginning of the stream. This means that it
consistently changes the position of the first entry, leading to the
following scenarios:

1) First entry trimmed after maximal tombstone:
If the first entry is trimmed to a position after the maximal tombstone,
all tombstones will be before the first entry, so they won't affect the
consumer group's lag.

2) First entry trimmed before maximal tombstone:
If the first entry is trimmed to a position before the maximal
tombstone, the maximal tombstone will not be updated.

## Solution
Therefore, this PR optimizes the lag calculation by ensuring that when
both the consumer group's last_id and the maximal tombstone are behind
the first entry, the consumer group's lag is always equal to the number
of remaining elements in the stream.

Supplement to PR #13338
This was referenced Nov 4, 2024
@YaacovHazan YaacovHazan mentioned this pull request Jan 6, 2025
YaacovHazan pushed a commit that referenced this pull request Jan 6, 2025
…of consume group (#13338)

Fix #13337

Ths PR fixes fixed two bugs that caused lag calculation errors.
1. When the latest tombstone is before the first entry, the tombstone
may stil be after the last id of consume group.
2. When a tombstone is after the last id of consume group, the group's
counter will be invalid, we should caculate the entries_read by using
estimates.
YaacovHazan pushed a commit that referenced this pull request Jan 6, 2025
## Describe
When using the `XTRIM` command to trim a stream, it does not update the
maximal tombstone (`max_deleted_entry_id`). This leads to an issue where
the lag calculation incorrectly assumes that there are no tombstones
after the consumer group's last_id, resulting in an inaccurate lag.

The reason XTRIM doesn't need to update the maximal tombstone is that it
always trims from the beginning of the stream. This means that it
consistently changes the position of the first entry, leading to the
following scenarios:

1) First entry trimmed after maximal tombstone:
If the first entry is trimmed to a position after the maximal tombstone,
all tombstones will be before the first entry, so they won't affect the
consumer group's lag.

2) First entry trimmed before maximal tombstone:
If the first entry is trimmed to a position before the maximal
tombstone, the maximal tombstone will not be updated.

## Solution
Therefore, this PR optimizes the lag calculation by ensuring that when
both the consumer group's last_id and the maximal tombstone are behind
the first entry, the consumer group's lag is always equal to the number
of remaining elements in the stream.

Supplement to PR #13338
YaacovHazan pushed a commit that referenced this pull request Jan 6, 2025
…of consume group (#13338)

Fix #13337

Ths PR fixes fixed two bugs that caused lag calculation errors.
1. When the latest tombstone is before the first entry, the tombstone
may stil be after the last id of consume group.
2. When a tombstone is after the last id of consume group, the group's
counter will be invalid, we should caculate the entries_read by using
estimates.
YaacovHazan pushed a commit that referenced this pull request Jan 6, 2025
## Describe
When using the `XTRIM` command to trim a stream, it does not update the
maximal tombstone (`max_deleted_entry_id`). This leads to an issue where
the lag calculation incorrectly assumes that there are no tombstones
after the consumer group's last_id, resulting in an inaccurate lag.

The reason XTRIM doesn't need to update the maximal tombstone is that it
always trims from the beginning of the stream. This means that it
consistently changes the position of the first entry, leading to the
following scenarios:

1) First entry trimmed after maximal tombstone:
If the first entry is trimmed to a position after the maximal tombstone,
all tombstones will be before the first entry, so they won't affect the
consumer group's lag.

2) First entry trimmed before maximal tombstone:
If the first entry is trimmed to a position before the maximal
tombstone, the maximal tombstone will not be updated.

## Solution
Therefore, this PR optimizes the lag calculation by ensuring that when
both the consumer group's last_id and the maximal tombstone are behind
the first entry, the consumer group's lag is always equal to the number
of remaining elements in the stream.

Supplement to PR #13338
@SF97
Copy link

SF97 commented Jan 7, 2025

Hello

Does this fix also resolve #11646 ?

Thank you :)

@sundb
Copy link
Collaborator Author

sundb commented Jan 8, 2025

@SF97 yes, related.

@YaacovHazan YaacovHazan moved this from Todo to Done in Redis 7.4 Backport Apr 21, 2025
@YaacovHazan YaacovHazan moved this from In Progress to Done in Redis 7.2 Backport Apr 21, 2025
@sundb sundb removed this from Redis 8.2 Aug 15, 2025
@sundb sundb added this to Redis 8.0 Aug 15, 2025
@sundb sundb moved this from Todo to Done in Redis 8.0 Aug 15, 2025
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
…of consume group (redis#13338)

Fix redis#13337

Ths PR fixes fixed two bugs that caused lag calculation errors.
1. When the latest tombstone is before the first entry, the tombstone
may stil be after the last id of consume group.
2. When a tombstone is after the last id of consume group, the group's
counter will be invalid, we should caculate the entries_read by using
estimates.
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
## Describe
When using the `XTRIM` command to trim a stream, it does not update the
maximal tombstone (`max_deleted_entry_id`). This leads to an issue where
the lag calculation incorrectly assumes that there are no tombstones
after the consumer group's last_id, resulting in an inaccurate lag.

The reason XTRIM doesn't need to update the maximal tombstone is that it
always trims from the beginning of the stream. This means that it
consistently changes the position of the first entry, leading to the
following scenarios:

1) First entry trimmed after maximal tombstone:
If the first entry is trimmed to a position after the maximal tombstone,
all tombstones will be before the first entry, so they won't affect the
consumer group's lag.

2) First entry trimmed before maximal tombstone:
If the first entry is trimmed to a position before the maximal
tombstone, the maximal tombstone will not be updated.

## Solution
Therefore, this PR optimizes the lag calculation by ensuring that when
both the consumer group's last_id and the maximal tombstone are behind
the first entry, the consumer group's lag is always equal to the number
of remaining elements in the stream.

Supplement to PR redis#13338
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
### New Features in binary distributions

- 7 new data structures: JSON, Time series, Bloom filter, Cuckoo filter,
Count-min sketch, Top-k, t-digest
- Redis scalable query engine (including vector search)

### Potentially breaking changes

- redis#12272 `GETRANGE` returns an empty bulk when the negative end index is
out of range
- redis#12395 Optimize `SCAN` command when matching data type

### Bug fixes

- redis#13510 Fix `RM_RdbLoad` to enable AOF after RDB loading is completed
- redis#13489 `ACL CAT` - return module commands
- redis#13476 Fix a race condition in the `cache_memory` of `functionsLibCtx`
- redis#13473 Fix incorrect lag due to trimming stream via `XTRIM` command
- redis#13338 Fix incorrect lag field in `XINFO` when tombstone is after the
`last_id` of the consume group
- redis#13470 On `HDEL` of last field - update the global hash field
expiration data structure
- redis#13465 Cluster: Pass extensions to node if extension processing is
handled by it
- redis#13443 Cluster: Ensure validity of myself when loading cluster config
- redis#13422 Cluster: Fix `CLUSTER SHARDS` command returns empty array

### Modules API

- redis#13509 New API calls: `RM_DefragAllocRaw`, `RM_DefragFreeRaw`, and
`RM_RegisterDefragCallbacks` - defrag API to allocate and free raw
memory

### Performance and resource utilization improvements

- redis#13503 Avoid overhead of comparison function pointer calls in listpack
`lpFind`
- redis#13505 Optimize `STRING` datatype write commands
- redis#13499 Optimize `SMEMBERS` command
- redis#13494 Optimize `GEO*` commands reply
- redis#13490 Optimize `HELLO` command
- redis#13488 Optimize client query buffer
- redis#12395 Optimize `SCAN` command when matching data type
- redis#13529 Optimize `LREM`, `LPOS`, `LINSERT`, and `LINDEX` commands
- redis#13516 Optimize `LRANGE` and other commands that perform several
writes to client buffers per call
- redis#13431 Avoid `used_memory` contention when updating from multiple
threads

### Other general improvements

- redis#13495 Reply `-LOADING` on replica while flushing the db

### CLI tools

- redis#13411 redis-cli: Fix wrong `dbnum` showed after the client
reconnected

### Notes

- No backward compatibility for replication or persistence.
- Additional distributions, upgrade paths, features, and improvements
will be introduced in upcoming pre-releases.
- With the GA release of 8.0 we will deprecate Redis Stack.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] Redis Streams XINFO Lag field

5 participants