Fix XTRIM/XADD with approx not deletes entries for DELREF/ACKED strategies#14623
Fix XTRIM/XADD with approx not deletes entries for DELREF/ACKED strategies#14623sundb merged 6 commits intoredis:unstablefrom
Conversation
|
Currently, when using XTRIM with approximate mode (~) and DELREF/ACKED deletion strategies, no messages are deleted. The code only deletes full nodes when the strategy is not DELREF/ACKED. This PR attempts to fix that issue. However, there appears to be a fundamental conflict: approximate mode is designed to delete entire nodes from the stream radix tree to speed up deletion. But DELREF/ACKED strategies require checking each node messages before deletion. This means approximate mode and DELREF/ACKED seem inherently incompatible—the former prioritizes speed by removing full nodes, while the latter requires granular inspection that defeats that optimization. |
|
Perhaps we should ignore the approximate mode when DELREF/ACKED is used (and clarify this in the documentation)? |
|
@LiorKogan agree with this way. |
|
@LiorKogan @sundb @sggeorgiev if it's not too late, maybe we should block the combo "~" + "DELREF/ACKED"? that way it'll be very clear to the user that if he wants DELREF/ACKED he can't use "~" |
|
@guybe7 - It is probably too late because we already released it in 8.2 in August. Anyway, ignoring the approximate flag causes no harm. |
|
OK, in that case let's merge this PR and add a note in the documentation that the approximate flag is ignored when using DELREF/ACKED strategies. |
## Summary This bus was introduced by #14623 Before PR #14623, when a stream node was going to be fully removed, we would just delete the whole node directly instead of iterating through and deleting each entry. Now, with the XTRIM/XADD flags, we have to iterate and delete entries one by one. However, the implementation in issue #8169 didn’t consider the case where all entries are removed, so `p` can end up being NULL. Fixes an UndefinedBehaviorSanitizer error in `streamTrim()` when marking the last entry in a listpack as deleted. The issue occurs when performing pointer arithmetic on a NULL pointer after `lpNext()` reaches the end of the listpack. ## Solution If p is NULL, we skip the delta calculation and the calculation of new `p`.
I came upon this PR after running into the issue that it fixes, but I wonder about the above. I am not familiar with the internals of redis, but I would think that there are two parts in doing an ACKED delete:
It would seem that an approximate ACKED trim would still need to do 1, but could prevent having to do 2, which might still make it more efficient than a non-approximate ACKED trim (and more featureful than a KEEPREF delete, since AFAICS that will also delete items that are never seen by all consumer groups. Or am I misunderstanding this?). |
@matthijskooijman Maybe you have misunderstood. These two ways are the same. It's not about checking first and then deleting, but checking and deleting simultaneously. Moreover, the memory won't be moved; it's just set to the deletion flag. |
Ah, thanks for clarifying. So then there is some extra bookkeeping to know that all deletion flags in a node are set and the entire node can be deleted? And then in KEEPREF mode, approximate is useful because based on the MINID or MAXLEN you can just decide what nodes to delete based on their ID ranges, without having to look into the nodes themselves? What I am looking for is an efficient way to delete old items, but prevent deleting items that are not seen/processed yet. Apparently How would you achieve this then? Thinking on it, and looking at KEEPREF, would it make sense to run XTRIM ~ KEEPREF with MINID based on the oldest message not yet received by all consumer groups? This MINID could be determined by looking at the consumer group metadata only, without having to look at individual entries. This might end up deleting entries that were received but not ACKed yet, but IIUC then KEEPREF ensures thee entries are kept around in the pel list until they are ACKed, right? So this would effectively be the same as doing ~ ACKED? Is this a correct interpretation? If so, is there a way to do this automatically, like a special entry id to pass to MINID to mean "the highest ID seen by all consumer groups", or would I need to parse XINFO output for this? Maybe this is a bit off-topic for this PR, but hopefully you'll help me understand how this works :-) |
…egies (redis#14623) This bug was introduced by redis#14130 and found by guybe7 When using XTRIM/XADD with approx mode (~) and DELREF/ACKED delete strategies, if a node was eligible for removal but couldn't be removed directly (because consumer group references need to be checked), the code would incorrectly break out of the loop instead of continuing to process entries within the node. This fix allows the per-entry deletion logic to execute for eligible nodes when using non-KEEPREF strategies.
…egies (redis#14623) This bug was introduced by redis#14130 and found by guybe7 When using XTRIM/XADD with approx mode (~) and DELREF/ACKED delete strategies, if a node was eligible for removal but couldn't be removed directly (because consumer group references need to be checked), the code would incorrectly break out of the loop instead of continuing to process entries within the node. This fix allows the per-entry deletion logic to execute for eligible nodes when using non-KEEPREF strategies.
## Summary This bus was introduced by redis#14623 Before PR redis#14623, when a stream node was going to be fully removed, we would just delete the whole node directly instead of iterating through and deleting each entry. Now, with the XTRIM/XADD flags, we have to iterate and delete entries one by one. However, the implementation in issue redis#8169 didn’t consider the case where all entries are removed, so `p` can end up being NULL. Fixes an UndefinedBehaviorSanitizer error in `streamTrim()` when marking the last entry in a listpack as deleted. The issue occurs when performing pointer arithmetic on a NULL pointer after `lpNext()` reaches the end of the listpack. ## Solution If p is NULL, we skip the delta calculation and the calculation of new `p`.
This bug was introduced by #14130 and found by @guybe7
When using XTRIM/XADD with approx mode (~) and DELREF/ACKED delete strategies, if a node was eligible for removal but couldn't be removed directly (because consumer group references need to be checked), the code would incorrectly break out of the loop instead of continuing to process entries within the node. This fix allows the per-entry deletion logic to execute for eligible nodes when using non-KEEPREF strategies.