XADD and XTRIM, Trim by MINID, and new LIMIT argument by guybe7 · Pull Request #8169 · redis/redis

guybe7 · 2020-12-09T17:52:47Z

This PR adds an additional trimming strategy to XADD and XTRIM named MINID (complements the existing MAXLEN).
It also adds a new LIMIT argument that allows incremental trimming by repeated calls (rather than all at once).

This provides the ability to trim all records older than a certain ID (which makes it possible for the user to trim by age too).
Example:
XTRIM mystream MINID ~ 1608540753 will trim entries with id < 1608540753, but might not trim all (because of the ~ modifier)

The purpose is to ease the use of streams. many users use streams as logs and the common case is wanting a log of the last X seconds rather than a log that contains maximum X entries (new MINID vs existing MAXLEN)

The new LIMIT modifier is only supported when the trim strategy uses ~. i.e. when the user asked for exact trimming, it all happens in one go (no possibility for incremental trimming).
However, when ~ is provided, we trim full rax nodes, up to the limit number of records.
The default limit is 100*stream_node_max_entries (used when LIMIT is not provided).
I.e. this is a behavior change (even if the existing MAXLEN strategy is used).
An explicit limit of 0 means unlimited (but note that it's not the default).

Other changes:

Refactor arg parsing code for XADD and XTRIM to use common code.

oranagra · 2020-12-16T11:50:51Z

WOW. that's a lot of code / changes.
while i start to review it, i wanna ask two things:

please update the top comment with a summary of what this PR brings. both a summary about the new features (syntax) and their purpose (justification).
for easier review (now and in the future), the PR, and even commit comment, better contain a short summary about the code changes. i.e. in case they contain a refactory, or anything else that can make either reviewing the changes easier. possibly just a short checklist of changes that can be used instead of bothering to read the code.

src/t_stream.c

zuiderkwast

Very nice!

It could be worth mentioning the refactoring of args parsing in the PR description, since it affects many lines but it's mostly moved code, so the change isn't actually that large.

Is MAXNODES meant to be an undocumented option? It seems to provide the equivalent of SCAN's COUNT but it's hard to know how many entries there are per node.

Most of my comments are about documentation. It would be good to document how to call XTRIM repeatedly until everything is trimmed, like how SCAN is called repeatedly until it returns 0.

Disclaimer: I'm not very familiar with Redis source code.

zuiderkwast · 2020-12-22T12:47:15Z

src/t_stream.c

Typo? With the approx option, the stream may contain entries with IDs < 'id'.

zuiderkwast · 2020-12-22T12:51:03Z

src/t_stream.c

Mention the effect and purpose of maxnodes in the documentation above the function?

zuiderkwast · 2020-12-22T13:28:58Z

src/t_stream.c

Nice refactoring (breaking out the XADD args parsing into a separate function and unifying it with the XTRIM args parsing). This might also make it easier to reuse code for implementing a stream module API in the future.

But please keep a comment about what i is here. It was declared like this:

int i = 2; /* This is the first argument position where we could find an option, or the ID. */

zuiderkwast · 2020-12-22T13:31:10Z

src/t_stream.c

Some documentation of this function would be nice, e.g. the meaning of the args and the return value.

Perhaps a matter of taste, but maybe return i+1 on success instead of return i? Then the return value points to the position after the args handled by this function, which might easier to document and understand. In xaddCommand(), it would be the position of the first field arg.

zuiderkwast · 2020-12-22T13:59:39Z

src/t_stream.c

Logic seems right (AFAIK). Just an idea of structuring: Can it be done like

check for MAXLEN/MINID/MAXAGE and set args->trim_strategy.

common code for "~" and "="

various checks (without using strcasecmp() again)

the problem is that i don't know where exactly the trim_strategy will be, because i could get NOMKSTREAM for example or *
if this function would handle only trim strategies (i.e. the first arg is always the trim strategy) then it would make more sense to parse like you suggested

I think you're right. Refactoring like this won't make it simpler.

zuiderkwast · 2020-12-22T14:04:39Z

src/t_stream.c

Needs to be documented.

zuiderkwast · 2020-12-22T14:10:49Z

src/t_stream.c

Also add MAXNODES? Even if it's undocumented, it should be documented here in the source code IMO.

zuiderkwast · 2020-12-22T14:39:33Z

src/t_stream.c

Also add MAXNODES here?

I assume MAXNODES refers to the number of radix tree nodes. Approximately how many entries are there per node?

Btw, adding a default value of 100 for MAXNODES affects the legacy MAXLEN option, so even if = is given, MAXLEN will not enforce the maximum length of the stream anymore. (Instead MAXLEN actually specifies the minimum length to keep when trimming.) I think the change is to the better since it limits the execution time in corner cases, but I think it needs to be documented.

The docs for MAXAGE and MINID should also have some note saying something like that the limit is not enforced (for performance reasons or whatever, even if MAXNODES isn't documented) even when = is given, but that the actual guarantee is that entries back to the given age or ID are kept when trimming.

guybe7 · 2020-12-23T08:34:28Z

@zuiderkwast thanks for reviewing, looks like you put some time into it so thank you! going through your comment now...

zuiderkwast

That was a fast follow-up! Good documentation!

The rest is a matter of taste, I suppose. I'm not a maintainer or anything, so don't take my comments for the truth.

zuiderkwast · 2020-12-23T09:50:05Z

src/t_stream.c

Missing end quote after the first word 'maxnodes

Period at end of sentense.

zuiderkwast · 2020-12-23T09:50:37Z

src/t_stream.c

the should -> that should

zuiderkwast · 2020-12-23T09:59:30Z

src/t_stream.c

When seeing them together like this, an assymentry becomes apparent: "MAX" in MAXNODES refers to what to delete, but "MAX" in MAXLEN and MAXAGE refers to what to keep.

Maybe MAXNODES can be renamed to something to avoid this confusion? Ideas: LIMIT, TRIMLIMIT, MAXTRIM, TRIMNODES.

zuiderkwast · 2020-12-23T10:10:41Z

src/t_stream.c

Oops, I didn't realize that ID is also after what's handled by streamParseAddOrTrimArguments()... So maybe I change my mind and instead I think streamParseAddOrTrimArguments() should just return i. The point of the previous comment was to return the position after the args handled by the function but I didn't realize that return i already did that, so I'm sorry for this review mistake!

If you agree, the code in xaddCommand() would be something like the below (like in the first version, just renaming i to idpos). What do you think?

int idpos = streamParseAddOrTrimArguments(c, &parsed_args, 1); if (idpos < 0) return; /* streamParseAddOrTrimArguments already replied. */ int field_pos = idpos+1;

guybe7 · 2020-12-23T12:32:11Z

@zuiderkwast pushed another commit, decided to go with LIMIT <entries> - the users shouldn't know what "radix tree nodes" means...

zuiderkwast

LGTM!

guybe7 · 2020-12-23T17:09:14Z

@redis/core-team and specifically @yossigo this PR introduces a new arg for trimming, LIMIT which limits the number of trimmed entries.
we already have the ~ modifier to limit deletions within a rax node, so another limiting mech could be confusing...

we have 3 options:

keep LIMIT, which has no meaning in case user asked for "exact" trimming (no ~). in case ~ was provided the trimming may stop even if there are still whole nodes to trim.
drop LIMIT, and if ~ was provided, limit the number of deleted entries by some constant
never limit the number of whole nodes that may be trimmed (behavior is unstable). this might bite us in case user decides to trim by ID and the stream is very long.

oranagra · 2020-12-23T20:15:49Z

some additional context for whoever's missing it (don't bother to read the code, it's not final yet, and includes some refactoring to reduce code duplication)
This PR adds two things:

an additional "TRIM_STRATEGY": adding MINID in addition to the existing MAXLEN (ignore MAXAGE strategy, we're probably gonna drop it).
a way to limit the amount of entries trimmed in one call (so the call won't choke for long), this is the topic of the question Guy posted.

We already have a LIMIT / COUNT argument for many other commands, which limits the amount of work one call does (like in SCAN).
But in these streams commands, Salvatore already added a way for the user to hint to redis if it wants an exact trimming (using =) or an approximate trimming (using ~) if that can be done more efficiently.
Behind the scenes, this ~ flag just determines if we delete records inside a listpack, or just whole listpacks, but the user doesn't necessarily knows that.

One option (#1) is to add the LIMIT argument, which can work on both approximate and exact trimming, and possibly have a default of 1000 for the approximate trimming, and a default of unlimited for the exact trimming. this may be an acceptable change of behavior compared to previous versions (since the user hinted he's not after an exact result), while still attempting to be save innocent users from long loops. but if we don't want that, we can decide to have a default of unlimited for both.

Second option (#2) is to avoid changing the command API (not add a LIMIT argument), and then, if the user wanted exact trimming, we could do just that (not limit on the count that's trimmed, and it can in theory block for long time), and if the user did ask for approximate trimming, we can use a hard coded limit of 1000 or so. this means that unlike the past, now repeated calls to the approximate trimming will incrementally do more work. but it also means that unlike the past a single call will not necessarily bring you close to your target.

Last option (#3) is to completely drop that LIMIT idea, and just let redis choke if the user attempts to trim a lot of entries at once.

zuiderkwast · 2020-12-26T16:25:56Z

I like #2 the most. Exactness or efficiency. There isn't really any need for anything in between.

If trimming is done regularly with ~, it will always catch up eventually.
If trimming is not done regularily on xadd, but instead e.g. on a daily basis, choking may be a problem even with MAXLEN, so adding anti-choking to ~ is a nice improvement. Users would need to update their jobs to call XTRIM repeatedly until it returns zero (or close to zero). If there is a risk of backward-compatibility problems, perhaps a configuration options or a compile-time option to tweak the trim limit would be a way forward.

MAXAGE as a simple short-hand for MINID can be convenient for users. They can put a fixed number in the command instead of doing a computation with current time each time.

Fix redis#6640 redis#4450 redis#5951

tests/tmp/.gitignore

src/t_stream.c

guybe7 · 2021-01-06T09:28:33Z

redis/redis-doc#1488

src/t_stream.c

madolson

API Looks fine to me. Did not look at the code.

Co-authored-by: Itamar Haber <itamar@redislabs.com>

yossigo

@guybe7 Sorry for responding late, this was lost in the noise. I didn't see any continuation of the discussion regarding the limit options, but I understand the explicit option was chosen. Did we document why?

I actually lean towards not exposing it, or at least define it as an abstract hint of the amount of work rather than anything more specific.

oranagra · 2021-01-07T14:33:53Z

@yossigo after a week with no response to our beg for feedback, we decided to proceed with what seemed best to us.
We chose option #1, have a LIMIT with decent defaults.
As far as i remember, the reason is that with decent defaults, it'll behave as best as we can hope for innocent users that did not provide LIMIT, and the behavior change it brings seems acceptable. but we still allow power users, with special needs to get control over it. (otherwise, someone that needs an immediate effect, may be out of options, or actually needs to write a script).

we can still re-open it for discussion, but let's make it a quick one.
do you have a suggested solution and a winning argument?

p.s. please note that the docs already have some reference to "macro nodes" and "radix tree". (if it helps our decision in some way)

guybe7 · 2021-01-07T14:35:33Z

worth mentioning the the LIMIT is given in number of entries, not in nodes

yossigo · 2021-01-07T15:07:49Z

@oranagra @guybe7 Would it make a big difference to define the LIMIT as an arbitrary work value and map it internally to something sensible? Like nodes, or some factor on the number of entries? This is how SCAN refers to COUNT and I think it's a good pattern to follow.

itamarhaber · 2021-01-07T16:07:34Z

I'm all for abstracting the macro nodes from the docs and API, especially since in the future we may have more considerations. Honestly, I was struggling with this sentence https://github.com/redis/redis-doc/pull/1488/files#diff-0acba95bcc02e956409a1e8c7cbfdd9cd1e06bc35be49375c794a4cc99251329R39

That said, SCAN's COUNT is a source of confusion for many a users, exactly because of its vagueness. It would have better been called EFFORT.

oranagra · 2021-01-07T16:19:20Z

i also don't like the reference to macro nodes, but the reason i brought it up here is just a distraction. (mainly so we don't carve it in stone, referring to it again).

the current LIMIT option actually refers to stream records, not macro nodes, and i think that's actually in line with what SCAN does.
both SCAN and XTRIM will stop whenever they feel like, and the LIMIT is just an raw advise, but it refers to something the use is aware of (keys or records), and not some vague 0..100 value or some other guideline for effort.

if anyone has any concrete suggestion, please share it.
or if you think going for option #2 or #3 above is desirable, please say so and explain why you prefer it.

personally i think 3 is too restrictive, and 2 is breaking the old behavior too much.
so i think we do need to add a parameter.. only question is which? the current one looks like a good compromise to me (which is why i promoted it).

so please try to come up with a concrete alternative.

yossigo

@oranagra Your last arguments make sense, agreed.

This PR adds another trimming strategy to XADD and XTRIM named MINID (complements the existing MAXLEN). It also adds a new LIMIT argument that allows incremental trimming by repeated calls (rather than all at once). This provides the ability to trim all records older than a certain ID (which makes it possible for the user to trim by age too). Example: XTRIM mystream MINID ~ 1608540753 will trim entries with id < 1608540753, but might not trim all (because of the ~ modifier) The purpose is to ease the use of streams. many users use streams as logs and the common case is wanting a log of the last X seconds rather than a log that contains maximum X entries (new MINID vs existing MAXLEN) The new LIMIT modifier is only supported when the trim strategy uses ~. i.e. when the user asked for exact trimming, it all happens in one go (no possibility for incremental trimming). However, when ~ is provided, we trim full rax nodes, up to the limit number of records. The default limit is 100*stream_node_max_entries (used when LIMIT is not provided). I.e. this is a behavior change (even if the existing MAXLEN strategy is used). An explicit limit of 0 means unlimited (but note that it's not the default). Other changes: Refactor arg parsing code for XADD and XTRIM to use common code.

## Summary This bus was introduced by #14623 Before PR #14623, when a stream node was going to be fully removed, we would just delete the whole node directly instead of iterating through and deleting each entry. Now, with the XTRIM/XADD flags, we have to iterate and delete entries one by one. However, the implementation in issue #8169 didn’t consider the case where all entries are removed, so `p` can end up being NULL. Fixes an UndefinedBehaviorSanitizer error in `streamTrim()` when marking the last entry in a listpack as deleted. The issue occurs when performing pointer arithmetic on a NULL pointer after `lpNext()` reaches the end of the listpack. ## Solution If p is NULL, we skip the delta calculation and the calculation of new `p`.

## Summary This bus was introduced by redis#14623 Before PR redis#14623, when a stream node was going to be fully removed, we would just delete the whole node directly instead of iterating through and deleting each entry. Now, with the XTRIM/XADD flags, we have to iterate and delete entries one by one. However, the implementation in issue redis#8169 didn’t consider the case where all entries are removed, so `p` can end up being NULL. Fixes an UndefinedBehaviorSanitizer error in `streamTrim()` when marking the last entry in a listpack as deleted. The issue occurs when performing pointer arithmetic on a NULL pointer after `lpNext()` reaches the end of the listpack. ## Solution If p is NULL, we skip the delta calculation and the calculation of new `p`.

guybe7 requested a review from oranagra December 9, 2020 17:52

guybe7 commented Dec 16, 2020

View reviewed changes

src/t_stream.c Outdated Show resolved Hide resolved

zuiderkwast reviewed Dec 22, 2020

View reviewed changes

guybe7 force-pushed the trim_by_id branch from e37520e to cda7677 Compare December 23, 2020 09:47

zuiderkwast reviewed Dec 23, 2020

View reviewed changes

guybe7 added 3 commits December 30, 2020 10:05

Streams: Trim by ID

cf639bb

Fix redis#6640 redis#4450 redis#5951

CR fixes

aeafe08

CR fixes 2

54c8a24

guybe7 force-pushed the trim_by_id branch 3 times, most recently from db91d33 to 64245e4 Compare December 30, 2020 17:34

oranagra reviewed Dec 30, 2020

View reviewed changes

tests/tmp/.gitignore Outdated Show resolved Hide resolved

src/t_stream.c Outdated Show resolved Hide resolved

CR fixes 3

b1bc56b

guybe7 force-pushed the trim_by_id branch from 64245e4 to b1bc56b Compare December 30, 2020 21:39

fix default limit without approx bug

0100ca0

oranagra previously approved these changes Dec 31, 2020

View reviewed changes

oranagra changed the title ~~Streams: Trim by ID~~ XADD and XTRIM, Trim by MINID, and new LIMIT argument Dec 31, 2020

add more tests

2634bdf

itamarhaber reviewed Jan 6, 2021

View reviewed changes

src/t_stream.c Outdated Show resolved Hide resolved

src/t_stream.c Outdated Show resolved Hide resolved

src/t_stream.c Show resolved Hide resolved

src/t_stream.c Outdated Show resolved Hide resolved

madolson previously approved these changes Jan 7, 2021

View reviewed changes

modify wording in syntax error responses for conflicting arguments

1037fc3

Co-authored-by: Itamar Haber <itamar@redislabs.com>

oranagra dismissed stale reviews from madolson and themself via 1037fc3 January 7, 2021 07:03

oranagra previously approved these changes Jan 7, 2021

View reviewed changes

hickup

55552f4

oranagra dismissed their stale review via 55552f4 January 7, 2021 07:10

oranagra approved these changes Jan 7, 2021

View reviewed changes

yossigo reviewed Jan 7, 2021

View reviewed changes

itamarhaber approved these changes Jan 7, 2021

View reviewed changes

yossigo approved these changes Jan 7, 2021

View reviewed changes

oranagra merged commit 814aad6 into redis:unstable Jan 8, 2021

This was referenced Jan 8, 2021

Streams feature request: MAXAGE optional argument for XADD #4450

Closed

Feature Request: Time to Live for Redis Stream Entries #5951

Closed

Redis 6.2 rc2 #8305

Merged

mp911de mentioned this pull request Jan 13, 2021

Add support for MINID trimming strategy and the LIMIT argument to XADD and XTRIM redis/lettuce#1582

Closed

riaan53 mentioned this pull request Jan 18, 2021

Add support for MINID trimming strategy and the LIMIT argument to XADD and XTRIM redis/go-redis#1605

Closed

Leoncanva mentioned this pull request Nov 14, 2022

XTrimArgs Should Allow Limit = 0 redis/lettuce#2250

Closed

kijanawoodard mentioned this pull request Jul 4, 2025

add minId overloads StackExchange/StackExchange.Redis#2842

Closed

sundb mentioned this pull request Jan 7, 2026

Fix UBSan error in stream trim when processing last entry #14669

Merged

Conversation

guybe7 commented Dec 9, 2020 • edited by oranagra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Dec 16, 2020

Uh oh!

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guybe7 Dec 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guybe7 commented Dec 23, 2020

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guybe7 commented Dec 23, 2020

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

guybe7 commented Dec 23, 2020

Uh oh!

oranagra commented Dec 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuiderkwast commented Dec 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guybe7 commented Jan 6, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

yossigo left a comment

Choose a reason for hiding this comment

Uh oh!

oranagra commented Jan 7, 2021

Uh oh!

guybe7 commented Jan 7, 2021

Uh oh!

yossigo commented Jan 7, 2021

Uh oh!

itamarhaber commented Jan 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

guybe7 commented Dec 9, 2020 •

edited by oranagra

Loading

guybe7 Dec 23, 2020 •

edited

Loading

oranagra commented Dec 23, 2020 •

edited

Loading

zuiderkwast commented Dec 26, 2020 •

edited

Loading

itamarhaber commented Jan 7, 2021 •

edited

Loading