XADD and XTRIM, Trim by MINID, and new LIMIT argument#8169
XADD and XTRIM, Trim by MINID, and new LIMIT argument#8169oranagra merged 8 commits intoredis:unstablefrom
Conversation
|
WOW. that's a lot of code / changes.
|
zuiderkwast
left a comment
There was a problem hiding this comment.
Very nice!
It could be worth mentioning the refactoring of args parsing in the PR description, since it affects many lines but it's mostly moved code, so the change isn't actually that large.
Is MAXNODES meant to be an undocumented option? It seems to provide the equivalent of SCAN's COUNT but it's hard to know how many entries there are per node.
Most of my comments are about documentation. It would be good to document how to call XTRIM repeatedly until everything is trimmed, like how SCAN is called repeatedly until it returns 0.
Disclaimer: I'm not very familiar with Redis source code.
src/t_stream.c
Outdated
There was a problem hiding this comment.
Typo? With the approx option, the stream may contain entries with IDs < 'id'.
src/t_stream.c
Outdated
There was a problem hiding this comment.
Mention the effect and purpose of maxnodes in the documentation above the function?
src/t_stream.c
Outdated
There was a problem hiding this comment.
Nice refactoring (breaking out the XADD args parsing into a separate function and unifying it with the XTRIM args parsing). This might also make it easier to reuse code for implementing a stream module API in the future.
But please keep a comment about what i is here. It was declared like this:
int i = 2; /* This is the first argument position where we could
find an option, or the ID. */
src/t_stream.c
Outdated
There was a problem hiding this comment.
Some documentation of this function would be nice, e.g. the meaning of the args and the return value.
Perhaps a matter of taste, but maybe return i+1 on success instead of return i? Then the return value points to the position after the args handled by this function, which might easier to document and understand. In xaddCommand(), it would be the position of the first field arg.
src/t_stream.c
Outdated
There was a problem hiding this comment.
Logic seems right (AFAIK). Just an idea of structuring: Can it be done like
- check for MAXLEN/MINID/MAXAGE and set
args->trim_strategy. - common code for "~" and "="
- various checks (without using strcasecmp() again)
There was a problem hiding this comment.
the problem is that i don't know where exactly the trim_strategy will be, because i could get NOMKSTREAM for example or *
if this function would handle only trim strategies (i.e. the first arg is always the trim strategy) then it would make more sense to parse like you suggested
There was a problem hiding this comment.
I think you're right. Refactoring like this won't make it simpler.
src/t_stream.c
Outdated
There was a problem hiding this comment.
Needs to be documented.
src/t_stream.c
Outdated
There was a problem hiding this comment.
Also add MAXNODES? Even if it's undocumented, it should be documented here in the source code IMO.
src/t_stream.c
Outdated
There was a problem hiding this comment.
Also add MAXNODES here?
I assume MAXNODES refers to the number of radix tree nodes. Approximately how many entries are there per node?
Btw, adding a default value of 100 for MAXNODES affects the legacy MAXLEN option, so even if = is given, MAXLEN will not enforce the maximum length of the stream anymore. (Instead MAXLEN actually specifies the minimum length to keep when trimming.) I think the change is to the better since it limits the execution time in corner cases, but I think it needs to be documented.
The docs for MAXAGE and MINID should also have some note saying something like that the limit is not enforced (for performance reasons or whatever, even if MAXNODES isn't documented) even when = is given, but that the actual guarantee is that entries back to the given age or ID are kept when trimming.
|
@zuiderkwast thanks for reviewing, looks like you put some time into it so thank you! going through your comment now... |
zuiderkwast
left a comment
There was a problem hiding this comment.
That was a fast follow-up! Good documentation!
The rest is a matter of taste, I suppose. I'm not a maintainer or anything, so don't take my comments for the truth.
src/t_stream.c
Outdated
There was a problem hiding this comment.
- Missing end quote after the first word
'maxnodes - Period at end of sentense.
src/t_stream.c
Outdated
There was a problem hiding this comment.
the should -> that should
src/t_stream.c
Outdated
There was a problem hiding this comment.
When seeing them together like this, an assymentry becomes apparent: "MAX" in MAXNODES refers to what to delete, but "MAX" in MAXLEN and MAXAGE refers to what to keep.
Maybe MAXNODES can be renamed to something to avoid this confusion? Ideas: LIMIT, TRIMLIMIT, MAXTRIM, TRIMNODES.
src/t_stream.c
Outdated
There was a problem hiding this comment.
Oops, I didn't realize that ID is also after what's handled by streamParseAddOrTrimArguments()... So maybe I change my mind and instead I think streamParseAddOrTrimArguments() should just return i. The point of the previous comment was to return the position after the args handled by the function but I didn't realize that return i already did that, so I'm sorry for this review mistake!
If you agree, the code in xaddCommand() would be something like the below (like in the first version, just renaming i to idpos). What do you think?
int idpos = streamParseAddOrTrimArguments(c, &parsed_args, 1);
if (idpos < 0) return; /* streamParseAddOrTrimArguments already replied. */
int field_pos = idpos+1;|
@zuiderkwast pushed another commit, decided to go with |
|
@redis/core-team and specifically @yossigo this PR introduces a new arg for trimming, LIMIT which limits the number of trimmed entries. we have 3 options:
|
|
some additional context for whoever's missing it (don't bother to read the code, it's not final yet, and includes some refactoring to reduce code duplication)
We already have a LIMIT / COUNT argument for many other commands, which limits the amount of work one call does (like in SCAN). One option ( Second option ( Last option ( |
|
I like
MAXAGE as a simple short-hand for MINID can be convenient for users. They can put a fixed number in the command instead of doing a computation with current time each time. |
db91d33 to
64245e4
Compare
madolson
left a comment
There was a problem hiding this comment.
API Looks fine to me. Did not look at the code.
Co-authored-by: Itamar Haber <itamar@redislabs.com>
yossigo
left a comment
There was a problem hiding this comment.
@guybe7 Sorry for responding late, this was lost in the noise. I didn't see any continuation of the discussion regarding the limit options, but I understand the explicit option was chosen. Did we document why?
I actually lean towards not exposing it, or at least define it as an abstract hint of the amount of work rather than anything more specific.
|
@yossigo after a week with no response to our beg for feedback, we decided to proceed with what seemed best to us. we can still re-open it for discussion, but let's make it a quick one. p.s. please note that the docs already have some reference to "macro nodes" and "radix tree". (if it helps our decision in some way) |
|
worth mentioning the the LIMIT is given in number of entries, not in nodes |
|
I'm all for abstracting the macro nodes from the docs and API, especially since in the future we may have more considerations. Honestly, I was struggling with this sentence https://github.com/redis/redis-doc/pull/1488/files#diff-0acba95bcc02e956409a1e8c7cbfdd9cd1e06bc35be49375c794a4cc99251329R39 That said, |
|
i also don't like the reference to macro nodes, but the reason i brought it up here is just a distraction. (mainly so we don't carve it in stone, referring to it again). the current LIMIT option actually refers to stream records, not macro nodes, and i think that's actually in line with what SCAN does. if anyone has any concrete suggestion, please share it. personally i think 3 is too restrictive, and 2 is breaking the old behavior too much. so please try to come up with a concrete alternative. |
This PR adds another trimming strategy to XADD and XTRIM named MINID (complements the existing MAXLEN). It also adds a new LIMIT argument that allows incremental trimming by repeated calls (rather than all at once). This provides the ability to trim all records older than a certain ID (which makes it possible for the user to trim by age too). Example: XTRIM mystream MINID ~ 1608540753 will trim entries with id < 1608540753, but might not trim all (because of the ~ modifier) The purpose is to ease the use of streams. many users use streams as logs and the common case is wanting a log of the last X seconds rather than a log that contains maximum X entries (new MINID vs existing MAXLEN) The new LIMIT modifier is only supported when the trim strategy uses ~. i.e. when the user asked for exact trimming, it all happens in one go (no possibility for incremental trimming). However, when ~ is provided, we trim full rax nodes, up to the limit number of records. The default limit is 100*stream_node_max_entries (used when LIMIT is not provided). I.e. this is a behavior change (even if the existing MAXLEN strategy is used). An explicit limit of 0 means unlimited (but note that it's not the default). Other changes: Refactor arg parsing code for XADD and XTRIM to use common code.
## Summary This bus was introduced by #14623 Before PR #14623, when a stream node was going to be fully removed, we would just delete the whole node directly instead of iterating through and deleting each entry. Now, with the XTRIM/XADD flags, we have to iterate and delete entries one by one. However, the implementation in issue #8169 didn’t consider the case where all entries are removed, so `p` can end up being NULL. Fixes an UndefinedBehaviorSanitizer error in `streamTrim()` when marking the last entry in a listpack as deleted. The issue occurs when performing pointer arithmetic on a NULL pointer after `lpNext()` reaches the end of the listpack. ## Solution If p is NULL, we skip the delta calculation and the calculation of new `p`.
## Summary This bus was introduced by redis#14623 Before PR redis#14623, when a stream node was going to be fully removed, we would just delete the whole node directly instead of iterating through and deleting each entry. Now, with the XTRIM/XADD flags, we have to iterate and delete entries one by one. However, the implementation in issue redis#8169 didn’t consider the case where all entries are removed, so `p` can end up being NULL. Fixes an UndefinedBehaviorSanitizer error in `streamTrim()` when marking the last entry in a listpack as deleted. The issue occurs when performing pointer arithmetic on a NULL pointer after `lpNext()` reaches the end of the listpack. ## Solution If p is NULL, we skip the delta calculation and the calculation of new `p`.
Fix #6640 #4450 #5951
This PR adds an additional trimming strategy to XADD and XTRIM named MINID (complements the existing MAXLEN).
It also adds a new LIMIT argument that allows incremental trimming by repeated calls (rather than all at once).
This provides the ability to trim all records older than a certain ID (which makes it possible for the user to trim by age too).
Example:
XTRIM mystream MINID ~ 1608540753will trim entries with id < 1608540753, but might not trim all (because of the~modifier)The purpose is to ease the use of streams. many users use streams as logs and the common case is wanting a log of the last X seconds rather than a log that contains maximum X entries (new MINID vs existing MAXLEN)
The new LIMIT modifier is only supported when the trim strategy uses
~. i.e. when the user asked for exact trimming, it all happens in one go (no possibility for incremental trimming).However, when
~is provided, we trim full rax nodes, up to the limit number of records.The default limit is
100*stream_node_max_entries(used when LIMIT is not provided).I.e. this is a behavior change (even if the existing MAXLEN strategy is used).
An explicit limit of 0 means unlimited (but note that it's not the default).
Other changes: