Skip to content

Conversation

@yangbodong22011
Copy link
Contributor

@yangbodong22011 yangbodong22011 commented Jan 7, 2021

New commands:
HRANDFIELD [<count> [WITHVALUES]]
ZRANDMEMBER [<count> [WITHSCORES]]
Algorithms are similar to the one in SRANDMEMBER.

Both return a simple bulk response when no arguments are given, and an array otherwise.
In case values/scores are requested, RESP2 returns a long array, and RESP3 a nested array.
note: in all 3 commands, the only option that also provides random order is the one with negative count.

Changes to SRANDMEMBER

  • Optimization when count is 1, we can use the more efficient algorithm of non-unique random
  • optimization: work with sds strings rather than robj

Other changes:

  • zzlGetScore: when zset needs to convert string to double, we use safer memcpy (in
    case the buffer is too small)
  • Solve a "bug" in SRANDMEMBER test: it intended to test a positive count (case 3 or
    case 4) and by accident used a negative count

@yangbodong22011 yangbodong22011 marked this pull request as draft January 7, 2021 12:41
@yangbodong22011
Copy link
Contributor Author

ziplist O(n) random algorithm is done, open it, if you have any comments, please comment.

@yangbodong22011 yangbodong22011 marked this pull request as ready for review January 12, 2021 03:32
@madolson
Copy link
Contributor

@yangbodong22011 Is this PR ready for review?

@yangbodong22011
Copy link
Contributor Author

@yangbodong22011 Is this PR ready for review?

yes

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yangbodong22011 thanks for the PR.

I made a quick first pass on the code, and added some comments.

for some comments i don't have final conclusions about some of them, we need to consult others and figure out what's the right thing to do.
(specifically talking about random order, and map response type for zset, and single bulk response when count is not provided)

In the original PR i commented that i think we may want to break the code into reusable parts, and use them in the 3 commands as shared code (that means modify the srandmemberCommand).
I think what i had in mind is that each both ziplist and dict will each have a function that implements all the logic of the 4 cases, and then Hash, Zset, and Set can just call the right one according to the encoding.
i'm not at all sure if this approach is better than what you implemented, just want to put it on the table again for re-consideration.

I Haven't reviewed the test code yet, but i think that all 4 cases (different code paths) must be covered by the (for both ziplist and non-ziplist encoding).

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yangbodong22011 i think i have all the answers.
there are quite a few changes to make but most of them are not a lot of work.
in addition as i asked earlier, i think we need to have tests to cover all the cases (different code paths).

I tried to reply to the original threads, but i see Github shows it below as new threads.
however, all the comments are also shown above in the original threads too.
so please don't look at the comment of this review (blow), instead us the ones above, or use the "files" view

* Add optional WITHSCORES and WITHVALUES
* Add variant without count argument
* Use array response instead of map
* Efficiency fixes in ziplistRandomPair and ziplistRandomPairs
* Fix type of empty reply
* Other cleanup and fixes
* ziplistRandomPairs returns sds (with length) rather than use strlen so
  they can contain null chars.
* solve leaks, uninitialized memory access, double free
it intended to test a positive count (case 3 or case 4) and by accident
used a negative count
@oranagra oranagra added approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository labels Jan 27, 2021
@oranagra
Copy link
Member

@redis/core-team please approve

itamarhaber
itamarhaber previously approved these changes Jan 27, 2021
yossigo
yossigo previously approved these changes Jan 27, 2021
@madolson
Copy link
Contributor

High level comment, we call hashes a collection of fields and values, not members and values. Shouldn't this be HRANDOMFIELD?

@oranagra
Copy link
Member

Interesting.. I guess you're right.
Sits well with the fact HLEN != SCARD
@itamarhaber WDYT?

@itamarhaber
Copy link
Member

Agreed, from the highest level, it should be called HRANDKEY following the HKEYS convention.

@oranagra
Copy link
Member

the doc is inconsistent. command arguments are saying field

HMSET key field value [field value ...]

while the one command name has keys

HKEYS key

also what about zset? should it rremain ZRANDMEMBER?

ZADD key [NX|XX] [GT|LT] [CH] [INCR] score member [score member ...]

@oranagra
Copy link
Member

i think that HKEYS is the exception.. everywhere else refers to them as fields.
so i vote for HRANDFIELD and ZRANDMEMBER

yossigo
yossigo previously approved these changes Jan 28, 2021
@oranagra oranagra changed the title Add hrandmember and zrandmember Add hrandfield and zrandmember Jan 28, 2021
@itamarhaber
Copy link
Member

I upvote HRANDFIELD

@itamarhaber
Copy link
Member

Docs ready for inspection at redis/redis-doc#1504

madolson
madolson previously approved these changes Jan 28, 2021
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking from me, looks pretty good

* when count is 1, we can use the more efficient algo of non-unique rand
* [hash|zset]TypeRandomElement will not create heap based sds, instead
  it returns a ziplisetEntry (even if source is a dict)
* later we can add the value from the ziplistEntry directly to the reply
  buffers, without extra heap alloc or memcpy
* CASE3 still create heap sds, and the reply loop was moved into it, so
  that the code of CASE4 can be optimized
* CASE4 doesn't allocate heap strings for the value, it only adds heap
  string for the key, and if not in the dup detection dict, it just
  replies directly.
* minor optimization to SRANDMEMBER: work with sds strings rather than robj
* when zset needs to convert string to double, we use safer memcpy (in
  case the buffer is too small)
* add HRANDFIELD and ZRANDMEMBER to the corrupt dump fuzzer
@oranagra oranagra dismissed stale reviews from madolson and yossigo via b145d25 January 28, 2021 22:49
@oranagra
Copy link
Member

@soloestoy @madolson please see last commit for some optimizations you requested and more.

note that it's possible to do even better some day.

  • CASE3 doesn't need to duplicate all the data, it can create a shallow copy.
  • CASE4 doesn't need to duplicate the keys, it can use references to the keys of the main object.

The challenge is with the mixed ziplist and dict encoding, either we split the implementation to handle each separately.
Or we can add the ziplist entry pointer (p) to ziplistEntry, and create a special type of dict with hash function and compare function that works on the ziplistEntry.
The compare function only needs to match the pointers, since they're a shallow copy of either the ziplist of dict of the original object).
The hash function may need to convert a long to string.

but anyway, all of that inefficiency is already present in SRANDMEMBER, and i'm not sure it's right to add all this complexity to redis for this kinda gain (still O(N)), on a command that's probably not used that much.

madolson
madolson previously approved these changes Jan 28, 2021
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this, as you said, always room to improve later.

@oranagra oranagra merged commit b9a0500 into redis:unstable Jan 29, 2021
@oranagra oranagra changed the title Add hrandfield and zrandmember Add HRANDFIELD and ZRANDMEMBER. improvements to SRANDMEMBER Jan 29, 2021
itamarhaber added a commit to redis/redis-doc that referenced this pull request Jan 29, 2021
@oranagra oranagra mentioned this pull request Jan 31, 2021
@zuiderkwast
Copy link
Contributor

zuiderkwast commented Feb 5, 2021

It seems this PR introduced the following compile warnings (GCC 10.2.1):

ziplist.c: In function ‘ziplistRandomPairs’:
ziplist.c:1534:16: warning: ‘vlval’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 1534 |     dest->lval = lval;
      |     ~~~~~~~~~~~^~~~~~
ziplist.c:1543:22: note: ‘vlval’ was declared here
 1543 |     long long klval, vlval;
      |                      ^~~~~
ziplist.c:1534:16: warning: ‘klval’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 1534 |     dest->lval = lval;
      |     ~~~~~~~~~~~^~~~~~
ziplist.c:1543:15: note: ‘klval’ was declared here
 1543 |     long long klval, vlval;
      |               ^~~~~
ziplist.c:1533:16: warning: ‘vlen’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 1533 |     dest->slen = len;
      |     ~~~~~~~~~~~^~~~~
ziplist.c:1542:24: note: ‘vlen’ was declared here
 1542 |     unsigned int klen, vlen;
      |                        ^~~~
ziplist.c:1533:16: warning: ‘klen’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 1533 |     dest->slen = len;
      |     ~~~~~~~~~~~^~~~~
ziplist.c:1542:18: note: ‘klen’ was declared here
 1542 |     unsigned int klen, vlen;
      |                  ^~~~
ziplist.c:1532:16: warning: ‘value’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 1532 |     dest->sval = val;
      |     ~~~~~~~~~~~^~~~~
ziplist.c:1541:30: note: ‘value’ was declared here
 1541 |     unsigned char *p, *key, *value;
      |                              ^~~~~

@oranagra
Copy link
Member

oranagra commented Feb 5, 2021

solved in #8444

JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Mar 2, 2021
New commands:
`HRANDFIELD [<count> [WITHVALUES]]`
`ZRANDMEMBER [<count> [WITHSCORES]]`
Algorithms are similar to the one in SRANDMEMBER.

Both return a simple bulk response when no arguments are given, and an array otherwise.
In case values/scores are requested, RESP2 returns a long array, and RESP3 a nested array.
note: in all 3 commands, the only option that also provides random order is the one with negative count.

Changes to SRANDMEMBER
* Optimization when count is 1, we can use the more efficient algorithm of non-unique random
* optimization: work with sds strings rather than robj

Other changes:
* zzlGetScore: when zset needs to convert string to double, we use safer memcpy (in
  case the buffer is too small)
* Solve a "bug" in SRANDMEMBER test: it intended to test a positive count (case 3 or
  case 4) and by accident used a negative count

Co-authored-by: xinluton <xinluton@qq.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants