Skip to content

Conversation

@oranagra
Copy link
Member

Sanitize dump payload: ziplist, listpack, zipmap, intset, stream

When loading an encoded payload we will at least do a shallow validation to
check that the size that's encoded in the payload matches the size of the
allocation.
This let's us later use this encoded size to make sure the various offsets
inside encoded payload don't reach outside the allocation, if they do, we'll
assert/panic, but at least we won't segfault or smear memory.

We can also do 'deep' validation which runs on all the records of the encoded
payload and validates that they don't contain invalid offsets. This lets us
detect corruptions early and reject a RESTORE command rather than accepting
it and asserting (crashing) later when accessing that payload via some command.

configuration:

  • adding ACL flag skip-sanitize-payload
  • adding config sanitize-dump-payload [yes/no/clients]

For now, we don't have a good way to ensure MIGRATE in cluster resharding isn't
being slowed down by these sanitation, so i'm setting the default value to no,
but later on it should be set to clients by default.

changes:

  • changing rdbReportError not to exit in RESTORE command
  • adding a new stat to be able to later check if cluster MIGRATE isn't being
    slowed down by sanitation.

Fuzz tester and fixes for segfaults and leaks it exposed

The test creates keys with various encodings, DUMP them, corrupt the payload and RESTORES it.
It utilizes the recently added use-exit-on-panic config to distinguish between asserts and segfaults.
If the restore succeeds, it runs random commands on the key to attempt to trigger a crash.
It runs in two modes, one with deep sanitation enabled and one without.
In the first one we don't expect any assertions or segfaults, in the second one we expect assertions, but no segfaults.
We also check for leaks and invalid reads using valgrind, and if we find them we print the commands that lead to that issue.

Changes in the code (other than the test):

  • Replace a few NPD (null pointer deference) flows and division by zero with an assertion, so that it doesn't fail the test. (since we set the server to use exit rather than abort on assertion).
  • Fix quite a lot of flows in rdb.c that could have lead to memory leaks in RESTORE command (since it now responds with an error rather than panic)
  • Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need to bother with faking a valid checksum
  • Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to run in the crash report (see comments in the code)

test suite infra improvements:

  • be able to run valgrind checks before the process terminates
  • rotate log files when restarting servers

@oranagra
Copy link
Member Author

Notes:

  1. The fuzz test is currently excluded from the normal CI tests since it may sometime still fail from time to time (finding new issues).
    Alternative is to let it run for a few hours/days until we see it comes back clean.
  2. The sanitation is currently configured to be disabled by default, this is because we don't yet have a way to skip them on MIGRATE that is done during cluster resharding.
    A solution for that (in another PR) is to have the cluster nodes exchange some secret between them on the cluster bus, which they'll use to authenticate before sending RESTORE. This would also mean they don't need to get the AUTH/AUTH2 argument of the MIGRATE command.

@redis/core-team please have a look and comment.

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started looking through it, I got through some of the data structures.

src/debug.c Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed sometimes we add testing stuff here, and other times you add them as configs. Is there a reason we don't always add them here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the configs that we want some people to actually use in production are put in CONFIG, and the ones we only add for the purpose of the test suite are put in DEBUG.
The two important differences IMHO are:

  1. config options should later be supported forever, we can't remove them. (e.g. no one can argue that we need to keep supporting SET-ACTIVE-EXPIRE forever)
  2. config options also work right on startup (even before redis starts loading an AOF). this is the reason we sometime put test-suite tunables as config rather than DEBUG (e.g. key-load-delay)

other considerations may be:

  • this specific config is dangerous, we don't want it to be enabled with config (we assume DEBUG is blocked in production clusters anyway).

src/listpack.c Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever called with deep = 1? Are you just keeping the code around in case someone else uses listpack?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in the future, when listpacks are used by other data types, they can use this method.
initially i used it to validate the listpacks for streams in rdb.c, but i later realized that i need to validate that the actual records in the listpack make sense in the context of a stream to avoid runtime crashes (added streamValidateListpackIntegrity), and i didn't want to run over all the records in the listpack twice, so i made streamValidateListpackIntegrity do both the validation of the listpack structure and the stream in one pass.

@oranagra
Copy link
Member Author

rebased, and added 8 new commits (by topic)

@oranagra
Copy link
Member Author

@redis/core-team i don't expect you all to review all the code changes, i'll do that with Yossi. but please post a message that approves the feature and it's interfaces. i.e. have a look at the top comment here and at redis.conf.

@oranagra
Copy link
Member Author

oranagra commented Dec 1, 2020

Some measurements i did that show degradation (positive is improvement).
small differences (up to 5%) are just jitters of my testing in most cases (specifically for LPUSH and HSET runs that were short, just to prepare for the other benchmarks), for some i attempted to take the best of 3 runs each of one minute (commands below) notably LPOS, LINDEX, HGET, HGETALL, XRANGE, etc.

Note that initially LPOS and HGET showed severe (-25%) degradation, and after some optimizations effort (last commit) i was able to re-gain the performance loss and even improve.

Obviously the biggest impact is on the (now optional) RESTORE that does full (deep) sanitization

before after degradation
List
PUSH 1848010.56 1758492.2 -4.84%
LPOS 5544.92 5406.69 -2.49%
LINDEX 350559.9 390361.21 11.35%
RANGE 406075.77 397959.9 -2.00%
POP 1883089.16 1727988.52 -8.24%
RESTORE shallow 75303.99 82091.14 9.01%
RESTORE deep 75303.99 14895.89 -80.22%
Hash
HSET
HGET 250775.43 296039.08 18.05%
HGETALL 13854.81 13165.33 -4.98%
RESTORE shallow 92859.49 97783.37 5.30%
RESTORE deep 89317.43 10936.4 -87.76%
Stream
XADD 705336.46 744797.71 5.59%
XRANGE 20371.9 19739.81 -3.10%
XDEL 196474.13 206776.83 5.24%
RESTORE shallow 409138.77 408498.63 -0.16%
RESTORE deep 409138.77 216977.94 -46.97%
listpack search 75684.98 75181.56 -0.67%
Details
memtier_benchmark --pipeline=100 --command="lindex list 1234" --hide-histogram --test-time 60 -x 3
--
memtier_benchmark --pipeline=100 --command="lrange list 100 120" --hide-histogram --test-time 60 -x 3
memtier_benchmark --pipeline=100 --command="lpop list" -n 100000 --hide-histogram
memtier_benchmark --pipeline=100 --command='restore list2 0 "\x0e\x0f\xc3@TU\x8f\x04\x8f\x15\x00\x00\x88 \x03\b\x96\x03\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xd3\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\xc3@r_\xfd\x04\xfd\x1f\x00\x00\xf6 \x03\bS\x05\x00\x04asdf\x06\xe0\xff\x05\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xff\a\xe1\xf1\a\x01f\xff\t\x00*j\x9e\x84\|\xf5\|\xb2" replace'  --hide-histogram  --test-time 60 -x 3
redis-cli config set sanitize-dump-payload yes
 
rm dump.rdb; src/redis-server --save "" &
memtier_benchmark --pipeline=100 --command="HSET hash __key__ asdf" --command-key-pattern=S -n 500 -c 1 -t 1 --hide-histogram
memtier_benchmark --pipeline=100 --command="HGET hash non" --hide-histogram --test-time 60 -x 3
memtier_benchmark --pipeline=100 --command="HGETALL hash" --hide-histogram --test-time 60 -x 3
memtier_benchmark --pipeline=100 --command='restore hash2 0 "\r\xc3I\xefd\xb9\x04\xb9$\x00\x00\xb2 \x03\x13\xe8\x03\x00\tmemtier-0\x0b\x04asdf\x06\xe0\x00\x10\x001\xe0\a\x10\x002\xe0\a\x10\x003\xe0\a\x10\x004\xe0\a\x10\x005\xe0\a\x10\x006\xe0\a\x10\x007\xe0\a\x10\x008\xe0\a\x10\x009\xa0\x10\x00\n\xc0\xa9\x0210\x0c\x80\xaa\xe0\x01\x11\x001\xe0\b\x11\x002\xe0\b\x11\x003\xe0\b\x11\x004\xe0\b\x11\x005\xe0\b\x11\x006\xe0\b\x11\x007\xe0\b\x11\x008\xe0\b\x11\x009\xe0\a\x11\x002\xe0\b\xb3\x002\xe0\b\xb3\x002\xe0\b\xb3\x002\xe0\b\xb3\x002\xe0\b\xb3\x002\xe0\b\xb3\x002\xe0\b\xb3\x002\xe0\b\xb3\x002\xe0\b\xb3\x002\xe0\b\xb3\x003\xe0\b\xb3\x003\xe0\b\xb3\x003\xe0\b\xb3\x003\xe0\b\xb3\x003\xe0\b\xb3\x003\xe0\b\xb3\x003\xe0\b\xb3\x003\xe0\b\xb3\x003\xe0\b\xb3\x003\xe0\b\xb3\x004\xe0\b\xb3\x004\xe0\b\xb3\x004\xe0\b\xb3\x004\xe0\b\xb3\x004\xe0\b\xb3\x004\xe0\b\xb3\x004\xe0\b\xb3\x004\xe0\b\xb3\x004\xe0\b\xb3\x004\xe0\b\xb3\x005\xe0\b\xb3\x005\xe0\b\xb3\x005\xe0\b\xb3\x005\xe0\b\xb3\x005\xe0\b\xb3\x005\xe0\b\xb3\x005\xe0\b\xb3\x005\xe0\b\xb3\x005\xe0\b\xb3\x005\xe0\b\xb3\x006\xe0\b\xb3\x006\xe0\b\xb3\x006\xe0\b\xb3\x006\xe0\b\xb3\x006\xe0\b\xb3\x006\xe0\b\xb3\x006\xe0\b\xb3\x006\xe0\b\xb3\x006\xe0\b\xb3\x006\xe0\b\xb3\x007\xe0\b\xb3\x007\xe0\b\xb3\x007\xe0\b\xb3\x007\xe0\b\xb3\x007\xe0\b\xb3\x007\xe0\b\xb3\x007\xe0\b\xb3\x007\xe0\b\xb3\x007\xe0\b\xb3\x007\xe0\b\xb3\x008\xe0\b\xb3\x008\xe0\b\xb3\x008\xe0\b\xb3\x008\xe0\b\xb3\x008\xe0\b\xb3\x008\xe0\b\xb3\x008\xe0\b\xb3\x008\xe0\b\xb3\x008\xe0\b\xb3\x008\xe0\b\xb3\x009\xe0\b\xb3\x009\xe0\b\xb3\x009\xe0\b\xb3\x009\xe0\b\xb3\x009\xe0\b\xb3\x009\xe0\b\xb3\x009\xe0\b\xb3\x009\xe0\b\xb3\x009\xe0\b\xb3\x009\xc0\xb3\x00\x0b\xe6\x01S\x010\r\x86T\xe0\x02\x12\x001\xe0\t\x12\x002\xe0\t\x12\x003\xe0\t\x12\x004\xe0\t\x12\x005\xe0\t\x12\x006\xe0\t\x12\x007\xe0\t\x12\x008\xe0\t\x12\x009\xe0\b\x12\x001\xe0\t\xbd\x001\xe0\t\xbd\x001\xe0\t\xbd\x001\xe0\t\xbd\x001\xe0\t\xbd\x001\xe0\t\xbd\x001\xe0\t\xbd\x001\xe0\t\xbd\x001\xe0\t\xbd\x001\xe0\t\xbd\x002\xe0\t\xbd\x002\xe0\t\xbd\x002\xe0\t\xbd\x002\xe0\t\xbd\x002\xe0\t\xbd\x002\xe0\t\xbd\x002\xe0\t\xbd\x002\xe0\t\xbd\x002\xe0\t\xbd\x002\xe0\t\xbd\x003\xe0\t\xbd\x003\xe0\t\xbd\x003\xe0\t\xbd\x003\xe0\t\xbd\x003\xe0\t\xbd\x003\xe0\t\xbd\x003\xe0\t\xbd\x003\xe0\t\xbd\x003\xe0\t\xbd\x003\xe0\t\xbd\x004\xe0\t\xbd\x004\xe0\t\xbd\x004\xe0\t\xbd\x004\xe0\t\xbd\x004\xe0\t\xbd\x004\xe0\t\xbd\x004\xe0\t\xbd\x004\xe0\t\xbd\x004\xe0\t\xbd\x004\xe0\t\xbd\x005\xe0\t\xbd\x005\xe0\t\xbd\x005\xe0\t\xbd\x005\xe0\t\xbd\x005\xe0\t\xbd\x005\xe0\t\xbd\x005\xe0\t\xbd\x005\xe0\t\xbd\x005\xe0\t\xbd\x005\xe0\t\xbd\x006\xe0\t\xbd\x006\xe0\t\xbd\x006\xe0\t\xbd\x006\xe0\t\xbd\x006\xe0\t\xbd\x006\xe0\t\xbd\x006\xe0\t\xbd\x006\xe0\t\xbd\x006\xe0\t\xbd\x006\xe0\t\xbd\x007\xe0\t\xbd\x007\xe0\t\xbd\x007\xe0\t\xbd\x007\xe0\t\xbd\x007\xe0\t\xbd\x007\xe0\t\xbd\x007\xe0\t\xbd\x007\xe0\t\xbd\x007\xe0\t\xbd\x007\xe0\t\xbd\x008\xe0\t\xbd\x008\xe0\t\xbd\x008\xe0\t\xbd\x008\xe0\t\xbd\x008\xe0\t\xbd\x008\xe0\t\xbd\x008\xe0\t\xbd\x008\xe0\t\xbd\x008\xe0\t\xbd\x008\xe0\t\xbd\x009\xe0\t\xbd\x009\xe0\t\xbd\x009\xe0\t\xbd\x009\xe0\t\xbd\x009\xe0\t\xbd\x009\xe0\t\xbd\x009\xe0\t\xbd\x009\xe0\t\xbd\x009\xe0\t\xbd\x009\xe0\b\xbd\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x002\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x003\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xe7\tk\x004\xa7k\x01f\xff\t\x00\xe5\x03\a0\|\xfd\x9f\xa7" replace'  --hide-histogram  --test-time 60 -x 3
redis-cli config set sanitize-dump-payload yes
 
rm dump.rdb; src/redis-server --save "" &
memtier_benchmark --pipeline=100 --command="XADD mystream * name oran" -n 100000 --hide-histogram
memtier_benchmark --pipeline=100 --command="XREVRANGE mystream + - COUNT 100"  --hide-histogram   --test-time 60 -x 3
redis-cli xrange mystream -  + \|tail # look up the entry before the last
memtier_benchmark --pipeline=100 --command="XDEL mystream 1600275596103-98"  --hide-histogram    --test-time 60 -x 3
memtier_benchmark --pipeline=100 --command='restore mystream2 0 "\x0f\x01\x10\x00\x00\x01v\x1b6\xc2J\x00\x00\x00\x00\x00\x00\x00\x00\xc3A\x1bB\xd1\x14\xd1\x02\x00\x00\xff\x002\x01\x00\x01\x01\x01\x84name\x05\x00\x01\x02 \r\b\x00\x01\x84oran\x05\x04`\r \x1b\xe0\x02\r\x00\x02\xe0\x04\r\x00\x03\xe0\x04\r\x00\x04\xe0\x04\r\x00\x05\xe0\x02\r\x00\x01\xe0\x04S@\x00\xe0\x01S\x00\x01\xe0\x04S\x00\x01\xe0\x04S\x00\x01\xe0\x04S\x00\x01\xe0\x06S\x00\x06\xe0\x04E\x00\a\xe0\x04\r\x00\b\xe0\x04\r\x00\t\xe0\x04\r\x00\n\xe0\x04\r\x00\x0b\xe0\x04\r\x00\x0c\xe0\x04\r\x00\r\xe0\x04\r\x00\x0e\xe0\x04\r\x00\x0f\xe0\x04\r\x00\x10\xe0\x04\r\x00\x11\xe0\x04\r\x00\x12\xe0\x04\r\x00\x13\xe0\x04\r\x00\x14\xe0\x04\r\x00\x15\xe0\x04\r\x00\x16\xe0\x04\r\x00\x17\xe0\x04\r\x00\x18\xe0\x04\r\x00\x19\xe0\x04\r\x00\x1a\xe0\x04\r\x00\x1b\xe0\x04\r\x00\x1c\xe0\x04\r\x00\x1d\xe0\x04\r\x00\x1e\xe0\x04\r\x00\x1f\xe0\x04\r\x00 \xe0\x04\r\x00!\xe0\x04\r\x00\"\xe0\x04\r\x00#\xe0\x04\r\x00$\xe0\x04\r\x00%\xe0\x04\r\x00&\xe0\x04\r\x00\x27\xe0\x04\r\x00\(\xe0\x04\r\x00\)\xe0\x04\r\x00*\xe0\x04\r\x00+\xc0\r\x01\x01\xff2\x81\x00\x00\x01v\x1b6\xc2K+\x00\t\x00[_\x0c*\xd5\xd2\xf4\xc4" replace'  --hide-histogram  --test-time 60 -x 3
redis-cli config set sanitize-dump-payload yes
memtier_benchmark --command="XADD mystream3 * name oran" -c1 -t1 -n 99 --hide-histogram
redis-cli xinfo stream mystream3 # look up the last entry and +1 after it
memtier_benchmark --command="xrange mystream3  1606805984708-8 1606805984708-9"  --hide-histogram    --test-time 60 -x 3

@oranagra
Copy link
Member Author

oranagra commented Dec 2, 2020

pushed one final commit with code review fixes.
re-tested the HGETALL benchmark, the only reason it showed regression is flakiness. re-testing it showed improvement in many cases..
gonna do a rebase now and squash a few commits..

@oranagra
Copy link
Member Author

oranagra commented Dec 2, 2020

@yossigo i'm done re-ordering and squashing my commits.
please approve.

@oranagra oranagra requested a review from a team December 2, 2020 20:12
@oranagra oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten labels Dec 3, 2020
itamarhaber
itamarhaber previously approved these changes Dec 4, 2020
src/acl.c Outdated
Comment on lines 94 to 95
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a trivial comment, but the only importance of the order of these flags is in ACLDescribeUser, which would make this come first. I'd rather have it towards the end :)

When loading an encoded payload we will at least do a shallow validation to
check that the size that's encoded in the payload matches the size of the
allocation.
This let's us later use this encoded size to make sure the various offsets
inside encoded payload don't reach outside the allocation, if they do, we'll
assert/panic, but at least we won't segfault or smear memory.

We can also do 'deep' validation which runs on all the records of the encoded
payload and validates that they don't contain invalid offsets. This lets us
detect corruptions early and reject a RESTORE command rather than accepting
it and asserting (crashing) later when accessing that payload via some command.

configuration:
- adding ACL flag skip-sanitize-payload
- adding config sanitize-dump-payload [yes/no/clients]

For now, we don't have a good way to ensure MIGRATE in cluster resharding isn't
being slowed down by these sanitation, so i'm setting the default value to `no`,
but later on it should be set to `clients` by default.

changes:
- changing rdbReportError not to `exit` in RESTORE command
- adding a new stat to be able to later check if cluster MIGRATE isn't being
  slowed down by sanitation.
- improve stream rdb encoding test to include more types of stream metadata
- add test to cover various ziplist encoding entries (although it does
  look like the stress test above it is able to find some too
- add another test for ziplist encoding for hash with full sanitization
- add similar ziplist encoding tests for list
…it exposed

The test creates keys with various encodings, DUMP them, corrupt the payload
and RESTORES it.
It utilizes the recently added use-exit-on-panic config to distinguish between
 asserts and segfaults.
If the restore succeeds, it runs random commands on the key to attempt to
trigger a crash.

It runs in two modes, one with deep sanitation enabled and one without.
In the first one we don't expect any assertions or segfaults, in the second one
we expect assertions, but no segfaults.
We also check for leaks and invalid reads using valgrind, and if we find them
we print the commands that lead to that issue.

Changes in the code (other than the test):
- Replace a few NPD (null pointer deference) flows and division by zero with an
  assertion, so that it doesn't fail the test. (since we set the server to use
  `exit` rather than `abort` on assertion).
- Fix quite a lot of flows in rdb.c that could have lead to memory leaks in
  RESTORE command (since it now responds with an error rather than panic)
- Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need
  to bother with faking a valid checksum
- Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to
  run in the crash report (see comments in the code)
- fix a missing boundary check in lzf_decompress

test suite infra improvements:
- be able to run valgrind checks before the process terminates
- rotate log files when restarting servers
when using --baseport to run two tests suite in parallel (different
folders), we need to also make sure the port used by the testsuite to
communicate with it's workers is unique. otherwise the attept to find a
free port connects to the other test suite and messes it.

maybe one day we need to attempt to bind, instead of connect when tring
to find a free port.
If RESTORE passes successfully with full sanitization, we can't affort
to crash later on assertion due to duplicate records in a hash when
converting it form ziplist to dict.
This means that when doing full sanitization, we must make sure there
are no duplicate records in any of the collections.
When RDB input attempts to make a huge memory allocation that fails,
RESTORE should fail gracefully rather than die with panic
First, if the ziplist header is surely inside the ziplist, do fast path
decoding rather than the careful one.

In that case, streamline the encoding if-else chain to be executed only
once, and the encoding validity tested at the end.

encourage inlining

likely / unlikely hints for speculative execution

Assertion used _exit(1) to tell the compiler that the code after them is
not reachable and get rid of warnings.

But in some cases assertions are placed inside tight loops, and any
piece of code in them can slow down execution (code cache and other
reasons), instead using either abort() or better yet, unreachable
builtin.
@oranagra oranagra merged commit e288430 into redis:unstable Dec 6, 2020
@oranagra oranagra deleted the payload_sanitation branch December 6, 2020 12:54
@oranagra oranagra mentioned this pull request Jan 13, 2021
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Sep 18, 2022
Starting from 6.2, after ACL SETUSER user reset, the user
will carry the sanitize-payload flag. I guess it was added
by mistake in redis#7807, reset means reverting to default.

Added tests to make sure it won't be broken in the future,
and updated the comment in ACLSetUser.

Fixes redis#11278
oranagra pushed a commit that referenced this pull request Sep 22, 2022
…11279)

Starting from 6.2, after ACL SETUSER user reset, the user
will carry the sanitize-payload flag. It was added in #7807,
and then ACL SETUSER reset is inconsistent with default
newly created user which missing sanitize-payload flag.

Same as `off` and `on` these two bits are mutually exclusive,
the default created user needs to have sanitize-payload flag.
Adds USER_FLAG_SANITIZE_PAYLOAD flag to ACLCreateUser.

Note that the bug don't have any real implications,
since the code in rdb.c (rdbLoadObject) checks for
`USER_FLAG_SANITIZE_PAYLOAD_SKIP`, so the fact that
`USER_FLAG_SANITIZE_PAYLOAD` is missing doesn't really matters.

Added tests to make sure it won't be broken in the future,
and updated the comment in ACLSetUser and redis.conf
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
…edis#11279)

Starting from 6.2, after ACL SETUSER user reset, the user
will carry the sanitize-payload flag. It was added in redis#7807,
and then ACL SETUSER reset is inconsistent with default
newly created user which missing sanitize-payload flag.

Same as `off` and `on` these two bits are mutually exclusive,
the default created user needs to have sanitize-payload flag.
Adds USER_FLAG_SANITIZE_PAYLOAD flag to ACLCreateUser.

Note that the bug don't have any real implications,
since the code in rdb.c (rdbLoadObject) checks for
`USER_FLAG_SANITIZE_PAYLOAD_SKIP`, so the fact that
`USER_FLAG_SANITIZE_PAYLOAD` is missing doesn't really matters.

Added tests to make sure it won't be broken in the future,
and updated the comment in ACLSetUser and redis.conf
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
…edis#11279)

Starting from 6.2, after ACL SETUSER user reset, the user
will carry the sanitize-payload flag. It was added in redis#7807,
and then ACL SETUSER reset is inconsistent with default
newly created user which missing sanitize-payload flag.

Same as `off` and `on` these two bits are mutually exclusive,
the default created user needs to have sanitize-payload flag.
Adds USER_FLAG_SANITIZE_PAYLOAD flag to ACLCreateUser.

Note that the bug don't have any real implications,
since the code in rdb.c (rdbLoadObject) checks for
`USER_FLAG_SANITIZE_PAYLOAD_SKIP`, so the fact that
`USER_FLAG_SANITIZE_PAYLOAD` is missing doesn't really matters.

Added tests to make sure it won't be broken in the future,
and updated the comment in ACLSetUser and redis.conf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten

Projects

None yet

4 participants