-
Notifications
You must be signed in to change notification settings - Fork 24.5k
Description
I've been doing some updates to a set of lua scripts we use, and in the process I've been able to repeatably get malloc errors during my unit tests. This is on Mac 10.7.5 using gcc 4.4.6. When I first encountered this I was using an older version but the latest 2.6.12 release also evinces this issue. The details of these versions:
Redis server v=2.5.7 sha=7c5d96d9:0 malloc=libc bits=64
Redis server v=2.6.12 sha=00000000:0 malloc=libc bits=64
I've sent a link to the core dump to Antirez, but I figured I should post the description here.
The backtrace:
#0 0x00007fff85080ce2 in __pthread_kill ()
#1 0x00007fff88d197d2 in pthread_kill ()
#2 0x00007fff88d0aa7a in abort ()
#3 0x00007fff88d2c4ac in szone_error ()
#4 0x00007fff88d2c4e8 in free_list_checksum_botch ()
#5 0x00007fff88d3353e in tiny_malloc_from_free_list ()
#6 0x00007fff88d3400e in szone_malloc_should_clear ()
#7 0x00007fff88d35972 in szone_realloc ()
#8 0x00007fff88d69243 in malloc_zone_realloc ()
#9 0x00007fff88d6a032 in realloc ()
#10 0x000000010000a08b in zrealloc (ptr=0x102800430, size=561) at zmalloc.c:159
#11 0x0000000100008fff in sdscatlen () at sds.c:107
#12 0x0000000100033016 in luaRedisGenericCommand () at scripting.c:299
#13 0x00000001000470f2 in luaD_precall ()
#14 0x000000010005278a in luaV_execute ()
#15 0x00000001000475ed in luaD_call ()
#16 0x0000000100046c57 in luaD_rawrunprotected ()
#17 0x0000000100046ccf in luaD_pcall ()
#18 0x00000001000421c4 in lua_pcall ()
#19 0x00000001000346db in evalGenericCommand () at scripting.c:872
#20 0x00000001000068ab in call () at redis.c:1589
#21 0x0000000100006dfb in processCommand () at redis.c:1764
#22 0x000000010001164c in processInputBuffer () at networking.c:1013
#23 0x000000010000f064 in readQueryFromClient (el=<value temporarily unavailable, due to optimizations>, fd=<value temporarily unavailable, due to optimizations>, privdata=<value temporarily unavailable, due to optimizations>, mask=<value temporarily unavailable, due to optimizations>) at networking.c:1076
#24 0x0000000100001845 in aeProcessEvents () at ae.c:382
#25 0x0000000100001b1b in aeMain (eventLoop=0x100082c98) at ae.c:425
#26 0x0000000100008b6e in main (argc=<value temporarily unavailable, due to optimizations>, argv=<value temporarily unavailable, due to optimizations>) at redis.c:2711
And the register info:
rax 0x0 0
rbx 0x6 6
rcx 0x7fff5fbfecf8 140734799801592
rdx 0x0 0
rsi 0x6 6
rdi 0x1307 4871
rbp 0x7fff5fbfed20 0x7fff5fbfed20
rsp 0x7fff5fbfecf8 0x7fff5fbfecf8
r8 0x7fff74bb6fb8 140735151828920
r9 0x0 0
r10 0x7fff85080d0a 140735425285386
r11 0xffffff80002dad60 -549752820384
r12 0x1000c3000 4295766016
r13 0x1000f6000 4295974912
r14 0x7fff74bb9960 140735151839584
r15 0x1000f60c0 4295975104
rip 0x7fff85080ce2 0x7fff85080ce2 <__pthread_kill+10>
eflags 0x246 582
cs 0x7 7
ss 0x0 0
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
Like I said, I've been able to reproducibly evoke this behavior from unit tests, though mysteriously I've not been able to reproduce it any other way. For instance, I tried capturing the commands issues from the lua script using monitor and then replying them through redis-cli. That approach didn't work (I imagine either because the breaking command didn't make it through, or that the bug specific to the lua code).
The next thing I tried was to replay to evalsha requests I'd recorded and I was still unable to reproduce the issue. Very mysterious, indeed. Lastly, I even tried pulling out both the Lua code and its invocation into its own lua script but again this proved less than fruitful.
I've found three separate chunks of code in my scripts that I can comment out and prevent the malloc error, though they seem to be completely unrelated and I imagine they are not necessarily meaningful.
Until now, we've kept about a dozen discrete lua scripts to implement the core functionality of this particular library but we're trying to pull together related pieces of code into Lua classes so as to reduce repeated code across the scripts. It's been going well so far, until I began to move the code from put.lua into the unified script. It was at this point that problems arose.
The repos in question are:
- The python bindings: qless-py
- The core lua scripts: qless-core
The core scripts are a submodule of the python bindings, and the only dependency should be redis-py. With this checked out, I've been able to repeatably get this to occur with a fresh empty redis instance by invoking:
# From within the qless-py repo
./test.py TestFail.test_complete_failedIt doesn't always break on the first invocation, usually does within the first few runs, and 10 consecutive runs has thus far guaranteed evincing this behavior.