-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Added Errorstats info section to enable keeping track of the different errors that occur within Redis; Added failed_calls to commandstats #8217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@filipecosta90 thanks for this PR.
i added a few comments about the implementation, but maybe we should first discuss the key features here.
as we've established, there are generally two groups of errors, ones that are handled in processCommand and ones that happen inside the command code itself.
do we intend to cover only the first group? or also the second?
If we intend to extend this feature to match other types of errors, it's probably wrong to do an if-else chain of calls to strncmp, and then a lookup in a dict (or an enum lookup into an array).
maybe we should have a rax in which we can search by prefix, and implicitly support all type of errors.
i think it would be nice to examine this feature after adding a few more errors as an example (like OOM, WRONGTYPE, NOSCRIPT, LOADING, BUSY, BUSYKEY.
one thing we'll soon discover is that many of these are being used with a simple addReply (so we don't even have them on our radar).
|
Also related to Oran's top level comment, we don't always use "addReplyError*" for all of the errors, we need to go through and audit them. |
@oranagra @madolson, following what you've mentioned I've updated the code to use the Error Prefix ( The first word after the "-", up to the first space ) as the rax key (lowercased). In that manner we have the desired implicit support for all errors. Can you guys check the new commit. Sample output of errorstats now: |
|
@filipecosta90 looking at the example in your last post (didn't read the code yet), i think these two cases should fall into the standard not if we think it is important for redis to distinguish between them, then it's also important for client libraries, and we should change the code and promote them to have their own error code. |
d479b61 to
36e1317
Compare
@oranagra agree. In the latest code push I assume that if no I've also removed the error name from the error struct and I'm using the error prefix as is ( meaning no lowercase ). I believe all comments have been address. wdyt? |
36e1317 to
e8af019
Compare
src/server.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noting that this section is not included by default. i guess that's good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oranagra given this is so "cheap" to compute and potentially a very usefull info I've added it as default. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see the only section that's currently not part of the default is the commandstats, i don't suppose it's a matter of being cheap to generate (other parts are possibly more demanding), i think it's also about the verbosity and usefulness.
i suppose that in practice we'll never see here more than some 4 errors, but i'm not sure i'd say it's more useful than the commands stats.
let's leave this question open for later discussion.
… Added failed_calls to commandstats
9a90be5 to
e103af0
Compare
|
@filipecosta90 it looks like you added a bunch of commits, and a rebase/merge from unstable, and then squashed it all, and i can't seem to be able to review the changes easily. i'm ok with amending a commit and force pushing it (github let's me view the diffs), or to add more and more commits which will then be squashed when the PR is merged. |
|
@oranagra Did you look at the code where we are temporarily attaching a flag to a command? That doesn't seem like a very maintainable assumption to make. |
|
@madolson i agree. i'm still editing my CR post when i'm suggesting a better alternative. |
Properly throw errors for invalid replication stream and support redis#8217
…ommandstats (redis#8217) This Commit pushes forward the observability on overall error statistics and command statistics within redis-server: It extends INFO COMMANDSTATS to have - failed_calls in - so we can keep track of errors that happen from the command itself, broken by command. - rejected_calls - so we can keep track of errors that were triggered outside the commmand processing per se Adds a new section to INFO, named ERRORSTATS that enables keeping track of the different errors that occur within redis ( within processCommand and call ) based on the reply Error Prefix ( The first word after the "-", up to the first space ). This commit also fixes RM_ReplyWithError so that it can be correctly identified as an error reply.
dddae7721 Temp workaround for redis/redis-io#251 (#1646) 2fd52e72b Add ASKING command (#1642) b9155866a add bitfield_ro command (#1645) de7eeb2cc Update MONITOR command (#1641) df6e3889a Update Lua RESP3 reply format (#1619) be38fe6f1 Add INFO metrics {total,current}_active_defrag_time (#1643) 8ec59f994 Fix grammar in pipelining doc for easier consumption (#1024) 80cd34aff Format Ruby examples (#1136) 12f577959 Add new LMPOP and BLMPOP commands. (#1636) 660170158 Merge pull request #1637 from madolson/numpat-change d9410a6a1 Update clients.json (#1639) 8dbd962e6 update readme: small typo fix (#1634) 1f5754db4 Escape markdown in my_random_value (#1635) e94992395 Clarify wording around number of pubsub numpat 845a28d3e Notes on manual failover to a new (just upgraded) replica (#1633) 9a53d7038 Sentinel: rephrase replica-priority paragraph. (#1630) 94a0e4846 Add missing hyperlinks to different sections of the sentinel doc 3554566ce Add SORT_RO documentation (#1621) 46db91367 Update active Java clients (#1623) dfa8671d3 Sponsor name change (#1624) b7d1ecfae Avoid confusing use of term "pure function" 87ecbfca7 Add current_cow_peak field in INFO command (#1617) f9a75a4b0 Add doc for new options of expire commands (#1613) 2d26a5a9c adds new SINTERCARD and ZINTERCARD (#1610) af282ed4a Fix spelling of acknowledgment (#1618) 1edd7b68a Render lists properly in CLIENT PAUSE doc (#1601) e3d11f00b Fix inconsistent EVALSHA error output (#1616) a25c27b0f Add RedisIMS (If-Modified-Since) module (#1599) 228f8c692 Fix typo (#1606) 10a013653 Clarifies that timeout is in seconds. (#1604) 6ca156269 INFO details about fragmentation (#1608) 9877b6740 Fix line feed in acl cats 5b46fada0 add description of the ACL categories (#1609) a7a331b9d Add 'systemd' to wordlist (#1614) 2320aaeaf Better phrasing and fix typos in streams (#1607) b00eac4cf Add `process_supervised` description at INFO (#1612) 788189c65 Add info metrics total_eviction_exceeded_time and current_eviction_exceeded_time, redis#9031 5ddb3b26b update stars (#1611) 6a9c6310b RPOPLPUSH minor phrasing updates bfc5e2cb7 Add unit to TTL summary (be consistent with PTTL) (#1600) e722a583b Document the INFO field "sentinel_tilt_since_seconds" for Sentinel (#1594) 7282fadc4 GEOSEARCH, GEOSEARCHSTORE more detailed examples (#1597) 5739d443a Change args of BITPOS from [start] [end] to [start [end]] f0a260346 Merge pull request #1223 from redis/bit-group 9924218f0 Kicked off performance analysis guide - on-CPU profiling and tracing (#1437) a5ce7cec5 Cluster: reference redis-cli instead of redis-trib + minor fix (#1592) 593ef6677 SCAN: Address documentation typo on parallel iterations (#1595) 632c46aa5 Add client: redis-async (ocaml) #1593 c7143d636 Update the doc about maxmemory on replicas (#1457) d48c19234 Update time complexity of EXISTS, SPOP (#612) 706170b19 Fix the return of LPOP and RPOP with count + examples (#1591) 7b270c467 Improve wording around expire + transact (#1590) fe2ff16b9 Cleanup erlang clients that are not maintained 28e7d37d4 Add Bash client https://github.com/SomajitDey/redis-client (#1581) 7f7b61ca9 Add redis-interval-module 8cbe3cfc0 Adding ecredis - Erlang Redis Cluster client (#1586) 157443c74 Add Jedis.jl to clients.json as a new Julia client #1584 5d8afc49f Use 'nil' in LPOS command docs (#1571) f851b3593 Add note re TTL getting reset by getset 92b7099cf Rephrase description of XX and NX options of ZADD 16d577371 GEOSEARCH command: Changed wording in a few places. (#1568) d68271df1 Removed wrong space (#1567) ef1709469 Added SWI-Prolog client (#1407) 618cca8f0 Update SET documentation of the NX and GET flags (#1565) 13a022958 Document optionality of PFADD, EVAL, EVALSHA args cb3c9b5fb Merge pull request #1572 from sazzad16/patch-3 19db26df5 Merge pull request #1579 from huangzhw/cluster be82a6f22 Merge pull request #1560 from nmvk/evalro 64d5225df Update PGP key. (#1589) 586b10591 API documentation of EXPIRETIME and PEXPIRETIME commands (#1582) 0146a9e6d slowlog get command supports passing in -1 to get all logs. (#1585) 874d646ee Add missing flag nofailover in "cluster nodes" command output. 9fe1f36dd Add TairHash and TairString modules (#1577) 13bda277d Update client-side-caching.md (#1575) 2865b8e96 ZREMRANGEBYLEX : Replace "return" with "remove" (#1576) d4b50349f Fix typo (#1574) 23a9ab1b3 MEMORY USAGE command may return 'nil' af4e30177 add godis client (#1566) 6fca48493 Little tweak on INCR rate limiter 1 (#1516) 29fe06d83 Specifies an error is returned when xadd fails (#1495) df4eda7ce Add Red to tools.json (#1558) 0e39ef9ec Escapes Redis command names in bitfield's syntax b2fc5c53c Update bitfield.md (#1553) 3f34db95a SPOP with COUNT returns empty array (not nil) (#1559) a4fa6557a GEOSEARCHSTORE: Remove WITHCOORD, WITHDIST and WITHHASH (#1561) eea6e9d1c Apply suggestions from code review a24651db2 Update to Commands.json 328df19b3 Address PR comments e42581151 EVAL_RO and EVALSHA_RO documentation 42ccc962f update module api ref again (#1557) 71490b714 Update notifications documentation with module key type event notifications (#1548) 437b16e63 Update module api reference (#1556) 6e749f1fb Updated complexity of BRPOP and BLPOP commands. (#1554) 189ecd19e Fix sentinel failover delay to 2 * failover-timeout (#1555) 3ca7ad000 Add more details about configuration format (#1549) c79195847 Remove cli tag from XACK (#1552) ecb0ab336 Update persistence.md (#1550) 29f18a45d update module api doc for 6.2.2 (#1546) 4e2bdf592 Minor spelling / phrasing update. (#1544) 4953e008b Fix LIMIT index error in ZRANGEBYLEX command of indexes.md. (#1542) 48bb84e86 Tiny grammatical change. (#1543) 136bb9ea3 Add documentation for Lua behavior under low memory conditions. (#1533) 5295c398f Fix wrong order of geospatial tuple (#1541) 18e360252 Fix GEOPOS time complexity (#1540) 4514130e0 Update set.md dbd2b8a0f Stream command cleanups (#1537) 1f28de1ea Update modules.json (#1538) 7f42ec7a6 Adding a link to redicrypt module (#1536) 3493e18c3 Fix RESET command misleading typo and add CLIENT KILL LADDR. (#1535) ed0b3f694 Added Jedis as an example of client side partitioning (#1534) 400b13f9a Fix incorrect references to SYNC command. (#1531) 18ae06bb2 Adding Redis Interval Sets to the redis.io modules (#1532) 89b88883e Update sponsors.md 8792b6250 Specify STRALGO LCS syntax more accurately. (#1529) a61380b17 Client and tools cleanups (#1530) 64a0d67c3 Update xpending.md f1d54a691 module api changes for 6.2.1 (#1526) 7870ef7bb Persistence: Define acronyms RDB and AOF (#1524) 539dd6c06 Fix minor phrasing / grammar issue (#1525) e0528232f update module api ref (#1523) 8aa3e63c5 grammar changes (#1520) 2cfe75876 update INFO doc for current_fork_perc, current_save_keys_processed, c… (#1518) 08330f388 Fix typo (#1517) b8434ef82 Add a warning about Sentinel and hostnames. (#1513) 260ee0b73 small typo (#1515) b702dd26d admin.md: madvise for transparent_hugepage (#841) eda70b134 Updates tools.json (#1514) 78e0f7ac5 Cleans clients.json (#1512) 4aa0c7c1a Adds aspell dependency to PR action 252a588e4 clients.json: marked node-redis as inactive (#1511) ac7fcd41e Updates clients.json ecbe17343 Update clients.json (#1510) e46f14628 add ltoddy's rust redis client into doc (#1509) 89970b305 Add C# client FreeRedis (#1434) a2fefb86a Revamps release process description (#1491) 77d54696e improve GETDEL description (#1508) cac4c937c Update getex.md 3c81ab6ef Update hrandfield.md 99cf22d4c module up update for 6.2 RC3 (#1507) 956e1dd9e Updates for Sentinel in 6.2. (#1506) f1db48664 Documents the FAILOVER command (#1502) 0923d9e7e Update lazyfree-lazy-user-flush doc (#1493) 47c4105d6 Adds HRANDMEMBER and ZRANDMEMBER, revamps SRANDMEMBER (#1504) 90fccf9f6 Adding redis-py-cluster to the list of clients. (#1505) 3672323bd GETEX, GETDEL and SET PXAT/EXAT Documentation (#1501) ff1f4bf9a Fix: Removed tonumber from the Lua script (#1496) 0536b897b Apply module command cleanup from redis/8344 (#1499) 3a73cc2d6 Apply module command cleanup from redis/8344 (#1498) 5a336ba5e apply module command cleanup from redis/8344 (#1497) 7b40968c7 apply module command cleanup from redis/8344 (#1497) bd829f2e8 update module api formatting and missing functions (#1494) 6acf74c4e Update zrangestore.md 7b9874cee Adds complexity to SWAPDB (#1484) da92d05e9 update module api for 6.2 RC2 (#1492) 926bc64f1 Update set.md (#1490) 01af47f41 GEOSEARCH ANY argument (#1480) 70760fc30 docs for new MINID trim strategy (#1488) 1e8355fe0 Fixed consistency issue with history versions (#1489) 78a3753fd Documention for client pause (#1487) 70c16cffe Updated documentation about BCAST (#1486) 53ead8b58 update INFO doc for current_cow_size field (#1478) 361b855a2 New ZRANGESTORE and ZRANGE new args, deprecate others (#1482) f256d308b XAUTOCLAIM docs (#1475) 60b5f4bcb new styling guides (#1485) fd2f0fa52 XREADGROUP example and reply (#1469) 10b306f4c Update clients.json (#1477) 63966e54f hedis now supports cluster (#1483) 31c5e3e6c Rename Presto SQL to Trino (#1476) 40933ff8f Adds CH/NX/XX options to GEOADD (#1474) 090a5fe57 Added documentation regarding error statistics (redis/redis#8217) (#1472) d1a28e7b2 add command deprecatoin note for GEORADIUS (#1481) a55223bc7 Documents noargs HELLO variant (#1471) 9f43155fe add reauthenticate and deauthenticate to workdlist 54a440872 Update reset.md a8278c441 Update reset.md d60438a09 Documents CLIENT TRACKINGINFO (#1459) 2042e052c Adds count to L/RPOP (#1464) 33bc5ddcf Abide current HELLO subcommand (#1381) bc5c60596 Documents '--cluster-yes' and a couple of env vars (#1435) 0629c3a5c Clarify the usage of maxlen with a value of 0 (#1436) 616eab455 change port 7002 after redirection (#1468) 7dc7474bb Revert "Use a static example for client info" 056315147 Use a static example for client info 43f18c80b update module docs for 6.2 RC1 (#1466) 0ee00bbc9 Add GEOSEARCH/GEOSEARCHSTORE (#1465) ed8571b22 Documented redis#8132: Redis main thread cpu time, and scrape system time (#1455) 248dce42a Add description for total_forks in INFO STATS (#1461) f3caf86cf Copy edit intro (#1463) dbd555ab4 fix twitter account to g_korland (#1460) c92e61613 Documents XPENDING's exclusive ranges (#1453) b6251621d Add Pottery Redlock implementation (#1458) 7ec457eca Typo fix (#1456) 05247b6d0 Adds CLIENT INFO and CLIENT LIST ID (#1451) 940ce878d Clients: Add forks of C and Erlang clients and mark old as inactive (#1452) a65ca12ac Adds exclusive ranges to XRANGE/XREVRANGE (#1444) 0521a4dbf Documents ACL Pub/Sub (#1448) 871c84235 Adds names to blocks (#1449) 899df8d8a Adds nested arguments (#1443) 4772fa991 Add asynchronous Redlock PHP implementation (#1445) 9220d0235 Jedis is moved under Redis (#1447) e18dcb675 Fix XPENDING IDLE (#1446) e382212fa Add handy-redis to list of clients (#1416) 07a0cee32 Adds the COPY command (#1439) 7d664e59d Docs for XPENDING IDLE (#1442) ca01ef14d Add documentation for the new zdiff/zdiffstore commands (#1428) 76e46bff4 Stream intro doc cleanups (#1433) 7b6de45a4 add missing user information in client list (#1441) edbe192ba update client list command for including bcast and redir information (#1440) 751fe6501 Adds empty authors to Presto 895548224 Add Presto Redis connector (#1430) c62a515eb Adds NewLife.Redis C# client (#1431) 4787687dc Typo fix: entires -> entries (#1432) e3daa2f95 Updates INFO fields about diskless loading progress and maxclients (#1427) 08904d273 Adds the RESET command (#1426) 1b594c6cf Adds clarifications about Pub/Sub replies in a Redis Cluster (#1425) 7a03ea3ed Update latency-reset.md 8a0d21a22 Update cluster-flushslots.md a0a330b73 Updated CLIENT LIST and KILL doc to include laddr (#1424) 8d4bf9bc4 Update module api before releasing 6.0.9 (#1423) 8bb78b4c2 Some cleanups to streams doc (#1422) c387a8f0c Updates to Sentinel doc (#1417) e0fb54518 Renames bit group to bitmap to conform to internals 207e09f20 Moves some commands to a new 'bit' category git-subtree-dir: docs/redis-doc git-subtree-split: dddae77217aa8d0e0d1d00cd14993032156aa705
93a2afb33 Escapes Redis commands 099aae476 Escapes keywords that are also commands dddae7721 Temp workaround for redis/redis-io#251 (#1646) 2fd52e72b Add ASKING command (#1642) b9155866a add bitfield_ro command (#1645) de7eeb2cc Update MONITOR command (#1641) df6e3889a Update Lua RESP3 reply format (#1619) be38fe6f1 Add INFO metrics {total,current}_active_defrag_time (#1643) 8ec59f994 Fix grammar in pipelining doc for easier consumption (#1024) 80cd34aff Format Ruby examples (#1136) 12f577959 Add new LMPOP and BLMPOP commands. (#1636) 660170158 Merge pull request #1637 from madolson/numpat-change d9410a6a1 Update clients.json (#1639) 8dbd962e6 update readme: small typo fix (#1634) 1f5754db4 Escape markdown in my_random_value (#1635) e94992395 Clarify wording around number of pubsub numpat 845a28d3e Notes on manual failover to a new (just upgraded) replica (#1633) 9a53d7038 Sentinel: rephrase replica-priority paragraph. (#1630) 94a0e4846 Add missing hyperlinks to different sections of the sentinel doc 3554566ce Add SORT_RO documentation (#1621) 46db91367 Update active Java clients (#1623) dfa8671d3 Sponsor name change (#1624) b7d1ecfae Avoid confusing use of term "pure function" 87ecbfca7 Add current_cow_peak field in INFO command (#1617) f9a75a4b0 Add doc for new options of expire commands (#1613) 2d26a5a9c adds new SINTERCARD and ZINTERCARD (#1610) af282ed4a Fix spelling of acknowledgment (#1618) 1edd7b68a Render lists properly in CLIENT PAUSE doc (#1601) e3d11f00b Fix inconsistent EVALSHA error output (#1616) a25c27b0f Add RedisIMS (If-Modified-Since) module (#1599) 228f8c692 Fix typo (#1606) 10a013653 Clarifies that timeout is in seconds. (#1604) 6ca156269 INFO details about fragmentation (#1608) 9877b6740 Fix line feed in acl cats 5b46fada0 add description of the ACL categories (#1609) a7a331b9d Add 'systemd' to wordlist (#1614) 2320aaeaf Better phrasing and fix typos in streams (#1607) b00eac4cf Add `process_supervised` description at INFO (#1612) 788189c65 Add info metrics total_eviction_exceeded_time and current_eviction_exceeded_time, redis#9031 5ddb3b26b update stars (#1611) 6a9c6310b RPOPLPUSH minor phrasing updates bfc5e2cb7 Add unit to TTL summary (be consistent with PTTL) (#1600) e722a583b Document the INFO field "sentinel_tilt_since_seconds" for Sentinel (#1594) 7282fadc4 GEOSEARCH, GEOSEARCHSTORE more detailed examples (#1597) 5739d443a Change args of BITPOS from [start] [end] to [start [end]] f0a260346 Merge pull request #1223 from redis/bit-group 9924218f0 Kicked off performance analysis guide - on-CPU profiling and tracing (#1437) a5ce7cec5 Cluster: reference redis-cli instead of redis-trib + minor fix (#1592) 593ef6677 SCAN: Address documentation typo on parallel iterations (#1595) 632c46aa5 Add client: redis-async (ocaml) #1593 c7143d636 Update the doc about maxmemory on replicas (#1457) d48c19234 Update time complexity of EXISTS, SPOP (#612) 706170b19 Fix the return of LPOP and RPOP with count + examples (#1591) 7b270c467 Improve wording around expire + transact (#1590) fe2ff16b9 Cleanup erlang clients that are not maintained 28e7d37d4 Add Bash client https://github.com/SomajitDey/redis-client (#1581) 7f7b61ca9 Add redis-interval-module 8cbe3cfc0 Adding ecredis - Erlang Redis Cluster client (#1586) 157443c74 Add Jedis.jl to clients.json as a new Julia client #1584 5d8afc49f Use 'nil' in LPOS command docs (#1571) f851b3593 Add note re TTL getting reset by getset 92b7099cf Rephrase description of XX and NX options of ZADD 16d577371 GEOSEARCH command: Changed wording in a few places. (#1568) d68271df1 Removed wrong space (#1567) ef1709469 Added SWI-Prolog client (#1407) 618cca8f0 Update SET documentation of the NX and GET flags (#1565) 13a022958 Document optionality of PFADD, EVAL, EVALSHA args cb3c9b5fb Merge pull request #1572 from sazzad16/patch-3 19db26df5 Merge pull request #1579 from huangzhw/cluster be82a6f22 Merge pull request #1560 from nmvk/evalro 64d5225df Update PGP key. (#1589) 586b10591 API documentation of EXPIRETIME and PEXPIRETIME commands (#1582) 0146a9e6d slowlog get command supports passing in -1 to get all logs. (#1585) 874d646ee Add missing flag nofailover in "cluster nodes" command output. 9fe1f36dd Add TairHash and TairString modules (#1577) 13bda277d Update client-side-caching.md (#1575) 2865b8e96 ZREMRANGEBYLEX : Replace "return" with "remove" (#1576) d4b50349f Fix typo (#1574) 23a9ab1b3 MEMORY USAGE command may return 'nil' af4e30177 add godis client (#1566) 6fca48493 Little tweak on INCR rate limiter 1 (#1516) 29fe06d83 Specifies an error is returned when xadd fails (#1495) df4eda7ce Add Red to tools.json (#1558) 0e39ef9ec Escapes Redis command names in bitfield's syntax b2fc5c53c Update bitfield.md (#1553) 3f34db95a SPOP with COUNT returns empty array (not nil) (#1559) a4fa6557a GEOSEARCHSTORE: Remove WITHCOORD, WITHDIST and WITHHASH (#1561) eea6e9d1c Apply suggestions from code review a24651db2 Update to Commands.json 328df19b3 Address PR comments e42581151 EVAL_RO and EVALSHA_RO documentation 42ccc962f update module api ref again (#1557) 71490b714 Update notifications documentation with module key type event notifications (#1548) 437b16e63 Update module api reference (#1556) 6e749f1fb Updated complexity of BRPOP and BLPOP commands. (#1554) 189ecd19e Fix sentinel failover delay to 2 * failover-timeout (#1555) 3ca7ad000 Add more details about configuration format (#1549) c79195847 Remove cli tag from XACK (#1552) ecb0ab336 Update persistence.md (#1550) 29f18a45d update module api doc for 6.2.2 (#1546) 4e2bdf592 Minor spelling / phrasing update. (#1544) 4953e008b Fix LIMIT index error in ZRANGEBYLEX command of indexes.md. (#1542) 48bb84e86 Tiny grammatical change. (#1543) 136bb9ea3 Add documentation for Lua behavior under low memory conditions. (#1533) 5295c398f Fix wrong order of geospatial tuple (#1541) 18e360252 Fix GEOPOS time complexity (#1540) 4514130e0 Update set.md dbd2b8a0f Stream command cleanups (#1537) 1f28de1ea Update modules.json (#1538) 7f42ec7a6 Adding a link to redicrypt module (#1536) 3493e18c3 Fix RESET command misleading typo and add CLIENT KILL LADDR. (#1535) ed0b3f694 Added Jedis as an example of client side partitioning (#1534) 400b13f9a Fix incorrect references to SYNC command. (#1531) 18ae06bb2 Adding Redis Interval Sets to the redis.io modules (#1532) 89b88883e Update sponsors.md 8792b6250 Specify STRALGO LCS syntax more accurately. (#1529) a61380b17 Client and tools cleanups (#1530) 64a0d67c3 Update xpending.md f1d54a691 module api changes for 6.2.1 (#1526) 7870ef7bb Persistence: Define acronyms RDB and AOF (#1524) 539dd6c06 Fix minor phrasing / grammar issue (#1525) e0528232f update module api ref (#1523) 8aa3e63c5 grammar changes (#1520) 2cfe75876 update INFO doc for current_fork_perc, current_save_keys_processed, c… (#1518) 08330f388 Fix typo (#1517) b8434ef82 Add a warning about Sentinel and hostnames. (#1513) 260ee0b73 small typo (#1515) b702dd26d admin.md: madvise for transparent_hugepage (#841) eda70b134 Updates tools.json (#1514) 78e0f7ac5 Cleans clients.json (#1512) 4aa0c7c1a Adds aspell dependency to PR action 252a588e4 clients.json: marked node-redis as inactive (#1511) ac7fcd41e Updates clients.json ecbe17343 Update clients.json (#1510) e46f14628 add ltoddy's rust redis client into doc (#1509) 89970b305 Add C# client FreeRedis (#1434) a2fefb86a Revamps release process description (#1491) 77d54696e improve GETDEL description (#1508) cac4c937c Update getex.md 3c81ab6ef Update hrandfield.md 99cf22d4c module up update for 6.2 RC3 (#1507) 956e1dd9e Updates for Sentinel in 6.2. (#1506) f1db48664 Documents the FAILOVER command (#1502) 0923d9e7e Update lazyfree-lazy-user-flush doc (#1493) 47c4105d6 Adds HRANDMEMBER and ZRANDMEMBER, revamps SRANDMEMBER (#1504) 90fccf9f6 Adding redis-py-cluster to the list of clients. (#1505) 3672323bd GETEX, GETDEL and SET PXAT/EXAT Documentation (#1501) ff1f4bf9a Fix: Removed tonumber from the Lua script (#1496) 0536b897b Apply module command cleanup from redis/8344 (#1499) 3a73cc2d6 Apply module command cleanup from redis/8344 (#1498) 5a336ba5e apply module command cleanup from redis/8344 (#1497) 7b40968c7 apply module command cleanup from redis/8344 (#1497) bd829f2e8 update module api formatting and missing functions (#1494) 6acf74c4e Update zrangestore.md 7b9874cee Adds complexity to SWAPDB (#1484) da92d05e9 update module api for 6.2 RC2 (#1492) 926bc64f1 Update set.md (#1490) 01af47f41 GEOSEARCH ANY argument (#1480) 70760fc30 docs for new MINID trim strategy (#1488) 1e8355fe0 Fixed consistency issue with history versions (#1489) 78a3753fd Documention for client pause (#1487) 70c16cffe Updated documentation about BCAST (#1486) 53ead8b58 update INFO doc for current_cow_size field (#1478) 361b855a2 New ZRANGESTORE and ZRANGE new args, deprecate others (#1482) f256d308b XAUTOCLAIM docs (#1475) 60b5f4bcb new styling guides (#1485) fd2f0fa52 XREADGROUP example and reply (#1469) 10b306f4c Update clients.json (#1477) 63966e54f hedis now supports cluster (#1483) 31c5e3e6c Rename Presto SQL to Trino (#1476) 40933ff8f Adds CH/NX/XX options to GEOADD (#1474) 090a5fe57 Added documentation regarding error statistics (redis/redis#8217) (#1472) d1a28e7b2 add command deprecatoin note for GEORADIUS (#1481) a55223bc7 Documents noargs HELLO variant (#1471) 9f43155fe add reauthenticate and deauthenticate to workdlist 54a440872 Update reset.md a8278c441 Update reset.md d60438a09 Documents CLIENT TRACKINGINFO (#1459) 2042e052c Adds count to L/RPOP (#1464) 33bc5ddcf Abide current HELLO subcommand (#1381) bc5c60596 Documents '--cluster-yes' and a couple of env vars (#1435) 0629c3a5c Clarify the usage of maxlen with a value of 0 (#1436) 616eab455 change port 7002 after redirection (#1468) 7dc7474bb Revert "Use a static example for client info" 056315147 Use a static example for client info 43f18c80b update module docs for 6.2 RC1 (#1466) 0ee00bbc9 Add GEOSEARCH/GEOSEARCHSTORE (#1465) ed8571b22 Documented redis#8132: Redis main thread cpu time, and scrape system time (#1455) 248dce42a Add description for total_forks in INFO STATS (#1461) f3caf86cf Copy edit intro (#1463) dbd555ab4 fix twitter account to g_korland (#1460) c92e61613 Documents XPENDING's exclusive ranges (#1453) b6251621d Add Pottery Redlock implementation (#1458) 7ec457eca Typo fix (#1456) 05247b6d0 Adds CLIENT INFO and CLIENT LIST ID (#1451) 940ce878d Clients: Add forks of C and Erlang clients and mark old as inactive (#1452) a65ca12ac Adds exclusive ranges to XRANGE/XREVRANGE (#1444) 0521a4dbf Documents ACL Pub/Sub (#1448) 871c84235 Adds names to blocks (#1449) 899df8d8a Adds nested arguments (#1443) 4772fa991 Add asynchronous Redlock PHP implementation (#1445) 9220d0235 Jedis is moved under Redis (#1447) e18dcb675 Fix XPENDING IDLE (#1446) e382212fa Add handy-redis to list of clients (#1416) 07a0cee32 Adds the COPY command (#1439) 7d664e59d Docs for XPENDING IDLE (#1442) ca01ef14d Add documentation for the new zdiff/zdiffstore commands (#1428) 76e46bff4 Stream intro doc cleanups (#1433) 7b6de45a4 add missing user information in client list (#1441) edbe192ba update client list command for including bcast and redir information (#1440) 751fe6501 Adds empty authors to Presto 895548224 Add Presto Redis connector (#1430) c62a515eb Adds NewLife.Redis C# client (#1431) 4787687dc Typo fix: entires -> entries (#1432) e3daa2f95 Updates INFO fields about diskless loading progress and maxclients (#1427) 08904d273 Adds the RESET command (#1426) 1b594c6cf Adds clarifications about Pub/Sub replies in a Redis Cluster (#1425) 7a03ea3ed Update latency-reset.md 8a0d21a22 Update cluster-flushslots.md a0a330b73 Updated CLIENT LIST and KILL doc to include laddr (#1424) 8d4bf9bc4 Update module api before releasing 6.0.9 (#1423) 8bb78b4c2 Some cleanups to streams doc (#1422) c387a8f0c Updates to Sentinel doc (#1417) e0fb54518 Renames bit group to bitmap to conform to internals 207e09f20 Moves some commands to a new 'bit' category git-subtree-dir: docs/redis-doc git-subtree-split: 93a2afb336b3456699d755a7c3d7b2ab27298a9a
…12817) Fix #12785 and other race condition issues. See the following isolated comments. The following report was obtained using SANITIZER thread. ```sh make SANITIZER=thread ./runtest-moduleapi --config io-threads 4 --config io-threads-do-reads yes --accurate ``` 1. Fixed thread-safe issue in RM_UnblockClient() Related discussion: #12817 (comment) * When blocking a client in a module using `RM_BlockClientOnKeys()` or `RM_BlockClientOnKeysWithFlags()` with a timeout_callback, calling RM_UnblockClient() in module threads can lead to race conditions in `updateStatsOnUnblock()`. - Introduced: Version: 6.2 PR: #7491 - Touch: `server.stat_numcommands`, `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Harm Level: High Potentially corrupts the memory data of `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Solution: Differentiate whether the call to moduleBlockedClientTimedOut() comes from the module or the main thread. Since we can't know if RM_UnblockClient() comes from module threads, we always assume it does and let `updateStatsOnUnblock()` asynchronously update the unblock status. * When error reply is called in timeout_callback(), ctx is not thread-safe, eventually lead to race conditions in `afterErrorReply`. - Introduced: Version: 6.2 PR: #8217 - Touch `server.stat_total_error_replies`, `server.errors`, - Harm Level: High Potentially corrupts the memory data of `server.errors` - Solution: Make the ctx in `timeout_callback()` with `REDISMODULE_CTX_THREAD_SAFE`, and asynchronously reply errors to the client. 2. Made RM_Reply*() family API thread-safe Related discussion: #12817 (comment) Call chain: `RM_Reply*()` -> `_addReplyToBufferOrList()` -> touch server.current_client - Introduced: Version: 7.2.0 PR: #12326 - Harm Level: None Since the module fake client won't have the `CLIENT_PUSHING` flag, even if we touch server.current_client, we can still exit after `c->flags & CLIENT_PUSHING`. - Solution Checking `c->flags & CLIENT_PUSHING` earlier. 3. Made freeClient() thread-safe Fix #12785 - Introduced: Version: 4.0 Commit: 3fcf959 - Harm Level: Moderate * Trigger assertion It happens when the module thread calls freeClient while the io-thread is in progress, which just triggers an assertion, and doesn't make any race condiaions. * Touch `server.current_client`, `server.stat_clients_type_memory`, and `clientMemUsageBucket->clients`. It happens between the main thread and the module threads, may cause data corruption. 1. Error reset `server.current_client` to NULL, but theoretically this won't happen, because the module has already reset `server.current_client` to old value before entering freeClient. 2. corrupts `clientMemUsageBucket->clients` in updateClientMemUsageAndBucket(). 3. Causes server.stat_clients_type_memory memory statistics to be inaccurate. - Solution: * No longer counts memory usage on fake clients, to avoid updating `server.stat_clients_type_memory` in freeClient. * No longer resetting `server.current_client` in unlinkClient, because the fake client won't be evicted or disconnected in the mid of the process. * Judgment assertion `io_threads_op == IO_THREADS_OP_IDLE` only if c is not a fake client. 4. Fixed free client args without GIL Related discussion: #12817 (comment) When freeing retained strings in the module thread (refcount decr), or using them in some way (refcount incr), we should do so while holding the GIL, otherwise, they might be simultaneously freed while the main thread is processing the unblock client state. - Introduced: Version: 6.2.0 PR: #8141 - Harm Level: Low Trigger assertion or double free or memory leak. - Solution: Documenting that module API users need to ensure any access to these retained strings is done with the GIL locked 5. Fix adding fake client to server.clients_pending_write It will incorrectly log the memory usage for the fake client. Related discussion: #12817 (comment) - Introduced: Version: 4.0 Commit: 9b01b64 - Harm Level: None Only result in NOP - Solution: * Don't add fake client into server.clients_pending_write * Add c->conn assertion for updateClientMemUsageAndBucket() and updateClientMemoryUsage() to avoid same issue in the future. So now it will be the responsibility of the caller of both of them to avoid passing in fake client. 6. Fix calling RM_BlockedClientMeasureTimeStart() and RM_BlockedClientMeasureTimeEnd() without GIL - Introduced: Version: 6.2 PR: #7491 - Harm Level: Low Causes inaccuracies in command latency histogram and slow logs, but does not corrupt memory. - Solution: Module API users, if know that non-thread-safe APIs will be used in multi-threading, need to take responsibility for protecting them with their own locks instead of the GIL, as using the GIL is too expensive. ### Other issue 1. RM_Yield is not thread-safe, fixed via #12905. ### Summarize 1. Fix thread-safe issues for `RM_UnblockClient()`, `freeClient()` and `RM_Yield`, potentially preventing memory corruption, data disorder, or assertion. 2. Updated docs and module test to clarify module API users' responsibility for locking non-thread-safe APIs in multi-threading, such as RM_BlockedClientMeasureTimeStart/End(), RM_FreeString(), RM_RetainString(), and RM_HoldString(). ### About backpot to 7.2 1. The implement of (1) is not too satisfying, would like to get more eyes. 2. (2), (3) can be safely for backport 3. (4), (6) just modifying the module tests and updating the documentation, no need for a backpot. 4. (5) is harmless, no need for a backpot. --------- Co-authored-by: Oran Agra <oran@redislabs.com>
…edis#12817) Fix redis#12785 and other race condition issues. See the following isolated comments. The following report was obtained using SANITIZER thread. ```sh make SANITIZER=thread ./runtest-moduleapi --config io-threads 4 --config io-threads-do-reads yes --accurate ``` 1. Fixed thread-safe issue in RM_UnblockClient() Related discussion: redis#12817 (comment) * When blocking a client in a module using `RM_BlockClientOnKeys()` or `RM_BlockClientOnKeysWithFlags()` with a timeout_callback, calling RM_UnblockClient() in module threads can lead to race conditions in `updateStatsOnUnblock()`. - Introduced: Version: 6.2 PR: redis#7491 - Touch: `server.stat_numcommands`, `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Harm Level: High Potentially corrupts the memory data of `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Solution: Differentiate whether the call to moduleBlockedClientTimedOut() comes from the module or the main thread. Since we can't know if RM_UnblockClient() comes from module threads, we always assume it does and let `updateStatsOnUnblock()` asynchronously update the unblock status. * When error reply is called in timeout_callback(), ctx is not thread-safe, eventually lead to race conditions in `afterErrorReply`. - Introduced: Version: 6.2 PR: redis#8217 - Touch `server.stat_total_error_replies`, `server.errors`, - Harm Level: High Potentially corrupts the memory data of `server.errors` - Solution: Make the ctx in `timeout_callback()` with `REDISMODULE_CTX_THREAD_SAFE`, and asynchronously reply errors to the client. 2. Made RM_Reply*() family API thread-safe Related discussion: redis#12817 (comment) Call chain: `RM_Reply*()` -> `_addReplyToBufferOrList()` -> touch server.current_client - Introduced: Version: 7.2.0 PR: redis#12326 - Harm Level: None Since the module fake client won't have the `CLIENT_PUSHING` flag, even if we touch server.current_client, we can still exit after `c->flags & CLIENT_PUSHING`. - Solution Checking `c->flags & CLIENT_PUSHING` earlier. 3. Made freeClient() thread-safe Fix redis#12785 - Introduced: Version: 4.0 Commit: redis@3fcf959 - Harm Level: Moderate * Trigger assertion It happens when the module thread calls freeClient while the io-thread is in progress, which just triggers an assertion, and doesn't make any race condiaions. * Touch `server.current_client`, `server.stat_clients_type_memory`, and `clientMemUsageBucket->clients`. It happens between the main thread and the module threads, may cause data corruption. 1. Error reset `server.current_client` to NULL, but theoretically this won't happen, because the module has already reset `server.current_client` to old value before entering freeClient. 2. corrupts `clientMemUsageBucket->clients` in updateClientMemUsageAndBucket(). 3. Causes server.stat_clients_type_memory memory statistics to be inaccurate. - Solution: * No longer counts memory usage on fake clients, to avoid updating `server.stat_clients_type_memory` in freeClient. * No longer resetting `server.current_client` in unlinkClient, because the fake client won't be evicted or disconnected in the mid of the process. * Judgment assertion `io_threads_op == IO_THREADS_OP_IDLE` only if c is not a fake client. 4. Fixed free client args without GIL Related discussion: redis#12817 (comment) When freeing retained strings in the module thread (refcount decr), or using them in some way (refcount incr), we should do so while holding the GIL, otherwise, they might be simultaneously freed while the main thread is processing the unblock client state. - Introduced: Version: 6.2.0 PR: redis#8141 - Harm Level: Low Trigger assertion or double free or memory leak. - Solution: Documenting that module API users need to ensure any access to these retained strings is done with the GIL locked 5. Fix adding fake client to server.clients_pending_write It will incorrectly log the memory usage for the fake client. Related discussion: redis#12817 (comment) - Introduced: Version: 4.0 Commit: redis@9b01b64 - Harm Level: None Only result in NOP - Solution: * Don't add fake client into server.clients_pending_write * Add c->conn assertion for updateClientMemUsageAndBucket() and updateClientMemoryUsage() to avoid same issue in the future. So now it will be the responsibility of the caller of both of them to avoid passing in fake client. 6. Fix calling RM_BlockedClientMeasureTimeStart() and RM_BlockedClientMeasureTimeEnd() without GIL - Introduced: Version: 6.2 PR: redis#7491 - Harm Level: Low Causes inaccuracies in command latency histogram and slow logs, but does not corrupt memory. - Solution: Module API users, if know that non-thread-safe APIs will be used in multi-threading, need to take responsibility for protecting them with their own locks instead of the GIL, as using the GIL is too expensive. ### Other issue 1. RM_Yield is not thread-safe, fixed via redis#12905. ### Summarize 1. Fix thread-safe issues for `RM_UnblockClient()`, `freeClient()` and `RM_Yield`, potentially preventing memory corruption, data disorder, or assertion. 2. Updated docs and module test to clarify module API users' responsibility for locking non-thread-safe APIs in multi-threading, such as RM_BlockedClientMeasureTimeStart/End(), RM_FreeString(), RM_RetainString(), and RM_HoldString(). ### About backpot to 7.2 1. The implement of (1) is not too satisfying, would like to get more eyes. 2. (2), (3) can be safely for backport 3. (4), (6) just modifying the module tests and updating the documentation, no need for a backpot. 4. (5) is harmless, no need for a backpot. --------- Co-authored-by: Oran Agra <oran@redislabs.com>
|
We have encountered such a problem here. The user used such lua code: We can see Is there any way we can control this incorrect usage? |
|
i'm not sure there's anything we can do. |
yean, the only way i thought of is the same. in info errorstats, use raxSize and only part of the data is output to break out of the loop. Note that it may also potentially consume some memory (if it has been accumulated) Or set a parameter to limit the number of Errorstats we save |
|
if we do that, i'd rather stop collecting them, release the memory and permanently disable error stats. not just filter them at reporting time. |
|
yes, i think so too, just.. i can’t describe it very well. ok, what i think is similar to only saving up to 1000 errorstats, and deleting some of them when it exceeds, similar to our earliest repl_scriptcache_fifo, something like that but just some random ideas, just to throw the discussion out there first |
|
i don't think we'd want to delete some. i think we can detect a misuse and completely disable this feature preventing the damage it can cause, and losing the functionality. |
Users who abuse lua error_reply will generate a new error object on each error call, which can make server.errors get bigger and bigger. This will cause the server to block when calling INFO (we also return errorstats by default). To prevent the damage it can cause, when a misuse is detected, we will print a warning log and disable the errorstats to avoid adding more new errors. It can be re-enabled via CONFIG RESETSTAT. Because server.errors may be very large (it may be better now since we have the limit), config resetstat may block for a while. So in resetErrorTableStats, we will try to lazyfree server.errors. See the related discussion at the end of redis#8217.
#13141) Users who abuse lua error_reply will generate a new error object on each error call, which can make server.errors get bigger and bigger. This will cause the server to block when calling INFO (we also return errorstats by default). To prevent the damage it can cause, when a misuse is detected, we will print a warning log and disable the errorstats to avoid adding more new errors. It can be re-enabled via CONFIG RESETSTAT. Because server.errors may be very large (it may be better now since we have the limit), config resetstat may block for a while. So in resetErrorTableStats, we will try to lazyfree server.errors. See the related discussion at the end of #8217.
…edis#12817) Fix redis#12785 and other race condition issues. See the following isolated comments. The following report was obtained using SANITIZER thread. ```sh make SANITIZER=thread ./runtest-moduleapi --config io-threads 4 --config io-threads-do-reads yes --accurate ``` 1. Fixed thread-safe issue in RM_UnblockClient() Related discussion: redis#12817 (comment) * When blocking a client in a module using `RM_BlockClientOnKeys()` or `RM_BlockClientOnKeysWithFlags()` with a timeout_callback, calling RM_UnblockClient() in module threads can lead to race conditions in `updateStatsOnUnblock()`. - Introduced: Version: 6.2 PR: redis#7491 - Touch: `server.stat_numcommands`, `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Harm Level: High Potentially corrupts the memory data of `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Solution: Differentiate whether the call to moduleBlockedClientTimedOut() comes from the module or the main thread. Since we can't know if RM_UnblockClient() comes from module threads, we always assume it does and let `updateStatsOnUnblock()` asynchronously update the unblock status. * When error reply is called in timeout_callback(), ctx is not thread-safe, eventually lead to race conditions in `afterErrorReply`. - Introduced: Version: 6.2 PR: redis#8217 - Touch `server.stat_total_error_replies`, `server.errors`, - Harm Level: High Potentially corrupts the memory data of `server.errors` - Solution: Make the ctx in `timeout_callback()` with `REDISMODULE_CTX_THREAD_SAFE`, and asynchronously reply errors to the client. 2. Made RM_Reply*() family API thread-safe Related discussion: redis#12817 (comment) Call chain: `RM_Reply*()` -> `_addReplyToBufferOrList()` -> touch server.current_client - Introduced: Version: 7.2.0 PR: redis#12326 - Harm Level: None Since the module fake client won't have the `CLIENT_PUSHING` flag, even if we touch server.current_client, we can still exit after `c->flags & CLIENT_PUSHING`. - Solution Checking `c->flags & CLIENT_PUSHING` earlier. 3. Made freeClient() thread-safe Fix redis#12785 - Introduced: Version: 4.0 Commit: redis@3fcf959 - Harm Level: Moderate * Trigger assertion It happens when the module thread calls freeClient while the io-thread is in progress, which just triggers an assertion, and doesn't make any race condiaions. * Touch `server.current_client`, `server.stat_clients_type_memory`, and `clientMemUsageBucket->clients`. It happens between the main thread and the module threads, may cause data corruption. 1. Error reset `server.current_client` to NULL, but theoretically this won't happen, because the module has already reset `server.current_client` to old value before entering freeClient. 2. corrupts `clientMemUsageBucket->clients` in updateClientMemUsageAndBucket(). 3. Causes server.stat_clients_type_memory memory statistics to be inaccurate. - Solution: * No longer counts memory usage on fake clients, to avoid updating `server.stat_clients_type_memory` in freeClient. * No longer resetting `server.current_client` in unlinkClient, because the fake client won't be evicted or disconnected in the mid of the process. * Judgment assertion `io_threads_op == IO_THREADS_OP_IDLE` only if c is not a fake client. 4. Fixed free client args without GIL Related discussion: redis#12817 (comment) When freeing retained strings in the module thread (refcount decr), or using them in some way (refcount incr), we should do so while holding the GIL, otherwise, they might be simultaneously freed while the main thread is processing the unblock client state. - Introduced: Version: 6.2.0 PR: redis#8141 - Harm Level: Low Trigger assertion or double free or memory leak. - Solution: Documenting that module API users need to ensure any access to these retained strings is done with the GIL locked 5. Fix adding fake client to server.clients_pending_write It will incorrectly log the memory usage for the fake client. Related discussion: redis#12817 (comment) - Introduced: Version: 4.0 Commit: redis@9b01b64 - Harm Level: None Only result in NOP - Solution: * Don't add fake client into server.clients_pending_write * Add c->conn assertion for updateClientMemUsageAndBucket() and updateClientMemoryUsage() to avoid same issue in the future. So now it will be the responsibility of the caller of both of them to avoid passing in fake client. 6. Fix calling RM_BlockedClientMeasureTimeStart() and RM_BlockedClientMeasureTimeEnd() without GIL - Introduced: Version: 6.2 PR: redis#7491 - Harm Level: Low Causes inaccuracies in command latency histogram and slow logs, but does not corrupt memory. - Solution: Module API users, if know that non-thread-safe APIs will be used in multi-threading, need to take responsibility for protecting them with their own locks instead of the GIL, as using the GIL is too expensive. 1. RM_Yield is not thread-safe, fixed via redis#12905. 1. Fix thread-safe issues for `RM_UnblockClient()`, `freeClient()` and `RM_Yield`, potentially preventing memory corruption, data disorder, or assertion. 2. Updated docs and module test to clarify module API users' responsibility for locking non-thread-safe APIs in multi-threading, such as RM_BlockedClientMeasureTimeStart/End(), RM_FreeString(), RM_RetainString(), and RM_HoldString(). 1. The implement of (1) is not too satisfying, would like to get more eyes. 2. (2), (3) can be safely for backport 3. (4), (6) just modifying the module tests and updating the documentation, no need for a backpot. 4. (5) is harmless, no need for a backpot. --------- Co-authored-by: Oran Agra <oran@redislabs.com>
…edis#12817) Fix redis#12785 and other race condition issues. See the following isolated comments. The following report was obtained using SANITIZER thread. ```sh make SANITIZER=thread ./runtest-moduleapi --config io-threads 4 --config io-threads-do-reads yes --accurate ``` 1. Fixed thread-safe issue in RM_UnblockClient() Related discussion: redis#12817 (comment) * When blocking a client in a module using `RM_BlockClientOnKeys()` or `RM_BlockClientOnKeysWithFlags()` with a timeout_callback, calling RM_UnblockClient() in module threads can lead to race conditions in `updateStatsOnUnblock()`. - Introduced: Version: 6.2 PR: redis#7491 - Touch: `server.stat_numcommands`, `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Harm Level: High Potentially corrupts the memory data of `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Solution: Differentiate whether the call to moduleBlockedClientTimedOut() comes from the module or the main thread. Since we can't know if RM_UnblockClient() comes from module threads, we always assume it does and let `updateStatsOnUnblock()` asynchronously update the unblock status. * When error reply is called in timeout_callback(), ctx is not thread-safe, eventually lead to race conditions in `afterErrorReply`. - Introduced: Version: 6.2 PR: redis#8217 - Touch `server.stat_total_error_replies`, `server.errors`, - Harm Level: High Potentially corrupts the memory data of `server.errors` - Solution: Make the ctx in `timeout_callback()` with `REDISMODULE_CTX_THREAD_SAFE`, and asynchronously reply errors to the client. 2. Made RM_Reply*() family API thread-safe Related discussion: redis#12817 (comment) Call chain: `RM_Reply*()` -> `_addReplyToBufferOrList()` -> touch server.current_client - Introduced: Version: 7.2.0 PR: redis#12326 - Harm Level: None Since the module fake client won't have the `CLIENT_PUSHING` flag, even if we touch server.current_client, we can still exit after `c->flags & CLIENT_PUSHING`. - Solution Checking `c->flags & CLIENT_PUSHING` earlier. 3. Made freeClient() thread-safe Fix redis#12785 - Introduced: Version: 4.0 Commit: redis@3fcf959 - Harm Level: Moderate * Trigger assertion It happens when the module thread calls freeClient while the io-thread is in progress, which just triggers an assertion, and doesn't make any race condiaions. * Touch `server.current_client`, `server.stat_clients_type_memory`, and `clientMemUsageBucket->clients`. It happens between the main thread and the module threads, may cause data corruption. 1. Error reset `server.current_client` to NULL, but theoretically this won't happen, because the module has already reset `server.current_client` to old value before entering freeClient. 2. corrupts `clientMemUsageBucket->clients` in updateClientMemUsageAndBucket(). 3. Causes server.stat_clients_type_memory memory statistics to be inaccurate. - Solution: * No longer counts memory usage on fake clients, to avoid updating `server.stat_clients_type_memory` in freeClient. * No longer resetting `server.current_client` in unlinkClient, because the fake client won't be evicted or disconnected in the mid of the process. * Judgment assertion `io_threads_op == IO_THREADS_OP_IDLE` only if c is not a fake client. 4. Fixed free client args without GIL Related discussion: redis#12817 (comment) When freeing retained strings in the module thread (refcount decr), or using them in some way (refcount incr), we should do so while holding the GIL, otherwise, they might be simultaneously freed while the main thread is processing the unblock client state. - Introduced: Version: 6.2.0 PR: redis#8141 - Harm Level: Low Trigger assertion or double free or memory leak. - Solution: Documenting that module API users need to ensure any access to these retained strings is done with the GIL locked 5. Fix adding fake client to server.clients_pending_write It will incorrectly log the memory usage for the fake client. Related discussion: redis#12817 (comment) - Introduced: Version: 4.0 Commit: redis@9b01b64 - Harm Level: None Only result in NOP - Solution: * Don't add fake client into server.clients_pending_write * Add c->conn assertion for updateClientMemUsageAndBucket() and updateClientMemoryUsage() to avoid same issue in the future. So now it will be the responsibility of the caller of both of them to avoid passing in fake client. 6. Fix calling RM_BlockedClientMeasureTimeStart() and RM_BlockedClientMeasureTimeEnd() without GIL - Introduced: Version: 6.2 PR: redis#7491 - Harm Level: Low Causes inaccuracies in command latency histogram and slow logs, but does not corrupt memory. - Solution: Module API users, if know that non-thread-safe APIs will be used in multi-threading, need to take responsibility for protecting them with their own locks instead of the GIL, as using the GIL is too expensive. ### Other issue 1. RM_Yield is not thread-safe, fixed via redis#12905. ### Summarize 1. Fix thread-safe issues for `RM_UnblockClient()`, `freeClient()` and `RM_Yield`, potentially preventing memory corruption, data disorder, or assertion. 2. Updated docs and module test to clarify module API users' responsibility for locking non-thread-safe APIs in multi-threading, such as RM_BlockedClientMeasureTimeStart/End(), RM_FreeString(), RM_RetainString(), and RM_HoldString(). ### About backpot to 7.2 1. The implement of (1) is not too satisfying, would like to get more eyes. 2. (2), (3) can be safely for backport 3. (4), (6) just modifying the module tests and updating the documentation, no need for a backpot. 4. (5) is harmless, no need for a backpot. --------- Co-authored-by: Oran Agra <oran@redislabs.com>
…edis#12817) Fix redis#12785 and other race condition issues. See the following isolated comments. The following report was obtained using SANITIZER thread. ```sh make SANITIZER=thread ./runtest-moduleapi --config io-threads 4 --config io-threads-do-reads yes --accurate ``` 1. Fixed thread-safe issue in RM_UnblockClient() Related discussion: redis#12817 (comment) * When blocking a client in a module using `RM_BlockClientOnKeys()` or `RM_BlockClientOnKeysWithFlags()` with a timeout_callback, calling RM_UnblockClient() in module threads can lead to race conditions in `updateStatsOnUnblock()`. - Introduced: Version: 6.2 PR: redis#7491 - Touch: `server.stat_numcommands`, `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Harm Level: High Potentially corrupts the memory data of `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Solution: Differentiate whether the call to moduleBlockedClientTimedOut() comes from the module or the main thread. Since we can't know if RM_UnblockClient() comes from module threads, we always assume it does and let `updateStatsOnUnblock()` asynchronously update the unblock status. * When error reply is called in timeout_callback(), ctx is not thread-safe, eventually lead to race conditions in `afterErrorReply`. - Introduced: Version: 6.2 PR: redis#8217 - Touch `server.stat_total_error_replies`, `server.errors`, - Harm Level: High Potentially corrupts the memory data of `server.errors` - Solution: Make the ctx in `timeout_callback()` with `REDISMODULE_CTX_THREAD_SAFE`, and asynchronously reply errors to the client. 2. Made RM_Reply*() family API thread-safe Related discussion: redis#12817 (comment) Call chain: `RM_Reply*()` -> `_addReplyToBufferOrList()` -> touch server.current_client - Introduced: Version: 7.2.0 PR: redis#12326 - Harm Level: None Since the module fake client won't have the `CLIENT_PUSHING` flag, even if we touch server.current_client, we can still exit after `c->flags & CLIENT_PUSHING`. - Solution Checking `c->flags & CLIENT_PUSHING` earlier. 3. Made freeClient() thread-safe Fix redis#12785 - Introduced: Version: 4.0 Commit: redis@fa1b900 - Harm Level: Moderate * Trigger assertion It happens when the module thread calls freeClient while the io-thread is in progress, which just triggers an assertion, and doesn't make any race condiaions. * Touch `server.current_client`, `server.stat_clients_type_memory`, and `clientMemUsageBucket->clients`. It happens between the main thread and the module threads, may cause data corruption. 1. Error reset `server.current_client` to NULL, but theoretically this won't happen, because the module has already reset `server.current_client` to old value before entering freeClient. 2. corrupts `clientMemUsageBucket->clients` in updateClientMemUsageAndBucket(). 3. Causes server.stat_clients_type_memory memory statistics to be inaccurate. - Solution: * No longer counts memory usage on fake clients, to avoid updating `server.stat_clients_type_memory` in freeClient. * No longer resetting `server.current_client` in unlinkClient, because the fake client won't be evicted or disconnected in the mid of the process. * Judgment assertion `io_threads_op == IO_THREADS_OP_IDLE` only if c is not a fake client. 4. Fixed free client args without GIL Related discussion: redis#12817 (comment) When freeing retained strings in the module thread (refcount decr), or using them in some way (refcount incr), we should do so while holding the GIL, otherwise, they might be simultaneously freed while the main thread is processing the unblock client state. - Introduced: Version: 6.2.0 PR: redis#8141 - Harm Level: Low Trigger assertion or double free or memory leak. - Solution: Documenting that module API users need to ensure any access to these retained strings is done with the GIL locked 5. Fix adding fake client to server.clients_pending_write It will incorrectly log the memory usage for the fake client. Related discussion: redis#12817 (comment) - Introduced: Version: 4.0 Commit: redis@e96bac5 - Harm Level: None Only result in NOP - Solution: * Don't add fake client into server.clients_pending_write * Add c->conn assertion for updateClientMemUsageAndBucket() and updateClientMemoryUsage() to avoid same issue in the future. So now it will be the responsibility of the caller of both of them to avoid passing in fake client. 6. Fix calling RM_BlockedClientMeasureTimeStart() and RM_BlockedClientMeasureTimeEnd() without GIL - Introduced: Version: 6.2 PR: redis#7491 - Harm Level: Low Causes inaccuracies in command latency histogram and slow logs, but does not corrupt memory. - Solution: Module API users, if know that non-thread-safe APIs will be used in multi-threading, need to take responsibility for protecting them with their own locks instead of the GIL, as using the GIL is too expensive. ### Other issue 1. RM_Yield is not thread-safe, fixed via redis#12905. ### Summarize 1. Fix thread-safe issues for `RM_UnblockClient()`, `freeClient()` and `RM_Yield`, potentially preventing memory corruption, data disorder, or assertion. 2. Updated docs and module test to clarify module API users' responsibility for locking non-thread-safe APIs in multi-threading, such as RM_BlockedClientMeasureTimeStart/End(), RM_FreeString(), RM_RetainString(), and RM_HoldString(). ### About backpot to 7.2 1. The implement of (1) is not too satisfying, would like to get more eyes. 2. (2), (3) can be safely for backport 3. (4), (6) just modifying the module tests and updating the documentation, no need for a backpot. 4. (5) is harmless, no need for a backpot. --------- Co-authored-by: Oran Agra <oran@redislabs.com>
redis#13141) Users who abuse lua error_reply will generate a new error object on each error call, which can make server.errors get bigger and bigger. This will cause the server to block when calling INFO (we also return errorstats by default). To prevent the damage it can cause, when a misuse is detected, we will print a warning log and disable the errorstats to avoid adding more new errors. It can be re-enabled via CONFIG RESETSTAT. Because server.errors may be very large (it may be better now since we have the limit), config resetstat may block for a while. So in resetErrorTableStats, we will try to lazyfree server.errors. See the related discussion at the end of redis#8217.
this PR pushes forward the observability on overall error statistics and command statistics within redis-server:
It extends
INFO COMMANDSTATSto haveSample ( notice rejected_calls ):
Adds a new section to INFO, named
ERRORSTATSthat enables keeping track of the different errors that occur within redis ( within processCommand and call ) based uppong the reply Error Prefix ( The first word after the "-", up to the first space ).Sample:
This PR also fixes RM_ReplyWithError so that it can be correctly identified as an error reply.
To test the added features:
The added tests are divided as follow: