Skip to content

Conversation

@ShooterIT
Copy link
Member

I config set small memory sizes for "normal" of client-output-buffer-limit to limit the memory clients used, but I actually found the memory kept increasing when users executed smembers command on a big key.

I also could read some replies exactly when I send redis command by “telnet” even if the client output buffer size is more than the part of client-output-buffer-limit I set.

I consider we shouldn't write replies to the client output buffer if we want to close it asap, that will not use unnecessary memory. And we also should remove the client from the list of pending writes, so that we won't write(2) replies to the client.

@antirez
Copy link
Contributor

antirez commented May 8, 2020

Good fix @ShooterIT, thanks! However for the second part, that is, removing it from the pending writes list, what about that instead?

diff --git a/src/networking.c b/src/networking.c
index 75c0c16b1..aa4967a20 100644
--- a/src/networking.c
+++ b/src/networking.c
@@ -1402,6 +1402,9 @@ int handleClientsWithPendingWrites(void) {
         c->flags &= ~CLIENT_PENDING_WRITE;
         listDelNode(server.clients_pending_write,ln);
 
+        /* Don't write to clients that are going to be closed anyway. */
+        if (c->flags & CLIENT_CLOSE_ASAP) continue;
+
         /* If a client is protected, don't do anything,
          * that may trigger write error or recreate handler. */
         if (c->flags & CLIENT_PROTECTED) continue;

@antirez antirez added this to the Redis 6.0.1 milestone May 8, 2020
@ShooterIT
Copy link
Member Author

Hi @antirez there must be some data in output buffer if the client output buffer size is more than client-output-buffer-limit. And we will send the buffer data to the client because it is added into the pending writes list.

My test is as follows, I set small size of "normal" in "client-output-buffer-limit", and I generated a big list that has 8717 items. The connection was closed if I executed lrange mylist 0 -1 command by redis-cli for overcoming of output buffer limits, and nothing was showed because it didn‘t receive an entire reply. But I got some data when I used telnet that always shows what it reads.

work@~/redis-new/6380$ redis-cli -p 6381 config get "*output*"
1) "client-output-buffer-limit"
2) "normal 1024 1024 60 slave 268435456 67108864 60 pubsub 33554432 8388608 60"
work@~/redis-new/6380$ redis-cli -p 6380 llen mylist
(integer) 8717
work@~/redis-new/6380$ redis-cli -p 6380 lrange mylist 0 -1
Error: Server closed the connection
work@~/redis-new/6380$ telnet 127.0.0.1 6380
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
lrange mylist 0 -1
*8717
$5
20657
$4
1626
$5
...
981_19255
$10
7592_12539
$11s
18834_26080Connection closed by foreign host.

redis log

19358:M 08 May 2020 17:41:53.418 # Client id=1028 addr=127.0.0.1:44085 fd=8 name= age=2 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=20 qbuf-free=32748 obl=16376 oll=1 omem=20504 events=r cmd=lrange user=default scheduled to be closed ASAP for overcoming of output buffer limits.

I think it is unnecessary to send the incomplete reply to client, and we need to call write(2), that is wasteful. So I consider we should remove the client from the pending writes list.

@oranagra
Copy link
Member

I've rebased this PR and run the tests. i see it fails in the second test in tests/unit/moduleapi/auth.tcl.

@madolson looking at that test, i suppose it used to work due to implicit reconnect of the tests, is that intended:

        # Catch the I/O exception that was thrown when Redis
        # disconnected with us. 
        catch { [r ping] } e
        assert_match {*I/O*} $e

        # Check that a user change was registered
        assert_equal [r auth.changecount] 1

@ShooterIT can you look into it and check how it affected that test?

@ShooterIT
Copy link
Member Author

Firstly, I'm sorry i didn't read Salvatore's suggestion carefully. I think i should merge his suggestion, because there are also some other clients set CLIENT_CLOSE_ASAP flag

For test error, I don't know much actually, maybe you are right i think

@ShooterIT
Copy link
Member Author

ShooterIT commented Aug 24, 2020

But I think we should also remove the client from the pending writes list, because redis also will write replies to client when enable threaded I/O if only change the following.

diff --git a/src/networking.c b/src/networking.c
index 75c0c16b1..aa4967a20 100644
--- a/src/networking.c
+++ b/src/networking.c
@@ -1402,6 +1402,9 @@ int handleClientsWithPendingWrites(void) {
c->flags &= ~CLIENT_PENDING_WRITE;
listDelNode(server.clients_pending_write,ln);

    /* Don't write to clients that are going to be closed anyway. */
    if (c->flags & CLIENT_CLOSE_ASAP) continue;

     /* If a client is protected, don't do anything,
      * that may trigger write error or recreate handler. */
     if (c->flags & CLIENT_PROTECTED) continue;

@ShooterIT
Copy link
Member Author

ShooterIT commented Aug 24, 2020

Maybe we also check in handleClientsWithPendingWritesUsingThreads as follow

diff --git a/src/networking.c b/src/networking.c
index 5271de2d8..ed1e1f7c5 100644
--- a/src/networking.c
+++ b/src/networking.c
@@ -3026,6 +3026,9 @@ int handleClientsWithPendingWritesUsingThreads(void) {
     while((ln = listNext(&li))) {
         client *c = listNodeValue(ln);
         c->flags &= ~CLIENT_PENDING_WRITE;
+        if (c->flags & CLIENT_CLOSE_ASAP) {
+            listDelNode(server.clients_pending_write, ln);
+            continue;
+        }
         int target_id = item_id % server.io_threads_num;
         listAddNodeTail(io_threads_list[target_id],c);
         item_id++;

@ShooterIT
Copy link
Member Author

ShooterIT commented Aug 24, 2020

hi @oranagra
For tests, i think we should catch errors when execute auth.createmoduleuser command because redis will disconnect with users. https://github.com/redis/redis/blob/unstable/tests/modules/auth.c#L19

As follow, this change will make tests passed

diff --git a/tests/unit/moduleapi/auth.tcl b/tests/unit/moduleapi/auth.tcl
index 04a90a496..3d8564625 100644
--- a/tests/unit/moduleapi/auth.tcl
+++ b/tests/unit/moduleapi/auth.tcl
@@ -19,11 +19,10 @@ start_server {tags {"modules"}} {
 
     test {De-authenticating clients is tracked and kills clients} {
         assert_equal [r auth.changecount] 0
-        r auth.createmoduleuser
 
         # Catch the I/O exception that was thrown when Redis
         # disconnected with us. 
-        catch { [r ping] } e
+        catch { [r auth.createmoduleuser] } e
         assert_match {*I/O*} $e
 
         # Check that a user change was registered

@ShooterIT
Copy link
Member Author

Please also have a look if you have time @madolson

@madolson
Copy link
Contributor

I think the problem is that for acls (and module acls) we should probably use the close after reply flag instead of the close async. The motivation here is that there were completed commands in the output we should return to the calling clients. So I think the test caught a good thing, I can make that fix.

@madolson
Copy link
Contributor

The test specifically is relying on that behavior, so I agree that we could try to catch the exception, but I think it's probably the wrong long term thing to do.

@oranagra
Copy link
Member

oranagra commented Sep 1, 2020

@madolson so the next step is to keep the test as is, and change acl.c and module.c to use CLIENT_CLOSE_AFTER_REPLY instead of freeClientAsync (need to create an interface in networking.c for that.
@ShooterIT care to try that?

@ShooterIT
Copy link
Member Author

we should probably use the close after reply flag instead of the close async

I think that is not enough, we can't write any thing clients if we set CLIENT_CLOSE_AFTER_REPLY flag

@oranagra
Copy link
Member

oranagra commented Sep 2, 2020

@ShooterIT you mean that _addReplyToBuffer etc are rejecting the write, right?
why do we need to write to the client after setting the flag?

@ShooterIT
Copy link
Member Author

@oranagra Yes.
For acl deluser command

    } else if (!strcasecmp(sub,"deluser") && c->argc >= 3) {
        int deleted = 0;
        for (int j = 2; j < c->argc; j++) {
            sds username = c->argv[j]->ptr;
            if (!strcmp(username,"default")) {
                addReplyError(c,"The 'default' user cannot be removed");
                return;
            }
        }

        for (int j = 2; j < c->argc; j++) {
            sds username = c->argv[j]->ptr;
            user *u;
            if (raxRemove(Users,(unsigned char*)username,
                          sdslen(username),
                          (void**)&u))
            {
L1:             ACLFreeUserAndKillClients(u);
                deleted++;
            }
        }
L2:     addReplyLongLong(c,deleted);

If we the user we delete also is the user of current client, we kill current client async(L1) but we also wish we can write reply to it(L2). So I think it is not enough If we set CLIENT_CLOSE_AFTER_REPLY flag because we can't write any later.

For tests in module api, @madolson has been said, RedisModule_FreeModuleUser will call ACLFreeUserAndKillClients, so current client is killed, we can't send replies to the client. In my modification, we don't write any thing if the client is set CLIENT_CLOSE_ASAP flag, so my PR makes this tests failed.

int Auth_CreateModuleUser(RedisModuleCtx *ctx, RedisModuleString **argv, int argc) {
    REDISMODULE_NOT_USED(argv);
    REDISMODULE_NOT_USED(argc);

    if (global) {
        RedisModule_FreeModuleUser(global);
    }

    global = RedisModule_CreateModuleUser("global");
    RedisModule_SetModuleUserACL(global, "allcommands");
    RedisModule_SetModuleUserACL(global, "allkeys");
    RedisModule_SetModuleUserACL(global, "on");

    return RedisModule_ReplyWithSimpleString(ctx, "OK");
}

@oranagra
Copy link
Member

oranagra commented Sep 6, 2020

@ShooterIT if these are the only two problems (problematic corner cases), maybe we should just reorder things in these?
i.e. the code in ACL DELUSER can run the loop twice, first count and reply, and then do the action.
And maybe the test module should do the same (first reply and then action).

The benefit of this PR is an immediate release of a possibly (probably) huge output buffer rather.
i.e. asyncCloseClientOnOutputBufferLimitReached isn't really reclaiming the memory, and keeps feeding what was already queued (which is a lot, and also useless) to the client.
right?

@ShooterIT
Copy link
Member Author

@oranagra Yes, My motivation to do this PR is losing control of output buffer size. In my case, our operation engineers found used memory of redis was more than 15GB even more but max memory we set only is 10G. By pstack $redis_pid, we found it executed a complex command. We set client output buffer 1GB for clients, but that didn't take effect. So i think we shouldn't add replies to reply list if reached the size we set. What's more, it is unnecessary to send the buffer data to the client because the reply is not entire and calling write(2) is waste.

For problems, maybe there are not only two, acl load command may also have this problem. But as i observed, the problem is introduced only in ACL implementation. Maybe we need a new client flag :(

I do a try, just draft, maybe @madolson has a better idea.

--- a/src/acl.c
+++ b/src/acl.c
@@ -297,7 +297,10 @@ void ACLFreeUserAndKillClients(user *u) {
              * it in non authenticated mode. */
             c->user = DefaultUser;
             c->authenticated = 0;
-            freeClientAsync(c);
+            if (c == server.current_client)
+                c->flags |= CLIENT_RESOLVE_LATER;
+            else
+                freeClientAsync(c);
         }
--- a/src/server.c
+++ b/src/server.c
@@ -3367,6 +3367,14 @@ void call(client *c, int flags) {
     dirty = server.dirty-dirty;
     if (dirty < 0) dirty = 0;
 
+    if (!server.loading && c->flags & CLIENT_RESOLVE_LATER) {
+        c->flags &= ~CLIENT_RESOLVE_LATER;
+        if (clientHasPendingReplies(c))
+            c->flags |= CLIENT_CLOSE_AFTER_REPLY;
+        else
+            c->flags |= CLIENT_CLOSE_ASAP;
+    }
+

@oranagra
Copy link
Member

@ShooterIT the solution you suggested last seems good.
How about renaming it to CLIENT_CLOSE_AFTER_COMMAND?
i.e. the difference is that CLIENT_CLOSE_AFTER_REPLY doesn't allow any further writes, and the new one allows them till the current command is done.

I'm also ok with the alternative of not adding a new flag, and instead change these two cases to reply first and act later:

dif
diff --git a/src/acl.c b/src/acl.c
index 3194feb5b..bd0affa31 100644
--- a/src/acl.c
+++ b/src/acl.c
@@ -278,7 +278,7 @@ void ACLFreeUserAndKillClients(user *u) {
              * it in non authenticated mode. */
             c->user = DefaultUser;
             c->authenticated = 0;
-            freeClientAsync(c);
+            c->flags |= CLIENT_CLOSE_AFTER_REPLY;
         }
     }
     ACLFreeUser(u);
@@ -1673,6 +1673,16 @@ void aclCommand(client *c) {
             }
         }
 
+        /* we need to count and reply before deleting, since we may be deleting the current user */
+        for (int j = 2; j < c->argc; j++) {
+            sds username = c->argv[j]->ptr;
+            if (raxFind(Users,(unsigned char*)username,
+                          sdslen(username)))
+            {
+                deleted++;
+            }
+        }
+        addReplyLongLong(c,deleted);
         for (int j = 2; j < c->argc; j++) {
             sds username = c->argv[j]->ptr;
             user *u;
@@ -1681,10 +1691,8 @@ void aclCommand(client *c) {
                           (void**)&u))
             {
                 ACLFreeUserAndKillClients(u);
-                deleted++;
             }
         }
-        addReplyLongLong(c,deleted);
     } else if (!strcasecmp(sub,"getuser") && c->argc == 3) {
         user *u = ACLGetUserByName(c->argv[2]->ptr,sdslen(c->argv[2]->ptr));
         if (u == NULL) {
diff --git a/tests/modules/auth.c b/tests/modules/auth.c
index b13c10385..f769db876 100644
--- a/tests/modules/auth.c
+++ b/tests/modules/auth.c
@@ -15,6 +15,8 @@ int Auth_CreateModuleUser(RedisModuleCtx *ctx, RedisModuleString **argv, int arg
     REDISMODULE_NOT_USED(argv);
     REDISMODULE_NOT_USED(argc);
 
+    /* reply before closing since we might be deleting the current user. */
+    RedisModule_ReplyWithSimpleString(ctx, "OK");
     if (global) {
         RedisModule_FreeModuleUser(global);
     }
@@ -24,7 +26,7 @@ int Auth_CreateModuleUser(RedisModuleCtx *ctx, RedisModuleString **argv, int arg
     RedisModule_SetModuleUserACL(global, "allkeys");
     RedisModule_SetModuleUserACL(global, "on");
 
-    return RedisModule_ReplyWithSimpleString(ctx, "OK");
+    return REDISMODULE_OK;
 }
 
 int Auth_AuthModuleUser(RedisModuleCtx *ctx, RedisModuleString **argv, int argc) {

i've tested both approaches (my diff above and what you posted in your last message) and they seem to work.
the only things missing are some additional tests for coverage:

  • add a test that makes sure a client that was disconnected due to breaching the output buffer limit doesn't get to read any of it and the memory is released immediately (i.e. so that we can spot future regressions)
  • add a test for ACL DELUSER that deletes itself (seems uncovered right now)

What do you think?
@guybe7 can you wash your eyes on this as well?

@ShooterIT
Copy link
Member Author

@oranagra I'm ok to use CLIENT_CLOSE_AFTER_COMMAND, actually i also wanted to use this name(maybe little long, but it is much clear).

For problems, maybe there are not only two, acl load command may also have this problem. But as i observed, the problem is introduced only in ACL implementation. Maybe we need a new client flag :(

Just like i said above, so we asked yangbodong22011 to add more test cases #7765. So I rebase to newest unstable branch, it failed on acl load and save test cases

@oranagra
Copy link
Member

ok, so there are more places that first close a client and then reply to it. and it's good that we have coverage for some.
your suggestion about adding CLIENT_CLOSE_AFTER_COMMAND seems to be the winner, please add it to the PR (tests will start passing).

and please add tests for:

  • test that makes sure a client that was disconnected due to breaching the output buffer limit doesn't get to read any of it and the memory is released immediately (i.e. so that we can spot future regressions)
  • test for ACL DELUSER that deletes itself (seems uncovered right now)

sounds like a plan?

@ShooterIT
Copy link
Member Author

Yes. Let me do it ASAP, this PR took too long time.

@oranagra
Copy link
Member

yes. i really want to fix this (for fear of forgetting).

@ShooterIT
Copy link
Member Author

thanks @oranagra @guybe7

  1. for checking server. loading, before, I wanted to add a flag to solve our problem, but I also thought this flag may be used in some other cases, we don't want to make client closed when load aof, but actually, i have much consideration, closing a client(flagged as CLOSE_AFTER_REPLY) also shouldn't not happen even if in multi/exec. It is confusing, i am sorry, i should remove it.

  2. I think i should cover this test, it is very import to keep transaction atomic as much as possible

  3. I think there are two tests you want me to add. first one, we should reply nothing to the client even if redis have executed some commands in pipeline before reaching output limit. i also agree with you, in most cases, we set client output limit just is because of limited memory, and we also can't know OS socket buffer can accommodate how many valid complete replies, so we can't accurately tell users which portion in pipeline they should start to retry.
    The second also is what i want, i always want to find a easy way to test this since it can used to find the difference between old version and my PR.

I love to pay attention to details, let me add these tests, although my pr doesn't introduce these problem on my opinion, I think i truly should add these tests before we do next jobs.

@ShooterIT
Copy link
Member Author

We may miss one thing. When we execute acl deluser command in transaction(multi\exec) and the user is the client using. In current unstable version, redis may write all replies to clients, but my PR, redis won't continue to write any thing, so users can't receive enough replies of transaction. Although we shouldn't write replies to the client flagged with CLIENT_CLOSE_ASYNC, in current unstable version, it seems more reasonable to write replies to clients as much as possible, but there also is a chance redis can't write all replies to the client since OS socket buffer is limited.

I think we should add check when set clients with flag CLIENT_CLOSE_AFTER_REPLY

@@ -3404,7 +3404,9 @@ void call(client *c, int flags) {
 
     /* After executing command, we will close the client after writing entire
      * reply if it is set 'CLIENT_CLOSE_AFTER_COMMAND' flag. */
-    if (!server.loading && c->flags & CLIENT_CLOSE_AFTER_COMMAND) {
+    if (!(c->flags & CLIENT_MULTI && c->cmd->proc != execCommand) &&
+        c->flags & CLIENT_CLOSE_AFTER_COMMAND)
+    {
         c->flags &= ~CLIENT_CLOSE_AFTER_COMMAND;
         c->flags |= CLIENT_CLOSE_AFTER_REPLY;
     }

What is your opinion? @oranagra @guybe7

@oranagra
Copy link
Member

i'm not really worried about someone calling ACL DELUSER from MULTI, and also overflowing the output buffers limit at the same time. seems like an unlikely combination.

we should however consider that in the future other commands will use that new flag we added, so we should consider what to do in that case anyway.

but with the patch you offered above, the user can maybe abuse the limit inside multi and in fact be unlimited (output buffers grow beyond the limit), so i don't think we want to do that.

let's consider for a moment what other types of command may use that new flag:
i suppose a command such as MGET / LRANGE / KEYS / HGETALL (i.e. ones that can generate a big response or one that is unpredictable) will never use that flag.
also, the only commands that will ever use that flag are ones that can kill clients, including the current client.

so should we respect the response of such commands inside multi exec (while at the same time the output buffer is overflown)? i don't think so.
i think it is enough that the multi-exec runs to completion, i tend to think that the response in that case is not a concern.

@oranagra
Copy link
Member

ohh, i'm sorry, in your last post you mentioned the difference of behavior between unstable and this PR for a combination of MULTI-EXEC and ACL DELUSER without an output buffer overflow.
you said that before this PR the client would get the full multi-exec response, and with this PR it won't get any response, right?

maybe we should consider the current behavior in unstable as a bug?
i.e. what would happen if the user issues CLIENT KILL inside MULTI-EXEC? he would also get no response, right?

@ShooterIT
Copy link
Member Author

you said that before this PR the client would get the full multi-exec response, and with this PR it won't get any response, right?

Yes, but before this PR, if total response size of transaction exceeds OS socket buffer size, client may only receive part of response since we set CLIENT_CLOSE_ASYNC for this client.

maybe we should consider the current behavior in unstable as a bug?
i.e. what would happen if the user issues CLIENT KILL inside MULTI-EXEC? he would also get no response, right?

Oh, Maybe you are right. I didn't notice 'CLIENT KILL itself inside MULTI-EXEC'. I just took a look and did a test, its behavior is like ACL DELUSER in MULTI-EXEC in this PR because the client is flagged with CLIENT_CLOSE_AFTER_REPLY if client killed itself.

@oranagra
Copy link
Member

ok. so a few additional tests for coverage that are somewhat unrelated to this change (and removal of the !loading check), and we can merge it [after some 60 posts]?

@ShooterIT
Copy link
Member Author

Yes, too many conversations. I add two tests. Maybe i missed somethings before. Actually, these two problems influence each other :(

@ShooterIT
Copy link
Member Author

@oranagra
Before this commit, we will continue to add reply bulk to reply buffer even if client output buffer limit is enforced, so the used memory will keep increasing over configured limit. What's more, we shouldn’t write any reply to the client if it is set 'CLIENT_CLOSE_ASAP' flag because that doesn't conform to its definition and we will close all clients flagged with 'CLIENT_CLOSE_ASAP' in ‘beforeSleep’.

Because of code execution order, before this, we may firstly write to some replies to clients just when trigger writable event, but in fact, we may can’t send all replies to clients since OS socket buffer is limited. But this unexpected behavior makes some commands work well, for ACL subcommands, if the client deletes the using user, we need to send reply to client and close it, but before, we close the client firstly and write the reply to reply buffer secondly. We shouldn't do this despite it works well in most cases.

We add a flag 'CLIENT_CLOSE_AFTER_COMMAND' to mark clients, this flag means we will close the client after executing commands and send all entire replies, so that we can write replies to reply buffer during executing commands, send replies to clients, and close them later.

We also fix some implicit problems. If client output buffer limit is enforced in 'multi/exec', all commands will be executed completely in redis and clients will not read any reply instead of partial replies. Even more, if the client executes 'ACL deluser' the using user in 'multi/exec', it will not read the replies after 'ACL deluser' just like before executing 'client kill' itself in 'multi/exec'.

@oranagra
Copy link
Member

Thanks for the summary (and all the work), will use that message when i squash merge it.

@oranagra oranagra merged commit 57709c4 into redis:unstable Sep 24, 2020
@oranagra oranagra removed this from the Next minor backlog milestone Oct 19, 2020
@oranagra oranagra mentioned this pull request Oct 26, 2020
oranagra added a commit that referenced this pull request Oct 27, 2020
Before this commit, we would have continued to add replies to the reply buffer even if client
output buffer limit is reached, so the used memory would keep increasing over the configured limit.
What's more, we shouldn’t write any reply to the client if it is set 'CLIENT_CLOSE_ASAP' flag
because that doesn't conform to its definition and we will close all clients flagged with
'CLIENT_CLOSE_ASAP' in ‘beforeSleep’.

Because of code execution order, before this, we may firstly write to part of the replies to
the socket before disconnecting it, but in fact, we may can’t send the full replies to clients
since OS socket buffer is limited. But this unexpected behavior makes some commands work well,
for instance ACL DELUSER, if the client deletes the current user, we need to send reply to client
and close the connection, but before, we close the client firstly and write the reply to reply
buffer. secondly, we shouldn't do this despite the fact it works well in most cases.

We add a flag 'CLIENT_CLOSE_AFTER_COMMAND' to mark clients, this flag means we will close the
client after executing commands and send all entire replies, so that we can write replies to
reply buffer during executing commands, send replies to clients, and close them later.

We also fix some implicit problems. If client output buffer limit is enforced in 'multi/exec',
all commands will be executed completely in redis and clients will not read any reply instead of
partial replies. Even more, if the client executes 'ACL deluser' the using user in 'multi/exec',
it will not read the replies after 'ACL deluser' just like before executing 'client kill' itself
in 'multi/exec'.

We added some tests for output buffer limit breach during multi-exec and using a pipeline of
many small commands rather than one with big response.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 57709c4)
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Nov 4, 2020
Before this commit, we would have continued to add replies to the reply buffer even if client
output buffer limit is reached, so the used memory would keep increasing over the configured limit.
What's more, we shouldn’t write any reply to the client if it is set 'CLIENT_CLOSE_ASAP' flag
because that doesn't conform to its definition and we will close all clients flagged with
'CLIENT_CLOSE_ASAP' in ‘beforeSleep’.

Because of code execution order, before this, we may firstly write to part of the replies to
the socket before disconnecting it, but in fact, we may can’t send the full replies to clients
since OS socket buffer is limited. But this unexpected behavior makes some commands work well,
for instance ACL DELUSER, if the client deletes the current user, we need to send reply to client
and close the connection, but before, we close the client firstly and write the reply to reply
buffer. secondly, we shouldn't do this despite the fact it works well in most cases.

We add a flag 'CLIENT_CLOSE_AFTER_COMMAND' to mark clients, this flag means we will close the
client after executing commands and send all entire replies, so that we can write replies to
reply buffer during executing commands, send replies to clients, and close them later.

We also fix some implicit problems. If client output buffer limit is enforced in 'multi/exec',
all commands will be executed completely in redis and clients will not read any reply instead of
partial replies. Even more, if the client executes 'ACL deluser' the using user in 'multi/exec',
it will not read the replies after 'ACL deluser' just like before executing 'client kill' itself
in 'multi/exec'.

We added some tests for output buffer limit breach during multi-exec and using a pipeline of
many small commands rather than one with big response.

Co-authored-by: Oran Agra <oran@redislabs.com>
jschmieg pushed a commit to memKeyDB/memKeyDB that referenced this pull request Nov 6, 2020
Before this commit, we would have continued to add replies to the reply buffer even if client
output buffer limit is reached, so the used memory would keep increasing over the configured limit.
What's more, we shouldn’t write any reply to the client if it is set 'CLIENT_CLOSE_ASAP' flag
because that doesn't conform to its definition and we will close all clients flagged with
'CLIENT_CLOSE_ASAP' in ‘beforeSleep’.

Because of code execution order, before this, we may firstly write to part of the replies to
the socket before disconnecting it, but in fact, we may can’t send the full replies to clients
since OS socket buffer is limited. But this unexpected behavior makes some commands work well,
for instance ACL DELUSER, if the client deletes the current user, we need to send reply to client
and close the connection, but before, we close the client firstly and write the reply to reply
buffer. secondly, we shouldn't do this despite the fact it works well in most cases.

We add a flag 'CLIENT_CLOSE_AFTER_COMMAND' to mark clients, this flag means we will close the
client after executing commands and send all entire replies, so that we can write replies to
reply buffer during executing commands, send replies to clients, and close them later.

We also fix some implicit problems. If client output buffer limit is enforced in 'multi/exec',
all commands will be executed completely in redis and clients will not read any reply instead of
partial replies. Even more, if the client executes 'ACL deluser' the using user in 'multi/exec',
it will not read the replies after 'ACL deluser' just like before executing 'client kill' itself
in 'multi/exec'.

We added some tests for output buffer limit breach during multi-exec and using a pipeline of
many small commands rather than one with big response.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 57709c4)
oranagra added a commit that referenced this pull request Dec 8, 2020
Module blocked clients cache the response in a temporary client,
the reply list in this client would be affected by the recent fix
in #7202, but when the reply is later copied into the real client,
it would have bypassed all the checks for output buffer limit, which
would have resulted in both: responding with a partial response to
the client, and also not disconnecting it at all.
@ShooterIT ShooterIT deleted the close-client branch December 13, 2020 12:46
oranagra added a commit that referenced this pull request Jan 12, 2021
Module blocked clients cache the response in a temporary client,
the reply list in this client would be affected by the recent fix
in #7202, but when the reply is later copied into the real client,
it would have bypassed all the checks for output buffer limit, which
would have resulted in both: responding with a partial response to
the client, and also not disconnecting it at all.

(cherry picked from commit 48efc25)
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Mar 2, 2021
Module blocked clients cache the response in a temporary client,
the reply list in this client would be affected by the recent fix
in redis#7202, but when the reply is later copied into the real client,
it would have bypassed all the checks for output buffer limit, which
would have resulted in both: responding with a partial response to
the client, and also not disconnecting it at all.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants