Skip to content

Conversation

@enjoy-binbin
Copy link
Contributor

@enjoy-binbin enjoy-binbin commented Jan 9, 2022

The following steps will crash redis-server:

[root]# cat crash
PSYNC replicationid -1
SLOWLOG GET
GET key
[root]# nc 127.0.0.1 6379 < crash

This one following #10020 and the crash was reported in #10076.
Add a new common function name logInvalidUseAndFreeClient
that will log the errors and free the client in async way.

Other changes about the output info:

  1. Cmd with a full name by using getFullCommandName, now it will print the right subcommand name like slowlog|get.
  2. Print the full client info by using catClientInfoString, the info is also valuable.

Now the log will like this:

31518:M 09 Jan 2022 18:35:19.849 # Replica generated a reply to command slowlog|get, disconnecting it: id=4 addr=127.0.0.1:45208 laddr=127.0.0.1:6379 fd=8 name= age=0 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=42 qbuf-free=20432 argv-mem=10 multi-mem=0 obl=0 oll=0 omem=0 tot-mem=40986 events=r cmd=slowlog|get user=default redir=-1 resp=2

The following steps will crash redis-server:
```
[root]# cat crash
PSYNC replicationid -1
SLOWLOG GET
GET key
[root]# nc 127.0.0.1 6379 < crash
```

This one following redis#10020 and the crash was reported in redis#10076.

Other changes about the output info:
1. Cmd with a full name by using `getFullCommandName`, now it will print the right subcommand name like `slowlog|get`.
2. Print the full client info by using `catClientInfoString`, the info is also valuable.:
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@yoav-steinberg anything to comment?

@yoav-steinberg
Copy link
Contributor

yoav-steinberg commented Jan 9, 2022

Good catch @enjoy-binbin
LGTM, two things:

  1. The fact I missed addReplyDeferredLen() shows how bad this implementation is. Ideally we should have disconnected the client immediately after it sends the command because the command isn't marked with permission to be used from the replication link. In Make sure replicas don't write their own replies to the replication link #10020 We discussed this approach and decided it's too complex, but maybe now we can reconsider...
  2. If we now have a functions for logging and async freeing the client (logInvalidUseAndFreeClientAsync()), then perhaps we should use it in all the other places it's appropriate, for example:

    redis/src/networking.c

    Lines 979 to 982 in a84c964

    sds client = catClientInfoString(sdsempty(),dst);
    freeClientAsync(dst);
    serverLog(LL_WARNING,"Client %s scheduled to be closed ASAP for overcoming of output buffer limits.", client);
    sdsfree(client);

    redis/src/networking.c

    Lines 1693 to 1695 in a84c964

    serverLog(LL_VERBOSE,
    "Error writing to client: %s", connGetLastError(c->conn));
    freeClientAsync(c);

    ...

@oranagra
Copy link
Member

oranagra commented Jan 9, 2022

we can modify these other places, but i'd like to avoid changing the error message for the output buffer limit (maybe someone depends on it), and i'd like to avoid changing the verbosity level of the other one (may happen too often).
if we can't do that cleanly, i'm ok with keeping them as is.

@enjoy-binbin
Copy link
Contributor Author

  1. Yes, the approach seems ok, but as oran said, indeed too much work... (this is uncommon)

  2. I think we can take this one, although that looks like a good cleanup, like oran said, i think i will vote to keep it. (maybe in the future we can handle it better)

@oranagra oranagra merged commit a1ae260 into redis:unstable Jan 10, 2022
@enjoy-binbin enjoy-binbin deleted the fix_assertion_on_replica_cmd branch January 10, 2022 06:23
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Feb 11, 2022
Added regression tests for redis#10020 / redis#10081 / redis#10243.
The above PRs fixed some crashes due to an asserting,
see function `clientHasPendingReplies` (introduced in redis#9166).

This commit added some tests to cover the above scenario.
These tests will all fail in redis#9166, althought fixed not,
there is value in adding these tests to cover and verify
the changes. And it also can cover redis#8868 (verify the logs).

Other changes: reduces the wait time in `waitForBgsave` and
`waitForBgrewriteaof` from 1s to 50ms, which should reduce
the time for some tests.
oranagra pushed a commit that referenced this pull request Feb 13, 2022
Added regression tests for #10020 / #10081 / #10243.
The above PRs fixed some crashes due to an asserting,
see function `clientHasPendingReplies` (introduced in #9166).

This commit added some tests to cover the above scenario.
These tests will all fail in #9166, althought fixed not,
there is value in adding these tests to cover and verify
the changes. And it also can cover #8868 (verify the logs).

Other changes: 
1. Reduces the wait time in `waitForBgsave` and `waitForBgrewriteaof`
from 1s to 50ms, which should reduce the time for some tests.
2. Improve the test infra to print context when `assert_match` fails.
3. Improve the test infra to print `$error` when `assert_error` fails.
```
Expected an error matching 'ERR*' but got 'OK' (context: type eval line 4 cmd {assert_error "ERR*" {r set a b}} proc ::test)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[CRASH] ASSERTION FAILED and stack-buffer-overflow in networking.c:1026

4 participants