Skip to content

Conversation

@ny0312
Copy link
Contributor

@ny0312 ny0312 commented Nov 12, 2021

This PR introduced memory management on cluster bus link send buffers, to address issues raised in #9688.

The interface changes are (need core-team consensus):

  • Introduced a new cluster-link-sendbuf-limit config that caps memory usage of cluster bus link send buffers.
    • Its default is chosen to be 0 (aka infinite).
  • Introduced a new CLUSTER LINKS command that displays current TCP links to/from peers as a RESP3 map.
  • Added mem_cluster_links field under the Memory section of INFO command output, which displays the overall memory usage by all current cluster links.
    • Value of mem_cluster_links is also deducted from used_memory_dataset to better isolate the memory usage of real key values.
    • Memory usage of cluster links are still counted toward the maxmemory limit, so over consumption by cluster links can still cause evictions or OOM errors (no behavior change).
  • Added total_cluster_links_buffer_limit_exceeded field under the CLUSTER INFO command output, which displays the accumulated count of cluster links freed due to cluster-link-sendbuf-limit.

On an implementation level:

  • In clusterNode struct, added an inbound_link field pointing to the inbound link accepted from this node. Before this change, inbound links are not associated with the corresponding clusterNode and are loosely managed.

@ny0312
Copy link
Contributor Author

ny0312 commented Nov 12, 2021

This is a new PR that replaces old PR: ny0312#1

There were some comments left on the old PR that are now addressed in this new PR.

@ny0312 ny0312 changed the title Introduce memory management on cluster link buffers #1 Introduce memory management on cluster link buffers Nov 13, 2021
@madolson madolson added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus labels Nov 14, 2021
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some thoughts on the other PR, but this one seems OK to me. We also need a change in redis.conf to describe the parameter.

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise.

@madolson madolson added approval-needed Waiting for core team approval to be merged state:needs-code-doc the PR requires documentation inside the redis repository labels Nov 28, 2021
@madolson madolson requested a review from yossigo November 28, 2021 02:30
@madolson
Copy link
Contributor

@redis/core-team Please review, there are two core team decisions. The first is the config to limit maximum amount of outbound data, the second is the command to describe the information for the cluster links.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't review the majority of the code. just the parts related to the new interfaces.
i suppose the feature is valid, but i'd like to argue about the default value for the config.

@madolson
Copy link
Contributor

madolson commented Dec 7, 2021

@ny0312 The decision was that we will set the default to 0, implying infinite today, and we'll evaluate how it works. We'll document that you can do something about it.

@bashanyy
Copy link

bashanyy commented Dec 7, 2021 via email

@ny0312
Copy link
Contributor Author

ny0312 commented Dec 14, 2021

Accompanying doc PR: redis/redis-doc#1710

Thanks.

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @redis/core-team back to ya'll for approval.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment. otherwise LTTM.

@oranagra
Copy link
Member

The top comment of the PR is outdated too.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@madolson
Copy link
Contributor

@ny0312 Sorry to ask more from you, but we moved to a new command definition system:

If you had any PR that modifies the command table, you'll need to rebase, modify the json files, and run the python script that generates commands.c (and commit that too).
Same thing about modifying some things about the command (like argument, response, etc. they're all documented there now).

If you have any redis-doc PR that changes commands.json, make sure that the change is reflected in the json files in the redis repo, the command.json in redis-doc is now in "code" freeze (for now you can keep the change in both repos, but make sure they're "Identical"), we'll soon delete the json file from redis-doc and take the data from redis in some way.

You can transfer the contents of the doc PR over to this CR now.

@yossigo Can you approve the major changes if you have time, so we can get this merged?

@madolson
Copy link
Contributor

@oranagra Can you verify that commands.c and the json file were implemented correctly? They look right to me but I haven't looked that much. Besides that I think this is ready to merge along with the doc PR.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@madolson madolson merged commit 792afb4 into redis:unstable Dec 17, 2021
@sundb
Copy link
Collaborator

sundb commented Dec 17, 2021

Disconnect link when send buffer limit reached test failed in my daily CI.
https://github.com/sundb/redis/runs/4560035116?check_suite_focus=true
https://github.com/sundb/redis/runs/4560034978?check_suite_focus=true

hwware pushed a commit to hwware/redis that referenced this pull request Dec 20, 2021
Introduce memory management on cluster link buffers:
 * Introduce a new `cluster-link-sendbuf-limit` config that caps memory usage of cluster bus link send buffers.
 * Introduce a new `CLUSTER LINKS` command that displays current TCP links to/from peers.
 * Introduce a new `mem_cluster_links` field under `INFO` command output, which displays the overall memory usage by all current cluster links.
 * Introduce a new `total_cluster_links_buffer_limit_exceeded` field under `CLUSTER INFO` command output, which displays the accumulated count of cluster links freed due to `cluster-link-sendbuf-limit`.
@oranagra
Copy link
Member

@ny0312 can you please look into these failures.
see also
https://github.com/redis/redis/runs/4654435937?check_suite_focus=true#step:10:1262

01:17:54> Each node has two links with each peer: FAILED: Expected 19*2 eq 37 (context: type eval line 11 cmd {assert {$num_peers*2 eq $num_links}} proc ::foreach_instance_id)

@madolson
Copy link
Contributor

@oranagra @ny0312 that issue specifically is caused because the cluster tests do hard resets in between tests, and the nodes may not have re-established all links, especially for slower tests. I was looking into the other failures a bit last night, and don't have any other suggested fixes at the moment.

@oranagra
Copy link
Member

oranagra commented Jan 2, 2022

some failure in the FreeBSD CI in Disconnect link when send buffer limit reached
https://github.com/redis/redis/runs/4681750680?check_suite_focus=true#step:4:7614
This may be an issue with the CI itself (it't not stable), but also, maybe there's some timing issue in the test.

@madolson
Copy link
Contributor

madolson commented Jan 2, 2022

I'll ping @ny0312 on our AWS slack, I'm not sure he's noticed this.

@oranagra
Copy link
Member

oranagra commented Jan 2, 2022

@madolson a few days ago he told me he's on vacation 8-)

@madolson
Copy link
Contributor

madolson commented Jan 2, 2022

I probably should have known that :D.

I was able to slowly validate it was this commit that is causing both the crashes on sanitizer and the buffer disconnect issues. I have no idea why though, it might just be we're dumping a bunch of memory into the test which doesn't seem to be done elsewhere in the cluster test.

@ny0312
Copy link
Contributor Author

ny0312 commented Jan 3, 2022

I'm back. I will sync up with Maddy and investigate and fix what is going on. Sorry and thanks.

@oranagra
Copy link
Member

oranagra commented Jan 9, 2022

any news?
i must admit i didn't notice it fail lately, but got this one now:
https://github.com/redis/redis/runs/4733591841?check_suite_focus=true

00:47:22> Disconnect link when send buffer limit reached: error writing "sock802fbc590": broken pipe

it could be just a random failure of the slow freebsd CI (we see broken pipes in random places).
note that in the past we saw:

00:44:36> Disconnect link when send buffer limit reached: error writing "sock80264c2d0": connection reset by peer

as well as other types of errors, on non-freebsd CI.

  • a hung triggering a SIGSEGV kill by the test framework on a sanitizer run.
  • a the following failure on centos+tls:
01:17:54> Each node has two links with each peer: FAILED: Expected 19*2 eq 37 (context: type eval line 11 cmd {assert {$num_peers*2 eq $num_links}} proc ::foreach_instance_id)

@madolson madolson mentioned this pull request Jan 10, 2022
6 tasks
@ny0312
Copy link
Contributor Author

ny0312 commented Jan 14, 2022

Hey sorry for the late response. But I have been trying to reproduce this test failure, but couldn't.

This is the command I run (I copied it from https://github.com/redis/redis/runs/4654435988?check_suite_focus=true):

gmake || exit 1 ; if echo "" | grep -vq redis ; then ./runtest --verbose --timeout 2400 --no-latency --dump-logs  || exit 1 ; fi ; if echo "" | grep -vq modules ; then MAKE=gmake ./runtest-moduleapi --verbose --timeout 2400 --no-latency --dump-logs  || exit 1 ; fi ; if echo "" | grep -vq sentinel ; then ./runtest-sentinel  || exit 1 ; fi ; if echo "" | grep -vq cluster ; then ./runtest-cluster  || exit 1 ; fi ;

I've run it numerous times locally. I've run it numerous times on a FreeBSD 64 bit EC2. All with the latest unstable branch. Never able to reproduce the failure.

From the error message, it looks like Redis crashed during my test. The crash must be flaky and non-deterministic.

Any ideas that could help reproduce it?

Another question - how do I preserve redis logs from these test runs?

@sundb
Copy link
Collaborator

sundb commented Jan 14, 2022

@ny0312 It's bound to happen every time on my local freebsd vm, I've only allocated 1 core and 256M RAM to it, and since I don't know about clusters, there's no way to solve this problem directly, maybe I can package the logs and send them to you.

I use the following to reproduce.

./runtest-cluster --single 24-links --dont-clean

@sundb
Copy link
Collaborator

sundb commented Jan 14, 2022

When memory is exhausted, this test is bound to fail, and here are some logs after enabling loglevel debug.

16316:M 14 Jan 2022 12:48:51.257 . I/O error reading from node link: connection closed
16316:M 14 Jan 2022 12:48:51.315 . I/O error reading from node link: connection closed
16316:M 14 Jan 2022 12:48:51.315 . I/O error reading from node link: connection closed
16316:M 14 Jan 2022 12:48:51.318 . Unable to connect to Cluster Node [127.0.0.1]:40000 -> accept: Resource temporarily unavailable
16316:M 14 Jan 2022 12:48:51.318 . Unable to connect to Cluster Node [127.0.0.1]:40002 -> accept: Resource temporarily unavailable
16316:M 14 Jan 2022 12:48:51.318 . Unable to connect to Cluster Node [127.0.0.1]:40001 -> accept: Resource temporarily unavailable
16314:M 14 Jan 2022 12:48:51.254 . I/O error reading from node link: Connection reset by peer
16314:M 14 Jan 2022 12:48:51.257 . I/O error reading from node link: connection closed

@ny0312
Copy link
Contributor Author

ny0312 commented Jan 14, 2022

Huh, looks like file descriptors are exhausted?

@sundb ,

You mentioned you reproduced this on a FreeBSD vm. Could you please share what exactly is your setup? Which VirtualMachine?

In the meanwhile, I will try to simulate your setup with a EC2 to reproduce. i.e. I will provision a EC2 with FreeBSD, 1 core and 256MB memory.

Also, if you could share the full Redis logs for the failed test, that'd be great. (For example you could attach a file here, or use something like https://justpaste.it/)

Thanks a lot!

@sundb
Copy link
Collaborator

sundb commented Jan 14, 2022

@ny0312 justpaste.it can't upload files, so I created a repository to hold the logs(https://github.com/sundb/log).

  • system: Freebsd 12 64bit

  • vm: VMware

  • ulimit

-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          33554432
-s: stack size (kbytes)             524288
-c: core file size (blocks)         unlimited
-m: resident set size (kbytes)      unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       10240
-n: file descriptors                10240
-b: socket buffer size (bytes)      unlimited
-v: virtual memory size (kbytes)    unlimited
-p: pseudo-terminals                unlimited
-w: swap size (kbytes)              unlimited
-k: kqueues                         unlimited
-o: umtx shared locks               unlimited
  • change loglevel
diff --git a/tests/cluster/run.tcl b/tests/cluster/run.tcl
index c81d8f39d..ddf9986aa 100644
--- a/tests/cluster/run.tcl
+++ b/tests/cluster/run.tcl
@@ -17,6 +17,7 @@ proc main {} {
         "appendonly yes"
         "enable-protected-configs yes"
         "enable-debug-command yes"
+        "loglevel debug"
     }
     run_tests
     cleanup

@ny0312
Copy link
Contributor Author

ny0312 commented Jan 18, 2022

OK. I am able to reproduce the test failure with 1 core, 512MB RAM 64bit FreeBSD.

The failure is Redis getting OOM killed by kernel due to out of swap. In the test, I'm allowing cluster link buffers to grow up to 32MB. That proved to be too much for the FreeBSD test environment used by the daily runs.

I'm currently experimenting with a new cluster link buffer limit in my test. The challenge is that in order to trigger the condition under test, which is a cluster bus link being freed once over the limit, I need to fill up the cluster link buffer. But to fill up the link buffer, I need to first fill up the TCP write buffer in the kernel level, which varies from OS to OS.

I will post a fix PR as soon as I can get a good stable buffer limit.

@oranagra
Copy link
Member

note that in the "other" test suite, we have the --large-memory which enables us to run certain tests only on systems with a lot of memory (currently only manually).

also, what about the test failures in the sanitizer runs, like this one:
https://github.com/redis/redis/runs/4829401183?check_suite_focus=true#step:9:630

not sure if this one is related:
https://github.com/redis/redis/runs/4810931899?check_suite_focus=true#step:9:466

@ny0312
Copy link
Contributor Author

ny0312 commented Jan 19, 2022

also, what about the test failures in the sanitizer runs, like this one:
https://github.com/redis/redis/runs/4829401183?check_suite_focus=true#step:9:630

This has a different root cause. In the test, I'm assuming as soon as I send a large PUBLISH command to fill up a cluster link, the link will be freed. But in reality the link will only get freed in the next clusterCron run whenever that happens. My test is not accounting for this race condition. I will fix it too.

not sure if this one is related:
https://github.com/redis/redis/runs/4810931899?check_suite_focus=true#step:9:466

This link has two failures. test-sanitizer-address (clang) and test-freebsd

test-sanitizer-address (clang) is not related to my test. test-freebsd is related to my test and it should be the same OOM root cause.

@oranagra
Copy link
Member

i didn't look at the test, but i wanna mention that there are other tests in which we struggled to fill output buffers and had to deal with the OS socket buffers swallowing our bytes.

i think these are in:
tests/unit/client-eviction.tcl
tests/unit/maxmemory.tcl
tests/integration/replication-buffer.tcl

IIRC the best approach we had was to pause the destination process, and then gradually fill more and more data until we see it starts piling up, then stop.

@ny0312
Copy link
Contributor Author

ny0312 commented Jan 20, 2022

@madolson @oranagra @sundb

PR to fix flaky test "Disconnect link when send buffer limit reached" - #10157

Sorry for all the inconvenience. And thanks for the help and advice.

@tezc
Copy link
Collaborator

tezc commented Jan 21, 2022

@ny0312
Test failed : https://github.com/redis/redis/runs/4890129923?check_suite_focus=true#step:9:627

Testing unit: 24-links.tcl
00:39:15> (init) Restart killed instances: OK
00:39:15> Cluster nodes are reachable: OK
00:39:15> Cluster nodes hard reset: OK
00:39:15> Cluster Join and auto-discovery test: OK
00:39:17> Before slots allocation, all nodes report cluster failure: OK
00:39:17> Create a cluster with two single-node shards: OK
00:39:21> Cluster should start ok: OK
00:39:21> Each node has two links with each peer: FAILED: Expected 19*2 eq 37 (context: type eval line 11 cmd {assert {$num_peers*2 eq $num_links}} proc ::foreach_instance_id)
(Jumping to next unit after error)

Don't know about this test but might be related with timing. I increase "cluster-node-timeout" for something else on my branch, I see this test fails here. Just fyi.

@ny0312
Copy link
Contributor Author

ny0312 commented Jan 21, 2022

@tezc Thanks for reporting the failure. Sorry for the inconvenience. I updated my PR(#10157) to fix this test as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-code-doc the PR requires documentation inside the redis repository state:needs-doc-pr requires a PR to redis-doc repository

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[NEW] Unbounded memory usage by cluster bus link buffers in face of asymmetric network parition

7 participants