-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Bug Report
Current Behavior
During troubleshooting of our production issues with Lettuce and Redis Cluster, we have discovered issues with re-connection of Pub/Sub subscriptions after network problems.
Lettuce is not sending any keep-alive packets on TCP connections dedicated to Pub/Sub subscriptions. Without keep-alives in a rare case of a sudden connection loss to a Redis node, Lettuce is not able to detect that the connection is no longer working. With default OS configuration it will be waiting for hours until OS will close the connection. In the meantime all messages published to a channel will be lost.
Input Code
Minimal code from Lettuce docs is enough to reproduce the issue.
RedisClusterClient clusterClient = RedisClusterClient.create(Arrays.asList(node1, node2, node3));
ClusterTopologyRefreshOptions topologyRefreshOptions = ClusterTopologyRefreshOptions.builder()
.enablePeriodicRefresh(Duration.ofSeconds(15))
.enableAllAdaptiveRefreshTriggers()
.build();
clusterClient.setOptions(ClusterClientOptions.builder()
.topologyRefreshOptions(topologyRefreshOptions)
.build());
StatefulRedisPubSubConnection<String, String> connection = clusterClient.connectPubSub();
connection.addListener(new RedisPubSubListener<String, String>() { ... } );
RedisPubSubCommands<String, String> sync = connection.sync();
sync.subscribe("broadcast");To reproduce the issue:
- Start Redis Cluster.
- Connect to the cluster ans subscribe to the channel using the above code.
- Find to which server the client is connected using tcpdump or by checking with redis-cli PUBSUB CHANNELS *.
- Block all network traffic on that server using iptables (killing Redis process is not enough - OS will send FIN packets, and Lettuce will detect a problem and recover the subscription).
- Redis Cluster will recover the cluster by promoting one of the replicas to the master.
- Lettuce will not detect that connection is not longer working. And won't receive messages published to channels. Unused connection will be closed by OS after couple hours, and then Lettuce might me able to fix the problem.
We've been able to find issue also in Redis Standalone:
- Connect to Pub/Sub using Lettuce.
- Kill traffic on master using iptables. Restart VM with Redis and restore traffic.
- Lettuce is not detecting an issue and is listening on a dead connection.
Expected behavior/code
Lettuce should be able to detect a broken connection to fix Pub/Sub subscriptions.
Environment
- Lettuce version(s): 5.3.4.RELEASE
- Redis version: 5.0.5
Possible Solution
We've made similar tests using redis-cli client. The official client is sending keep-alive packets every 15 seconds, and is able to detect connection loss.
It would be best if Lettuce could send keep-alive packets on a Pub/Sub connection to detect network problems. That should enable Lettuce to fix Pub/Sub subscriptions.
Workarounds
We've found a workaround for this problem by tweaking OS params (tcp_keepalive_time, tcp_keepalive_intvl, tcp_keepalive_probes), but we would want to avoid changing OS params on all our machines that use Lettuce as a Redis client.