feat: set default tcp_user_timeout to 5 seconds for replicas#9317
Merged
gbartolini merged 2 commits intomainfrom Nov 26, 2025
Merged
feat: set default tcp_user_timeout to 5 seconds for replicas#9317gbartolini merged 2 commits intomainfrom
tcp_user_timeout to 5 seconds for replicas#9317gbartolini merged 2 commits intomainfrom
Conversation
Contributor
|
❗ By default, the pull request is configured to backport to all release branches.
|
The default value for TCP user timeout on standby
replication connections has changed from 0 (system default) to 5000ms
(5 seconds).
This change improves the default behavior of CloudNativePG installations
by ensuring standby instances can detect and recover from network issues
more quickly. Previously, when the network silently dropped packets,
standby instances could take up to 127 seconds (due to TCP SYN retries)
to detect a connection failure. With the new 5-second default, standby
instances will close unresponsive connections much faster and retry
connecting to the primary.
MIGRATION GUIDE FOR EXISTING INSTALLATIONS:
If you have an existing CloudNativePG installation where STANDBY_TCP_USER_TIMEOUT was not
explicitly set (defaulting to 0), and you want to preserve that behavior after upgrading,
you must now explicitly set STANDBY_TCP_USER_TIMEOUT to 0 in the cnpg-controller-manager-config
ConfigMap or Secret.
Example with ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: cnpg-controller-manager-config
namespace: cnpg-system
data:
STANDBY_TCP_USER_TIMEOUT: "0"
Note: If you do NOT have STANDBY_TCP_USER_TIMEOUT explicitly configured, the new default
of 5 seconds will be automatically applied on your next operator upgrade or pod restart.
For more details on TCP_USER_TIMEOUT and its behavior, see:
https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-TCP-USER-TIMEOUT
Closes #9229
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Member
Author
|
/test |
Contributor
|
@armru, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/19700876458 |
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
gbartolini
approved these changes
Nov 26, 2025
leonardoce
approved these changes
Nov 26, 2025
tcp_user_timeout to 5 seconds for replicas
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The default
tcp_user_timeoutfor standby replication connections has been changed from the system default to5000ms(5 seconds) for all replicas.This new default enhances the robustness of CloudNativePG clusters by enabling standby instances to detect and recover from network issues more quickly. Previously, silent network drops could cause standbys to wait up to ~127 seconds (due to TCP SYN retries) before detecting a failure. With the new 5-second timeout, standbys will close unresponsive connections sooner and promptly retry connecting to the primary.
If this default does not meet your requirements, you can override it for all standbys managed by the operator using the
STANDBY_TCP_USER_TIMEOUTconfiguration option.PRESERVATION GUIDE FOR EXISTING INSTALLATIONS:
If you have an existing CloudNativePG installation where
STANDBY_TCP_USER_TIMEOUTwas not explicitly set (thus defaulting to0), and you wish to preserve that behaviour after upgrading, you must now explicitly set it to0.Example using a
ConfigMap:If the variable is not explicitly configured, the new default of 5 seconds will automatically apply after the next operator upgrade or pod restart.
For more information on
tcp_user_timeout, see the PostgreSQL documentation:https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-TCP-USER-TIMEOUT
Closes #9229