Skip to content

Add TCP keepalive warnings when disabled (issue #5212)#5228

Merged
renecannao merged 7 commits intov3.0from
v3.0-5212
Nov 28, 2025
Merged

Add TCP keepalive warnings when disabled (issue #5212)#5228
renecannao merged 7 commits intov3.0from
v3.0-5212

Conversation

@renecannao
Copy link
Contributor

Summary

Add warnings when mysql-use_tcp_keepalive or pgsql-use_tcp_keepalive is set to false to prevent connection drops behind network load balancers.

Issue #5212

When ProxySQL is deployed behind network load balancers, disabling TCP keepalive can cause unexpected connection drops. This implementation warns users about this unsafe configuration choice.

Key Changes

Core Functionality

  • Admin_FlushVariables.cpp: Add warnings in flush_mysql_variables___database_to_runtime() and flush_pgsql_variables___database_to_runtime()
  • Check TCP keepalive setting using get_variable_int() for proper boolean variable access
  • Warn users when TCP keepalive is disabled with actionable guidance
  • Comprehensive Doxygen documentation explaining the security implications

Testing

  • TAP test: reg_test_5212_tcp_keepalive_warnings-t.cpp
  • Verifies warnings appear for both MySQL and PostgreSQL when TCP keepalive disabled
  • Efficient log parsing to detect warning messages in ProxySQL log files

Technical Details

  • Uses GloMTH->get_variable_int("use_tcp_keepalive") for MySQL
  • Uses GloPTH->get_variable_int("use_tcp_keepalive") for PostgreSQL
  • Warnings trigger during LOAD MYSQL VARIABLES TO RUNTIME and LOAD PGSQL VARIABLES TO RUNTIME
  • Memory-efficient testing with queue-based log processing

Test Plan

  • Verify warning appears when MySQL TCP keepalive disabled
  • Verify warning appears when PostgreSQL TCP keepalive disabled
  • Confirm no false positives when TCP keepalive enabled

- Add warnings in flush_mysql_variables___database_to_runtime() when mysql-use_tcp_keepalive=false
- Add warnings in flush_pgsql_variables___database_to_runtime() when pgsql-use_tcp_keepalive=false
- Include comprehensive Doxygen documentation explaining why disabling TCP keepalive is unsafe
- Warn users about potential connection drops when ProxySQL is deployed behind network load balancers

When TCP keepalive is disabled:
- Load balancers may drop idle connections from connection pools
- NAT devices may remove connection state
- Cloud load balancers may terminate connections during idle periods
- Results in sudden connection failures and "connection reset" errors

Resolves: #5212
- Fix crash by using get_variable_int() instead of get_variable_string() for boolean use_tcp_keepalive variable
- use_tcp_keepalive is a boolean variable, not a string, so get_variable_int() returns 0/1 instead of a string
- Fix syntax errors by removing duplicate code and fixing brace structure
- Add comprehensive Doxygen documentation for both MySQL and PostgreSQL warnings

Resolves assertion failure: "Not existing variable: use_tcp_keepalive"
Resolves: #5212
…g_lines()

- Add TAP test mysql-reg_test_5212_tcp_keepalive_warnings-t.cpp to verify TCP keepalive warnings
- Add overload of get_matching_lines() with max_lines parameter to scan only last N lines
- Current implementation reads entire file into memory (inefficient) - needs optimization
- Test reports finding 4 MySQL warnings in 50 lines (impossible) - indicates bug
- Add debugging output to show lines read vs lines examined

WIP: get_matching_lines() optimization needed using queue instead of vector
WIP: Test logic debugging needed for impossible warning counts

Resolves: #5212
- Implement get_matching_lines_from_filename() with efficient queue-based processing
- Use standard C++ regex instead of RE2 for reliable pattern matching
- Add filename-based function to avoid stream sharing conflicts
- Remove problematic stream-based function with max_lines parameter
- Fix regex pattern to match actual log warning format
- Add comprehensive debugging output (disabled with #if 0)
- Test both MySQL and PostgreSQL TCP keepalive warnings
- Search only last 10 lines for efficiency and relevance

The test now correctly verifies that TCP keepalive warnings are logged
when mysql-use_tcp_keepalive or pgsql-use_tcp_keepalive is set to false.
- Rename mysql-reg_test_5212_tcp_keepalive_warnings-t.cpp to reg_test_5212_tcp_keepalive_warnings-t.cpp
- Test covers both MySQL and PostgreSQL TCP keepalive warnings, so mysql- prefix is misleading
@gemini-code-assist
Copy link

Summary of Changes

Hello @renecannao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances ProxySQL's robustness by implementing a proactive warning system for TCP keepalive settings. It addresses potential connection stability issues that arise when TCP keepalive is disabled in environments utilizing network load balancers, NAT, or cloud infrastructure. By issuing clear warnings during runtime variable loading, the change aims to guide users towards safer configurations, thereby preventing unexpected connection drops and improving overall system reliability.

Highlights

  • TCP Keepalive Warnings: Introduced warnings for both MySQL and PostgreSQL connections when use_tcp_keepalive is set to false, as this can lead to connection instability behind network load balancers, NAT firewalls, or in cloud environments.
  • Runtime Variable Checks: The warnings are triggered during LOAD MYSQL VARIABLES TO RUNTIME and LOAD PGSQL VARIABLES TO RUNTIME operations, ensuring users are alerted when applying potentially unsafe configurations.
  • Comprehensive Documentation: Detailed Doxygen documentation has been added to Admin_FlushVariables.cpp, explaining the security implications and providing recommendations for TCP keepalive settings.
  • New Test Utility: A new utility function, get_matching_lines_from_filename, was added to test/tap/tap/utils.cpp to efficiently scan the last N lines of a log file for specific regex patterns, which is used in the new test.
  • Dedicated Regression Test: A new TAP test, reg_test_5212_tcp_keepalive_warnings-t.cpp, was added to verify that the TCP keepalive warnings are correctly generated and logged for both MySQL and PostgreSQL when the feature is disabled.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a helpful warning for users who disable TCP keepalive, which can lead to connection issues. The implementation is straightforward and the Doxygen documentation is thorough. My review focuses on a couple of code safety improvements in the core logic and suggestions to make the new tests more robust and complete to prevent flakiness and ensure test isolation.

* behind load balancers or in cloud environments.
*/
// Check for TCP keepalive setting and warn if disabled
int mysql_use_tcp_keepalive = GloMTH->get_variable_int((char *)"use_tcp_keepalive");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The C-style cast (char *) on the string literal "use_tcp_keepalive" is unnecessary and potentially unsafe. The get_variable_int method accepts a const char*, so the cast can be safely removed.

int mysql_use_tcp_keepalive = GloMTH->get_variable_int("use_tcp_keepalive");

* behind load balancers or in cloud environments.
*/
// Check for TCP keepalive setting and warn if disabled
int pgsql_use_tcp_keepalive = GloPTH->get_variable_int((char *)"use_tcp_keepalive");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the MySQL check, the C-style cast (char *) on the string literal "use_tcp_keepalive" is unnecessary and can be removed, as get_variable_int takes a const char*.

int pgsql_use_tcp_keepalive = GloPTH->get_variable_int("use_tcp_keepalive");

Comment on lines +16 to +110
int main(int argc, char** argv) {
CommandLine cl;

// Plan for 6 tests
plan(6);

// Get connections
MYSQL* admin = mysql_init(NULL);
if (!admin) {
diag("Failed to initialize admin connection");
return exit_status();
}

if (!mysql_real_connect(admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {
diag("Failed to connect to ProxySQL admin: %s", mysql_error(admin));
mysql_close(admin);
return exit_status();
}

// Get the log file path
const string log_path { get_env("REGULAR_INFRA_DATADIR") + "/proxysql.log" };

// Test 1: MySQL TCP keepalive warning
diag("Testing MySQL TCP keepalive warning when set to false");
{
// Set MySQL TCP keepalive to false
int query_err = mysql_query(admin, "SET mysql-use_tcp_keepalive='false'");
ok(query_err == 0, "SET mysql-use_tcp_keepalive='false' should succeed");
if (query_err != 0) {
diag("Error setting mysql-use_tcp_keepalive: %s", mysql_error(admin));
mysql_close(admin);
return exit_status();
}

// Load MySQL variables to runtime to trigger warning
query_err = mysql_query(admin, "LOAD MYSQL VARIABLES TO RUNTIME");
ok(query_err == 0, "LOAD MYSQL VARIABLES TO RUNTIME should succeed");
if (query_err != 0) {
diag("Error loading MySQL variables: %s", mysql_error(admin));
mysql_close(admin);
return exit_status();
}

// Wait a bit for the warning to be written to log
usleep(200000); // 200ms

// Check for the warning in the log - scan only last 10 lines using filename-based function
const string warning_regex { ".*WARNING.*mysql-use_tcp_keepalive is set to false.*" };
const auto& [match_count, warning_lines] = get_matching_lines_from_filename(log_path, warning_regex, true, 10);

// Scanning only last 10 lines ensures we're looking at recent log entries
ok(match_count > 0, "MySQL TCP keepalive warning should appear in log when set to false");
if (match_count == 0) {
diag("Expected MySQL TCP keepalive warning not found in last 10 lines of log");
}
}

// Test 2: PostgreSQL TCP keepalive warning
diag("Testing PostgreSQL TCP keepalive warning when set to false");
{
// Set PostgreSQL TCP keepalive to false
int query_err = mysql_query(admin, "SET pgsql-use_tcp_keepalive='false'");
ok(query_err == 0, "SET pgsql-use_tcp_keepalive='false' should succeed");
if (query_err != 0) {
diag("Error setting pgsql-use_tcp_keepalive: %s", mysql_error(admin));
mysql_close(admin);
return exit_status();
}

// Load PgSQL variables to runtime to trigger warning
query_err = mysql_query(admin, "LOAD PGSQL VARIABLES TO RUNTIME");
ok(query_err == 0, "LOAD PGSQL VARIABLES TO RUNTIME should succeed");
if (query_err != 0) {
diag("Error loading PgSQL variables: %s", mysql_error(admin));
mysql_close(admin);
return exit_status();
}

// Wait a bit for the warning to be written to log
usleep(500000); // 500ms

// Check for the warning in the log - scan only last 10 lines using filename-based function
const string warning_regex { ".*WARNING.*pgsql-use_tcp_keepalive is set to false.*" };
const auto& [match_count, warning_lines] = get_matching_lines_from_filename(log_path, warning_regex, true, 10);

// Scanning only last 10 lines ensures we're looking at recent log entries
ok(match_count > 0, "PostgreSQL TCP keepalive warning should appear in log when set to false");
if (match_count == 0) {
diag("Expected PostgreSQL TCP keepalive warning not found in last 10 lines of log");
}
}

mysql_close(admin);
return exit_status();
} No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test is a good start, but has a few areas for improvement to make it more robust and complete:

  1. Flakiness: The test uses fixed usleep() delays to wait for log messages to be written. This can be unreliable in different environments. A more robust approach would be to poll the log file in a loop until the expected message is found or a timeout is reached.
  2. Test Isolation: The test modifies global ProxySQL variables (mysql-use_tcp_keepalive and pgsql-use_tcp_keepalive) but doesn't restore them to their original values. This can interfere with other tests running in the same suite. It's good practice to save the original values at the beginning and restore them in a cleanup step at the end of the test.
  3. Missing Test Case: The pull request description mentions verifying "no false positives when TCP keepalive enabled". This is an important negative test case that is currently missing. You should add tests that set ...-use_tcp_keepalive to true, load variables, and assert that the warning message does not appear in the logs.

Comment on lines +1661 to +1682
std::regex regex;
try {
regex = std::regex(s_regex);
} catch (const std::regex_error& e) {
diag("get_matching_lines_from_filename ERROR: Invalid regex '%s': %s", s_regex.c_str(), e.what());
return { 0, found_matches };
}

// Process the recent lines from the queue
for (const string& line : recent_lines) {
std::smatch match;

if (get_matches) {
if (std::regex_search(line, match, regex)) {
found_matches.push_back({ static_cast<fstream::pos_type>(0), line, match.str() });
}
} else {
if (std::regex_search(line, regex)) {
found_matches.push_back({ static_cast<fstream::pos_type>(0), line, "" });
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new function get_matching_lines_from_filename uses std::regex, while the existing function get_matching_lines and other parts of the project appear to prefer re2::RE2. For consistency, and to leverage the performance and safety (ReDoS protection) benefits of re2, it would be better to use re2::RE2 here as well.

	re2::RE2 regex(s_regex);
	if (!regex.ok()) {
		diag("get_matching_lines_from_filename ERROR: Invalid regex '%s': %s", s_regex.c_str(), regex.error().c_str());
		return { 0, found_matches };
	}

	// Process the recent lines from the queue
	for (const string& line : recent_lines) {
		re2::StringPiece match;

		if (get_matches) {
			if (re2::RE2::PartialMatch(line, regex, &match)) {
				found_matches.push_back({ static_cast<fstream::pos_type>(0), line, match.ToString() });
			}
		} else {
			if (re2::RE2::PartialMatch(line, regex)) {
				found_matches.push_back({ static_cast<fstream::pos_type>(0), line, "" });
			}
		}
	}

…lename()

- Document purpose, parameters, return values, and behavior
- Include practical example with TCP keepalive warnings
- Explain memory-efficient queue-based approach
- Note stream sharing benefits and I/O limitations
- Provide clear contract for function usage and expectations
@renecannao
Copy link
Contributor Author

retest this please

Added reg_test_5212_tcp_keepalive_warnings-t to the default test group
to ensure the new TCP keepalive warnings test is included in automated testing.
@sonarqubecloud
Copy link

@renecannao renecannao merged commit 2771433 into v3.0 Nov 28, 2025
142 of 154 checks passed
@noizu noizu added this to the Release 3.0.4 milestone Nov 28, 2025
@noizu noizu added type/bug Something isn't working correctly area/connection-pool area/routing Query routing and load balancing labels Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/connection-pool area/routing Query routing and load balancing type/bug Something isn't working correctly

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants