Add TCP keepalive warnings when disabled (issue #5212)#5228
Conversation
- Add warnings in flush_mysql_variables___database_to_runtime() when mysql-use_tcp_keepalive=false - Add warnings in flush_pgsql_variables___database_to_runtime() when pgsql-use_tcp_keepalive=false - Include comprehensive Doxygen documentation explaining why disabling TCP keepalive is unsafe - Warn users about potential connection drops when ProxySQL is deployed behind network load balancers When TCP keepalive is disabled: - Load balancers may drop idle connections from connection pools - NAT devices may remove connection state - Cloud load balancers may terminate connections during idle periods - Results in sudden connection failures and "connection reset" errors Resolves: #5212
- Fix crash by using get_variable_int() instead of get_variable_string() for boolean use_tcp_keepalive variable - use_tcp_keepalive is a boolean variable, not a string, so get_variable_int() returns 0/1 instead of a string - Fix syntax errors by removing duplicate code and fixing brace structure - Add comprehensive Doxygen documentation for both MySQL and PostgreSQL warnings Resolves assertion failure: "Not existing variable: use_tcp_keepalive" Resolves: #5212
…g_lines() - Add TAP test mysql-reg_test_5212_tcp_keepalive_warnings-t.cpp to verify TCP keepalive warnings - Add overload of get_matching_lines() with max_lines parameter to scan only last N lines - Current implementation reads entire file into memory (inefficient) - needs optimization - Test reports finding 4 MySQL warnings in 50 lines (impossible) - indicates bug - Add debugging output to show lines read vs lines examined WIP: get_matching_lines() optimization needed using queue instead of vector WIP: Test logic debugging needed for impossible warning counts Resolves: #5212
- Implement get_matching_lines_from_filename() with efficient queue-based processing - Use standard C++ regex instead of RE2 for reliable pattern matching - Add filename-based function to avoid stream sharing conflicts - Remove problematic stream-based function with max_lines parameter - Fix regex pattern to match actual log warning format - Add comprehensive debugging output (disabled with #if 0) - Test both MySQL and PostgreSQL TCP keepalive warnings - Search only last 10 lines for efficiency and relevance The test now correctly verifies that TCP keepalive warnings are logged when mysql-use_tcp_keepalive or pgsql-use_tcp_keepalive is set to false.
- Rename mysql-reg_test_5212_tcp_keepalive_warnings-t.cpp to reg_test_5212_tcp_keepalive_warnings-t.cpp - Test covers both MySQL and PostgreSQL TCP keepalive warnings, so mysql- prefix is misleading
Summary of ChangesHello @renecannao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances ProxySQL's robustness by implementing a proactive warning system for TCP keepalive settings. It addresses potential connection stability issues that arise when TCP keepalive is disabled in environments utilizing network load balancers, NAT, or cloud infrastructure. By issuing clear warnings during runtime variable loading, the change aims to guide users towards safer configurations, thereby preventing unexpected connection drops and improving overall system reliability. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a helpful warning for users who disable TCP keepalive, which can lead to connection issues. The implementation is straightforward and the Doxygen documentation is thorough. My review focuses on a couple of code safety improvements in the core logic and suggestions to make the new tests more robust and complete to prevent flakiness and ensure test isolation.
| * behind load balancers or in cloud environments. | ||
| */ | ||
| // Check for TCP keepalive setting and warn if disabled | ||
| int mysql_use_tcp_keepalive = GloMTH->get_variable_int((char *)"use_tcp_keepalive"); |
There was a problem hiding this comment.
| * behind load balancers or in cloud environments. | ||
| */ | ||
| // Check for TCP keepalive setting and warn if disabled | ||
| int pgsql_use_tcp_keepalive = GloPTH->get_variable_int((char *)"use_tcp_keepalive"); |
| int main(int argc, char** argv) { | ||
| CommandLine cl; | ||
|
|
||
| // Plan for 6 tests | ||
| plan(6); | ||
|
|
||
| // Get connections | ||
| MYSQL* admin = mysql_init(NULL); | ||
| if (!admin) { | ||
| diag("Failed to initialize admin connection"); | ||
| return exit_status(); | ||
| } | ||
|
|
||
| if (!mysql_real_connect(admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) { | ||
| diag("Failed to connect to ProxySQL admin: %s", mysql_error(admin)); | ||
| mysql_close(admin); | ||
| return exit_status(); | ||
| } | ||
|
|
||
| // Get the log file path | ||
| const string log_path { get_env("REGULAR_INFRA_DATADIR") + "/proxysql.log" }; | ||
|
|
||
| // Test 1: MySQL TCP keepalive warning | ||
| diag("Testing MySQL TCP keepalive warning when set to false"); | ||
| { | ||
| // Set MySQL TCP keepalive to false | ||
| int query_err = mysql_query(admin, "SET mysql-use_tcp_keepalive='false'"); | ||
| ok(query_err == 0, "SET mysql-use_tcp_keepalive='false' should succeed"); | ||
| if (query_err != 0) { | ||
| diag("Error setting mysql-use_tcp_keepalive: %s", mysql_error(admin)); | ||
| mysql_close(admin); | ||
| return exit_status(); | ||
| } | ||
|
|
||
| // Load MySQL variables to runtime to trigger warning | ||
| query_err = mysql_query(admin, "LOAD MYSQL VARIABLES TO RUNTIME"); | ||
| ok(query_err == 0, "LOAD MYSQL VARIABLES TO RUNTIME should succeed"); | ||
| if (query_err != 0) { | ||
| diag("Error loading MySQL variables: %s", mysql_error(admin)); | ||
| mysql_close(admin); | ||
| return exit_status(); | ||
| } | ||
|
|
||
| // Wait a bit for the warning to be written to log | ||
| usleep(200000); // 200ms | ||
|
|
||
| // Check for the warning in the log - scan only last 10 lines using filename-based function | ||
| const string warning_regex { ".*WARNING.*mysql-use_tcp_keepalive is set to false.*" }; | ||
| const auto& [match_count, warning_lines] = get_matching_lines_from_filename(log_path, warning_regex, true, 10); | ||
|
|
||
| // Scanning only last 10 lines ensures we're looking at recent log entries | ||
| ok(match_count > 0, "MySQL TCP keepalive warning should appear in log when set to false"); | ||
| if (match_count == 0) { | ||
| diag("Expected MySQL TCP keepalive warning not found in last 10 lines of log"); | ||
| } | ||
| } | ||
|
|
||
| // Test 2: PostgreSQL TCP keepalive warning | ||
| diag("Testing PostgreSQL TCP keepalive warning when set to false"); | ||
| { | ||
| // Set PostgreSQL TCP keepalive to false | ||
| int query_err = mysql_query(admin, "SET pgsql-use_tcp_keepalive='false'"); | ||
| ok(query_err == 0, "SET pgsql-use_tcp_keepalive='false' should succeed"); | ||
| if (query_err != 0) { | ||
| diag("Error setting pgsql-use_tcp_keepalive: %s", mysql_error(admin)); | ||
| mysql_close(admin); | ||
| return exit_status(); | ||
| } | ||
|
|
||
| // Load PgSQL variables to runtime to trigger warning | ||
| query_err = mysql_query(admin, "LOAD PGSQL VARIABLES TO RUNTIME"); | ||
| ok(query_err == 0, "LOAD PGSQL VARIABLES TO RUNTIME should succeed"); | ||
| if (query_err != 0) { | ||
| diag("Error loading PgSQL variables: %s", mysql_error(admin)); | ||
| mysql_close(admin); | ||
| return exit_status(); | ||
| } | ||
|
|
||
| // Wait a bit for the warning to be written to log | ||
| usleep(500000); // 500ms | ||
|
|
||
| // Check for the warning in the log - scan only last 10 lines using filename-based function | ||
| const string warning_regex { ".*WARNING.*pgsql-use_tcp_keepalive is set to false.*" }; | ||
| const auto& [match_count, warning_lines] = get_matching_lines_from_filename(log_path, warning_regex, true, 10); | ||
|
|
||
| // Scanning only last 10 lines ensures we're looking at recent log entries | ||
| ok(match_count > 0, "PostgreSQL TCP keepalive warning should appear in log when set to false"); | ||
| if (match_count == 0) { | ||
| diag("Expected PostgreSQL TCP keepalive warning not found in last 10 lines of log"); | ||
| } | ||
| } | ||
|
|
||
| mysql_close(admin); | ||
| return exit_status(); | ||
| } No newline at end of file |
There was a problem hiding this comment.
This test is a good start, but has a few areas for improvement to make it more robust and complete:
- Flakiness: The test uses fixed
usleep()delays to wait for log messages to be written. This can be unreliable in different environments. A more robust approach would be to poll the log file in a loop until the expected message is found or a timeout is reached. - Test Isolation: The test modifies global ProxySQL variables (
mysql-use_tcp_keepaliveandpgsql-use_tcp_keepalive) but doesn't restore them to their original values. This can interfere with other tests running in the same suite. It's good practice to save the original values at the beginning and restore them in a cleanup step at the end of the test. - Missing Test Case: The pull request description mentions verifying "no false positives when TCP keepalive enabled". This is an important negative test case that is currently missing. You should add tests that set
...-use_tcp_keepalivetotrue, load variables, and assert that the warning message does not appear in the logs.
| std::regex regex; | ||
| try { | ||
| regex = std::regex(s_regex); | ||
| } catch (const std::regex_error& e) { | ||
| diag("get_matching_lines_from_filename ERROR: Invalid regex '%s': %s", s_regex.c_str(), e.what()); | ||
| return { 0, found_matches }; | ||
| } | ||
|
|
||
| // Process the recent lines from the queue | ||
| for (const string& line : recent_lines) { | ||
| std::smatch match; | ||
|
|
||
| if (get_matches) { | ||
| if (std::regex_search(line, match, regex)) { | ||
| found_matches.push_back({ static_cast<fstream::pos_type>(0), line, match.str() }); | ||
| } | ||
| } else { | ||
| if (std::regex_search(line, regex)) { | ||
| found_matches.push_back({ static_cast<fstream::pos_type>(0), line, "" }); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The new function get_matching_lines_from_filename uses std::regex, while the existing function get_matching_lines and other parts of the project appear to prefer re2::RE2. For consistency, and to leverage the performance and safety (ReDoS protection) benefits of re2, it would be better to use re2::RE2 here as well.
re2::RE2 regex(s_regex);
if (!regex.ok()) {
diag("get_matching_lines_from_filename ERROR: Invalid regex '%s': %s", s_regex.c_str(), regex.error().c_str());
return { 0, found_matches };
}
// Process the recent lines from the queue
for (const string& line : recent_lines) {
re2::StringPiece match;
if (get_matches) {
if (re2::RE2::PartialMatch(line, regex, &match)) {
found_matches.push_back({ static_cast<fstream::pos_type>(0), line, match.ToString() });
}
} else {
if (re2::RE2::PartialMatch(line, regex)) {
found_matches.push_back({ static_cast<fstream::pos_type>(0), line, "" });
}
}
}…lename() - Document purpose, parameters, return values, and behavior - Include practical example with TCP keepalive warnings - Explain memory-efficient queue-based approach - Note stream sharing benefits and I/O limitations - Provide clear contract for function usage and expectations
|
retest this please |
Added reg_test_5212_tcp_keepalive_warnings-t to the default test group to ensure the new TCP keepalive warnings test is included in automated testing.
|



Summary
Add warnings when
mysql-use_tcp_keepaliveorpgsql-use_tcp_keepaliveis set tofalseto prevent connection drops behind network load balancers.Issue #5212
When ProxySQL is deployed behind network load balancers, disabling TCP keepalive can cause unexpected connection drops. This implementation warns users about this unsafe configuration choice.
Key Changes
Core Functionality
flush_mysql_variables___database_to_runtime()andflush_pgsql_variables___database_to_runtime()get_variable_int()for proper boolean variable accessTesting
reg_test_5212_tcp_keepalive_warnings-t.cppTechnical Details
GloMTH->get_variable_int("use_tcp_keepalive")for MySQLGloPTH->get_variable_int("use_tcp_keepalive")for PostgreSQLLOAD MYSQL VARIABLES TO RUNTIMEandLOAD PGSQL VARIABLES TO RUNTIMETest Plan