Fix critical bugs in close_all_non_term_fd() for fork/exec safety#5276
Fix critical bugs in close_all_non_term_fd() for fork/exec safety#5276renecannao merged 2 commits intov3.0from
Conversation
This commit fixes two critical bugs in close_all_non_term_fd() that caused undefined behavior and potential deadlocks when called after fork() before execve() in multi-threaded programs. Bug 1: Self-Referential Directory FD Closure ---------------------------------------------- When iterating through /proc/self/fd, opendir() creates a file descriptor for the directory stream. This fd appears in the enumeration while we're iterating, and if we close it, readdir() operates on a corrupted DIR* stream, causing undefined behavior, crashes, or missed file descriptors. Fix: Use dirfd() to obtain the directory's fd and explicitly skip closing it. Bug 2: Heap Allocation After fork() in Multi-Threaded Programs ---------------------------------------------------------------- In multi-threaded programs, when fork() is called while other threads hold malloc locks, the child process inherits a "frozen" state where those locks remain locked (the owning threads don't exist in the child). Any heap allocation (malloc/free/new/delete) in the child before execve() can deadlock. The original code used: - std::stol(std::string(dir->d_name)) - creates a temporary std::string - std::find() - may allocate internally Fix: Replace with heap-allocation-free alternatives: - atoi(dir->d_name) instead of std::stol(std::string(...)) - Simple C loops instead of std::find() Additional Improvements ----------------------- 1. Added close_range() syscall support (Linux 5.9+) with runtime detection - O(1) atomic operation, most efficient method - Only used when excludeFDs is empty (closes all fds >= 3) - Falls back to /proc/self/fd iteration when excludeFDs is non-empty 2. Added extensive doxygen documentation covering: - Security implications (preventing fd leaks to child processes) - Resource management (preventing fd exhaustion) - Deadlock prevention in multi-threaded fork() contexts - Implementation details (three strategies: close_range, /proc/self/fd, rlimit) - fork() safety design considerations - Example usage and portability notes 3. Added required includes: dirent.h, sys/syscall.h, linux/close_range.h Workflow Safety --------------- The function is now safe to use in the common fork() -> close_all_non_term_fd() -> execve() workflow, even in multi-threaded programs. Files Modified -------------- - lib/proxysql_utils.cpp
Summary of ChangesHello @renecannao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness and safety of the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request provides crucial fixes for close_all_non_term_fd() to ensure its safety when used after fork() in multi-threaded applications. The changes to avoid heap allocation and prevent self-closure of the directory file descriptor are excellent. The addition of close_range() support is a great performance enhancement. I've identified a few critical issues that undermine the goal of being allocation-free and could lead to hangs or silent failures. Addressing these will make the function truly robust and safe.
lib/proxysql_utils.cpp
Outdated
| * @param excludeFDs Vector of file descriptors to preserve (in addition to 0, 1, 2) | ||
| * @return void | ||
| */ | ||
| void close_all_non_term_fd(std::vector<int> excludeFDs) { |
There was a problem hiding this comment.
The function takes excludeFDs by value, which causes a copy of the std::vector. This copy operation involves heap allocation. A primary goal of this pull request is to make this function safe to call after fork() in a multi-threaded program by avoiding all heap allocations to prevent potential deadlocks. Passing the vector by value undermines this effort and re-introduces the risk of deadlocks.
void close_all_non_term_fd(const std::vector<int>& excludeFDs) {
lib/proxysql_utils.cpp
Outdated
| for (unsigned int fd = 3; fd < nlimit.rlim_cur; fd++) { | ||
| if (std::find(excludeFDs.begin(), excludeFDs.end(), fd) == excludeFDs.end()) { | ||
| // Check if fd is in exclusion list | ||
| bool exclude = false; | ||
| for (size_t i = 0; i < excludeFDs.size(); i++) { | ||
| if (excludeFDs[i] == (int)fd) { | ||
| exclude = true; | ||
| break; | ||
| } | ||
| } | ||
| if (!exclude) { | ||
| close(fd); | ||
| } | ||
| } |
There was a problem hiding this comment.
The loop for (unsigned int fd = 3; fd < nlimit.rlim_cur; fd++) is dangerous. The type of nlimit.rlim_cur is rlim_t, which on 64-bit Linux is unsigned long long. The loop variable fd is unsigned int. If rlim_cur is set to a value larger than UINT_MAX (e.g., via ulimit -n), this loop will become an infinite loop, causing the process to hang.
Additionally, file descriptors are of type int, so the loop should not iterate beyond INT_MAX.
To fix this, the loop variable should be of type rlim_t, and you should ensure the file descriptor value does not exceed INT_MAX before using it.
for (rlim_t i = 3; i < nlimit.rlim_cur; ++i) {
if (i > INT_MAX) {
break;
}
int fd = static_cast<int>(i);
// Check if fd is in exclusion list
bool exclude = false;
for (size_t j = 0; j < excludeFDs.size(); j++) {
if (excludeFDs[j] == fd) {
exclude = true;
break;
}
}
if (!exclude) {
close(fd);
}
}| static int close_range_available = -1; // -1 = unknown, 0 = not available, 1 = available | ||
| if (close_range_available == -1) { | ||
| // First call: check if close_range is available | ||
| long ret = syscall(__NR_close_range, 3, ~0U, 0); | ||
| close_range_available = (ret == 0 || errno != ENOSYS) ? 1 : 0; | ||
| } | ||
| if (close_range_available == 1) { | ||
| // close_range is available, use it to close all fds >= 3 | ||
| syscall(__NR_close_range, 3, ~0U, 0); | ||
| return; | ||
| } |
There was a problem hiding this comment.
The current logic for detecting and using the close_range syscall has two issues:
- On the first successful call, it executes
syscall(__NR_close_range, ...)twice, which is inefficient. - More critically, if the
syscallfails with a transient error (e.g.,EINTR),close_range_availableis incorrectly set to1. The subsequentsyscallwill also likely fail, but its return value is ignored. The function then returns, having failed to close the file descriptors and without falling back to the other methods. This can leave sensitive file descriptors open.
A safer approach is to attempt the syscall and only cache its availability if it succeeds. If it fails for any reason, you should fall back to the other methods for the current call.
static int close_range_available = -1; // -1 = unknown, 0 = not available, 1 = available
if (close_range_available == 1) {
syscall(__NR_close_range, 3, ~0U, 0);
return;
}
if (close_range_available == -1) {
if (syscall(__NR_close_range, 3, ~0U, 0) == 0) {
close_range_available = 1;
return;
}
if (errno == ENOSYS) {
close_range_available = 0;
}
}This commit addresses critical issues identified in PR #5276 by gemini-code-assist's code review, which could undermine the goal of being allocation-free and cause hangs or silent failures. Bug 1: Vector Passed by Value (Critical) ------------------------------------------ The function took std::vector<int> excludeFDs by value, causing heap allocation during the copy operation. This undermines the PR's goal of avoiding heap allocations after fork() to prevent deadlocks in multi-threaded programs. Fix: Change to pass by const reference to avoid heap allocation. void close_all_non_term_fd(const std::vector<int>& excludeFDs) Bug 2: Infinite Loop Risk (Critical) ------------------------------------ The loop used unsigned int for the variable while comparing against rlim_t (unsigned long long). If rlim_cur exceeded UINT_MAX, this would create an infinite loop. Fix: Use rlim_t type for the loop variable and cap at INT_MAX. for (rlim_t fd_rlim = 3; fd_rlim < nlimit.rlim_cur && fd_rlim <= INT_MAX; fd_rlim++) Bug 3: close_range() Detection Logic (High) ------------------------------------------ The original detection logic had two problems: 1. Executed close_range syscall twice on first successful call 2. Incorrectly cached availability on transient failures (EINTR), leaving file descriptors open without fallback Fix: Reordered logic to only cache on success, allow retry on transient failures. Only cache as "not available" on ENOSYS. For other errors (EBADF, EINVAL, etc.), don't cache - might be transient. Files Modified -------------- - include/proxysql_utils.h - lib/proxysql_utils.cpp
|
|
retest this please |



Summary
This PR fixes two critical bugs in
close_all_non_term_fd()that caused undefined behavior and potential deadlocks when called afterfork()beforeexecve()in multi-threaded programs.Fixes #5275
Bug 1: Self-Referential Directory FD Closure
When iterating through
/proc/self/fd,opendir()creates a file descriptor for the directory stream. This fd appears in the enumeration while we're iterating, and if we close it,readdir()operates on a corrupted DIR* stream.Fix: Use
dirfd()to obtain the directory's fd and explicitly skip closing it.Bug 2: Heap Allocation After fork() in Multi-Threaded Programs
In multi-threaded programs, when
fork()is called while other threads hold malloc locks, the child process inherits a "frozen" state where those locks remain locked. Any heap allocation in the child beforeexecve()can deadlock.Fix: Replace
std::stol(std::string(...))withatoi()andstd::find()with simple loops to avoid heap allocation.Additional Improvements
close_range() syscall support (Linux 5.9+) with runtime detection:
excludeFDsis empty/proc/self/fditeration whenexcludeFDsis non-emptyExtensive doxygen documentation covering security, resource management, deadlock prevention, and fork() safety considerations
Files Modified
lib/proxysql_utils.cppTest Plan