Skip to content

Fix critical bugs in close_all_non_term_fd() for fork/exec safety#5276

Merged
renecannao merged 2 commits intov3.0from
v3.0_fork
Jan 9, 2026
Merged

Fix critical bugs in close_all_non_term_fd() for fork/exec safety#5276
renecannao merged 2 commits intov3.0from
v3.0_fork

Conversation

@renecannao
Copy link
Contributor

Summary

This PR fixes two critical bugs in close_all_non_term_fd() that caused undefined behavior and potential deadlocks when called after fork() before execve() in multi-threaded programs.

Fixes #5275


Bug 1: Self-Referential Directory FD Closure

When iterating through /proc/self/fd, opendir() creates a file descriptor for the directory stream. This fd appears in the enumeration while we're iterating, and if we close it, readdir() operates on a corrupted DIR* stream.

Fix: Use dirfd() to obtain the directory's fd and explicitly skip closing it.


Bug 2: Heap Allocation After fork() in Multi-Threaded Programs

In multi-threaded programs, when fork() is called while other threads hold malloc locks, the child process inherits a "frozen" state where those locks remain locked. Any heap allocation in the child before execve() can deadlock.

Fix: Replace std::stol(std::string(...)) with atoi() and std::find() with simple loops to avoid heap allocation.


Additional Improvements

  1. close_range() syscall support (Linux 5.9+) with runtime detection:

    • O(1) atomic operation, most efficient method
    • Only used when excludeFDs is empty
    • Falls back to /proc/self/fd iteration when excludeFDs is non-empty
  2. Extensive doxygen documentation covering security, resource management, deadlock prevention, and fork() safety considerations


Files Modified

  • lib/proxysql_utils.cpp

Test Plan

  • Build completes successfully
  • Function is now safe to use in the fork() → close_all_non_term_fd() → execve() workflow
  • No heap allocations occur in the child process between fork() and execve()

This commit fixes two critical bugs in close_all_non_term_fd() that caused
undefined behavior and potential deadlocks when called after fork() before
execve() in multi-threaded programs.

Bug 1: Self-Referential Directory FD Closure
----------------------------------------------
When iterating through /proc/self/fd, opendir() creates a file descriptor
for the directory stream. This fd appears in the enumeration while we're
iterating, and if we close it, readdir() operates on a corrupted DIR*
stream, causing undefined behavior, crashes, or missed file descriptors.

Fix: Use dirfd() to obtain the directory's fd and explicitly skip closing it.

Bug 2: Heap Allocation After fork() in Multi-Threaded Programs
----------------------------------------------------------------
In multi-threaded programs, when fork() is called while other threads hold
malloc locks, the child process inherits a "frozen" state where those locks
remain locked (the owning threads don't exist in the child). Any heap
allocation (malloc/free/new/delete) in the child before execve() can deadlock.

The original code used:
- std::stol(std::string(dir->d_name)) - creates a temporary std::string
- std::find() - may allocate internally

Fix: Replace with heap-allocation-free alternatives:
- atoi(dir->d_name) instead of std::stol(std::string(...))
- Simple C loops instead of std::find()

Additional Improvements
-----------------------
1. Added close_range() syscall support (Linux 5.9+) with runtime detection
   - O(1) atomic operation, most efficient method
   - Only used when excludeFDs is empty (closes all fds >= 3)
   - Falls back to /proc/self/fd iteration when excludeFDs is non-empty

2. Added extensive doxygen documentation covering:
   - Security implications (preventing fd leaks to child processes)
   - Resource management (preventing fd exhaustion)
   - Deadlock prevention in multi-threaded fork() contexts
   - Implementation details (three strategies: close_range, /proc/self/fd, rlimit)
   - fork() safety design considerations
   - Example usage and portability notes

3. Added required includes: dirent.h, sys/syscall.h, linux/close_range.h

Workflow Safety
---------------
The function is now safe to use in the common fork() -> close_all_non_term_fd()
-> execve() workflow, even in multi-threaded programs.

Files Modified
--------------
- lib/proxysql_utils.cpp
@gemini-code-assist
Copy link

Summary of Changes

Hello @renecannao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and safety of the close_all_non_term_fd() function, particularly in multi-threaded environments where it is called between fork() and execve(). The changes ensure proper resource cleanup and prevent potential deadlocks or undefined behavior by carefully managing file descriptors and avoiding problematic operations in the child process.

Highlights

  • Self-Referential Directory FD Closure Fix: Addressed a bug where iterating through /proc/self/fd could lead to closing the directory's own file descriptor, causing undefined behavior. The fix involves using dirfd() to identify and explicitly skip closing this specific file descriptor.
  • Heap Allocation Avoidance in Forked Child: Resolved a critical issue in multi-threaded programs where heap allocations after fork() but before execve() could cause deadlocks due to inherited malloc locks. This was fixed by replacing std::stol(std::string(...)) with atoi() and std::find() with simple loops, eliminating heap allocations in the critical path.
  • close_range() Syscall Support: Implemented support for the close_range() syscall (Linux 5.9+) for efficient, O(1) atomic closing of file descriptors. This method is used when the exclusion list is empty and falls back to /proc/self/fd iteration otherwise, with runtime detection for availability.
  • Extensive Doxygen Documentation: Added comprehensive Doxygen documentation to the close_all_non_term_fd() function, detailing its security implications, resource management benefits, deadlock prevention mechanisms, and important design considerations for fork() safety.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides crucial fixes for close_all_non_term_fd() to ensure its safety when used after fork() in multi-threaded applications. The changes to avoid heap allocation and prevent self-closure of the directory file descriptor are excellent. The addition of close_range() support is a great performance enhancement. I've identified a few critical issues that undermine the goal of being allocation-free and could lead to hangs or silent failures. Addressing these will make the function truly robust and safe.

* @param excludeFDs Vector of file descriptors to preserve (in addition to 0, 1, 2)
* @return void
*/
void close_all_non_term_fd(std::vector<int> excludeFDs) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function takes excludeFDs by value, which causes a copy of the std::vector. This copy operation involves heap allocation. A primary goal of this pull request is to make this function safe to call after fork() in a multi-threaded program by avoiding all heap allocations to prevent potential deadlocks. Passing the vector by value undermines this effort and re-introduces the risk of deadlocks.

void close_all_non_term_fd(const std::vector<int>& excludeFDs) {

Comment on lines 645 to 657
for (unsigned int fd = 3; fd < nlimit.rlim_cur; fd++) {
if (std::find(excludeFDs.begin(), excludeFDs.end(), fd) == excludeFDs.end()) {
// Check if fd is in exclusion list
bool exclude = false;
for (size_t i = 0; i < excludeFDs.size(); i++) {
if (excludeFDs[i] == (int)fd) {
exclude = true;
break;
}
}
if (!exclude) {
close(fd);
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The loop for (unsigned int fd = 3; fd < nlimit.rlim_cur; fd++) is dangerous. The type of nlimit.rlim_cur is rlim_t, which on 64-bit Linux is unsigned long long. The loop variable fd is unsigned int. If rlim_cur is set to a value larger than UINT_MAX (e.g., via ulimit -n), this loop will become an infinite loop, causing the process to hang.

Additionally, file descriptors are of type int, so the loop should not iterate beyond INT_MAX.

To fix this, the loop variable should be of type rlim_t, and you should ensure the file descriptor value does not exceed INT_MAX before using it.

			for (rlim_t i = 3; i < nlimit.rlim_cur; ++i) {
				if (i > INT_MAX) {
					break;
				}
				int fd = static_cast<int>(i);
				// Check if fd is in exclusion list
				bool exclude = false;
				for (size_t j = 0; j < excludeFDs.size(); j++) {
					if (excludeFDs[j] == fd) {
						exclude = true;
						break;
					}
				}
				if (!exclude) {
					close(fd);
				}
			}

Comment on lines +596 to +606
static int close_range_available = -1; // -1 = unknown, 0 = not available, 1 = available
if (close_range_available == -1) {
// First call: check if close_range is available
long ret = syscall(__NR_close_range, 3, ~0U, 0);
close_range_available = (ret == 0 || errno != ENOSYS) ? 1 : 0;
}
if (close_range_available == 1) {
// close_range is available, use it to close all fds >= 3
syscall(__NR_close_range, 3, ~0U, 0);
return;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic for detecting and using the close_range syscall has two issues:

  1. On the first successful call, it executes syscall(__NR_close_range, ...) twice, which is inefficient.
  2. More critically, if the syscall fails with a transient error (e.g., EINTR), close_range_available is incorrectly set to 1. The subsequent syscall will also likely fail, but its return value is ignored. The function then returns, having failed to close the file descriptors and without falling back to the other methods. This can leave sensitive file descriptors open.

A safer approach is to attempt the syscall and only cache its availability if it succeeds. If it fails for any reason, you should fall back to the other methods for the current call.

                static int close_range_available = -1;  // -1 = unknown, 0 = not available, 1 = available
		if (close_range_available == 1) {
			syscall(__NR_close_range, 3, ~0U, 0);
			return;
		}
		if (close_range_available == -1) {
			if (syscall(__NR_close_range, 3, ~0U, 0) == 0) {
				close_range_available = 1;
				return;
			}
			if (errno == ENOSYS) {
				close_range_available = 0;
			}
		}

This commit addresses critical issues identified in PR #5276 by
gemini-code-assist's code review, which could undermine the goal of
being allocation-free and cause hangs or silent failures.

Bug 1: Vector Passed by Value (Critical)
------------------------------------------
The function took std::vector<int> excludeFDs by value, causing heap
allocation during the copy operation. This undermines the PR's goal of
avoiding heap allocations after fork() to prevent deadlocks in
multi-threaded programs.

Fix: Change to pass by const reference to avoid heap allocation.
  void close_all_non_term_fd(const std::vector<int>& excludeFDs)

Bug 2: Infinite Loop Risk (Critical)
------------------------------------
The loop used unsigned int for the variable while comparing against
rlim_t (unsigned long long). If rlim_cur exceeded UINT_MAX, this would
create an infinite loop.

Fix: Use rlim_t type for the loop variable and cap at INT_MAX.
  for (rlim_t fd_rlim = 3; fd_rlim < nlimit.rlim_cur && fd_rlim <= INT_MAX; fd_rlim++)

Bug 3: close_range() Detection Logic (High)
------------------------------------------
The original detection logic had two problems:
1. Executed close_range syscall twice on first successful call
2. Incorrectly cached availability on transient failures (EINTR),
   leaving file descriptors open without fallback

Fix: Reordered logic to only cache on success, allow retry on
transient failures. Only cache as "not available" on ENOSYS.
For other errors (EBADF, EINVAL, etc.), don't cache - might be transient.

Files Modified
--------------
- include/proxysql_utils.h
- lib/proxysql_utils.cpp
@renecannao renecannao added this to the Release 3.0.5 milestone Jan 7, 2026
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 7, 2026

@renecannao
Copy link
Contributor Author

retest this please

@renecannao renecannao merged commit e6cbdca into v3.0 Jan 9, 2026
14 of 153 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix critical bugs in close_all_non_term_fd() for fork/exec safety

1 participant