Skip to content

fix: concurrency bugs - mutex handling, thread spawning, and resource management (5 bugs)#2883

Merged
4 commits merged into
Hmbown:mainfrom
HUQIANTAO:fix/concurrency-bugs
Jun 8, 2026
Merged

fix: concurrency bugs - mutex handling, thread spawning, and resource management (5 bugs)#2883
4 commits merged into
Hmbown:mainfrom
HUQIANTAO:fix/concurrency-bugs

Conversation

@HUQIANTAO

@HUQIANTAO HUQIANTAO commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes 5 concurrency and async runtime bugs that could cause cascading panics, thread exhaustion, or Windows compilation failures.

Bugs Fixed

Mutex Poisoning

  1. Mutex lock().unwrap() cascading crash (mcp_server.rs:384,434)

    • If any thread panics while holding the Mutex, all subsequent lock().unwrap() calls panic, crashing the MCP server.
    • Fix: Use unwrap_or_else(|e| e.into_inner()) to recover from poisoned locks.
  2. cost_status Mutex poison data loss (cost_status.rs:28-58)

    • A panic while holding the lock silently loses all subsequent cost tracking data.
    • Fix: Use unwrap_or_else(|e| e.into_inner()) to recover.

Thread Management

  1. std::thread::spawn in async code (hooks.rs:1055)

    • Each hook invocation creates an unbounded OS thread, bypassing tokio's scheduler.
    • Fix: Replace with tokio::task::spawn_blocking to respect thread pool limits.
  2. Dropped JoinHandle in SSE loop (mcp.rs:647)

    • JoinHandle immediately dropped, making it impossible to detect SSE loop termination.
    • Fix: Store JoinHandle in SseTransport struct.

Cross-Platform Safety

  1. std::os::unix::io::FromRawFd in tracing writer (runtime_log.rs:162)
    • Platform-specific unsafe code fails to compile on Windows.
    • Fix: Use Box<dyn std::io::Write + Send> return type for cross-platform dynamic dispatch. Falls back to stderr as last resort.

… management

1. Fix Mutex lock().unwrap() in MCP server (mcp_server.rs:384,434)
   - Use unwrap_or_else(|e| e.into_inner()) to recover from poisoned locks
   - Previously, a single panic while holding the lock would cascade to all threads

2. Fix std::thread::spawn in async code (hooks.rs:1055)
   - Replace std::thread::spawn with tokio::task::spawn_blocking
   - Respects tokio's thread pool limits instead of creating unbounded OS threads
   - Fire-and-forget hook execution now properly managed by tokio runtime

3. Fix dropped JoinHandle in SSE loop (mcp.rs:647)
   - Store the JoinHandle in SseTransport struct
   - Enables detection of SSE loop termination
   - Prevents silent connection loss without structured error reporting

4. Fix std::sync::Mutex poison handling in cost_status (cost_status.rs:28-58)
   - Use unwrap_or_else(|e| e.into_inner()) to recover from poisoned locks
   - Previously, a panic while holding the lock silently lost all subsequent cost data
   - Cost tracking now survives mutex poisoning

5. Fix .expect() in tracing writer (runtime_log.rs:162)
   - Replace expect() with fallback chain: try_clone -> reopen file -> stderr
   - Prevents panicking inside tracing subscriber on fd exhaustion
   - Previously, EMFILE during logging would crash the application

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown

Thanks @HUQIANTAO for taking the time to contribute.

This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in .github/APPROVED_CONTRIBUTORS will be closed automatically.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant PR access by commenting /lgtm on a pull request.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves robustness across several TUI modules by recovering from poisoned locks, utilizing Tokio's blocking task pool instead of detached threads, and storing the SSE task join handle. Additionally, it implements a fallback mechanism in the logging system to handle file descriptor exhaustion. The reviewer correctly identified a critical compilation issue on Windows due to the use of platform-specific Unix APIs (FromRawFd) in the logging fallback, and provided an elegant, cross-platform solution using a boxed writer.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread crates/tui/src/runtime_log.rs Outdated
Comment on lines 160 to 175
.with_writer(move || {
subscriber_file
.try_clone()
.expect("clone log file handle for tracing writer")
// Clone the file handle for each write. If clone fails (fd exhaustion),
// fall back to reopening the same path, or ultimately stderr.
subscriber_file.try_clone().unwrap_or_else(|e| {
tracing::warn!("Failed to clone log file handle: {e}, reopening");
std::fs::OpenOptions::new()
.create(true)
.append(true)
.open(&log_path_clone)
.unwrap_or_else(|_| {
// Last resort: wrap stderr as a File to prevent panic.
use std::os::unix::io::FromRawFd;
unsafe { std::fs::File::from_raw_fd(2) }
})
})
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The use of std::os::unix::io::FromRawFd will cause a compilation failure on Windows platforms because the std::os::unix module is not available there. To make this code fully cross-platform and safe (avoiding unsafe blocks), you can change the closure's return type to Box<dyn std::io::Write + Send>. This allows returning either a std::fs::File or std::io::stderr() dynamically without platform-specific unsafe code.

            .with_writer(move || -> Box<dyn std::io::Write + Send> {
                // Clone the file handle for each write. If clone fails (fd exhaustion),
                // fall back to reopening the same path, or ultimately stderr.
                match subscriber_file.try_clone() {
                    Ok(f) => Box::new(f),
                    Err(e) => {
                        tracing::warn!("Failed to clone log file handle: {e}, reopening");
                        match std::fs::OpenOptions::new()
                            .create(true)
                            .append(true)
                            .open(&log_path_clone)
                        {
                            Ok(f) => Box::new(f),
                            Err(_) => Box::new(std::io::stderr()),
                        }
                    }
                }
            })

Replace platform-specific std::os::unix::io::FromRawFd with
Box<dyn std::io::Write + Send> return type. This compiles on
Windows, macOS, and Linux without unsafe code.

The closure now returns a boxed writer that is either:
- The cloned file handle (success case)
- A reopened file handle (clone failed)
- stderr (last resort, prevents panic)

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

The test at line 4768 was missing the new sse_task field added to
SseTransport. Add a dummy tokio::spawn task for the test.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

tokio::task::spawn_blocking requires a running tokio runtime, which
breaks tests that call hook functions outside a tokio context. Since
hooks are fire-and-forget (no JoinHandle needed), std::thread::spawn
is the correct choice.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Hmbown added a commit that referenced this pull request Jun 7, 2026
@Hmbown Hmbown closed this pull request by merging all changes into Hmbown:main in 3d503a0 Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants