Skip to content

[BUG] Fix io_context lifecycle management issues #400

Description

@kcenon

Summary

Fix io_context lifecycle management issues causing test failures and potential resource leaks, as tracked in Issue #315 and #348.

Current State

Multiple TODO comments reference lifecycle management problems:

// integration_tests/failures/error_handling_test.cpp:58
// TODO: Fix root cause in io_context lifecycle management (Issue #348)

// integration_tests/scenarios/connection_lifecycle_test.cpp:109
// TODO: Fix root cause in io_context lifecycle management (Issue #348)

// integration_tests/scenarios/connection_lifecycle_test.cpp:224
// TODO: Fix root cause in io_context lifecycle management (Issue #315)

Problem Analysis

Symptoms

  1. Race conditions during shutdown
  2. Use-after-free in async callbacks
  3. Hanging tests due to io_context not stopping properly
  4. Resource leaks when connections are destroyed before io_context

Root Causes

  1. Shared io_context lifetime: Multiple objects sharing an io_context without clear ownership
  2. Callback safety: Callbacks executed after object destruction
  3. Work guard management: Improper work_guard lifetime leading to premature io_context::run() exit
  4. Async operation cancellation: Outstanding operations not cancelled before destruction

Proposed Solution

1. Clear Ownership Model

class connection_context {
public:
    explicit connection_context()
        : io_context_(std::make_unique<asio::io_context>())
        , work_guard_(std::make_unique<work_guard_t>(io_context_->get_executor()))
    {}
    
    ~connection_context() {
        shutdown();
    }
    
    void shutdown() {
        // 1. Stop accepting new work
        work_guard_.reset();
        
        // 2. Cancel all outstanding operations
        io_context_->stop();
        
        // 3. Wait for io thread to complete
        if (io_thread_.joinable()) {
            io_thread_.join();
        }
    }
    
private:
    std::unique_ptr<asio::io_context> io_context_;
    std::unique_ptr<work_guard_t> work_guard_;
    std::thread io_thread_;
};

2. Safe Callback Pattern

template<typename Handler>
auto make_safe_handler(std::weak_ptr<connection> weak_conn, Handler&& handler) {
    return [weak_conn, handler = std::forward<Handler>(handler)]
           (auto&&... args) {
        if (auto conn = weak_conn.lock()) {
            handler(std::forward<decltype(args)>(args)...);
        }
        // Silently drop callback if connection is gone
    };
}

3. Graceful Shutdown Sequence

auto connection::stop() -> VoidResult {
    // 1. Set stopping flag
    stopping_.store(true);
    
    // 2. Cancel all pending operations
    socket_->cancel();
    
    // 3. Close socket
    socket_->close();
    
    // 4. Wait for callbacks to drain
    drain_pending_callbacks();
    
    // 5. Stop io_context
    context_->shutdown();
    
    return common::ok();
}

Tasks

  • Audit all io_context usage patterns
  • Implement connection_context wrapper class
  • Convert callbacks to use weak_ptr pattern
  • Add graceful shutdown sequence to all connection types
  • Fix error_handling_test.cpp test cases
  • Fix connection_lifecycle_test.cpp test cases
  • Add stress test for rapid connect/disconnect cycles
  • Document io_context ownership model

Acceptance Criteria

  • All lifecycle tests pass without TODO workarounds
  • No TSAN warnings in lifecycle tests
  • No ASAN warnings for use-after-free
  • Rapid connect/disconnect stress test passes
  • Clean shutdown with no resource leaks

Files to Modify

  • src/core/messaging_client.cpp
  • src/core/messaging_server.cpp
  • src/internal/tcp_socket.cpp
  • src/internal/websocket_socket.cpp
  • integration_tests/failures/error_handling_test.cpp
  • integration_tests/scenarios/connection_lifecycle_test.cpp

Related

Metadata

Metadata

Assignees

Labels

asyncAsynchronous operationsbugSomething isn't workingpriority:highHigh priority issue

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions