Summary
Fix io_context lifecycle management issues causing test failures and potential resource leaks, as tracked in Issue #315 and #348.
Current State
Multiple TODO comments reference lifecycle management problems:
// integration_tests/failures/error_handling_test.cpp:58
// TODO: Fix root cause in io_context lifecycle management (Issue #348)
// integration_tests/scenarios/connection_lifecycle_test.cpp:109
// TODO: Fix root cause in io_context lifecycle management (Issue #348)
// integration_tests/scenarios/connection_lifecycle_test.cpp:224
// TODO: Fix root cause in io_context lifecycle management (Issue #315)
Problem Analysis
Symptoms
- Race conditions during shutdown
- Use-after-free in async callbacks
- Hanging tests due to io_context not stopping properly
- Resource leaks when connections are destroyed before io_context
Root Causes
- Shared io_context lifetime: Multiple objects sharing an io_context without clear ownership
- Callback safety: Callbacks executed after object destruction
- Work guard management: Improper work_guard lifetime leading to premature io_context::run() exit
- Async operation cancellation: Outstanding operations not cancelled before destruction
Proposed Solution
1. Clear Ownership Model
class connection_context {
public:
explicit connection_context()
: io_context_(std::make_unique<asio::io_context>())
, work_guard_(std::make_unique<work_guard_t>(io_context_->get_executor()))
{}
~connection_context() {
shutdown();
}
void shutdown() {
// 1. Stop accepting new work
work_guard_.reset();
// 2. Cancel all outstanding operations
io_context_->stop();
// 3. Wait for io thread to complete
if (io_thread_.joinable()) {
io_thread_.join();
}
}
private:
std::unique_ptr<asio::io_context> io_context_;
std::unique_ptr<work_guard_t> work_guard_;
std::thread io_thread_;
};
2. Safe Callback Pattern
template<typename Handler>
auto make_safe_handler(std::weak_ptr<connection> weak_conn, Handler&& handler) {
return [weak_conn, handler = std::forward<Handler>(handler)]
(auto&&... args) {
if (auto conn = weak_conn.lock()) {
handler(std::forward<decltype(args)>(args)...);
}
// Silently drop callback if connection is gone
};
}
3. Graceful Shutdown Sequence
auto connection::stop() -> VoidResult {
// 1. Set stopping flag
stopping_.store(true);
// 2. Cancel all pending operations
socket_->cancel();
// 3. Close socket
socket_->close();
// 4. Wait for callbacks to drain
drain_pending_callbacks();
// 5. Stop io_context
context_->shutdown();
return common::ok();
}
Tasks
Acceptance Criteria
Files to Modify
src/core/messaging_client.cpp
src/core/messaging_server.cpp
src/internal/tcp_socket.cpp
src/internal/websocket_socket.cpp
integration_tests/failures/error_handling_test.cpp
integration_tests/scenarios/connection_lifecycle_test.cpp
Related
Summary
Fix io_context lifecycle management issues causing test failures and potential resource leaks, as tracked in Issue #315 and #348.
Current State
Multiple TODO comments reference lifecycle management problems:
Problem Analysis
Symptoms
Root Causes
Proposed Solution
1. Clear Ownership Model
2. Safe Callback Pattern
3. Graceful Shutdown Sequence
Tasks
connection_contextwrapper classAcceptance Criteria
Files to Modify
src/core/messaging_client.cppsrc/core/messaging_server.cppsrc/internal/tcp_socket.cppsrc/internal/websocket_socket.cppintegration_tests/failures/error_handling_test.cppintegration_tests/scenarios/connection_lifecycle_test.cppRelated