Skip to content

perf(tcp_socket): add std::span receive callback to remove per-read allocations #316

Description

@kcenon

Parent Issue

Part of #315 (TCP receive std::span callback migration)

Summary

Introduce a std::span<const uint8_t> receive callback API for tcp_socket so the receive hot path can run without per-read std::vector allocations/copies.

Background

Current tcp_socket::do_read() behavior:

  • Always constructs std::vector<uint8_t> chunk(...) from read_buffer_ (src/internal/tcp_socket.cpp:116)
  • Uses callback_mutex_ (std::lock_guard) on every read to copy callbacks (src/internal/tcp_socket.cpp:120)

This is a scalability limiter at high TPS due to allocator overhead and lock traffic.

Proposed Solution

1) Add a view-based receive callback (preferred fast path)

Add a new registration API (name can be adjusted, but avoid overload ambiguity):

  • set_receive_callback_view(std::function<void(std::span<const uint8_t>)> cb)

Callback contract:

  • The span is valid only until the callback returns.
  • Callers must not store/capture the span across async boundaries.

2) Keep legacy vector callback for compatibility (slow path)

Keep existing:

  • set_receive_callback(std::function<void(const std::vector<uint8_t>&)> cb)

Dispatch rule:

  • If a view-callback is set, invoke it with std::span and do not allocate/copy into std::vector.
  • Otherwise, fall back to the legacy vector callback and preserve current semantics.

3) Remove callback mutex from the receive hot path

Replace per-read mutex locking with lock-free callback access, e.g.:

  • Store callbacks as std::shared_ptr<cb_t> and use std::atomic_load/std::atomic_store.

4) Preserve buffer lifetime safety

Keep the read loop structure such that the next async_read_some() is scheduled only after the callback returns (span lifetime remains simple and safe).

Files to Update

  • src/internal/tcp_socket.cpp
  • src/internal/tcp_socket.h

Acceptance Criteria

  • tcp_socket exposes set_receive_callback_view(std::span<const uint8_t>)
  • When the view callback is used, do_read() performs no per-read std::vector allocation/copy
  • Receive hot path does not take callback_mutex_
  • Legacy vector callback continues to work unchanged
  • Unit/integration tests pass

Test Plan

  • Build and run:
    • ctest --output-on-failure (or CI equivalent)
  • Add/adjust unit tests if needed to validate:
    • view callback invoked with correct size
    • legacy callback still works

Notes

This issue intentionally does not add allocation-count benchmarks; track that separately under a dedicated perf/test issue.

Metadata

Metadata

Assignees

Labels

asyncAsynchronous operationsenhancementNew feature or requestperformancePerformance improvementspriority:highHigh priority issuerefactoringCode refactoring and improvements

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions