Summary
Document the NUMA-aware thread pool and work-stealing subsystem including topology detection, NUMA-local scheduling, and cross-node work stealing.
Parent Issue
Part of: [EPIC] docs: Address documentation gaps across all ecosystem systems (kcenon/common_system#325)
Background (Why)
thread_system includes NUMA-aware scheduling capabilities for optimized performance on multi-socket systems. This is an advanced feature critical for high-performance computing workloads but has no documentation.
Source files:
include/kcenon/thread/stealing/numa_work_stealer.h — NUMA-aware work stealing
include/kcenon/thread/stealing/numa_topology.h — NUMA topology detection
include/kcenon/thread/core/numa_thread_pool.h — NUMA-aware thread pool
Scope (What)
1. NUMA Overview
- What NUMA is and why it matters for thread pools
- Performance impact of NUMA-unaware scheduling
- When NUMA optimization is beneficial vs unnecessary
2. Topology Detection (numa_topology.h)
- How the system detects NUMA topology at runtime
- Fallback behavior on non-NUMA systems (single socket)
- API for querying detected topology
3. NUMA Thread Pool (numa_thread_pool.h)
- How threads are pinned to NUMA nodes
- Job affinity and NUMA-local queue management
- Configuration for NUMA-aware scheduling
4. Work Stealing (numa_work_stealer.h)
- Work stealing algorithm across NUMA nodes
- Locality-aware stealing (prefer same-node theft)
- Cross-node stealing penalty and thresholds
- Fairness guarantees
5. Usage Examples
// NUMA-aware thread pool setup
auto pool = numa_thread_pool::create()
.detect_topology()
.threads_per_node(4)
.stealing_policy(prefer_local)
.build();
// Submit NUMA-affine work
pool.submit_on_node(0, []{ /* NUMA node 0 work */ });
6. Performance Tuning
- Benchmarks: NUMA-aware vs NUMA-unaware on multi-socket
- Optimal threads-per-node ratios
- When cross-node stealing helps vs hurts
Acceptance Criteria
Summary
Document the NUMA-aware thread pool and work-stealing subsystem including topology detection, NUMA-local scheduling, and cross-node work stealing.
Parent Issue
Part of: [EPIC] docs: Address documentation gaps across all ecosystem systems (kcenon/common_system#325)
Background (Why)
thread_system includes NUMA-aware scheduling capabilities for optimized performance on multi-socket systems. This is an advanced feature critical for high-performance computing workloads but has no documentation.
Source files:
include/kcenon/thread/stealing/numa_work_stealer.h— NUMA-aware work stealinginclude/kcenon/thread/stealing/numa_topology.h— NUMA topology detectioninclude/kcenon/thread/core/numa_thread_pool.h— NUMA-aware thread poolScope (What)
1. NUMA Overview
2. Topology Detection (
numa_topology.h)3. NUMA Thread Pool (
numa_thread_pool.h)4. Work Stealing (
numa_work_stealer.h)5. Usage Examples
6. Performance Tuning
Acceptance Criteria