Skip to content

[Task] docs: Document diagnostics and metrics backends #532

Description

@kcenon

Summary

Document the thread pool diagnostics system and pluggable metrics backends, including bottleneck detection, health monitoring, and export to Prometheus/JSON/Logging.

Parent Issue

Part of: [EPIC] docs: Address documentation gaps across all ecosystem systems (kcenon/common_system#325)

Background (Why)

thread_system includes a comprehensive diagnostics and metrics subsystem with:

  • Thread pool diagnostics with bottleneck detection
  • Health status monitoring
  • Pluggable metrics backends (Prometheus, JSON, Logging)
  • Latency histograms and sliding window counters

None of this is documented, making it impossible for users to leverage these observability features.

Source files — Diagnostics (include/kcenon/thread/diagnostics/):

  • thread_pool_diagnostics.h — Main diagnostics orchestrator
  • bottleneck_report.h — Bottleneck detection and reporting
  • health_status.h — Pool health status types
  • execution_event.h — Execution event tracking
  • job_info.h — Job metadata
  • thread_info.h — Thread metadata

Source files — Metrics (include/kcenon/thread/metrics/):

  • metrics_backend.h — Backend interface
  • metrics_base.h — Base metric types
  • thread_pool_metrics.h — Pool-specific metrics
  • enhanced_metrics.h — Extended metric types
  • latency_histogram.h — Latency distribution tracking
  • sliding_window_counter.h — Time-windowed counting
  • metrics_service.h — Metrics collection service

Scope (What)

1. Diagnostics System

Thread Pool Diagnostics (thread_pool_diagnostics.h)

  • How to enable diagnostics on a thread pool
  • Available diagnostic data points
  • Real-time vs periodic diagnostics

Bottleneck Detection (bottleneck_report.h)

  • Bottleneck detection algorithm
  • Report format and interpretation
  • Automated recommendations

Health Monitoring (health_status.h)

  • Health status levels and transitions
  • Health check configuration
  • Alerting integration points

2. Metrics Framework

Metrics Backend Interface (metrics_backend.h)

  • Backend plugin interface
  • How to implement custom backends
  • Backend lifecycle management

Built-in Backends

Backend Output Format Use Case
Prometheus HTTP endpoint OpenMetrics Production monitoring
JSON File/stdout JSON Development/testing
Logging Logger system Text Debug/audit

Core Metrics (thread_pool_metrics.h, enhanced_metrics.h)

  • Available metrics (queue depth, throughput, utilization, etc.)
  • Metric naming conventions
  • Label/tag support

Latency Tracking (latency_histogram.h)

  • Histogram bucket configuration
  • Percentile calculation (P50, P95, P99)
  • Sliding window behavior

3. Usage Examples

// Enable diagnostics
auto pool = thread_pool::create()
    .with_diagnostics(diagnostics_config{
        .enable_bottleneck_detection = true,
        .health_check_interval = 5s
    })
    .with_metrics(prometheus_backend{"0.0.0.0:9090"})
    .build();

// Query diagnostics
auto report = pool.diagnostics().bottleneck_report();
auto health = pool.diagnostics().health_status();

Acceptance Criteria

  • All diagnostics types documented (6 files)
  • All metrics types documented (7 files)
  • Metrics backend interface documented for custom implementations
  • Built-in backends (Prometheus/JSON/Logging) documented with setup examples
  • Bottleneck report interpretation guide
  • Complete usage example with diagnostics + metrics

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationpriority:mediumMedium priority issue

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions