Summary
Add histogram metric support to capture latency distributions with percentiles (p50, p95, p99) for network operations.
Current State
metric_reporter provides gauge and counter metrics
- Latency reported as single value (
report_latency(double ms))
- No percentile calculations
- No distribution visibility
Current limitation:
// Only reports latest value, no distribution
static void report_latency(double ms);
Why Histograms?
- Average latency hides outliers
- P99 latency is critical for SLA compliance
- Distributions reveal performance patterns
- Essential for capacity planning
Proposed Implementation
1. Histogram Class
class histogram {
public:
struct config {
std::vector<double> bucket_boundaries; // Explicit buckets
// OR
double min_value = 0.0;
double max_value = 10000.0; // 10 seconds
size_t bucket_count = 20; // Exponential buckets
};
explicit histogram(config cfg);
// Record a value
void record(double value);
// Get statistics
[[nodiscard]] auto count() const -> uint64_t;
[[nodiscard]] auto sum() const -> double;
[[nodiscard]] auto min() const -> double;
[[nodiscard]] auto max() const -> double;
[[nodiscard]] auto mean() const -> double;
// Percentiles
[[nodiscard]] auto percentile(double p) const -> double;
[[nodiscard]] auto p50() const -> double { return percentile(0.50); }
[[nodiscard]] auto p95() const -> double { return percentile(0.95); }
[[nodiscard]] auto p99() const -> double { return percentile(0.99); }
[[nodiscard]] auto p999() const -> double { return percentile(0.999); }
// Get all bucket counts
[[nodiscard]] auto buckets() const
-> std::vector<std::pair<double, uint64_t>>;
// Reset statistics
void reset();
private:
std::vector<double> boundaries_;
std::vector<std::atomic<uint64_t>> bucket_counts_;
std::atomic<uint64_t> count_{0};
std::atomic<double> sum_{0.0};
std::atomic<double> min_{std::numeric_limits<double>::max()};
std::atomic<double> max_{std::numeric_limits<double>::lowest()};
mutable std::mutex mutex_;
};
2. Sliding Window Histogram
class sliding_histogram {
public:
struct config {
histogram::config hist_config;
std::chrono::seconds window_duration{60};
size_t bucket_count_per_window = 6; // 10-second buckets
};
explicit sliding_histogram(config cfg);
void record(double value);
// Get statistics for current window
[[nodiscard]] auto p50() const -> double;
[[nodiscard]] auto p95() const -> double;
[[nodiscard]] auto p99() const -> double;
private:
struct time_bucket {
histogram hist;
std::chrono::steady_clock::time_point start_time;
};
std::deque<time_bucket> buckets_;
config config_;
std::mutex mutex_;
};
3. Integration with Metrics System
namespace metric_names {
// Histogram metrics
constexpr const char* LATENCY_HISTOGRAM = "network.latency.histogram";
constexpr const char* CONNECTION_TIME_HISTOGRAM =
"network.connection_time.histogram";
constexpr const char* REQUEST_DURATION_HISTOGRAM =
"network.request_duration.histogram";
}
class metric_reporter {
public:
// Existing methods...
// NEW: Histogram methods
static void record_latency(double ms);
static void record_connection_time(double ms);
static void record_request_duration(double ms);
// Get histogram statistics
static auto get_latency_p50() -> double;
static auto get_latency_p95() -> double;
static auto get_latency_p99() -> double;
// Get all histograms for export
static auto get_all_histograms()
-> std::map<std::string, histogram_snapshot>;
private:
static histogram& latency_histogram();
static histogram& connection_time_histogram();
static histogram& request_duration_histogram();
};
4. Histogram Snapshot for Export
struct histogram_snapshot {
uint64_t count;
double sum;
double min;
double max;
std::map<double, double> percentiles; // percentile -> value
std::vector<std::pair<double, uint64_t>> buckets; // boundary -> count
// Serialize for export
auto to_prometheus() const -> std::string;
auto to_json() const -> std::string;
};
5. Usage Examples
// Record connection latency
auto start = std::chrono::steady_clock::now();
client->connect(host, port);
auto elapsed = std::chrono::steady_clock::now() - start;
metric_reporter::record_connection_time(
std::chrono::duration<double, std::milli>(elapsed).count());
// Get percentiles for alerting
if (metric_reporter::get_latency_p99() > 100.0) {
log_warning("P99 latency exceeds 100ms threshold");
}
// Export for Prometheus
auto snapshot = metric_reporter::get_all_histograms();
for (const auto& [name, hist] : snapshot) {
std::cout << hist.to_prometheus() << "\n";
}
6. Prometheus Format Output
# HELP network_latency_histogram Network operation latency in milliseconds
# TYPE network_latency_histogram histogram
network_latency_histogram_bucket{le="1"} 100
network_latency_histogram_bucket{le="5"} 250
network_latency_histogram_bucket{le="10"} 400
network_latency_histogram_bucket{le="25"} 480
network_latency_histogram_bucket{le="50"} 495
network_latency_histogram_bucket{le="100"} 499
network_latency_histogram_bucket{le="+Inf"} 500
network_latency_histogram_sum 12500
network_latency_histogram_count 500
Default Bucket Boundaries
Optimized for network latencies:
// Milliseconds: sub-ms to 10 seconds
{0.1, 0.5, 1, 2, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000}
Tasks
Acceptance Criteria
Files to Create/Modify
- New:
include/kcenon/network/metrics/histogram.h
- New:
include/kcenon/network/metrics/sliding_histogram.h
- New:
src/metrics/histogram.cpp
- Modify:
include/kcenon/network/metrics/network_metrics.h
- Modify:
src/metrics/network_metrics.cpp
- New:
tests/unit/test_histogram.cpp
Related
Summary
Add histogram metric support to capture latency distributions with percentiles (p50, p95, p99) for network operations.
Current State
metric_reporterprovides gauge and counter metricsreport_latency(double ms))Current limitation:
Why Histograms?
Proposed Implementation
1. Histogram Class
2. Sliding Window Histogram
3. Integration with Metrics System
4. Histogram Snapshot for Export
5. Usage Examples
6. Prometheus Format Output
Default Bucket Boundaries
Optimized for network latencies:
Tasks
histogramclasssliding_histogramclassmetric_reporterAcceptance Criteria
Files to Create/Modify
include/kcenon/network/metrics/histogram.hinclude/kcenon/network/metrics/sliding_histogram.hsrc/metrics/histogram.cppinclude/kcenon/network/metrics/network_metrics.hsrc/metrics/network_metrics.cpptests/unit/test_histogram.cppRelated