feat: add Prometheus per-query metrics for midnight data sources [PM-22100]#904
Conversation
LGLO
left a comment
There was a problem hiding this comment.
This PR does not do what ticket is about.
Problem Statement
Current state:
The Midnight node queries Cardano's DBSync database multiple times per block and per epoch, but only a single aggregate timing histogram (data_source_method_time_elapsed) is exposed via Prometheus.
It is impossible to determine which specific DBSync query is slow — the histogram combines all queries into one bucket set.
What is desired by the ticket:
Each DBSync query has its own labelled Prometheus timing histogram, enabling per-query latency comparison between preview and mainnet.
In other words, it is not about complete copy of solution from Partner Chains, but getting better insight the PC gives. Recording metrices should be done inside methods at SQL queries level.
Understood. Will re-evaluate the changes needed. |
Introduce MidnightDataSourceMetrics with public accessor methods, replacing the upstream McFollowerMetrics whose accessors are crate-private in partner-chains v1.8.1. The local observed_async_trait! macro now records call counts and timing histograms for all six midnight-specific data source methods (candidates, cnight observation, federated authority observation). JIRA: PM-22100 Made-with: Cursor
* update scanner action to latest version Signed-off-by: Giles Cope <gilescope@gmail.com> * update scanner action to latest version Signed-off-by: Giles Cope <gilescope@gmail.com> * ci: rename workflow name from build to scan Signed-off-by: Giles Cope <gilescope@gmail.com> --------- Signed-off-by: Giles Cope <gilescope@gmail.com>
Made-with: Cursor
…rces Add individual Prometheus timing histograms for each SQL query inside midnight-node data source methods. This provides per-query latency visibility alongside the existing method-level timing, enabling precise identification of slow DBSync queries on mainnet. 13 sub-query timers added across 3 data sources: - cNight observation: 5 queries (block lookup + 4 concurrent UTXO queries) - Federated authority: 3 queries (block lookup + 2 governance UTXOs) - Candidates: 5 queries across 3 methods Ref: PM-22100 Made-with: Cursor
Replace 13 inline timer patterns with a shared start_sub_query_timer() helper and SubQueryTimer RAII guard in the metrics module. Each call site reduces from 3 lines to 1. Made-with: Cursor
… only Remove the observed_async_trait! macro timing (method-level) since per-SQL-query timing provides more useful granularity. Simplify MidnightDataSourceMetrics to histogram-only (remove unused call counter). Rename metric to midnight_data_source_query_time_elapsed with query_name label for clarity. Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
753c8e7 to
ad33938
Compare
Made-with: Cursor
…22100] (#904) * feat: add Prometheus per-query metrics to midnight data sources Introduce MidnightDataSourceMetrics with public accessor methods, replacing the upstream McFollowerMetrics whose accessors are crate-private in partner-chains v1.8.1. The local observed_async_trait! macro now records call counts and timing histograms for all six midnight-specific data source methods (candidates, cnight observation, federated authority observation). JIRA: PM-22100 Made-with: Cursor * ci: normalise scan job name (#823) * update scanner action to latest version Signed-off-by: Giles Cope <gilescope@gmail.com> * update scanner action to latest version Signed-off-by: Giles Cope <gilescope@gmail.com> * ci: rename workflow name from build to scan Signed-off-by: Giles Cope <gilescope@gmail.com> --------- Signed-off-by: Giles Cope <gilescope@gmail.com> * chore: add change file for PM-22100 dbsync query metrics Made-with: Cursor * feat: add sub-query SQL-level Prometheus timing for midnight data sources Add individual Prometheus timing histograms for each SQL query inside midnight-node data source methods. This provides per-query latency visibility alongside the existing method-level timing, enabling precise identification of slow DBSync queries on mainnet. 13 sub-query timers added across 3 data sources: - cNight observation: 5 queries (block lookup + 4 concurrent UTXO queries) - Federated authority: 3 queries (block lookup + 2 governance UTXOs) - Candidates: 5 queries across 3 methods Ref: PM-22100 Made-with: Cursor * refactor: extract sub-query timer helper to reduce repetition Replace 13 inline timer patterns with a shared start_sub_query_timer() helper and SubQueryTimer RAII guard in the metrics module. Each call site reduces from 3 lines to 1. Made-with: Cursor * refactor: remove method-level timing, keep SQL-level sub-query timers only Remove the observed_async_trait! macro timing (method-level) since per-SQL-query timing provides more useful granularity. Simplify MidnightDataSourceMetrics to histogram-only (remove unused call counter). Rename metric to midnight_data_source_query_time_elapsed with query_name label for clarity. Made-with: Cursor * style: apply rustfmt to candidates data source Made-with: Cursor * chore: update changes file to reflect SQL-level sub-query timing Made-with: Cursor --------- Signed-off-by: Giles Cope <gilescope@gmail.com> Co-authored-by: Squirrel <giles.cope@shielded.io>
…22100] (#904) * feat: add Prometheus per-query metrics to midnight data sources Introduce MidnightDataSourceMetrics with public accessor methods, replacing the upstream McFollowerMetrics whose accessors are crate-private in partner-chains v1.8.1. The local observed_async_trait! macro now records call counts and timing histograms for all six midnight-specific data source methods (candidates, cnight observation, federated authority observation). JIRA: PM-22100 Made-with: Cursor * ci: normalise scan job name (#823) * update scanner action to latest version Signed-off-by: Giles Cope <gilescope@gmail.com> * update scanner action to latest version Signed-off-by: Giles Cope <gilescope@gmail.com> * ci: rename workflow name from build to scan Signed-off-by: Giles Cope <gilescope@gmail.com> --------- Signed-off-by: Giles Cope <gilescope@gmail.com> * chore: add change file for PM-22100 dbsync query metrics Made-with: Cursor * feat: add sub-query SQL-level Prometheus timing for midnight data sources Add individual Prometheus timing histograms for each SQL query inside midnight-node data source methods. This provides per-query latency visibility alongside the existing method-level timing, enabling precise identification of slow DBSync queries on mainnet. 13 sub-query timers added across 3 data sources: - cNight observation: 5 queries (block lookup + 4 concurrent UTXO queries) - Federated authority: 3 queries (block lookup + 2 governance UTXOs) - Candidates: 5 queries across 3 methods Ref: PM-22100 Made-with: Cursor * refactor: extract sub-query timer helper to reduce repetition Replace 13 inline timer patterns with a shared start_sub_query_timer() helper and SubQueryTimer RAII guard in the metrics module. Each call site reduces from 3 lines to 1. Made-with: Cursor * refactor: remove method-level timing, keep SQL-level sub-query timers only Remove the observed_async_trait! macro timing (method-level) since per-SQL-query timing provides more useful granularity. Simplify MidnightDataSourceMetrics to histogram-only (remove unused call counter). Rename metric to midnight_data_source_query_time_elapsed with query_name label for clarity. Made-with: Cursor * style: apply rustfmt to candidates data source Made-with: Cursor * chore: update changes file to reflect SQL-level sub-query timing Made-with: Cursor --------- Signed-off-by: Giles Cope <gilescope@gmail.com> Co-authored-by: Squirrel <giles.cope@shielded.io>
Signed-off-by: Mike Clay <mike.clay@shielded.io>
Signed-off-by: Mike Clay <mike.clay@shielded.io>








Summary
Add per-query Prometheus timing histograms for each SQL query executed by midnight-node's three data sources, enabling per-query latency comparison between Cardano preview and mainnet environments.
🎫 Ticket 📐 Engineering 🧪 Test Plan
Motivation
The Midnight node queries Cardano DBSync multiple times during each block production cycle. The partner-chains data sources already record per-method Prometheus metrics, but midnight-node defines three additional data sources — cNight observation, federated authority observation, and a candidates wrapper — that have zero Prometheus visibility for their individual SQL queries.
Without per-query timing data, the team cannot diagnose the 6-second query latencies observed on mainnet DBSync or verify that index optimizations are effective.
Changes
MidnightDataSourceMetricsstruct with a singleHistogramVec(midnight_data_source_query_time_elapsed{query_name="..."}) and RAII timer helpercnight_get_block_by_hash,cnight_get_registrations,cnight_get_deregistrations,cnight_get_asset_creates,cnight_get_asset_spendsfedauth_get_block_by_hash,fedauth_get_council_utxo,fedauth_get_technical_committee_utxocandidates_get_token_utxo_for_epoch,candidates_get_latest_block_for_epoch,candidates_get_utxos_for_address,candidates_get_stake_distribution,candidates_get_epoch_nonceMcFollowerMetricstoMidnightDataSourceMetricsservice.rsregisters midnight metrics;main_chain_follower.rsroutes bothMcFollowerMetrics(partner-chains) andMidnightDataSourceMetrics(midnight-specific) to the appropriate data sourcesDesign Decisions
midnight_data_source_query_*is distinct from partner-chains'partner_chains_data_source_method_*, avoiding label confusion._countsuffixes; a separate CounterVec would be redundant.Submission Checklist
Fork Strategy
TODO before merging