Slow TX Observability, Per-TX Latency Metric, and Timeout-Interrupt Diagnostics#2134
Slow TX Observability, Per-TX Latency Metric, and Timeout-Interrupt Diagnostics#2134
Conversation
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
Code review1 issue found. Checked for bugs and CLAUDE.md compliance. Bug: Off-by-one in
|
|
@claude Do a review again, see if you find any other issues |
|
Claude finished @lucca30's task in 2m 7s —— View job Code Review (Re-review)
1 issue found, 2 minor observations. Checked for bugs, races, security, and CLAUDE.md compliance. Issue: Weak ordering guarantee between
|
|
IMO, the per tx latency isn't much useful as it would just be a another metric. It doesn't particularly tell use which txs are slower vs which are faster. I don't see how this can help us. |
@kamuikatsurgi |
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (69.23%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #2134 +/- ##
===========================================
+ Coverage 51.44% 51.48% +0.04%
===========================================
Files 881 882 +1
Lines 153870 154004 +134
===========================================
+ Hits 79157 79290 +133
+ Misses 69542 69538 -4
- Partials 5171 5176 +5
... and 23 files with indirect coverage changes
🚀 New features to boost your workflow:
|
|
@claude Do an analysis for the new changes I did. Checking if any bug, error or sec breach was introduced. |
|
Claude finished @lucca30's task in 1m 13s —— View job Analysis of New Changes
0 bugs, 0 security issues found. Checked for bugs, data races, security breaches, and correctness issues across all 6 changed files. What was checked1.
2. Tracker integration in
3. Window ticker in
4.
5. Interrupt diagnostic context (
6.
7.
8. Removed flag test (
9. Benchmark test (
10. Security check
Previous review findings status
The implementation is solid. No bugs, errors, or security issues introduced. |
manav2401
left a comment
There was a problem hiding this comment.
LGTM, couple of small nits.
|


Description
Summary
This branch adds miner observability and diagnostics in three areas:
It also adds dedicated unit tests and benchmark coverage for the new tracker.
Behavior Changes
1) Slow tx tracking (windowed top-K)
K=10) with:O(1)fast reject when not in top-KO(log K)insert/replace when qualifyingworker.IsRunning()).2) Timeout interruption diagnostics
On block-build timeout interruption, logs now include richer context:
flagSetAtflagToAbortDelayflagToTxInterruptDelay(when observed)txHash,txIndex,sender,txElapsed)Also includes dedicated handling/logging for
vm.ErrInterruptin tx execution flow.3) Per-tx duration metric
worker/txApplyDurationmetric.ExpDecaySample(8192, 0.015)) to preserve tail visibility at high throughput.Logging Format Helper
common.PrettyTime(UTC format:MM-DD|HH:MM:SS.mmm) for consistent log time rendering.common/format_test.go.Test Changes
Benchmark Impact
Benchmark command:
Representative run (Apple M4 Pro):
Baseline/200tx:204.0 ns/opTrackerMostlyNoInsert/200tx:446.7 ns/opTrackerMixedInsert/200tx:853.0 ns/opBaseline/800tx:794.0 ns/opTrackerMostlyNoInsert/800tx:1328 ns/opTrackerMixedInsert/800tx:2492 ns/opDelta vs baseline (impact view)
Assuming 200 tx insertions per block:
446.7 - 204.0 = +242.7 ns/block(about+1.21 ns/tx)853.0 - 204.0 = +649.0 ns/block(about+3.25 ns/tx)Assuming 800 tx insertions per block:
1328 - 794 = +534 ns/block(about+0.67 ns/tx)2492 - 794 = +1698 ns/block(about+2.12 ns/tx)Allocation profile in tracker modes:
480 B/op,10 allocs/op(bounded byK=10, not by tx volume)Conclusion:
Validation Performed
go test ./commongo test ./internal/cli/servergo test ./miner -run TestSlowTxgo test ./miner -run '^$' -bench BenchmarkSlowTxTopWindowTrackerImpact -benchmemChanges
Breaking changes
Please complete this section if any breaking changes have been made, otherwise delete it
Nodes audience
In case this PR includes changes that must be applied only to a subset of nodes, please specify how you handled it (e.g. by adding a flag with a default value...)
Checklist
Cross repository changes
Testing
Manual tests
Please complete this section with the steps you performed if you ran manual tests for this functionality, otherwise delete it
Additional comments
Please post additional comments in this section if you have them, otherwise delete it