feat: add ProgressReporter abstraction#9146
Conversation
| @@ -0,0 +1,109 @@ | |||
| # ProgressLogger Log Samples | |||
There was a problem hiding this comment.
I mean, put it in the pr descrpition, not committing it.
There was a problem hiding this comment.
I've committed it for the sake of you to review it and tell what you think, and after I was gonna delete it
There was a problem hiding this comment.
But now gonna do as you wish
|
|
||
| public void VisitTree(in OldStyleTrieVisitContext _, in ValueHash256 rootHash) | ||
| { | ||
| if (_progressLogger is not null) |
There was a problem hiding this comment.
Dont make it optional then. In test, there is LimboLog.
|
|
||
| // For now, we'll use a conservative estimate based on typical state sizes | ||
| // This can be improved with more sophisticated estimation logic | ||
| return 10_000_000; // Conservative estimate for mainnet state |
There was a problem hiding this comment.
Do it properly, or don't at all.
|
@asdacap @LukaszRozmej do we still consider merging it? |
|
@claude review |
|
Claude finished @LukaszRozmej's task in 4m 57s —— View job PR Review:
|
| # | Finding | Location |
|---|---|---|
| H1 | ProgressLogger hardcodes "Blk/s" unit — every log line will show Blk/s instead of nodes/sec or accounts/sec, contradicting the PR description samples. Use SetFormat() with a custom formatter on each logger instance. |
CopyTreeVisitor.cs:100, TrieStatsCollector.cs, SupplyVerifier.cs |
| H2 | Diagnostic regression in TrieStatsCollector — the original Warn-level log emitted every million nodes included Missing CODE / STATE / STORAGE counts. Replacing it with _progressLogger.LogProgress() silently drops this health signal during trie traversal. Operators monitoring logs for broken state will miss it. |
TrieStatsCollector.cs:143 |
Medium
| # | Finding | Location |
|---|---|---|
| M1 | Progress bar always shows 100% when total = 0 — Math.Clamp(current / (float)Math.Max(0,1), 0, 1) clamps to 1 immediately after the first node. All in-progress lines show [████████████████████] 100%, which is meaningless. |
CopyTreeVisitor.cs:69 |
| M2 | ProgressLogger not thread-safe under parallel traversal — CurrentValue, LastValue, and LastMeasurement are plain fields with no Interlocked / volatile protection. Full pruning runs with MaxDegreeOfParallelism > 1, so speed calculations will be racy. |
CopyTreeVisitor.cs:100 |
| M3 | Dead code — _totalNodesToProcess (always 0), _sourceNodeStorage (set but never read), and EstimateNodesInState() (private, never called) in CopyTreeVisitor should all be removed. |
CopyTreeVisitor.cs:28 |
| M4 | Double Reset on shared ProgressLogger — FullPruner.CopyTrie and CopyTreeVisitor.VisitTree both call Reset(0,0) on the same instance, overwriting the start timestamp. Ownership of Reset should live in exactly one place. |
FullPruner.cs:270 |
Low
LogProgress(string state)method inCopyTreeVisitoris now dead code (never called) — should be removed.SupplyVerifier.VisitLeafdoesn't call_progressLogger.Update, making the node count incomplete. The removed per-branch/per-leaf balance logs also eliminated useful intermediate diagnostics.- Indentation inconsistency in
StateSyncFeedTestsBase.cs:259. - Explanatory comments throughout describe what the code does rather than why (violates project coding-style guidelines).
Not mergeable — H1 means the feature's core value (readable progress output) is broken as-is (wrong units), and H2 is a diagnostic regression in a path used for trie health monitoring.
Two fixes from review feedback: 1. Visitor tests had been switched from LimboNoErrorLogger.Instance (ILogger) to LimboLogs.Instance when the ctor signature went ILogger → ILogManager. LimboLogs's LimboTraceLogger silently drops errors; LimboNoErrorLogger throws on Error() as a test fail-fast guard. NoErrorLimboLogs is the ILogManager equivalent that preserves the throw-on-error invariant. 2. DbBlocksLoader / StartupBlockTreeFixer instances in tests and in Nethermind.PerfTest were not `using`-disposed. Pre-existing oversight (both classes were already IDisposable), but matters more now that each owns a System.Timers.Timer via ProgressReporter.
|
@claude review |
|
Claude finished @LukaszRozmej's task in 4m 50s —— View job PR Review:
|
| # | Severity | Finding |
|---|---|---|
| L1 | Low | ProgressReporter.Dispose() calls MarkEnd() but not LogProgress() — the final "100% done" progress bar line is never emitted. Prior code at TotalDifficultyFixMigration explicitly called both. One-line fix in Dispose(). |
| L2 | Low | Lost fixedBlocks summary in TotalDifficultyFixMigration — "Ended TotalDifficultyFixMigration." replaced "Ended … Fixed N blocks." Mild observability regression for fleet operators. |
Neither blocks merge. The abstraction is correct and all previous issues are addressed.
| if (Interlocked.Exchange(ref _disposed, 1) != 0) return; | ||
| _timer.Stop(); | ||
| _timer.Dispose(); | ||
| _progressLogger.MarkEnd(); |
There was a problem hiding this comment.
Low – Missing final LogProgress() after MarkEnd()
The old TotalDifficultyFixMigration explicitly called MarkEnd() then LogProgress() to emit a final "100% complete" progress line. ProgressReporter.Dispose() stops the timer and marks end, but never fires a final log. The dedup state in LogProgress() tracks (CurrentValue, TargetValue, CurrentQueued, _skipped), so if the last timer tick fired before the final Update() call (e.g. operation completes faster than 1 s, or timer last fired at 99.5%), no completion line is emitted.
MarkEnd() only sets UtcEndTime; it doesn't trigger logging itself. Adding one call here makes the "done" state always appear in the log:
| _progressLogger.MarkEnd(); | |
| _timer.Stop(); | |
| _timer.Dispose(); | |
| _progressLogger.MarkEnd(); | |
| _progressLogger.LogProgress(); |
| reporter.Update(blockNumber - startingBlock + 1); | ||
| } | ||
|
|
||
| if (_logger.IsInfo) _logger.Info("Ended TotalDifficultyFixMigration."); |
There was a problem hiding this comment.
Low – Lost fixedBlocks summary count
The old completion log included the number of blocks that were actually corrected:
"Ended TotalDifficultyFixMigration. Fixed {fixedBlocks} blocks."
The Warn per-block lines are still emitted during the run, but operators monitoring a fleet can no longer grep the final summary to know whether any corrections were made at all, without scanning potentially thousands of Warn lines. Consider restoring the counter:
| if (_logger.IsInfo) _logger.Info("Ended TotalDifficultyFixMigration."); | |
| if (_logger.IsInfo) _logger.Info($"Ended TotalDifficultyFixMigration. Fixed {fixedBlocks} blocks."); |
(and re-add long fixedBlocks = 0; before the loop, incremented where shouldPersist is set)
Tests can now write `using IBlockTreeVisitor fixer = new StartupBlockTreeFixer(...)`
and call fixer.VisitBlock(...) directly without a `((IBlockTreeVisitor)fixer)` cast
(VisitBlock and friends are explicit-interface impls on the production visitors).
All three production implementors already implemented IDisposable; relationship
is just being codified. The default-method `void IDisposable.Dispose() { }` keeps
the test-only TestBlockTreeVisitor (BlockTreeTests.cs) source-compatible without
forcing an empty Dispose on it.
Redundant ", IDisposable" on the three production visitor declarations removed.
L1: ProgressReporter.Dispose now emits a final LogProgress after MarkEnd so the closing "100% / done" line is logged (matches the explicit pattern used by BloomMigration/ReceiptMigration before the refactor). L2: Restore "Fixed N entries." suffix on TotalDifficultyFixMigration's end log. Renamed counter from `fixedBlocks` to `fixedEntries` because the counter increments per BlockInfo (a single block height can have multiple BlockInfos for non-canonical forks, so the previous name could exceed the block range).
|
@claude re-review |
Closes the last two non-custom sites under NethermindEth#8504: - Era1.EraImporter, Era1.EraExporter, EraE.EraImporter, EraE.EraExporter: drop the manual `ProgressLogger.Reset + modulo `Update`/`LogProgress` pattern (the latter caused log silence under cold-disk imports) in favour of `using ProgressReporter`. Update is now called every iteration; the timer drives the 1s wall-clock cadence. - Init.Snapshot.SnapshotDownloader: replace the bespoke Init.Snapshot.ProgressTracker (custom ITimer + AddProgress) with `ProgressReporter` + a per-instance byte-size formatter via `progress.Logger.SetFormat(...)`. Deletes ProgressTracker.cs. Drops the now-unused ITimerFactory ctor parameter. `ProgressReporter.Logger.SetFormat` is the documented escape hatch for sites that need a custom presentation while keeping the timer + disposal lifecycle for free. Leaves SnapSync.ProgressTracker alone; its multi-phase architecture is non-trivial to migrate and the issue says nothing about it.
|
@claude re-review |
|
Claude finished @LukaszRozmej's task in 4m 32s —— View job Re-review:
|
| Finding | Status |
|---|---|
L1 – missing final LogProgress() after MarkEnd() in Dispose() |
Resolved — ProgressReporter.Dispose() now calls both MarkEnd() + LogProgress() |
L2 – lost fixedBlocks summary count |
Resolved — "Fixed {fixedEntries} entries." log line is still emitted |
flcl42 – disposal leaks in DbBlocksLoaderTests |
Resolved — all four test instances now use using |
LukaszRozmej nits – timer init order / timer.Stop() |
Resolved — superseded by ProgressReporter which encapsulates the pattern correctly |
New findings
No Critical, High, or Medium issues.
| # | Severity | Location | Finding |
|---|---|---|---|
| N1 | Nit | IBlockTreeVisitor.cs:16 |
Double blank line after the default Dispose() body — one blank line is enough (inline comment posted) |
Design notes (no action required)
ProgressLoggerconcurrent writes in Era paths:EraExporterandEraImportercallprogress.Update(Interlocked.Increment(ref totalProcessed))from parallel epoch tasks.ProgressLogger.Updateassigns plain auto-properties without synchronization — on 64-bit targetslongassignment is atomic, so speed displays may be transiently stale but no corruption occurs. Acceptable for observability-only logging.SnapshotDownloadertimer semantics: OldProgressTrackerusedAutoReset = false(re-arm after each tick), new code uses a periodic 5s timer.LogProgress()deduplication suppresses output when the value hasn't changed, so stalled-download behavior is equivalent. The new approach is cleaner.ReceiptsVerificationVisitor.Dispose()isvirtual:MissingReceiptsFixVisitordoesn't add disposable resources and doesn't overrideDispose(), so the base implementation correctly disposes_progress. Thevirtualmodifier leaves room for subclasses to extend if needed.
Summary
All previous findings are resolved. The abstraction is clean, the disposal lifetimes are correct throughout, and the IBlockTreeVisitor : IDisposable extension is backward-compatible via the no-op default. One formatting nit in IBlockTreeVisitor.cs is the only remaining item.
Mergeable ✓
|
Claude finished @LukaszRozmej's task in 3m 40s —— View job Re-review:
|
| Finding | Status |
|---|---|
H1 – hardcoded "Blk/s" in CopyTreeVisitor/TrieStatsCollector |
Resolved — reverted to master in prior round |
| H2 – diagnostic regression in TrieStatsCollector | Resolved — reverted to master in prior round |
| M1–M4 – progress bar / thread safety / dead code / double Reset | Resolved — reverted to master in prior round |
| L1 – no final LogProgress on Dispose | Resolved in 6e0e175 |
| L2 – lost fixedBlocks summary | Resolved in 6e0e175 |
Summary
No Critical, High, or Medium findings. All previous Low findings addressed. The ProgressReporter abstraction is correct, the four application sites are clean, and the new Era + Snapshot migrations follow the same pattern correctly. Mergeable.
| { | ||
| void IDisposable.Dispose() { } | ||
|
|
||
|
|
There was a problem hiding this comment.
Nit – extra blank line
Two blank lines here; one is enough.
|
@claude check the validity of the following review 🟠 WARNING — two real issues, both rooted in the same gap W1 — Dispose() doesn't drain in-flight timer callbacks; final "done" log can be overwritten by a stale tick ProgressReporter.cs: System.Timers.Timer.Stop() is documented as not draining in-flight Elapsed callbacks — a tick already queued on the thread pool can land
Two observable bad outcomes:
Fix: use Timer.Dispose(WaitHandle) and block on it: W2 — ProgressLogger field state is unsynchronized; Update() and timer-fired LogProgress() race ProgressLogger.Update(long) writes CurrentValue, UtcStartTime, UtcEndTime. LogProgress() reads those and _lastReportState. None of it is Concrete torn-state path: Update() sets UtcEndTime = null first; a tick between that and the CurrentValue = value assignment computes the This is technically pre-existing in ProgressLogger, but the old ProgressTracker used Interlocked.Add + AutoReset=false which kept the timer Fix: a lock in ProgressReporter around both update and log paths is sufficient (uncontended at 1 Hz): 🟡 SUGGESTION S1 — Functional regression on unknown-size snapshot downloads |
|
Claude finished @svlachakis's task in 3m 25s —— View job Review validity check
W1 — Dispose() timer drainVerdict: Valid in theory, materially overstated in practice.
But the claimed "bad outcomes" don't hold up under the dedup logic in Case A — stale tick fires before Case B — stale tick fires between Case C — stale tick fires after the final The reviewer's claim that "the final '100% done' line is followed by a stale partial-progress line" requires the stale tick to fire after The W2 — Unsynchronized ProgressLogger fieldsVerdict: Valid concern, but it's pre-existing and explicitly accepted.
The old However, the consequences are bounded:
This was explicitly reviewed and accepted in round 3: "speed displays may be transiently stale but no corruption occurs. Acceptable for observability-only logging." The Severity: Low (pre-existing in S1 — SnapshotDownloader unknown-total regressionVerdict: Invalid. The code correctly handles the unknown-total case. The reviewer claims the percentage bar pegs at 100% when // SnapshotDownloader.cs:73-74
using ProgressReporter progress = new("Snapshot download", logManager, totalSize ?? 0, ProgressInterval);
progress.Logger.SetFormat(FormatBytes(totalSize)); // ← overrides DefaultFormatter entirely
private static Func<ProgressLogger, string> FormatBytes(long? totalBytes) =>
totalBytes is null
? static logger => $"Snapshot download {HumanReadableSize(logger.CurrentValue)}" // no percentage
: logger => $"Snapshot download {HumanReadableSize(logger.CurrentValue)} out of {HumanReadableSize(totalBytes.Value)}";When Summary
None of these findings warrant blocking the PR. W1 and W2 are valid observations worth a follow-up improvement but carry no correctness or user-data risk — this is observability-only code. S1 is factually incorrect. |
Addresses external review (svlachakis): W1 — System.Timers.Timer.Stop() does not drain in-flight Elapsed callbacks; a tick queued before Stop() can land after Dispose finishes and produce a stale progress line after the final "done" line. W2 — ProgressLogger fields (CurrentValue, UtcStartTime, UtcEndTime, _lastReportState) are plain auto-properties with no synchronization. The old Snapshot ProgressTracker used AutoReset=false (single-shot re-armed callbacks) which serialized ticks; ProgressReporter's periodic timer doesn't, so Update on the work thread and LogProgress on a thread-pool tick truly race in Era1/EraE export paths and elsewhere. A single lock on the wrapped ProgressLogger fixes both: Update and the timer callback can't interleave, and Dispose acquires the same lock before MarkEnd + LogProgress, blocking on any in-flight tick. Late callbacks that fire after Dispose dedup against the terminal _lastReportState. Locking on _progressLogger itself (we own it exclusively) avoids an extra allocation per reporter. Also drops the stray blank line after the default Dispose body in IBlockTreeVisitor (cosmetic nit from claude bot review).
|
@claude re-review |
|
Claude finished @LukaszRozmej's task in 2m 14s —— View job Re-review:
|
| Finding | Status |
|---|---|
H1 – hardcoded "Blk/s" in CopyTreeVisitor |
Resolved — files reverted to master in prior round |
| H2 – diagnostic regression in TrieStatsCollector | Resolved — file reverted to master in prior round |
| M1–M4 – progress bar / thread safety / dead code / double Reset | Resolved — files reverted to master in prior round |
L1 – no final LogProgress() on Dispose |
Resolved in 6e0e175 |
L2 – lost fixedBlocks summary |
Resolved in 6e0e175 |
N1 – extra blank line in IBlockTreeVisitor.cs |
Resolved in e841f46 |
W1 – Timer.Stop() doesn't drain in-flight callbacks |
Resolved in e841f46 (see below) |
W2 – unsynchronized ProgressLogger fields |
Resolved in e841f46 (see below) |
| S1 – unknown-total snapshot percentage regression | Was Invalid — SetFormat(FormatBytes(null)) suppresses the bar |
W1 + W2 Fix Analysis
The fix uses a single lock (_progressLogger) across three entry points:
public void Update(long value)
=> lock (_progressLogger) _progressLogger.Update(value);
private void OnElapsed(object? sender, ElapsedEventArgs e)
=> lock (_progressLogger) _progressLogger.LogProgress();
public void Dispose()
{
if (Interlocked.Exchange(ref _disposed, 1) != 0) return;
_timer.Stop();
_timer.Dispose();
lock (_progressLogger)
{
_progressLogger.MarkEnd();
_progressLogger.LogProgress();
}
}W2 (unsynchronized fields): Update and OnElapsed can no longer interleave, so ProgressLogger's plain auto-property fields (CurrentValue, UtcStartTime, UtcEndTime, _lastReportState) are protected from concurrent writes. ✓
W1 (timer drain): After _timer.Stop() + _timer.Dispose() no new callbacks are queued, but a callback already enqueued in the thread pool can still run. When it does, it blocks on lock (_progressLogger) until Dispose() releases it. At that point _lastReportState is already set to the terminal state, so LogProgress()'s dedup suppresses any output. ✓
Lock object choice: Locking on _progressLogger itself (exclusively owned by this instance) avoids an extra allocation and is idiomatic for owned, non-public resources. ✓
Double-dispose guard: Interlocked.Exchange(ref _disposed, 1) prevents two concurrent Dispose() calls from both emitting a "done" line. ✓
Contention profile: The timer fires at 1 Hz; Update() is called at block-processing cadence. The lock is held for a single field write (Update) or one log-format pass (LogProgress). Contention is negligible in practice. ✓
Logger property exposure: progress.Logger.SetFormat(...) is called at construction time (before the timer fires at any call site), so direct Logger access introduces no concurrent-access risk. ✓
New Findings
None. No Critical, High, or Medium findings.
Summary
All previous findings are resolved. The W1/W2 fix is correct: the lock serialises Update vs. timer ticks, and the lock inside Dispose() ensures any in-flight tick completes before the terminal "done" line is emitted. The IBlockTreeVisitor cosmetic nit is also fixed.
Mergeable ✓
Drop the restate-the-code remarks paragraph and the inline Dispose comment. The single remaining remark covers the only non-obvious WHY: the lock is for the Update/LogProgress race + Timer.Stop drain gap.
Closes #8504.
Changes
Introduces
Nethermind.Core.ProgressReporter— anIDisposablethat wraps aProgressLoggerwith a 1-second timer for periodicLogProgressand guaranteesMarkEndon disposal. Replaces the recurringReset→new Timer(...)→try/finally MarkEndpattern with oneusingline.Applied to four sites that were either ad-hoc or had no progress logging:
TotalDifficultyFixMigration— previously had no progress logging.DbBlocksLoader,StartupBlockTreeFixer,ReceiptsVerificationVisitor— three block-tree visitors that emitted ad-hoc modulo-NInfostrings; now standardised onProgressLogger's percentage + bar + speed format.Side effects:
DbBlocksLoaderandStartupBlockTreeFixerconstructor signatures change fromILoggertoILogManager(required to constructProgressLogger). Callers inReviewBlockTree,Nethermind.PerfTest, and tests updated.ReceiptsVerificationVisitorbecomesIDisposable;ReceiptFixMigrationupdated tousingit.LimboNoErrorLogger.InstanceasILoggerswitched to the equivalentILogManager—NoErrorLimboLogs.Instance(preserves the throw-on-error fail-fast guard;LimboLogs.Instancesilently drops errors and would have weakened the invariant).DbBlocksLoaderTests,StartupTreeFixerTests, andNethermind.PerfTestweren'tusing-disposed despite the classes already beingIDisposable.Scope note
The Full Pruning, Trie Verification, and Analytics work originally proposed in this PR was superseded on master:
Nethermind.Analyticswas removed in Remove Nethermind.Analytics #9683.CopyTreeVisitorandTrieStatsCollectornow useVisitorProgressTracker, a path-based estimator that produces a percentage even when total node count is unknown — strictly more useful than wiringProgressLoggerdirectly.BlockingVerifyTriereports progress transitively throughTrieStatsCollector.Types of changes
Testing
Progress logging is observability-only. Behavioural coverage stays with the existing
TotalDifficultyFixMigrationTest,DbBlocksLoaderTests, andStartupTreeFixerTests. FullNethermind.slnxbuilds clean underEnforceCodeStyleInBuild=true.