Skip to content

blockstm: replace O(n²) UpdateDeps with inverted-index DepsBuilder#2094

Merged
kamuikatsurgi merged 9 commits intodevelopfrom
kamui/blockstm-perf
Mar 3, 2026
Merged

blockstm: replace O(n²) UpdateDeps with inverted-index DepsBuilder#2094
kamuikatsurgi merged 9 commits intodevelopfrom
kamui/blockstm-perf

Conversation

@kamuikatsurgi
Copy link
Copy Markdown
Member

The block producer computes a transaction dependency DAG during sequential execution and embeds it in the block header. Importers use it to schedule parallel BlockSTM execution without speculative re-execution overhead.

The old algorithm (UpdateDeps) scanned every preceding transaction's full write-set for each new transaction: O(n²) per block. At 300 tx/block, this was the dominant cost in the miner, confirmed by CPU profiling.

Algorithm

DepsBuilder replaces it with two data structures:

Inverted indexlastWriter map[Key]int: maps each state key to the index of the most recent transaction that wrote it. For a new tx at index i, a single pass over its read-set resolves all direct dependencies in O(R), where R is the read-set size. Only the latest writer matters; earlier writers are already covered by transitivity.

Word-parallel bitsetsdirectDeps []TxBitset and reachable []TxBitset: each bitset is []uint64 of width (N+63)/64. Transitive reduction runs in O(N/64) per tx using two bitset operations:

reachable[i] =reachable[j] for j in directDeps[i]     // transitive closure
directDeps[i] &^= reachable[i]    // remove redundant edges

Total complexity: O(n·R) vs O(n²·W) previously, where R = avg read-set size and W = avg write-set size. Default bitset width is 512; ensureCapacity doubles on overflow and widens all existing bitsets atomically before appending new ones — maintaining the invariant that all bitsets have identical len(words) at all times.

The output format (map[int]map[int]bool) is unchanged, full backwards compatibility with old importers.

Correctness fixes

ValidatorBytes preservation — If DepsBuilder errored, the original code skipped rlp.DecodeBytes and re-encoded a zero-value BlockExtraData, silently dropping ValidatorBytes. These are consensus-critical on Cancun sprint transitions. Fix: always decode the existing extra data before overwriting TxDependency.

delayFlag off-by-one (pre-existing) — mvReadMapList[i-1] checked the wrong transaction's reads when deciding whether to include the DAG hint. The last transaction's coinbase/burnt-contract balance reads were never checked. If only the last tx read these balances, the hint was incorrectly embedded. Fixed to mvReadMapList[i].

Backwards compatibility

Verified on a devnet with 1 new validator node and 3 older RPC nodes. After 1,352+
blocks:

  • Zero consensus errors on all nodes
  • All three checkpoints logged on old nodes: decode, dependency verification, parallel
    execution completion

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.23%. Comparing base (82b9c74) to head (ab3e24d).
⚠️ Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
miner/worker.go 28.20% 28 Missing ⚠️

❌ Your patch status has failed because the patch coverage (80.00%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2094      +/-   ##
===========================================
+ Coverage    51.17%   51.23%   +0.05%     
===========================================
  Files          878      878              
  Lines       152499   152602     +103     
===========================================
+ Hits         78041    78179     +138     
+ Misses       69350    69321      -29     
+ Partials      5108     5102       -6     
Files with missing lines Coverage Δ
core/blockstm/dag.go 71.03% <100.00%> (+25.93%) ⬆️
miner/worker.go 68.46% <28.20%> (-0.82%) ⬇️

... and 14 files with indirect coverage changes

Files with missing lines Coverage Δ
core/blockstm/dag.go 71.03% <100.00%> (+25.93%) ⬆️
miner/worker.go 68.46% <28.20%> (-0.82%) ⬇️

... and 14 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kamuikatsurgi
Copy link
Copy Markdown
Member Author

@claude, review this PR focusing on performance and security. Also, make sure no regression or bug is introduced.

@claude
Copy link
Copy Markdown

claude bot commented Mar 2, 2026

Claude finished @kamuikatsurgi's task in 1m 49s —— View job


PR Review: BlockSTM O(n²) → O(n·R) Performance Optimization

  • Read and analyze the changed files
  • Review core/blockstm/dag.go for performance and correctness
  • Review core/blockstm/dag_test.go for test coverage
  • Review miner/worker.go for security and regression issues
  • Compile findings and provide detailed feedback

Summary

This PR replaces the O(n²) UpdateDeps algorithm with an O(n·R) DepsBuilder using an inverted index and word-parallel bitsets. The implementation is well-designed and correct. I found no security vulnerabilities or regressions. The bug fixes for ValidatorBytes preservation and the delayFlag off-by-one are both valid and important.


Performance Analysis ✅

Algorithm Improvement

The new DepsBuilder achieves significant performance gains through:

  1. Inverted Index (lastWriter map[Key]int) - O(1) lookup for the latest writer of any key, eliminating the O(n·W) scan per transaction in the old algorithm.

  2. Word-Parallel Bitsets - Transitive reduction runs in O(N/64) using SIMD-friendly operations (Or, AndNot), compared to O(n) map operations previously.

  3. Memory Efficiency - TxBitset uses []uint64 (8 bytes per 64 transactions) vs nested map[int]bool (40+ bytes per dependency entry).

Complexity: O(n·R) vs O(n²·W) where R = avg read-set size, W = avg write-set size.

Bitset Implementation (dag.go:243-292)

The bitset operations are correct:

  • Set/Test use proper bit indexing with i/64 for word index and i%64 for bit position
  • ForEach uses bits.TrailingZeros64 for efficient iteration (hardware-accelerated on modern CPUs)
  • word &= word - 1 clears the lowest set bit correctly

Correctness Analysis ✅

DepsBuilder Algorithm (dag.go:345-390)

The core algorithm is correct:

// For each read, find the latest writer (inverted index lookup)
for _, rd := range readList {
    if writer, ok := db.lastWriter[rd.Path]; ok && writer < index {
        db.directDeps[index].Set(writer)
    }
}

// Compute transitive closure from direct dependencies
db.directDeps[index].ForEach(func(j int) {
    db.reachable[index].Or(&db.reachable[j])
})

// Transitive reduction: remove redundant edges
db.directDeps[index].AndNot(&db.reachable[index])

// Update reachability for future transactions
db.reachable[index].Or(&db.directDeps[index])

The key insight is that reachable[j] is complete for all j < index since transactions are processed sequentially. This allows the reduction in a single pass.

Dynamic Capacity Growth (dag.go:325-343)

The ensureCapacity function correctly handles blocks with >512 transactions:

  • Doubles width until it can hold the needed index
  • Extends all existing bitsets atomically to maintain the invariant that all bitsets have identical len(words)

Error Handling

Non-sequential indices are properly rejected (dag.go:353-356), and the error is sticky - subsequent calls return the same error, and GetDeps() returns nil.


Bug Fixes ✅

1. ValidatorBytes Preservation (worker.go:1382-1385)

Before: If len(env.mvReadMapList) == 0, the code skipped rlp.DecodeBytes and re-encoded a zero-value BlockExtraData, silently dropping ValidatorBytes.

After: rlp.DecodeBytes is always called before overwriting TxDependency:

// Always decode header extra data before overwriting TxDependency.
if err := rlp.DecodeBytes(env.header.Extra[...], &blockExtraData); err != nil {
    return err
}

This is a consensus-critical fix for Cancun sprint transitions where ValidatorBytes must be preserved.

2. delayFlag Off-by-One (worker.go:1398)

Before: reads := env.mvReadMapList[i-1] - checked the previous transaction's reads, not the current one.

After: reads := env.mvReadMapList[i] - correctly checks transaction i's reads.

This fix ensures the last transaction's coinbase/burnt-contract balance reads are properly detected.


Test Coverage ✅

The test suite is comprehensive:

Test Coverage
TestBitset_* All bitset operations (Set, Test, Or, AndNot, ForEach, ToSlice)
TestDepsBuilder_NoDeps Independent transactions
TestDepsBuilder_LinearChain Chain dependencies
TestDepsBuilder_TransitiveReduction Redundant edge removal
TestDepsBuilder_DiamondDeps Fan-out/fan-in topology
TestDepsBuilder_LatestWriterWins Overwrite semantics
TestDepsBuilder_DynamicGrow Capacity expansion beyond 512
TestDepsBuilder_NonSequentialReturnsError Error handling
TestDepsBuilder_CorrectDAG 20 randomized trials verifying correctness
TestDepsBuilder_EquivalenceWithUpdateDeps Comparison with old algorithm

The TestDepsBuilder_EquivalenceWithUpdateDeps test explicitly documents the behavioral difference on "blind overwrites" (a tx writes a key without reading it first) - this is acceptable because DepsBuilder is more precise in this case.


Potential Minor Improvements

1. Consider Pre-allocating Inner Maps

In GetDeps() (dag.go:402), the inner map is created without capacity hint:

inner := make(map[int]bool)

Since the number of dependencies is known from the bitset's popcount, consider:

inner := make(map[int]bool, bits.OnesCount64(db.directDeps[i].words[0])) // for small blocks

Impact: Minor allocation reduction; not critical.

2. Document Channel Timeout

The 1-second timeout in worker.go:1337 is reasonable but could use a comment explaining why this duration was chosen.


Security Analysis ✅

  • No integer overflow risks: Bitset indices are bounded by transaction count (max ~1000 per block)
  • No race conditions: DepsBuilder runs in a single goroutine; channel communication is properly synchronized with sync.Once and sync.WaitGroup
  • No panic paths: Non-sequential index returns error rather than panicking; bitset operations don't access out-of-bounds indices due to ensureCapacity
  • No memory exhaustion: Bitset width grows by doubling (logarithmic memory growth)

Verdict

Approve

The implementation is correct, well-tested, and achieves the stated performance goals. The bug fixes address real issues that could cause consensus failures. The behavioral difference on blind overwrites is documented and actually represents more precise dependency tracking.
|

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Mar 3, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
11 Security Hotspots

See analysis details on SonarQube Cloud

@kamuikatsurgi kamuikatsurgi merged commit aaaa392 into develop Mar 3, 2026
20 of 22 checks passed
@kamuikatsurgi kamuikatsurgi deleted the kamui/blockstm-perf branch March 3, 2026 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants