Problem
The aggregator merge schedule is not fully deterministic across nodes. Two nodes at the same step can have different file layouts due to:
- Preverified legacy accommodation:
getMergeLimit() in snap_repo.go adjusts merge limits based on preverified files, which differ across binary releases
- Cross-domain coordination: commitment domain must wait for accounts/storage to merge first. Interrupted merges leave divergent intermediate states
- Two merge code paths:
aggregator.go (old, active) vs forkable_agg.go/snap_repo.go (new) with different merge logic
Why this matters
Decentralized snapshot distribution (#19660) requires nodes to compare what files they have. If the merge schedule is deterministic, two honest nodes at the same step produce identical files with identical torrent hashes — enabling direct hash comparison and threshold consensus (#19658).
If non-deterministic, we need structural range-coverage comparison instead of hash comparison, which is significantly more complex and weakens the trust model.
Current merge schedule
The bit-manipulation schedule (endStep & -endStep) is inherently deterministic:
Step 8: [0-8)
Step 12: [0-8) [8-12)
Step 16: [0-16)
Step 24: [0-16) [16-24)
Step 32: [0-32)
But it's perturbed by:
maxSpan cap from stepsInFrozenFile (configurable via erigondb.toml)
getMergeLimit() returning larger sizes when preverified files exist
- Commitment domain holding merges until accounts/storage catch up
Example of divergence
Node A (clean run):
v1.0-accounts.0-4096.kv (deep merge)
v1.0-accounts.4096-4224.kv (128 steps)
v1.0-accounts.4224-4232.kv (8 steps)
Node B (interrupted merge, restarted):
v1.0-accounts.0-2048.kv (partial deep merge)
v1.0-accounts.2048-4096.kv
v1.0-accounts.4096-4224.kv
v1.0-accounts.4224-4232.kv
Both cover steps 0-4232 but with different file boundaries and torrent hashes.
Proposed resolution
Make the merge schedule purely deterministic: given (step, MergeStages config), the expected file layout is computable. On startup, identify missing merges and execute them to converge to the canonical layout.
This would mean:
- All nodes at the same step produce identical files
- chain.toml can use hash comparison (simple, torrent-compatible)
- Threshold consensus works (same hash = agreement)
- UCAN can sign
(chain, step) tuples — layout is implied
Alternative
Accept non-determinism and use structural range-coverage comparison in chain.toml. Files are compared by step range, not hash. More complex, weaker trust model.
Blocking
This decision blocks the chain.toml V2 format design (#19660) — the format differs depending on whether we can assume deterministic layouts.
Related issues
Problem
The aggregator merge schedule is not fully deterministic across nodes. Two nodes at the same step can have different file layouts due to:
getMergeLimit()insnap_repo.goadjusts merge limits based on preverified files, which differ across binary releasesaggregator.go(old, active) vsforkable_agg.go/snap_repo.go(new) with different merge logicWhy this matters
Decentralized snapshot distribution (#19660) requires nodes to compare what files they have. If the merge schedule is deterministic, two honest nodes at the same step produce identical files with identical torrent hashes — enabling direct hash comparison and threshold consensus (#19658).
If non-deterministic, we need structural range-coverage comparison instead of hash comparison, which is significantly more complex and weakens the trust model.
Current merge schedule
The bit-manipulation schedule (
endStep & -endStep) is inherently deterministic:But it's perturbed by:
maxSpancap fromstepsInFrozenFile(configurable viaerigondb.toml)getMergeLimit()returning larger sizes when preverified files existExample of divergence
Node A (clean run):
Node B (interrupted merge, restarted):
Both cover steps 0-4232 but with different file boundaries and torrent hashes.
Proposed resolution
Make the merge schedule purely deterministic: given
(step, MergeStages config), the expected file layout is computable. On startup, identify missing merges and execute them to converge to the canonical layout.This would mean:
(chain, step)tuples — layout is impliedAlternative
Accept non-determinism and use structural range-coverage comparison in chain.toml. Files are compared by step range, not hash. More complex, weaker trust model.
Blocking
This decision blocks the chain.toml V2 format design (#19660) — the format differs depending on whether we can assume deterministic layouts.
Related issues