Skip to content

core/rawdb: added filesystem based witness storage (with --witness.filestore flag)#2098

Merged
pratikspatil024 merged 8 commits intodevelopfrom
psp-pos-3480
Mar 5, 2026
Merged

core/rawdb: added filesystem based witness storage (with --witness.filestore flag)#2098
pratikspatil024 merged 8 commits intodevelopfrom
psp-pos-3480

Conversation

@pratikspatil024
Copy link
Copy Markdown
Member

@pratikspatil024 pratikspatil024 commented Mar 2, 2026

Description

Witness payloads are stored once per block (~every 2s) and can exceed 10MB. Storing them in Pebble increases write amplification, flush/compaction pressure, and contention on a resource shared by all chain data. Moving witness blobs to the filesystem reduces DB I/O while preserving existing behavior.

  • Introduces a WitnessStore interface that abstracts witness blob storage, with two backends: database (existing behavior) and filesystem (via --witness.filestore flag)
  • The filesystem backend writes one file per block using atomic temp+rename, 2-level hex directory sharding, and keeps size metadata in Pebble for P2P queries
  • Reads fall back to Pebble transparently, enabling seamless migration without downtime. Old witnesses are naturally pruned away within 6400 blocks

Witness file path layout

<chaindata>/witnesses/<hex[0:2]>/<hex[2:4]>/0x<block-hash>.rlp
Witnesses are stored under a witnesses/ subdirectory within chaindata to keep them co-located with chain data while isolated from Pebble's internal files. A 2-level hex directory sharding (ab/cd/) distributes files across up to 65,536 buckets, avoiding filesystem performance degradation from large flat directories. With ~6400 files retained at any time, each bucket holds fewer than 1 file on average. The block hash is used as the filename since it's the primary key for all witness operations, and the .rlp extension indicates the encoding format.

Transition behaviour

  • Pebble to Filesystem (enable --witness.filestore): New witnesses go to the filesystem. Old witnesses in Pebble remain accessible via the read fallback and are naturally pruned within 6400 blocks. No manual intervention needed.
  • Filesystem to Pebble (disable --witness.filestore): New witnesses go to Pebble. Filesystem witnesses become orphaned, that is, the DB backend does not read or delete them. A warning is logged at startup: Witness directory exists but --witness.filestore is disabled. Orphaned files can be removed manually with rm -rf <chaindata>/witnesses/.
  • Filesystem to Pebble to Filesystem (round-trip): When switching back to filesystem mode, a small number of orphaned files from the first filesystem period may persist. The pruner's cursor advanced during the Pebble period and will not revisit those earlier block heights. These files are harmless and can be ignored or removed manually.

Benchmark tests results

Write (the main operation this optimizes)

Size DB (Pebble on disk) FS FS advantage
100KB ~1ms, 3.3KB alloc ~200μs, 2.5KB alloc ~5x faster, similar allocs
1MB ~12ms, 2.3MB alloc ~320μs, 2.5KB alloc ~38x faster, ~900x less memory
5MB ~48ms, 10.9MB alloc ~1.6ms, 2.5KB alloc ~30x faster, ~4400x less memory

The DB backend's allocations scale with payload size due to Pebble's internal memtable copies and compaction buffers, while latency is dominated by flush/compaction overhead. The FS backend maintains a constant ~2.5KB allocation regardless of payload size.

Read

Size DB (Pebble on disk) FS
100KB ~3.8μs ~29μs
1MB ~120μs ~122μs
5MB ~200μs ~279μs

At realistic witness sizes (1–10MB), read performance is nearly equivalent between the two backends. Hot reads are served by the in-memory witnessCache LRU and rarely hit disk.

Delete

Backend Latency
DB (Pebble on disk) ~2μs
FS ~100μs

FS deletes are slower (syscall vs tombstone marker), but pruning runs in the background every 120s on old blocks and is not latency-sensitive.

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

Please complete this section if any breaking changes have been made, otherwise delete it

Nodes audience

In case this PR includes changes that must be applied only to a subset of nodes, please specify how you handled it (e.g. by adding a flag with a default value...)

Checklist

  • I have added at least 2 reviewer or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
  • Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
  • Includes RPC methods changes, and the Notion documentation has been updated

Cross repository changes

  • This PR requires changes to heimdall
    • In case link the PR here:
  • This PR requires changes to matic-cli
    • In case link the PR here:

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on local environment
  • I have tested this code manually on remote devnet using express-cli
  • I have tested this code manually on amoy
  • I have created new e2e tests into express-cli

Manual tests

Please complete this section with the steps you performed if you ran manual tests for this functionality, otherwise delete it

Additional comments

Please post additional comments in this section if you have them, otherwise delete it

@pratikspatil024 pratikspatil024 requested review from cffls and lucca30 March 2, 2026 12:28
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 97.95918% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.39%. Comparing base (46e2631) to head (79a3058).
⚠️ Report is 15 commits behind head on develop.

Files with missing lines Patch % Lines
core/rawdb/database.go 87.50% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2098      +/-   ##
===========================================
+ Coverage    50.87%   51.39%   +0.52%     
===========================================
  Files          878      881       +3     
  Lines       152469   152802     +333     
===========================================
+ Hits         77564    78533     +969     
+ Misses       69809    69169     -640     
- Partials      5096     5100       +4     
Files with missing lines Coverage Δ
core/blockchain.go 62.20% <100.00%> (-0.06%) ⬇️
core/blockchain_reader.go 42.28% <100.00%> (ø)
core/rawdb/pruner_strategy_wit.go 90.00% <100.00%> (+12.00%) ⬆️
core/rawdb/witness_store.go 100.00% <100.00%> (ø)
core/rawdb/witness_store_db.go 100.00% <100.00%> (ø)
core/rawdb/witness_store_fs.go 100.00% <100.00%> (ø)
eth/api_backend.go 26.56% <100.00%> (ø)
eth/backend.go 52.20% <100.00%> (+0.06%) ⬆️
eth/ethconfig/config.go 78.94% <ø> (ø)
internal/cli/server/config.go 65.07% <100.00%> (+1.35%) ⬆️
... and 4 more

... and 54 files with indirect coverage changes

Files with missing lines Coverage Δ
core/blockchain.go 62.20% <100.00%> (-0.06%) ⬇️
core/blockchain_reader.go 42.28% <100.00%> (ø)
core/rawdb/pruner_strategy_wit.go 90.00% <100.00%> (+12.00%) ⬆️
core/rawdb/witness_store.go 100.00% <100.00%> (ø)
core/rawdb/witness_store_db.go 100.00% <100.00%> (ø)
core/rawdb/witness_store_fs.go 100.00% <100.00%> (ø)
eth/api_backend.go 26.56% <100.00%> (ø)
eth/backend.go 52.20% <100.00%> (+0.06%) ⬆️
eth/ethconfig/config.go 78.94% <ø> (ø)
internal/cli/server/config.go 65.07% <100.00%> (+1.35%) ⬆️
... and 4 more

... and 54 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@lucca30 lucca30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Contributor

@cffls cffls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@pratikspatil024
Copy link
Copy Markdown
Member Author

@claude - review this PR.

@claude
Copy link
Copy Markdown

claude bot commented Mar 4, 2026

Claude finished @pratikspatil024's task in 1m 37s —— View job


PR Review: Filesystem-based witness storage

  • Gather context and understand the PR changes
  • Review the WitnessStore interface design
  • Review the database backend implementation
  • Review the filesystem backend implementation
  • Review integration points and configuration
  • Review test coverage
  • Check for security and performance issues
  • Provide comprehensive review feedback

Summary

This PR introduces a well-designed abstraction for witness storage with two backends: the existing Pebble database and a new filesystem-based store. The implementation is clean, well-tested, and addresses a legitimate performance concern with storing large witness blobs in Pebble.

Strengths

1. Clean Interface Design (witness_store.go:12-18)

The WitnessStore interface is minimal and appropriate:

type WitnessStore interface {
    ReadWitness(hash common.Hash) []byte
    WriteWitness(hash common.Hash, witness []byte)
    HasWitness(hash common.Hash) bool
    DeleteWitness(hash common.Hash)
    Close() error
}

2. Seamless Migration (witness_store_fs.go:33-43, 71-79)

The fallback-to-Pebble logic in ReadWitness and HasWitness enables transparent migration without downtime. Old witnesses are naturally pruned within 6400 blocks.

3. Atomic Writes (witness_store_fs.go:45-69)

Using temp files with atomic rename prevents partial writes on crashes:

tmpPath := finalPath + ".tmp"
os.WriteFile(tmpPath, witness, 0644)
os.Rename(tmpPath, finalPath)

4. 2-Level Sharding (witness_store.go:21-31)

The hex-based directory sharding (e.g., ab/cd/0xabcd....rlp) prevents filesystem performance issues with too many files in a single directory.

5. Excellent Test Coverage

The test suite is comprehensive, covering:

  • CRUD operations
  • Concurrent access
  • Fallback to Pebble
  • Orphaned temp file cleanup
  • Permission errors
  • Edge cases (non-existent directories, nil witness store)
  • log.Crit paths via subprocess tests

6. Benchmark Tests (witness_store_bench_test.go)

Good update to use real Pebble on disk instead of memory DB, making the benchmark results more realistic and validating the PR's claims.

Suggestions

1. Minor: Consider os.O_SYNC for Critical Data (optional)

At witness_store_fs.go:54, os.WriteFile doesn't guarantee durability until the next fsync. For critical blockchain data, you might consider using explicit os.OpenFile with os.O_SYNC or calling file.Sync() before close. However, since witnesses are re-fetchable from peers and the atomic rename already prevents corruption, this is likely acceptable.

2. Consider Documenting Size Key Inconsistency

The FS backend uses binary.BigEndian.PutUint64 for size encoding (witness_store_fs.go:64), while the DB backend uses encodeBlockNumber (accessors_state.go:327). Both produce the same result, but documenting this alignment would help future maintainers.

3. SonarQube Security Hotspots

SonarQube flagged 7 security hotspots. These are likely related to:

  • File operations with user-controlled paths (directory creation)
  • Permission settings (0755, 0644)

These appear to be false positives since:

  • The witness directory path is derived from the data directory, not user input
  • The file permissions are appropriate for blockchain data

Potential Issues

1. Coverage Gap in database.go

Codecov shows missing coverage at core/rawdb/database.go. Based on the report, this is likely the orphan warning path:

if opts.WitnessStoreDir != "" {
    if _, err := os.Stat(opts.WitnessStoreDir); err == nil {
        log.Warn("Witness directory exists but --witness.filestore is disabled...")
    }
}

This is a non-critical warning path.

2. Config Coverage Gap

The partial coverage at internal/cli/server/config.go:1608 appears to be the assignment n.WitnessFileStore = c.Witness.FileStore, which is difficult to test in isolation.

Questions for the Author

  1. Delete during migration: When deleting a witness that exists only in Pebble (pre-migration), DeleteWitness calls os.Remove on a non-existent file, which logs nothing due to the os.IsNotExist check. Then it cleans Pebble. Is this the intended behavior? (Looks correct, just confirming.)

  2. Enabled by default: The flag is now enabled by default (DefaultConfig().Witness.FileStore = true). Is there documentation or a changelog entry planned to inform operators about this change?

Verdict

Approve - This is a well-implemented feature that addresses a real performance concern. The code is clean, well-documented, and thoroughly tested. The migration path is thoughtful, and the benchmarks validate the claimed improvements.

Minor suggestions above are optional improvements that don't block merging.


@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Mar 5, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
7 Security Hotspots

See analysis details on SonarQube Cloud

@pratikspatil024 pratikspatil024 changed the title [WIP] core/rawdb: added filesystem based witness storage (with --witness.filestore flag) core/rawdb: added filesystem based witness storage (with --witness.filestore flag) Mar 5, 2026
@pratikspatil024 pratikspatil024 merged commit d10837a into develop Mar 5, 2026
17 of 19 checks passed
@pratikspatil024 pratikspatil024 deleted the psp-pos-3480 branch March 5, 2026 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants