fix: correct RocksDB MaxOpenFiles formula to divide by DB count by kamilchodola · Pull Request #10736 · NethermindEth/nethermind

kamilchodola · 2026-03-06T09:38:43Z

Summary

Fix RocksDB MaxOpenFiles auto-limit formula to properly divide by estimated database count (~15)
The OS file descriptor limit is shared across all databases, but MaxOpenFiles is set per database

Problem

PR #9504 introduced auto-detection of the system open file limit to prevent macOS crashes during sync. The formula perDbLimit = 80% × systemLimit was applied per database, but with ~15 databases the combined limit was 15 × 0.8 × limit = 12× the OS limit. This:

Didn't actually prevent the macOS crash it was designed to fix
Caused a ~43% P90 block processing regression by setting an unnecessarily high per-DB limit that still forced RocksDB into LRU file handle caching mode

Fix

Divide by estimated DB count: perDbLimit = max(128, systemLimit × 0.8 / 15)

System limit	Before (per DB)	Before (combined)	After (per DB)	After (combined)
256 (macOS)	256	3840 ❌	128	1920 ✅
1024 (low Linux)	819	12285 ❌	128	1920 ✅
65536 (server)	52428	786420 ❌	3495	52425 ✅

Changes

Bug fix (non-breaking change which fixes an issue)

Test plan

Updated RocksDbConfigFactoryTests — all 11 tests pass
Benchmark validation: run realblocks payload set to verify P90 recovery

The auto-limit logic (PR #9504) set MaxOpenFiles to 80% of the system ulimit per database, but with ~15 databases the combined limit was 12x the OS total. This didn't prevent the macOS crash it was designed to fix, and caused a ~43% P90 block processing regression on Linux by forcing RocksDB to use an LRU file handle cache instead of keeping all SST handles open. Fix: properly divide by the estimated database count (~15) so the combined file handle usage stays within the OS limit.

asdacap · 2026-03-06T09:42:28Z

Dont reduce the limit dammit. And dont make it unlimited.

asdacap · 2026-03-06T09:44:14Z

You cant divide it by num of DB. Some db take up a lot of file, others dont. Thats why per db it is 80%.

LukaszRozmej · 2026-03-06T10:02:58Z

You cant divide it by num of DB. Some db take up a lot of file, others dont. Thats why per db it is 80%.

should we tweak it more explicitly for some dbs? I expect performance here is mostly about state db.

Copilot

Pull request overview

Adjusts the auto-derived RocksDB MaxOpenFiles value to account for the fact that the OS file-descriptor limit is shared across multiple DB instances, aligning the configuration with the original goal of preventing FD exhaustion (notably on macOS).

Changes:

Update RocksDbConfigFactory to divide the 80%-of-system-limit budget by an estimated DB count (~15) when auto-setting MaxOpenFiles.
Update RocksDbConfigFactoryTests to validate the new per-DB limit calculation for low and high system limits.
Adjust the minimum enforced MaxOpenFiles expectation in tests.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`src/Nethermind/Nethermind.Db.Rocks/Config/RocksDbConfigFactory.cs`	Changes the auto-limit formula to divide the FD budget by an estimated number of DBs.
`src/Nethermind/Nethermind.Db.Test/RocksDbConfigFactoryTests.cs`	Updates/extends unit tests to reflect the revised `MaxOpenFiles` calculation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-06T12:47:57Z

+                int perDbLimit = Math.Max(128, (int)(systemLimit * 0.8 / EstimatedDbCount));



The new formula can still choose a per-DB MaxOpenFiles (128) that exceeds the computed per-DB budget (systemLimit * 0.8 / EstimatedDbCount). For low system limits this makes the total across ~15 DBs far above the intended 80% process FD budget, which undermines the crash-prevention goal and makes the preceding comment about reserving 20% inaccurate. Consider clamping the final value so that (perDbLimit * EstimatedDbCount) does not exceed systemLimit * 0.8 (and/or logging a warning when the system limit is too low to satisfy the chosen minimum).

Suggested change

int perDbLimit = Math.Max(128, (int)(systemLimit * 0.8 / EstimatedDbCount));

const int MinPerDbLimit = 128;

double totalBudget = systemLimit * 0.8;

int budgetPerDb = (int)(totalBudget / EstimatedDbCount);

int perDbLimit;

if (budgetPerDb < MinPerDbLimit)

{

// Clamp to the per-DB budget so that EstimatedDbCount * perDbLimit does not exceed 80% of the system limit.

perDbLimit = budgetPerDb;

if (_logger.IsWarn)

{

_logger.Warn($"System open files limit {systemLimit} is too low to allocate the desired minimum of {MinPerDbLimit} MaxOpenFiles per database across {EstimatedDbCount} databases. Using reduced per-database limit of {perDbLimit} to stay within 80% of the process budget.");

}

}

else

{

// Use the budget-based per-DB limit; this keeps total usage within ~80% of the system limit.

perDbLimit = budgetPerDb;

}

Copilot · 2026-03-06T12:47:58Z

+        // Very low system limit should still give minimum of 128
+        var factory = new RocksDbConfigFactory(dbConfig, new PruningConfig(), new TestHardwareInfo(0, 100), LimboLogs.Instance);
        IRocksDbConfig config = factory.GetForDatabase("State0", null);
-        // With system limit of 1000, would be 1000 * 0.8 = 800
-        config.MaxOpenFiles.Should().Be(800);
+        // With system limit of 100: max(128, 100 * 0.8 / 15) = max(128, 5) = 128
+        config.MaxOpenFiles.Should().Be(128);


This test asserts that when the detected system FD limit is 100, MaxOpenFiles is set to 128. That means the auto-derived value can exceed the OS-reported limit, which contradicts the intent of auto-limiting and can make the log message misleading. If the implementation is updated to clamp the per-DB limit to the available budget, update this test to assert the clamped value (or adjust the scenario to a realistic system limit where the minimum does not exceed the OS limit).

Setting any finite MaxOpenFiles forces RocksDB into LRU table cache mode, adding overhead on every read. The original PR #9504 applied 80% of ulimit per-DB unconditionally, causing ~4% AVG / ~6% P90 regression on Linux servers with high ulimits (1048576). Only cap MaxOpenFiles when the system limit is below 10,000 (macOS default 256, restricted Docker 1024). On high-limit systems (typical Linux servers), leave unlimited for best performance. This preserves the original crash prevention for low-limit systems while eliminating the performance regression on production servers.

github-actions · 2026-03-06T19:30:22Z

EXPB Benchmark Comparison

Run: View workflow run

superblocks

Scenario: nethermind-halfpath-superblocks-fix-rocksdb-max-open-files-delay0s

Metric	PR	Master (cached)	Delta PR vs Master
AVG (ms)	1445.357000	1435.671900	+0.67%
MEDIAN (ms)	1130.620000	1117.195000	+1.20%
P90 (ms)	2519.62	2502.35	+0.69%
P95 (ms)	2650.64	2601.30	+1.90%
P99 (ms)	4024.15	3878.72	+3.75%
MIN (ms)	751.67	751.50	+0.02%
MAX (ms)	4324.64	4529.10	-4.51%

realblocks

Scenario: nethermind-halfpath-realblocks-fix-rocksdb-max-open-files-delay0s

Metric	PR	Master (cached)	Delta PR vs Master
AVG (ms)	40.607081	41.825554	-2.91%
MEDIAN (ms)	23.770000	24.900000	-4.54%
P90 (ms)	109.32	104.82	+4.29%
P95 (ms)	143.12	143.15	-0.02%
P99 (ms)	208.92	203.41	+2.71%
MIN (ms)	0.52	0.55	-5.45%
MAX (ms)	4109.01	4747.22	-13.44%

kamilchodola · 2026-04-21T09:52:14Z

@claude can you do review of this PR once more to check if it can introduce any regresison on RocksDB Get method on Ubuntu 24?

claude · 2026-04-21T09:52:31Z

Claude finished @kamilchodola's task in 2m 45s —— View job

Review: RocksDB MaxOpenFiles Fix — Ubuntu 24 Regression Check

Short answer: No, this PR does not introduce a regression on RocksDB Get on Ubuntu 24. For typical server deployments it should improve Get performance.

How `MaxOpenFiles` affects RocksDB Get

When MaxOpenFiles is a finite value, RocksDB wraps all SST file handles in an LRU table cache. Every Get that touches a file whose handle was evicted must reopen that file, adding latency. When MaxOpenFiles is unlimited (null), RocksDB keeps all SST handles permanently open — no eviction overhead on the read path.

Ubuntu 24 scenarios

Deployment	Typical soft `RLIMIT_NOFILE`	Before PR (via #9504)	After this PR	Get impact
User process (default)	1 024	`max(128, 819) = 819`	`1 024 < 10 000` → same 819	unchanged
systemd service (`LimitNOFILE=65536`)	65 536	`max(128, 52 428) = 52 428`	`65 536 ≥ 10 000` → unlimited	✅ improved
Docker container (default)	1 048 576	`max(128, 838 860) = 838 860`	≥ 10 000 → unlimited	✅ improved

On Ubuntu 24, the most common Nethermind deployment mode is a systemd service with LimitNOFILE far above the 10 000 threshold. This PR changes that path from a large-but-finite limit to unlimited — removing the LRU table-cache overhead on every Get.

The benchmark results confirm this:

realblocks AVG −2.91%, MEDIAN −4.54% — improvement
realblocks P90 +4.29% — minor; within noise range for a stochastic workload

Issues found

Low — Pre-existing: RLIMIT_NOFILE = 7 is wrong for macOS
src/Nethermind/Nethermind.Core/HardwareInfo.cs:35

The code uses the hardcoded constant 7 for both Linux and macOS, with the comment "Same value for both macOS and Linux." This is incorrect: Linux uses 7 ✓ but macOS uses 8. On macOS, getrlimit(7, ...) actually queries RLIMIT_NPROC (max processes), returning the process limit instead of the FD limit. This may cause the macOS crash-prevention path introduced in #9504 to read a wrong limit. Ubuntu 24 is unaffected (Linux correctly uses 7). This is a pre-existing bug not introduced by this PR, but worth tracking.

Low — Non-atomic initialization guard
src/Nethermind/Nethermind.Db.Rocks/Config/RocksDbConfigFactory.cs:20

private bool _maxOpenFilesInitialized;

_maxOpenFilesInitialized is read/written without any synchronization primitive (volatile, Interlocked, or lock). If GetForDatabase is called concurrently from multiple threads during initialization, two threads could both observe false and both set dbConfig.MaxOpenFiles. Since they'd compute the same value, this is a benign race in practice — but it contradicts the intent of "only once" and could cause confusing debug logs if the log line fires twice. On Ubuntu 24 this is not a performance or correctness regression.

Low — Threshold edge (10 000) silently switches to unlimited
If a deployment has exactly LimitNOFILE=10000 (or anything from 10 000 to ~65 000), this PR switches from a finite per-DB limit to unlimited with no log message. The Debug log fires but only when MaxOpenFiles is left at null. If those DBs collectively open many SST files on a 10 001-FD system they could exhaust the budget. RocksDB handles open() failures gracefully via its table cache, so no crash is expected — but silent FD exhaustion could slow down Gets unexpectedly. Consider logging the decision at Info level for this case too.

Verdict

No Critical, High, or Medium findings
The PR does not introduce a RocksDB Get regression on Ubuntu 24
Server deployments (Ubuntu 24 + systemd) will likely see a measurable Get improvement due to the unlimited table cache handles
The three Low findings above are worth tracking but do not affect correctness or performance on Ubuntu 24

github-actions Bot added bug fix + reliability database rocksdb labels Mar 6, 2026

kamilchodola force-pushed the fix/rocksdb-max-open-files branch from 9bb3d5a to 181c224 Compare March 6, 2026 09:42

kamilchodola changed the title ~~fix: only restrict RocksDB MaxOpenFiles on low system limits~~ fix: correct RocksDB MaxOpenFiles formula to divide by DB count Mar 6, 2026

LukaszRozmej requested a review from Copilot March 6, 2026 12:44

Copilot started reviewing on behalf of LukaszRozmej March 6, 2026 12:44 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

kamilchodola marked this pull request as ready for review March 6, 2026 19:13

kamilchodola requested review from LukaszRozmej, asdacap and damian-orzechowski as code owners March 6, 2026 19:13

Merge branch 'master' into fix/rocksdb-max-open-files

6b9b351

kamilchodola added the reproducible-benchmark label Mar 6, 2026

LukaszRozmej approved these changes Mar 7, 2026

View reviewed changes

LukaszRozmej added 3 commits March 7, 2026 02:28

deduplicate tests

21ecd90

even better tests

3f4c9f7

rename

4989697

asdacap approved these changes Mar 7, 2026

View reviewed changes

benaadams approved these changes Mar 9, 2026

View reviewed changes

benaadams merged commit 221335a into master Mar 9, 2026
111 checks passed

benaadams deleted the fix/rocksdb-max-open-files branch March 9, 2026 23:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct RocksDB MaxOpenFiles formula to divide by DB count#10736

fix: correct RocksDB MaxOpenFiles formula to divide by DB count#10736
benaadams merged 6 commits into
masterfrom
fix/rocksdb-max-open-files

kamilchodola commented Mar 6, 2026 •

edited

Loading

Uh oh!

asdacap commented Mar 6, 2026

Uh oh!

asdacap commented Mar 6, 2026

Uh oh!

LukaszRozmej commented Mar 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

github-actions Bot commented Mar 6, 2026

Uh oh!

Uh oh!

kamilchodola commented Apr 21, 2026

Uh oh!

claude Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		int perDbLimit = Math.Max(128, (int)(systemLimit * 0.8 / EstimatedDbCount));

-                int perDbLimit = Math.Max(128, (int)(systemLimit * 0.8 / EstimatedDbCount));
+                const int MinPerDbLimit = 128;
+                double totalBudget = systemLimit * 0.8;
+                int budgetPerDb = (int)(totalBudget / EstimatedDbCount);
+                int perDbLimit;
+                if (budgetPerDb < MinPerDbLimit)
+                {
+                    // Clamp to the per-DB budget so that EstimatedDbCount * perDbLimit does not exceed 80% of the system limit.
+                    perDbLimit = budgetPerDb;
+                    if (_logger.IsWarn)
+                    {
+                        _logger.Warn($"System open files limit {systemLimit} is too low to allocate the desired minimum of {MinPerDbLimit} MaxOpenFiles per database across {EstimatedDbCount} databases. Using reduced per-database limit of {perDbLimit} to stay within 80% of the process budget.");
+                    }
+                }
+                else
+                {
+                    // Use the budget-based per-DB limit; this keeps total usage within ~80% of the system limit.
+                    perDbLimit = budgetPerDb;
+                }

Conversation

kamilchodola commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Changes

Test plan

Uh oh!

asdacap commented Mar 6, 2026

Uh oh!

asdacap commented Mar 6, 2026

Uh oh!

LukaszRozmej commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 6, 2026

EXPB Benchmark Comparison

superblocks

realblocks

Uh oh!

Uh oh!

kamilchodola commented Apr 21, 2026

Uh oh!

claude Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: RocksDB MaxOpenFiles Fix — Ubuntu 24 Regression Check

How MaxOpenFiles affects RocksDB Get

Ubuntu 24 scenarios

Issues found

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kamilchodola commented Mar 6, 2026 •

edited

Loading

LukaszRozmej commented Mar 6, 2026 •

edited

Loading

claude Bot commented Apr 21, 2026 •

edited

Loading

How `MaxOpenFiles` affects RocksDB Get