Skip to content

[WIP] Automatically lower max open file limit to prevent crashes#9504

Merged
LukaszRozmej merged 6 commits into
masterfrom
copilot/automatically-lower-max-open-file
Oct 22, 2025
Merged

[WIP] Automatically lower max open file limit to prevent crashes#9504
LukaszRozmej merged 6 commits into
masterfrom
copilot/automatically-lower-max-open-file

Conversation

Copilot AI commented Oct 19, 2025

Copy link
Copy Markdown
Contributor
  • Analyze the codebase to understand database initialization and MaxOpenFiles configuration
  • Add system open file limit detection capability to IHardwareInfo/HardwareInfo
  • Modify RocksDbConfigFactory to automatically set MaxOpenFiles when not configured
  • Calculate safe per-database limit based on system limit, number of databases, and safety margin
  • Add appropriate logging to inform users of auto-adjustment
  • Write tests for the new functionality
  • Build and test the changes
  • Simplify implementation by directly mutating DbConfig instead of using wrapper class
  • Add flag to ensure mutation only happens once when GetForDatabase is called multiple times
  • Fix whitespace formatting
  • Adjust margin calculation to use 80% instead of dividing by 20
  • Test the solution manually to validate behavior
  • Request code review
  • Run CodeQL security checker
Original prompt

This section details on the original issue you should resolve

<issue_title>Automatically lower max open file</issue_title>
<issue_description>Is your feature request related to a problem? Please describe.

  • On OSX especially, by default nethermind will probably crash during syncing due to open file limit.
  • This can be mitigated with something like --Db.MaxOpenFiles 3000.
  • This is unclear especially because the error message is Out of memory.

Describe the solution you'd like

  • Automatically detect the limit, and set a reasonably lower limit. Remember the limit is applied to each db separately, so combined, it may still exceed it.

Describe alternatives you've considered

  • Set a default lower limit, but that will definitely slow down the node.
    </issue_description>

<agent_instructions>I dont know, figure it out. </agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes #9503


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>
@asdacap

asdacap commented Oct 20, 2025

Copy link
Copy Markdown
Contributor

Although intercepting the rocksdb config is proper, it adds a whole other file, which is fine if its in a plugin, but for core code, we can reduce the num of file. Consider mutating DbConfig directly instead.

@asdacap

asdacap commented Oct 20, 2025

Copy link
Copy Markdown
Contributor

@copilot do try to simplify.

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

Copilot AI commented Oct 20, 2025

Copy link
Copy Markdown
Contributor Author

Although intercepting the rocksdb config is proper, it adds a whole other file, which is fine if its in a plugin, but for core code, we can reduce the num of file. Consider mutating DbConfig directl...

Simplified in commit 92f6e7d by directly mutating DbConfig.MaxOpenFiles instead of using a wrapper class. Removed MaxOpenFilesAdjustedRocksdbConfig.cs file.

@copilot do try to simplify.

Simplified in 9d16b9d - now directly mutating DbConfig.MaxOpenFiles instead of using a wrapper class. Removed the MaxOpenFilesAdjustedRocksdbConfig.cs file entirely.

Copilot AI requested a review from asdacap October 20, 2025 01:08
@asdacap

asdacap commented Oct 20, 2025

Copy link
Copy Markdown
Contributor

@copilot GetForDatabase is called multiple time. Can you make it so that the mutation is done only once?

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

Copilot AI commented Oct 20, 2025

Copy link
Copy Markdown
Contributor Author

@copilot GetForDatabase is called multiple time. Can you make it so that the mutation is done only once?

Fixed in commit a1f4c8e by adding _maxOpenFilesInitialized flag to ensure the mutation and logging only happen once, even though GetForDatabase is called multiple times for different databases.

@asdacap

asdacap commented Oct 21, 2025

Copy link
Copy Markdown
Contributor

@copilot fix formatting

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

Copilot AI commented Oct 21, 2025

Copy link
Copy Markdown
Contributor Author

@copilot fix formatting

Fixed in 8b3a2d1 - removed trailing whitespace on line 22.

@asdacap

asdacap commented Oct 21, 2025

Copy link
Copy Markdown
Contributor

Tested to detect limit correctly on nixos.

@asdacap

asdacap commented Oct 21, 2025

Copy link
Copy Markdown
Contributor

@copilot reduce the margin. Instead of dividing by 20, multiply by 80%.

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

Copilot AI commented Oct 21, 2025

Copy link
Copy Markdown
Contributor Author

@copilot reduce the margin. Instead of dividing by 20, multiply by 80%.

Changed in 5c4f2a9 - now using systemLimit * 0.8 instead of systemLimit / 20. Updated tests and comments accordingly.

@asdacap asdacap marked this pull request as ready for review October 21, 2025 23:32
@LukaszRozmej LukaszRozmej merged commit f591cf4 into master Oct 22, 2025
80 checks passed
@LukaszRozmej LukaszRozmej deleted the copilot/automatically-lower-max-open-file branch October 22, 2025 21:37
kamilchodola added a commit that referenced this pull request Oct 27, 2025
* Fix delegation in eth_simulate (#9490)

* Test

* Test

* Try fix

* Delegation in GetCachedCodeInfo

* Do not trace delegate call

* Fix known-failing-tests.txt

* Fix suggestions

* Remove passing eth_getBlockByNumber hive tests (#9462)

Removed known failing tests related to eth_getBlockByNumber.

* fix: Use correct Docker Hub secrets across workflows (#9495)

* fix: Use correct Docker Hub secrets across workflows

* Update release workflow

---------

Co-authored-by: Ruben Buniatyan <rubo@users.noreply.github.com>

* Fix batched trie visitor missed storage (#9496)

* Move static/trusted nodes file to the data dir (#9477)

* Update Dockerfiles (#9497)

Co-authored-by: rubo <rubo@users.noreply.github.com>

* Fix: Blocktest exception handling logic  (#9491)

* Fix: Blocktest validation bypass and exception handling logic

This commit addresses two critical issues in blockchain test validation:

1. Validation Bypass Vulnerability:
   - Removed the `!test.SealEngineUsed ||` condition that allowed blocks
     to skip validation entirely when using NoProof seal engine
   - Now all blocks undergo proper consensus rule validation regardless
     of seal engine type (seal validation itself remains conditional)

2. Inverted Exception Logic:
   - Fixed inverted null checks on `ExpectedException` in both validation
     failure and exception handling paths
   - Added explicit validation pass/fail checks to catch blocks that
     unexpectedly pass when they should fail
   - Improved error messages to include actual validation error details
   - Added explanatory comments documenting expected behavior

The validation framework now correctly:
- Validates all blocks through the consensus rule validator
- Fails tests when blocks unexpectedly pass validation
- Fails tests when blocks unexpectedly fail validation
- Properly handles expected failures via both validation and exceptions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* ExpectException asserts simplification

* fixes

* Remove SealEngineType and SealEngineUsed

* simplify asserts

* Fix expectsException

---------

Co-authored-by: Bhargava Shastry <bhargava.shastry@ethereum.org>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Alexey Osipov <me@flcl.me>

* Update OP Superchain chains (#9500)

Co-authored-by: emlautarom1 <emlautarom1@users.noreply.github.com>

* Improve Hive RPC compat. (#9489)

* initial

* format

* apply comments

* update description

* remove leading zeros from some places

* update known-failing-hive-tests.txt after running the hive

* Update send blobs tool (#9472)

* Update send blobs docs and fix an option

* Fix dockerfile

* Make AuthorizationListForRpc.JsonConverter public for attribute-based activation (#9506)

* Add CancelAfter to Maintain_correct_pointers_for_beacon_sync_in_archi… (#9507)

* Add CancelAfter to Maintain_correct_pointers_for_beacon_sync_in_archive_sync

* add Retry

* try fix combines_contract_and_local_data_correctly

* Disable osaka chiado hardfork (#9486)

disable osaka chiado hardfork

Co-authored-by: Marc Harvey-Hill <10379486+Marchhill@users.noreply.github.com>

* Feature/xdc block sealer (#9505)

* xdc block sealer

* Test

* Format

* Use correct Bmi2 intrinsic support check (#9510)

* Fix nonce handling in eth_simulate (#9499)

* Do not load nonce from state

* Prepare state before assembling body

* Emulate nonce overflow

* known-failing-tests

* Fix suggestions

* test

* Fix

* MixHash & types & remove mod

* fxi

* Replace GitHub token (#9515)

* Fix fast sync settings workflow (#9519)

* Auto-update fast sync settings (#9522)

Co-authored-by: rubo <rubo@users.noreply.github.com>

* fix/crypto-random-secure-rng (#9513)

* Update CryptoRandom.cs

* Create CryptoRandomTests.cs

* Update src/Nethermind/Nethermind.Crypto/CryptoRandom.cs

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

* Apply suggestion from @LukaszRozmej

---------

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

* Slight logging change (#9520)

* Remove overseer tests (#9527)

* fix: CompositeTxTracer aggregates IsTracingLogs and gates ReportLog correctly (#9511)

* Log index (preparation) (#9481)

* Preparation for log-index

* PR cleanup

* Move Merge to separate interface

* PR feedback

# Conflicts:
#	src/Nethermind/Nethermind.Db/LogIndex/LogIndexStorage.cs

* Receipts events renaming

# Conflicts:
#	src/Nethermind/Nethermind.Facade/Find/LogIndexBuilder.cs

* PR feedback

* Formatting

* Code cleanup

* Code cleanup

* Fix DB config validation

* Use sorted view instead of iterator

# Conflicts:
#	src/Nethermind/Nethermind.Db/LogIndex/LogIndexStorage.cs

* Do not publicly expose iterator

* Code cleanup

* Revert changes to DB config reading

* PR feedback

* PR feedback

* PR feedback

# Conflicts:
#	src/Nethermind/Nethermind.Db.Rocks/Config/DbConfig.cs
#	src/Nethermind/Nethermind.Db.Rocks/Config/IDbConfig.cs

* PR feedback

* Persist genesis state after genesis loader to prevent unclean shutdown issues (#9536)

* Initial plan

* Add state persistence after genesis loading

- Call CommitTree(0) after successful genesis processing to force persist state to disk
- Add comprehensive tests for GenesisLoader state persistence behavior
- Ensures genesis state is available on restart after unclean shutdown

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Fix code formatting for GenesisLoaderTests

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Use IWorldStateManager.FlushCache() for genesis state persistence

- Changed GenesisLoader constructor to take IWorldStateManager instead of IWorldState
- Call FlushCache(CancellationToken.None) after successful genesis processing
- Removed test file as requested
- This uses the proper abstraction for cache flushing as suggested by reviewer

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Inject worldstate and flush after scope exit

* Add comprehensive unit tests for GenesisLoader

- Test successful genesis loading triggers FlushCache
- Test timeout scenario does not trigger FlushCache
- Test invalid block scenario does not trigger FlushCache
- Test FlushCache is called after scope exit
- All 4 new tests pass, total 24 tests in Consensus.Test

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>
Co-authored-by: Amirul Ashraf <asdacap@gmail.com>

* Fix Leading Zero Issue for taiko (#9538)

* fix leading zero issue

* add new test

* use Hash256 instead of ValueHash256

* Fix incorrect BlockRangeUpdateMessage.EarliestBlock in some edge cases (#9542)

* Add support for block producer based on global world state instance. (#9388)

* Add support for block producer based on global world state instance.

* Register ProducedBlockSuggester

* Add config. Suggest produced block and update main chain.

* Added test

* Small refactor

* Remove Holesky (#9525)

Co-authored-by: Ruben Buniatyan <rubo@users.noreply.github.com>

* Retry requesting pooled transactions (#9128)

* PoC of delayed sending if requested

* Add retry cache

* Fix logs and multiple requests expiration

* Add tests

* Rename

* Add more tests

* Improve

* Simplify dependencies

* Improve

* Rename

* no requestoor draft

* Use messages

* Remove the requestoor

* Fix tests

* Fix tests

* More fixes

* Substitute

* Disconnect peers with invalid txs

* Update src/Nethermind/Nethermind.Network.Contract/Messages/IResourceRequestMessage.cs

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

* Compress code

* Try pooled set

* Handle as eth66

* Fix review

* Fix tests

* No need in syncing handling

* Remove redundant dispose

* Review

---------

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* fix culture-dependent metric name generation (#9543)

Update PrometheusPushGatewayMetricsReporter.cs

* Fix: Include exception details in SimpleConsoleLogger.Error() (#9544)

* Update SimpleConsoleLogger.cs

* Update src/Nethermind/Nethermind.Logging/SimpleConsoleLogger.cs

---------

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

* Perf/Add ArrayPoolListRef to avoid some allocations (#9537)

* Add ArrayPoolListRef

* Apply usages ArrayPoolListRef when possible (no need for it to go on heap)

* Pass ArrayPoolListRef via in

* fix

* fixes

* small improvements

* small fix in test

* fix

* fix

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Update src/Nethermind/Nethermind.Core/Collections/ArrayPoolListRef.cs

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Remove unused custom array pool

* fix build

* AddMarkUnknownHashes to ArrayPoolListRef

---------

Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Fix merge gone wrong in #9537

* [WIP] Automatically lower max open file limit to prevent crashes (#9504)

* Initial plan

* Add automatic MaxOpenFiles detection and adjustment

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Simplify MaxOpenFiles adjustment by mutating DbConfig directly

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Ensure MaxOpenFiles mutation happens only once with initialization flag

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Fix whitespace formatting

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Change MaxOpenFiles calculation to use 80% of system limit

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Fix IndexOutOfRangeException on missing or mismatched receipts in BlockValidator (#9534)

* Initial plan

* Fix index out of range exception on missing/mismatched receipts

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Address PR feedback: move receipt count check and fix tests

Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>

* Handle static/trusted nodes old location (#9545)

* Improve build version handling (#9547)

* XDC  :  Add Header and Block Stores  (#9528)

* implement XdcBlockStore

* Add XdcBlockStore and XdcHeaderStore

* ws fix

* Apply suggested changes

* primary constructor

---------

Co-authored-by: ak88 <anders.holmbjerg@hotmail.com>

* proper fix for key property in storage proof for rpc-compat (#9551)

proper fix for key property in storage proof. revert change to ValueHash256Converter.cs to always return leading zeros.

* Xdc timeout handler (#9475)

* implement pool for timeouts and votes

* implement timeout handler

* use XdcPool for collecting timeouts

* implement checks before timeout handling and refactors

* update tc manager in tests

* expose OnReceiveTimeout method in interface for tc manager

* bit of optimization

* format

---------

Co-authored-by: ak88 <anders.holmbjerg@hotmail.com>

* Xdc Pool for timeouts and votes (#9521)

* implement pool for timeouts and votes

* remove unnecessary array allocation

* merged

---------

Co-authored-by: ak88 <anders.holmbjerg@hotmail.com>
Co-authored-by: ak88 <anders@nethermind.io>

* XDC : EpochSwitchInfo Manager (#9299)

* initial draft implementation of EpochSwitchInfoManager

* refactor and fix build issues

* added some tests

* refactor to use IXdcReleaseSpec

* fix build issue

* refactors and tests and fixes

* refactors and fixes and more tests

* fixes, add missing field, refactor tests

* refactor tests to be more predictable and consistent

* ws fixes

* fix issues

* refactors and applying suggeted changes

* fix test

* apply Math.Max suggestion

* refactors and suggetions

* remove redundent method

* fix failing tests

* refactor Spec moq-ing

* cleanup and ws fixes

* cleanup

* cleanup

* modify calls to epochSwitch mgr methods

---------

Co-authored-by: ak88 <anders.holmbjerg@hotmail.com>
Co-authored-by: Carmen Irene Cabrera Rodríguez <49727740+cicr99@users.noreply.github.com>
Co-authored-by: cicr99 <kr1000a@gmail.com>

* Update CODEOWNERS with project owners (#9552)

* Update CODEOWNERS with project owners

* Update .github/CODEOWNERS

* Update .github/CODEOWNERS

* Update .github/CODEOWNERS

* Update CODEOWNERS

* Update .github/CODEOWNERS

* Update .github/CODEOWNERS

* Update .github/CODEOWNERS

---------

Co-authored-by: Alexey Osipov <me@flcl.me>
Co-authored-by: Ahmad Bitar <33181301+smartprogrammer93@users.noreply.github.com>

* bump: nethermind version on props to 1.36.0 (#9556)

bump: nethermind version on props

* Changes for Arbitrum mainnet (#9473)

* test changes for mainnet

* fixing comments

* remove unused method

* rollback PersistentStorageProvider.cs - changes not needed

* remove logs

* remove unnecessary changes

* remove unnecessary changes

* virtual isGenesis on BlockHeader.cs

* empty line

* more places with hardcoded 0 for block or header number

* small change

* commit PR suggestion

* XDC block producer (#9512)

* block producer

* format

* test

* block production test

* format

* use the constant

* use timestamp from attributes

* default for timestamp

* name

* merged

* merge fix

* Validate sizes and types (#9546)

* PoC of delayed sending if requested

* Add retry cache

* Fix logs and multiple requests expiration

* Add tests

* Rename

* Add more tests

* Improve

* Simplify dependencies

* Improve

* Rename

* no requestoor draft

* Use messages

* Remove the requestoor

* Fix tests

* Fix tests

* More fixes

* Substitute

* Disconnect peers with invalid txs

* Update src/Nethermind/Nethermind.Network.Contract/Messages/IResourceRequestMessage.cs

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

* Compress code

* Try pooled set

* Handle as eth66

* Fix review

* Fix tests

* No need in syncing handling

* Validate sizes and types

* Add a test

* Fix test

* Add tests

* Fix test

* Code style

* Rollback

* Fix tests

* Move to proper class

* Rollback that rollback

* Mark not invalid txs as received

* Move

* Fix dispose

* Fix

* Moar

* Clean up cache

* Using

---------

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

* Use trace instead of warn (#9563)

* use trace

* if

* Validate sizes and types (#9564)

Fix logs

* XDC : Votes Manager  (#9296)

* push draft implementation  of QC manager and other related components

* intial draft implementation of Votes manager

* arg null exception

* fixes

* refactor and fixes

* refactor

* refactor

* Change signatures

* bit of refactor

* cleanup

* merged master

* Test

* format

* fixes

* format

* format

* persist when committing QC

* merge conflicts

* merge fixes

* votepool type

* concurrent vote pool

* added log

* isigner

* comment

* comments

* implement initial vote filtering

* refactor vote manager

* implement XdcPool for votes and timeouts

* ensure valid votes before processing qc

* format

* implement pool for timeouts and votes

* add tests for vote handling

* fix errors after merge

* format

* fix for QC manager

* format

* refactors and add tests

---------

Co-authored-by: ak88 <anders.holmbjerg@hotmail.com>
Co-authored-by: cicr99 <kr1000a@gmail.com>
Co-authored-by: Carmen Irene Cabrera Rodríguez <49727740+cicr99@users.noreply.github.com>

* Update OP Superchain chains (#9568)

Co-authored-by: emlautarom1 <emlautarom1@users.noreply.github.com>

* Auto-update fast sync settings (#9567)

Co-authored-by: rubo <rubo@users.noreply.github.com>

* Fix simulate errors (#9565)

* test

* Error codes

* Adjust errors

* More error codes

* Fix build

* Fix tests

* known failing tests

* Move simulate error codes

* Don't recache if from cache

* Improve ClockCache fast path

* Use smaller keys for Block and Header caches

* Pass via in

* Add number cache

* formatting

* sp

---------

Co-authored-by: Nikita Mescheryakov <root@nikitam.io>
Co-authored-by: Marek Moraczyński <marekm2504@gmail.com>
Co-authored-by: Mario Apra <mariotapra@gmail.com>
Co-authored-by: Ruben Buniatyan <rubo@users.noreply.github.com>
Co-authored-by: Amirul Ashraf <asdacap@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
Co-authored-by: Bhargava Shastry <bhargava.shastry@ethereum.org>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Alexey Osipov <me@flcl.me>
Co-authored-by: core-repository-dispatch-app[bot] <173070810+core-repository-dispatch-app[bot]@users.noreply.github.com>
Co-authored-by: emlautarom1 <emlautarom1@users.noreply.github.com>
Co-authored-by: Ahmad Bitar <33181301+smartprogrammer93@users.noreply.github.com>
Co-authored-by: Galoretka <galoretochka@gmail.com>
Co-authored-by: Marc <Marchhill@users.noreply.github.com>
Co-authored-by: Marc Harvey-Hill <10379486+Marchhill@users.noreply.github.com>
Co-authored-by: ak88 <anders@nethermind.io>
Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>
Co-authored-by: sashaodessa <140454972+sashaodessa@users.noreply.github.com>
Co-authored-by: Forostovec <ilonaforostovec22@gmail.com>
Co-authored-by: Alex <alexb5dh@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: asdacap <1841324+asdacap@users.noreply.github.com>
Co-authored-by: Diptanshu Kakwani <dipkakwani@gmail.com>
Co-authored-by: Damian Orzechowski <114909782+damian-orzechowski@users.noreply.github.com>
Co-authored-by: Marcos Antonio Maceo <35319980+stdevMac@users.noreply.github.com>
Co-authored-by: viktorking7 <140458814+viktorking7@users.noreply.github.com>
Co-authored-by: Alvarez <140459501+prestoalvarez@users.noreply.github.com>
Co-authored-by: Ayman Bouchareb <Ayman.bouchareb@outlook.fr>
Co-authored-by: ak88 <anders.holmbjerg@hotmail.com>
Co-authored-by: Carmen Irene Cabrera Rodríguez <49727740+cicr99@users.noreply.github.com>
Co-authored-by: cicr99 <kr1000a@gmail.com>
Co-authored-by: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
kamilchodola added a commit that referenced this pull request Mar 6, 2026
The original auto-limit logic (PR #9504) set MaxOpenFiles to 80% of
the system ulimit per database, but with ~15 databases the combined
limit far exceeded the OS total. This didn't prevent the macOS crash
it was designed to fix, and caused a ~43% P90 block processing
regression on Linux servers by forcing RocksDB to use an LRU file
handle cache instead of keeping all SST handles open.

Fix: only apply the restriction when the system limit is genuinely
low (< 16384, typical of macOS), and properly divide by the estimated
database count. High-limit systems (Linux servers) keep unlimited
RocksDB file handles for optimal read performance.
kamilchodola added a commit that referenced this pull request Mar 6, 2026
The auto-limit logic (PR #9504) set MaxOpenFiles to 80% of the system
ulimit per database, but with ~15 databases the combined limit was
12x the OS total. This didn't prevent the macOS crash it was designed
to fix, and caused a ~43% P90 block processing regression on Linux
by forcing RocksDB to use an LRU file handle cache instead of keeping
all SST handles open.

Fix: properly divide by the estimated database count (~15) so the
combined file handle usage stays within the OS limit.
kamilchodola added a commit that referenced this pull request Mar 6, 2026
Reverts the MaxOpenFiles auto-adjustment logic from #9504 to test
whether it's the cause of P90 regression. Keeps SST skip checks
and IHardwareInfo.MaxOpenFilesLimit property intact.
kamilchodola added a commit that referenced this pull request Mar 6, 2026
Completely reverts the MaxOpenFiles auto-adjustment from #9504:
- Remove MaxOpenFilesLimit from IHardwareInfo interface
- Remove getrlimit P/Invoke and detection logic from HardwareInfo
- Remove MaxOpenFilesLimit from TestHardwareInfo
- Remove MaxOpenFiles auto-limit logic from RocksDbConfigFactory
- Remove related tests

Keeps SST skip checks and other unrelated changes intact.
kamilchodola added a commit that referenced this pull request Mar 6, 2026
Setting any finite MaxOpenFiles forces RocksDB into LRU table cache
mode, adding overhead on every read. The original PR #9504 applied
80% of ulimit per-DB unconditionally, causing ~4% AVG / ~6% P90
regression on Linux servers with high ulimits (1048576).

Only cap MaxOpenFiles when the system limit is below 10,000 (macOS
default 256, restricted Docker 1024). On high-limit systems (typical
Linux servers), leave unlimited for best performance. This preserves
the original crash prevention for low-limit systems while eliminating
the performance regression on production servers.
benaadams pushed a commit that referenced this pull request Mar 9, 2026
* fix: correct RocksDB MaxOpenFiles formula to divide by DB count

The auto-limit logic (PR #9504) set MaxOpenFiles to 80% of the system
ulimit per database, but with ~15 databases the combined limit was
12x the OS total. This didn't prevent the macOS crash it was designed
to fix, and caused a ~43% P90 block processing regression on Linux
by forcing RocksDB to use an LRU file handle cache instead of keeping
all SST handles open.

Fix: properly divide by the estimated database count (~15) so the
combined file handle usage stays within the OS limit.

* fix: only cap MaxOpenFiles on low-ulimit systems

Setting any finite MaxOpenFiles forces RocksDB into LRU table cache
mode, adding overhead on every read. The original PR #9504 applied
80% of ulimit per-DB unconditionally, causing ~4% AVG / ~6% P90
regression on Linux servers with high ulimits (1048576).

Only cap MaxOpenFiles when the system limit is below 10,000 (macOS
default 256, restricted Docker 1024). On high-limit systems (typical
Linux servers), leave unlimited for best performance. This preserves
the original crash prevention for low-limit systems while eliminating
the performance regression on production servers.

* deduplicate tests

* even better tests

* rename

---------

Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automatically lower max open file

3 participants