perf: Remove unnecessary caches by bthomee · Pull Request #5439 · XRPLF/rippled

bthomee · 2025-05-20T14:01:08Z

High Level Overview of Change

This change removes the cache in DatabaseNodeImp and simplifies the caching logic in SHAMapStoreImp.

Context of Change

The codebase contains many caches, and this PR is one of several to refactor the codebase with as goal to improve performance, such as reducing memory use, to support future account, trust line, and transaction growth. In this case. as NuDB and RocksDB internally already use caches, additional caches in the code are not very valuable or may even be unnecessary. A number of performance analyses demonstrated improved performance, and more extensive performance analyses will be performed before this change is released.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactor (non-breaking change that only restructures code)
Performance (increase or change in throughput and/or latency)
Tests (you added tests for code that already exists, or your new feature included in this PR)
Documentation update
Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
Release

codecov · 2025-05-20T14:47:00Z

Codecov Report

❌ Patch coverage is 29.62963% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.9%. Comparing base (25d7c2c) to head (32d1414).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines	Patch %	Lines
src/libxrpl/nodestore/DatabaseNodeImp.cpp	20.8%	19 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           develop   #5439   +/-   ##
=======================================
  Coverage     79.9%   79.9%           
=======================================
  Files          840     840           
  Lines        65549   65483   -66     
  Branches      7266    7250   -16     
=======================================
- Hits         52352   52318   -34     
+ Misses       13197   13165   -32

Files with missing lines	Coverage Δ
include/xrpl/nodestore/Database.h	`69.2% <ø> (ø)`
include/xrpl/nodestore/detail/DatabaseNodeImp.h	`72.7% <ø> (-3.7%)`	⬇️
...nclude/xrpl/nodestore/detail/DatabaseRotatingImp.h	`66.7% <ø> (ø)`
src/libxrpl/nodestore/DatabaseRotatingImp.cpp	`63.9% <ø> (+0.8%)`	⬆️
src/xrpld/app/main/Application.cpp	`70.9% <ø> (+0.1%)`	⬆️
src/xrpld/app/misc/SHAMapStoreImp.cpp	`76.2% <100.0%> (+0.4%)`	⬆️
src/xrpld/app/misc/SHAMapStoreImp.h	`95.7% <ø> (ø)`
src/libxrpl/nodestore/DatabaseNodeImp.cpp	`36.6% <20.8%> (+2.5%)`	⬆️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PeterChen13579 · 2025-05-20T14:53:16Z

Just looping in @kuznetsss before approving, I don’t believe we do any node-level DB caching in Clio, right? So no changes should be needed on our side?

kuznetsss · 2025-05-20T15:46:23Z

As I understand this is rippled's internal database. So no changes on Clio's side are required.

sublimator · 2025-08-13T12:15:17Z

I was looking into memory usago a few weeks ago too and noticed some oddities with the node store cache. Sharing my notes.

So regarding the NodeStore cache, like you, I'm of the opinion that it is actually largely pointless.

Peer messages use the SHAMap as source of truth and use a custom wire encoding
There's already a TreeNodeCache for SHAMap nodes that you may want to have laying around

Why was it ever there!? Well, it seems like back in the day, for some reason the TreeNodeCache was actually keyed by an id representing a node's position in the tree. This was also before NUDB. Think LevelDB and SQLite. So you had the tree node cache often missing, and a slow k/v store, so I'm guessing it was a backup cache.

It's kind of crazy, we read from the node store, then create a DecodedBlob from decompressed data, only to immediately just make a copy of that, and then cache that "because reasons"

Now days? Shrug. It seems you can disable it via setting cache_age=0 and cache_size=0 in the [node_db] configuration.

Attached Claude nodes when I did the investigation

NodeObject Cache Investigation

Summary of Findings

After investigating the NodeObject cache in rippled/xahaud, it appears to serve no clear purpose in the current architecture. Evidence suggests it might be a remnant from an earlier design that's now just adding overhead to node fetch operations. But without access to original design discussions, we can't be completely certain.

Historical Investigation

A Theory: TreeNodeCache Was Originally Keyed by NodeID

Looking at the July 2013 codebase (commit c645901), there's interesting evidence that might explain why the NodeObject cache exists. The code suggests there may have been a fundamental issue with how the TreeNodeCache was implemented.

What the 2013 Code Shows:

The mTNByID Map (ripple_SHAMap.h:225):
```
boost::unordered_map<SHAMapNode, SHAMapTreeNode::pointer> mTNByID;
```
The map name itself - "Tree Nodes By ID" - suggests it was keyed by NodeID, not hash.

The HashedObjectStore Cache (ripple_HashedObjectStore.h):

class HashedObjectStore
{
public:
    HashedObjectStore (int cacheSize, int cacheAge);
    
    HashedObject::pointer retrieve (uint256 const& hash)
    {
        if (mLevelDB)
            return retrieveLevelDB (hash);
        return retrieveSQLite (hash);
    }

private:
    TaggedCache<uint256, HashedObject, UptimeTimerAdapter> mCache;
    KeyCache <uint256, UptimeTimerAdapter> mNegativeCache;

The HashedObjectStore had its own cache keyed by hash (uint256), separate from the TreeNodeCache.

How getNode() Worked (ripple_SHAMap.cpp:204):

SHAMapTreeNode::pointer node = checkCacheNode(id);  // Lookup by NodeID
if (node) {
#ifdef DEBUG
    if (node->getNodeHash() != hash)  // Then verify hash matches
        throw std::runtime_error("invalid node");
#endif

It seems to look up by NodeID first, then check if the hash matches - which could be problematic.

The Fallback Path (ripple_SHAMap.cpp:820):

SHAMapTreeNode::pointer SHAMap::fetchNodeExternalNT(const SHAMapNode& id, uint256 const& hash)
{
    // When mTNByID misses...
    HashedObject::pointer obj(theApp->getHashedObjectStore().retrieve(hash));

When the ID-based lookup failed, it fell back to a hash-based lookup in the HashedObjectStore.

How HashedObjectStore Retrieved Objects (ripple_HashedObjectStore.cpp):

HashedObject::pointer HashedObjectStore::retrieveLevelDB (uint256 const& hash)
{
    HashedObject::pointer obj = mCache.fetch (hash);
    
    if (obj || mNegativeCache.isPresent (hash) || !theApp || !theApp->getHashNodeLDB ())
        return obj;
    
    // ... fetch from LevelDB if not in cache
    obj = LLRetrieve (hash, theApp->getHashNodeLDB ());
    
    if (!obj)
    {
        mNegativeCache.add (hash);
        return obj;
    }
    
    mCache.canonicalize (hash, obj);
    return obj;
}

The HashedObjectStore maintained its own cache using the hash as the key, which would succeed even when mTNByID missed.

This could have created an interesting problem:

The same content (same hash) might exist at different positions (different IDs) in different trees
An ID-based cache would miss when looking for that content at a different position
A hash-based cache would find it

Could NodeObject Cache Have Been a Workaround?

If the above theory is correct, the NodeObject cache (keyed by hash) might have been added to catch what the TreeNodeCache (keyed by ID) missed. This would explain why there were two caches doing seemingly similar things.

It's worth noting that in 2013, the backend databases (SQLite and LevelDB) would have been considerably slower than modern options like NuDB or RocksDB. Given these performance characteristics, having a secondary cache to avoid repeated database lookups when the TreeNodeCache missed would have made more sense.

What the 2014 Documentation Says

By July 2014, the team seemed aware of these issues. From the SHAMap README (commit ea44497):

"When we navigate the tree (say, like SHAMap::walkTo()) we currently ask each node for information that we could determine locally... The next refactor should remove all calls to SHAMapTreeNode::GetID()... Then we can change the SHAMap::mTNByID member to be mTNByHash."

This suggests they wanted to move away from ID-based lookups entirely.

Current State

Today's code uses direct pointer traversal:

std::shared_ptr<SHAMapTreeNode> node = parent->getChild(branch);
if (node || !backed_)
    return node;

It seems like the architecture has evolved significantly, but the NodeObject cache remains - possibly as a remnant of the old workaround.

Current Caching Layers

Looking at the current code, there appear to be three caching layers:

1. TreeNodeCache

Caches fully parsed tree nodes
Seems useful for historical ledger queries
Uses the modern hash-based lookup

2. NodeObject Cache

Caches raw serialized bytes WITHOUT the 9-byte prefix (the prefix is stripped during the copy)
Unclear what benefit it provides today
Forces an extra copy operation on every fetch to remove the prefix

3. In-Memory SHAMap

Direct pointer tree structure for current ledgers
Handles most operations without any cache lookups

Observed Inefficiencies

The Copy Problem

Every node fetch from disk appears to do an unnecessary copy:

// In DecodedBlob::createObject()
Blob data(m_objectData, m_objectData + m_dataBytes);  // COPY!

This happens because:

The decompressed data includes a 9-byte prefix
NodeObject doesn't want the prefix
So it copies everything except the first 9 bytes

Where NodeObject Cache Doesn't Seem to Help

Current Ledger Operations: Use direct pointers, no cache needed
Historical Queries: TreeNodeCache already has the parsed nodes
P2P Syncing: Uses its own wire format
Startup: Cache is empty, so it's just overhead
RPC Queries: TreeNodeCache handles these better

Questions and Uncertainties

Was the NodeObject cache really a workaround for the ID-based TreeNodeCache? The evidence suggests it, but we can't be certain.
Are there any edge cases where the NodeObject cache provides value that we're missing?
Why wasn't it removed when the architecture changed? Fear of breaking something? Or does it serve some purpose we haven't identified?

Potential Impact of Removal

If the NodeObject cache truly serves no purpose, removing it could:

Eliminate the unnecessary copy on every disk read
Reduce memory usage
Simplify the code
Possibly improve performance

But without comprehensive testing, it's hard to be certain there aren't hidden dependencies.

Disabling the Cache

Interestingly, the NodeObject cache is already optional in the codebase. The cache initialization logic in DatabaseNodeImp.h shows how it can be disabled:

// Lines 46-76 from DatabaseNodeImp.h
std::optional<int> cacheSize, cacheAge;

if (config.exists("cache_size"))
{
    cacheSize = get<int>(config, "cache_size");
    if (cacheSize.value() < 0)
    {
        Throw<std::runtime_error>(
            "Specified negative value for cache_size");
    }
}

if (config.exists("cache_age"))
{
    cacheAge = get<int>(config, "cache_age");
    if (cacheAge.value() < 0)
    {
        Throw<std::runtime_error>(
            "Specified negative value for cache_age");
    }
}

if (cacheSize != 0 || cacheAge != 0)
{
    cache_ = std::make_shared<TaggedCache<uint256, NodeObject>>(
        "DatabaseNodeImp",
        cacheSize.value_or(0),
        std::chrono::minutes(cacheAge.value_or(0)),
        stopwatch(),
        j);
}

The key line is if (cacheSize != 0 || cacheAge != 0) - the cache is only created if at least one of these values is non-zero. Setting both to 0 leaves cache_ as nullptr.

Throughout DatabaseNodeImp.cpp, the code consistently checks if cache_ exists before using it:

// Line 55: storing objects
if (cache_ && !skipCache)

// Line 71: fetching objects  
if (cache_)

// Line 98: conditional fetch
cache_ ? cache_->fetch(hash) : nullptr

This means the cache can be disabled through configuration by setting both cache_age and cache_size to 0. When disabled:

All cache operations are skipped
The code falls back to direct backend access
No performance penalty from the unnecessary copy operation

This provides an easy way to test whether the cache provides any real benefit - just disable it and measure the impact.

Additional Problems: P2P Cache Pollution

Another major issue with the NodeObject cache is that peer requests pollute it with one-off data. When peers request nodes via TMGetObjectByHash, the code fetches from NodeStore, which adds the data to the cache even though it's quite possibly never going to be used again, especially if it's for a historical ledger.

This means if you have the cache configured, it will continuously grow with useless data every time peers request nodes from you. The cache fills with garbage that has near-zero probability of reuse, evicting any potentially useful entries. It's essentially uncontrolled memory growth triggered by external peer requests. See .ai-docs/nodeobject-cache-p2p-fix.md for a proposed solution.

Recommendations for Further Investigation

Test with cache_size=0 and cache_age=0 to disable the cache completely
Run benchmarks comparing performance with and without the cache
Monitor memory usage and query latency in both configurations
Look for any unexpected behavior when cache is disabled
Investigate P2P request patterns to quantify cache pollution

The fact that the cache is already optional suggests the developers recognized it might not always be necessary. Testing with it disabled would be a low-risk way to validate whether it's truly vestigial.

src/xrpld/app/misc/SHAMapStoreImp.cpp

bthomee · 2025-08-14T18:53:37Z

@sublimator I loved the historical context and the results from your deep dive - the amount of detail is very much appreciated!

As for your recommendations, our old performance analysis platform didn't let me modify the rippled.cfg, hence I created this PR in which I completely removed the code so I could evaluate its memory and cpu usage. It did show memory improvements at a slight increase in CPU. However, as our platform turned out to have some shortcomings (for one, it measured system memory rather than process memory, while it was being run on a machine in the cloud that can be shared with an arbitrary number of other services), we have been working on a new platform. This new platform is just about ready, so I'll be rerunning the experiment there to ensure there indeed performance improvements without any ill side-effects.

sublimator · 2025-08-18T00:43:57Z

@bthomee

You're welcome, I assumed you could recontextualize/summarize it!
Was my Claude notes.

sublimator · 2025-09-07T02:41:48Z

src/xrpld/app/misc/SHAMapStoreImp.cpp

    netOPs_ = &app_.getOPs();
    ledgerMaster_ = &app_.getLedgerMaster();
-    fullBelowCache_ = &(*app_.getNodeFamily().getFullBelowCache());
-    treeNodeCache_ = &(*app_.getNodeFamily().getTreeNodeCache());


@bthomee

It seems to be used by the SHAMap code itself. Tree node cache is more than a simple cache and is used for canonicalizing nodes. I was just deep diving into ripple'd shamap recently, and then I recalled this PR.

void SHAMap::canonicalize(SHAMapHash const& hash, std::shared_ptr<SHAMapTreeNode>& node) const { assert(backed_); assert(node->cowid() == 0); // Only canonical nodes can be cached assert(node->getHash() == hash); f_.getTreeNodeCache(ledgerSeq_) ->canonicalize_replace_client(hash.as_uint256(), node); }

It's what is used to make sure there's only one instance of a given node when structurally sharing trees.

All you're doing here is removing the SHAMapStoreImp reference, but maybe that clear() is actually useful ? I'm not sure when/why/how clearCaches() is being called, but it might actually be doing something beneficial to keeping memory usage down there.

Oh, wait, sorry it's calling freshenCache/freshenCaches

template <class CacheInstance> bool freshenCache(CacheInstance& cache) { std::uint64_t check = 0; for (auto const& key : cache.getKeys()) { dbRotating_->fetchNodeObject( key, 0, NodeStore::FetchType::synchronous, true); // <- duplicate = true if (!(++check % checkHealthInterval_) && healthWait() == stopping) return true; } return false; }

By my reading 'freshenCache' is actually forcing any memory resident nodes to actually be 'duplicate=true'd written to the new backend during rotation. It's simply using the cache keys to determine what map nodes are in memory, rather than "freshen" that particular cache.

See:
https://github.com/XRPLF/rippled/blob/cb52c9af001316a906665a10283f6f5dd271ebb3/src/xrpld/nodestore/detail/DatabaseRotatingImp.cpp#L132C10-L189

Given you can't actually remove the tree node cache, as its relied upon for canonicalizing nodes, the question becomes one of whether you want the new backend to have those keys.

Great research, @sublimator. I followed your notes and agree excluding (i.e. shrinking/zero-sizing) treeNodeCache_ can hurt rotation performance. The freshenCache pass uses the cache’s keys to warm the new backend—if the cache is empty, we lose that warm-start and we’ll see more cold misses immediately after rotation.

We should keep freshening the cache using the tree node cache, but we don’t need to keep it as a class member variable (dropping the member pointer doesn’t save memory—just fetch at the call site and pass *ptr).

The other realisation from your comment is that we’re currently trying to make the memory footprint of rippled smaller, which may lead to increased internal latency / decreased throughput—similar to the behaviour we see around DB rotation.

Thanks for the great feedback and discussion - I restored the freshenCaches.

Just saw these comments, thanks :)

src/xrpld/app/misc/SHAMapStoreImp.cpp

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/libxrpl/nodestore/DatabaseNodeImp.cpp

Copilot · 2026-02-03T16:27:37Z

src/libxrpl/nodestore/DatabaseNodeImp.cpp

+    auto results = backend_->fetchBatch(batch).first;
+    for (size_t i = 0; i < results.size(); ++i)
    {
-        auto nObj = std::move(dbResults[i]);
-        size_t index = indexMap[cacheMisses[i]];
-        auto const& hash = hashes[index];
-
-        if (nObj)
-        {
-            // Ensure all threads get the same object
-            if (cache_)
-                cache_->canonicalize_replace_client(hash, nObj);
-        }
-        else
+        if (!results[i])
        {
            JLOG(j_.error()) << "fetchBatch - "
-                             << "record not found in db or cache. hash = " << strHex(hash);
-            if (cache_)
-            {
-                auto notFound = NodeObject::createObject(hotDUMMY, {}, hash);
-                cache_->canonicalize_replace_client(hash, notFound);
-                if (notFound->getType() != hotDUMMY)
-                    nObj = std::move(notFound);
-            }
+                             << "record not found in db. hash = " << strHex(hashes[i]);
        }


fetchBatch now returns backend_->fetchBatch(batch).first directly. This can return a vector whose size does not match hashes.size() (e.g., the NullBackend implementation returns an empty pair), which is a behavioral change from the prior code that always returned hashes.size() entries. Consider enforcing/normalizing the result size (and/or handling a non-ok status) so callers can safely assume positional correspondence with the input hashes.

Added an assert to confirm. However, implementations in MemoryFactory, NuDBFactory, and RocksDBFactory show that the same number of output objects are created as input hashes are provided.

Added an assert to confirm. However, implementations in MemoryFactory, NuDBFactory, and RocksDBFactory show that the same number of output objects are created as input hashes are provided.

I think @copilot is wrong here. The original code did a move assignment, which updates the size() of results to match whatever was returned. The initial size is essentially ignored.

You can see what I'm talking about with this unit test case snippet:

std::vector<std::string> a{std::size_t{1024}}; std::vector<std::string> b{std::size_t{2}}; std::vector<std::string> c{std::size_t{2048}}; BEAST_EXPECT(a.size() == 1024); BEAST_EXPECT(b.size() == 2); BEAST_EXPECT(c.size() == 2048); a = b; BEAST_EXPECT(a.size() == 2); BEAST_EXPECT(b.size() == 2); BEAST_EXPECT(c.size() == 2048); c = std::move(b); BEAST_EXPECT(a.size() == 2); BEAST_EXPECT(b.size() == 0); BEAST_EXPECT(c.size() == 2);

I'd go so far as to say that the assert is wrong, because as it said, NullBackend intentionally returns an empty vector.

I just looked back again at the original code, and noticed that it guaranteed that result had the same size as hashes by creating it first, then copying the individual items over by index after canonicalizing. (I also realized that there was no bounds checking, so that if the backend_ returned a larger array, it would try to write past the end of results.)

To preserve that behavior, I suggest adding results.resize(hashes.size()); after the vector is created.

Done.

Note that there's an implicit assumption that results[i] refers t the same element as hashes[i]] (both in the new and old code). If the backend gives us back more or fewer results than input hashes, then perhaps it would be useful to raise a bigger stink about it - if the number of results is different, then how can we trust they are even in the right order?

Note that there's an implicit assumption that results[i] refers t the same element as hashes[i]] (both in the new and old code). If the backend gives us back more or fewer results than input hashes, then perhaps it would be useful to raise a bigger stink about it - if the number of results is different, then how can we trust they are even in the right order?

That's a good question, in both the old an new code. Perhaps have an assertion before resizing that results.size() == hashes.size() || results.empty(). If it's empty, then none of the items were found. Or do you think it should be a bigger stink like a throw or LogicError?

I added the assertion. I prefer to play it safe and find the error in the logs, than that all nodes on the network crash if we missed an edge case.

include/xrpl/nodestore/Database.h

Copilot · 2026-02-03T21:00:53Z

@ximinez I've opened a new pull request, #6323, to work on those changes. Once the pull request is ready, I'll request review from you.

ximinez

I left some comments on existing threads

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ximinez

One question, but either way this looks good.

ximinez · 2026-02-05T20:57:53Z

src/xrpld/app/misc/SHAMapStoreImp.cpp

+    // Also clear the FullBelowCache so its generation counter is bumped.
+    // This prevents stale "full below" markers from persisting across
+    // backend rotation/online deletion and interfering with SHAMap sync.
+    app_.getNodeFamily().getFullBelowCache()->clear();


Should you check that getFullBelowCache() doesn't return nullptr? I don't think the original implementation did.

I see that the full below cache is created in the NodeFamily constructor, which in turn is created in the constructor of ApplicationImp, and that is called via make_Application in Main.cpp.

In no place do I see where the full below cache could become invalidated, since the only functions that can reach the cache are in the application, and the node family that holds the cache will only release it once the application is destroyed.

(If we should check for nullptr then there are many other places where it can go wrong too - see the freshenCaches function below.)

* chore: Set ColumnLimit to 120 in clang-format (XRPLF#6288) This change updates the ColumnLimit from 80 to 120, and applies clang-format to reformat the code. * chore: Remove unnecessary `boost::system` requirement from conanfile (XRPLF#6290) * chore: Add cmake-format pre-commit hook (XRPLF#6279) This change adds `cmake-format` as. a pre-commit hook. The style file closely matches that in Clio, and they will be made to be equivalent over time. For now, some files have been excluded, as those need some manual adjustments, which will be done in future changes. * chore: Format all cmake files without comments (XRPLF#6294) * ci: Update hashes of XRPLF/actions (XRPLF#6316) This updates the hashes of all XRPLF/actions to their latest versions. * chore: Add upper-case match for ARM64 in CompilationEnv (XRPLF#6315) * fix: Restore config changes that broke standalone mode (XRPLF#6301) When support was added for `xrpld.cfg` in addition to `rippled.cfg` in XRPLF#6098, as part of an effort to rename occurrences of ripple(d) to xrpl(d), the clearing and creation of the data directory were modified for what, at the time, seemed to result in an equivalent code flow. This has turned out to not be true, which is why this change restores two modifications to `Config.cpp` that currently break running the binary in standalone mode. * docs: Update API changelog, add APIv2+APIv3 version documentation (XRPLF#6308) This change cleans up the `API-CHANGELOG.md` file. It moves the version-specific documentation to other files and fleshes out the changelog with all the API-related changes in each version. * chore: Add .zed editor config directory to .gitignore (XRPLF#6317) This change adds the project configuration directory to `.gitignore` for the `zed` editor. As per the [documentation](https://zed.dev/docs/remote-development?highlight=.zed#zed-settings), the project configuration files are stored in the `.zed` directory at the project root dir. * fix: Deletes expired NFToken offers from ledger (XRPLF#5707) This change introduces the `fixExpiredNFTokenOfferRemoval` amendment that allows expired offers to pass through `preclaim()` and be deleted in `doApply()`, following the same pattern used for expired credentials. * refactor: Add ServiceRegistry to help modularization (XRPLF#6222) Currently we're passing the `Application` object around, whereby the `Application` class acts more like a service registry that gives other classes access to other services. In order to allow modularization, we should replace `Application` with a service registry class so that modules depending on `Application` for other services can be moved easily. This change adds the `ServiceRegistry` class. * chore: Remove unity builds (XRPLF#6300) Unity builds were intended to speed up builds, by bundling multiple files into compilation units. However, now that ccache is available on all platforms, there is no need for unity builds anymore, as ccache stores compiled individual build objects for reuse. This change therefore removes the ability to make unity builds. * refactor: Replace include guards by '#pragma once' (XRPLF#6322) This change replaces all include guards in the `src/` and `include/` directories by `#pragma once`. * chore: Remove unnecessary script (XRPLF#6326) * chore: Update secp256k1 and openssl (XRPLF#6327) * fix typo in LendingHelpers unit-test (XRPLF#6215) * fix: Increment sequence when accepting new manifests (XRPLF#6059) The `ManifestCache::applyManifest` function was returning early without incrementing `seq_`. `OverlayImpl `uses this sequence to identify/invalidate a cached `TMManifests` message, which is exchanged with peers on connection. Depending on network size, startup sequencing, and topology, this can cause syncing issues. This change therefore increments `seq_` when a new manifest is accepted. * refactor: Update secp256k1 to 0.7.1 (XRPLF#6331) The latest secp256k1 release, 0.7.1, contains bug fixes that we may benefit from, see https://github.com/bitcoin-core/secp256k1/blob/master/CHANGELOG.md. * chore: Restore unity builds (XRPLF#6328) In certain cases, such as when modifying headers used by many compilation units, performing a unity build is slower than when performing a regular build with `ccache` enabled. There is also a benefit to a unity build in that it can detect things such as macro redefinitions within the group of files that are compiled together as a unit. This change therefore restores the ability to perform unity builds. However, instead of running every configuration with and without unity enabled, it is now only enabled for a single configuration to maintain lower computational use. As part of restoring the code, it became clear that currently two configurations have coverage enabled, since the check doesn't focus specifically on Debian Bookworm so it also applies to Debian Trixie. This has been fixed too in this change. * perf: Remove unnecessary caches (XRPLF#5439) This change removes the cache in `DatabaseNodeImp` and simplifies the caching logic in `SHAMapStoreImp`. As NuDB and RocksDB internally already use caches, additional caches in the code are not very valuable or may even be unnecessary, as also confirmed during preliminary performance analyses. * chore: Remove CODEOWNERS (XRPLF#6337) * test: Add file and line location to Env (XRPLF#6276) This change uses `std::source_location` to output the file and line location of the call that triggered a failed transaction. * refactor: Fix spelling issues in tests (XRPLF#6199) This change removes the `src/tests` exception from the `cspell` config and fixes all the issues that arise as a result. No functionality/test change. * refactor: Change main thread name to `xrpld-main` (XRPLF#6336) This change builds on the thread-renaming PR (XRPLF#6212), by renaming the main thread name to reduce ambiguity in performance monitoring tools. * fix: Update invariant checks for Permissioned Domains (XRPLF#6134) * refactor: Modularize WalletDB and Manifest (XRPLF#6223) This change modularizes the `WalletDB` and `Manifest`. Note that the wallet db has nothing to do with account wallets and it stores node configuration, which is why it depends on the manifest code. * refactor: Modularize RelationalDB (XRPLF#6224) The rdb module was not properly designed, which is fixed in this change. The module had three classes: 1) The abstract class `RelationalDB`. 2) The abstract class `SQLiteDatabase`, which inherited from `RelationalDB` and added some pure virtual methods. 3) The concrete class `SQLiteDatabaseImp`, which inherited from `SQLiteDatabase` and implemented all methods. The updated code simplifies this as follows: * The `SQLiteDatabaseImp` has become `SQLiteDatabase`, and * The former `SQLiteDatabase `has merged with `RelationalDatabase`. * chore: Fix `gcov` lib coverage build failure on macOS (XRPLF#6350) For coverage builds, we try to link against the `gcov` library (specific to the environment). But as macOS doesn't have this library and thus doesn't have the coverage tools to generate reports, the coverage builds on that platform were failing on linking. We actually don't need to explicitly force this linking, as the `CodeCoverage` file already has correct detection logic (currently on lines 177-193), which is invoked when the `--coverage` flag is provided: * AppleClang: Uses `xcrun -f llvm-cov` to set `GCOV_TOOL="llvm-cov gcov"`. * Clang: Finds `llvm-cov` to set `GCOV_TOOL="llvm-cov gcov"`. * GCC: Finds `gcov` to set `GCOV_TOOL="gcov"`. The `GCOV_TOOL` is then passed to `gcovr` on line 416, so the correct tool is used for processing coverage data. This change therefore removes the `gcov` suffix from lines 473 and 475 in the `CodeCoverage.cmake` file. * refactor: Modularize the NetworkOPs interface (XRPLF#6225) This change moves the NetworkOPs interface into `libxrpl` and it leaves its implementation in `xrpld`. * chore: Fix minor issues in comments (XRPLF#6346) * refactor: Modularize `HashRouter`, `Conditions`, and `OrderBookDB` (XRPLF#6226) This change modularizes additional components by moving code to `libxrpl`. * chore: Update clang-format to 21.1.8 (XRPLF#6352) * refactor: Decouple app/tx from `Application` and `Config` (XRPLF#6227) This change decouples app/tx from `Application` and `Config` to clear the way to moving transactors to `libxrpl`. * refactor: Modularize app/tx (XRPLF#6228) * ci: Add clang tidy workflow to ci (XRPLF#6369) * chore: Set cmake-format width to 100 (XRPLF#6386) * chore: Set clang-format width to 100 in config file (XRPLF#6387) * chore: Apply clang-format width 100 (XRPLF#6387) * Fix tautological assertion (XRPLF#6393) * ci: Add dependabot config (XRPLF#6379) * ci: Build docs in PRs and in private repos (XRPLF#6400) * ci: [DEPENDABOT] bump codecov/codecov-action from 5.4.3 to 5.5.2 (XRPLF#6398) Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5.4.3 to 5.5.2. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@18283e0...671740a) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-version: 5.5.2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix merge conflicts, format, spelling, replace all include guards by #pragma once * Fix non-unity build --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Ayaz Salikhov <mathbunnyru@users.noreply.github.com> Co-authored-by: Ed Hennis <ed@ripple.com> Co-authored-by: Bart <bthomee@users.noreply.github.com> Co-authored-by: Mayukha Vadari <mvadari@ripple.com> Co-authored-by: Vito Tumas <5780819+Tapanito@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jingchen <a1q123456@users.noreply.github.com> Co-authored-by: Niq Dudfield <ndudfield@gmail.com> Co-authored-by: Valentin Balaschenko <13349202+vlntb@users.noreply.github.com> Co-authored-by: Olek <115580134+oleks-rip@users.noreply.github.com> Co-authored-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com> Co-authored-by: nuxtreact <nuxtreact@outlook.com> Co-authored-by: Sergey Kuznetsov <skuznetsov@ripple.com> Co-authored-by: Ayaz Salikhov <asalikhov@ripple.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

bthomee · 2026-03-20T17:22:25Z

/ai-review

xrplf-ai-reviewer

Two null pointer dereference risks identified in cache operations.

Review by Claude Opus 4.6 · Prompt: V12

xrplf-ai-reviewer · 2026-03-20T17:30:31Z

src/xrpld/app/misc/SHAMapStoreImp.cpp

+    // Also clear the FullBelowCache so its generation counter is bumped.
+    // This prevents stale "full below" markers from persisting across
+    // backend rotation/online deletion and interfering with SHAMap sync.
+    app_.getNodeFamily().getFullBelowCache()->clear();


Null pointer risk: Guard cache access before clear() operation:

xrplf-ai-reviewer · 2026-03-20T17:30:32Z

src/xrpld/app/misc/SHAMapStoreImp.cpp

-    if (freshenCache(*treeNodeCache_))
-        return;
-    if (freshenCache(app_.getMasterTransaction().getCache()))
+    if (freshenCache(*app_.getNodeFamily().getTreeNodeCache()))


Null pointer dereference risk in cache access - add safety check:

bthomee added 11 commits March 28, 2025 13:21

TEST: Disable tagged cache to measure performance

8d31b17

Fully disable cache

d170384

Only disable the database cache

d01851b

Merge branch 'develop' into bthomee/disable-cache

61c9a19

Merge branch 'develop' into bthomee/disable-cache

39bfcaf

Merge branch 'develop' into bthomee/disable-cache

0777f7c

Remove cache from DatabaseNodeImp

6e7537d

Remove cache from SHAMapStoreImp

a213127

Removed unused caches from SHAMapStoreImp

4650e7d

Removed unused config values

2fa1c71

Reserve vector size

965fc75

bthomee added the QE test required RippleX QE Team must look at this PR. label May 20, 2025

bthomee marked this pull request as ready for review May 20, 2025 14:15

bthomee requested review from a team and vlntb May 20, 2025 14:15

Merge branch 'develop' into bthomee/disable-cache

d96c416

bthomee added this to the 2.6.0 (Q3 2025) milestone May 27, 2025

bthomee added 2 commits June 2, 2025 12:02

Merge branch 'develop' into bthomee/disable-cache

fca6a87

Merge branch 'develop' into bthomee/disable-cache

a4a1c4e

bthomee removed this from the 2.6.0 (Q3 2025) milestone Aug 13, 2025

Merge branch 'develop' into bthomee/disable-cache

aad6edb

sublimator reviewed Aug 14, 2025

View reviewed changes

src/xrpld/app/misc/SHAMapStoreImp.cpp Show resolved Hide resolved

sublimator reviewed Sep 7, 2025

View reviewed changes

vlntb reviewed Sep 16, 2025

View reviewed changes

src/xrpld/app/misc/SHAMapStoreImp.cpp Show resolved Hide resolved

Copilot started reviewing on behalf of bthomee February 3, 2026 16:21 View session

bthomee requested a review from ximinez February 3, 2026 16:21

Copilot AI reviewed Feb 3, 2026

View reviewed changes

Add assert

4495952

Copilot AI mentioned this pull request Feb 3, 2026

Remove incorrect assertion in fetchBatch #6323

Draft

1 task

ximinez requested changes Feb 3, 2026

View reviewed changes

bthomee added 2 commits February 3, 2026 17:19

Merge branch 'develop' into bthomee/disable-cache

1c12533

Resize results container

a98cb10

Copilot AI review requested due to automatic review settings February 3, 2026 22:29

Copilot started reviewing on behalf of bthomee February 3, 2026 22:29 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

bthomee added 3 commits February 4, 2026 11:25

Merge branch 'develop' into bthomee/disable-cache

6b7a898

Merge branch 'develop' into bthomee/disable-cache

af07be0

Add assert

c19aead

Copilot AI review requested due to automatic review settings February 4, 2026 19:01

Copilot started reviewing on behalf of bthomee February 4, 2026 19:02 View session

Merge branch 'develop' into bthomee/disable-cache

d89aa68

Copilot AI reviewed Feb 4, 2026

View reviewed changes

ximinez approved these changes Feb 5, 2026

View reviewed changes

Merge branch 'develop' into bthomee/disable-cache

dd42747

bthomee added QE test required RippleX QE Team must look at this PR. and removed QE test required RippleX QE Team must look at this PR. labels Feb 5, 2026

Merge branch 'develop' into bthomee/disable-cache

32d1414

bthomee merged commit 677758b into develop Feb 6, 2026
27 checks passed

bthomee deleted the bthomee/disable-cache branch February 6, 2026 14:42

a1q123456 mentioned this pull request Mar 18, 2026

refactor: Refactor TaggedCache.ipp to remove const_cast in canonicalize_replace_cache #5638

Open

13 tasks

xrplf-ai-reviewer bot reviewed Mar 20, 2026

View reviewed changes

Conversation

bthomee commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

High Level Overview of Change

Context of Change

Type of Change

Uh oh!

codecov bot commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PeterChen13579 commented May 20, 2025

Uh oh!

kuznetsss commented May 20, 2025

Uh oh!

sublimator commented Aug 13, 2025

NodeObject Cache Investigation

Summary of Findings

Historical Investigation

A Theory: TreeNodeCache Was Originally Keyed by NodeID

What the 2013 Code Shows:

Could NodeObject Cache Have Been a Workaround?

What the 2014 Documentation Says

Current State

Current Caching Layers

1. TreeNodeCache

2. NodeObject Cache

3. In-Memory SHAMap

Observed Inefficiencies

The Copy Problem

Where NodeObject Cache Doesn't Seem to Help

Questions and Uncertainties

Potential Impact of Removal

Disabling the Cache

Additional Problems: P2P Cache Pollution

Recommendations for Further Investigation

Uh oh!

Uh oh!

bthomee commented Aug 14, 2025

Uh oh!

sublimator commented Aug 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vlntb Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI commented Feb 3, 2026

Uh oh!

bthomee commented May 20, 2025 •

edited

Loading

codecov bot commented May 20, 2025 •

edited

Loading

vlntb Sep 16, 2025 •

edited

Loading

bthomee Feb 5, 2026 •

edited

Loading