Skip to content

Fix invariant checks for Permissioned Domains#6134

Merged
bthomee merged 15 commits intoXRPLF:developfrom
oleks-rip:pd_inv
Feb 10, 2026
Merged

Fix invariant checks for Permissioned Domains#6134
bthomee merged 15 commits intoXRPLF:developfrom
oleks-rip:pd_inv

Conversation

@oleks-rip
Copy link
Copy Markdown
Collaborator

High Level Overview of Change

Fixed typo in invariant checks, found by Immunefi

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking change that only restructures code)
  • Performance (increase or change in throughput and/or latency)
  • Tests (you added tests for code that already exists, or your new feature included in this PR)
  • Documentation update
  • Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
  • Release

@oleks-rip oleks-rip requested a review from Tapanito December 10, 2025 16:40
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.9%. Comparing base (db2734c) to head (eeb990f).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop   #6134     +/-   ##
=========================================
- Coverage     79.9%   79.9%   -0.0%     
=========================================
  Files          840     840             
  Lines        65483   65514     +31     
  Branches      7251    7256      +5     
=========================================
+ Hits         52332   52343     +11     
- Misses       13151   13171     +20     
Files with missing lines Coverage Δ
src/xrpld/app/tx/detail/InvariantCheck.cpp 92.9% <100.0%> (+0.2%) ⬆️
src/xrpld/app/tx/detail/InvariantCheck.h 100.0% <ø> (ø)

... and 11 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mvadari
Copy link
Copy Markdown
Collaborator

mvadari commented Dec 11, 2025

Does this need an amendment?

@oleks-rip
Copy link
Copy Markdown
Collaborator Author

oleks-rip commented Dec 11, 2025

Does this need an amendment?

No it didn't change anything. I can't imagine situation where before is different from after (except where before is null)

@oleks-rip oleks-rip added the Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required. label Dec 11, 2025
Copy link
Copy Markdown
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This invariant has a few problems in general:

  1. sleStatus_ is an optional. It should be a vector, in case more than one permissioned domain is affected by a transaction. If that is supposed to be impossible, then that can be one of the first things that finalize checks.
    if (selStatus_.size() > 1) { JLOG...; return false; }
    
  2. The conditions in finalize are only checked for successful PermissionedDomainSet transactions. If no other transaction should be able to modify a permisioned domain object, then it should check that. That probably means adding a check for PermissionedDomainDelete, too. If it's not one of those two types, or not successful, and selStatus_ is not empty, finalize should fail. If it is one of those two types, and selStatus_.size() != 1, finalized should also fail. Maybe add an _isDeleted flag to SleStatus and check that it's false for Set, and true for Delete.
  3. Because you're checking the integrity of the object, and not checking any of the changes between before and after, you probably should ignore before entirely. If it's messed up, there's nothing you can do about it, and if after is correct, then you've fixed whatever problem might have existed.

All that said, that's not what this PR is about, though maybe it should be. The changes above would obviously require an amendment.

As for what the PR is about, I can't convince myself that it's impossible for this change to result in a different ledger. The main reason is that if, somehow, a bad object had ever been written to the ledger, this check will now catch it, whereas it wouldn't have caught it before the change. That will result in the transaction succeeding on some nodes, and failing on others.

@ximinez ximinez removed the Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required. label Dec 12, 2025
@oleks-rip
Copy link
Copy Markdown
Collaborator Author

oleks-rip commented Dec 12, 2025

@ximinez

  1. sleStatus_ is an optional. It should be a vector, in case more than one permissioned domain is affected

modified and added check for size() > 1 (which is actually impossible as there is no such transaction)

  1. Because you're checking the integrity of the object, and not checking any of the changes between before and after

removed check for "before"

  1. The conditions in finalize are only checked for successful PermissionedDomainSet transactions.
    2.1 If no other transaction should be able to modify a permisioned domain object, then it should check that

I don't think it is possible. Invariant checks are all called for every tx, and I don't see a way to figure out that this particular PermissiondDomain sle object is modified by this particular tx. We just assumed that if there is "after" then it was PermissionedDomainSet

2.2 That probably means adding a check for PermissionedDomainDelete

Don't think that we need to add check for PermissionedDomainDelete. Do we actually care if the delete incorrect object ?

I can't convince myself that it's impossible for this change to result in a different ledger. The main reason is that if, somehow, a bad object had ever been written to the ledger, this check will now catch it, whereas it wouldn't have caught it before the change. That will result in the transaction succeeding on some nodes, and failing on others.

This is not true. Currently (without this PR) invariants check only "after" object for both cases ("before" and "after"), and only for PermissionedDomainSet transaction. Which mean "after" always will be present (no crash) and verified. And "before" check is not so important (as you mention early, and I agree). Also this fix is related to verification logic, not to the creation/modification logic.
So I don't see a way to create invalid object (remember that "after" was check always). Neither I see the behavior change after this fix, so not need for amendment.

Correct me if I'm wrong somewhere.

@oleks-rip oleks-rip force-pushed the pd_inv branch 3 times, most recently from e586813 to 34ea901 Compare December 12, 2025 20:15
@oleks-rip oleks-rip requested a review from ximinez December 12, 2025 20:30
@ximinez
Copy link
Copy Markdown
Collaborator

ximinez commented Dec 13, 2025

  1. sleStatus_ is an optional. It should be a vector, in case more than one permissioned domain is affected

modified and added check for size() > 1 (which is actually impossible as there is no such transaction)

EVERY failure case in an Invariant should be impossible. The intention is less "double-check that the transaction did the right thing", and more "if something goes extremely wrong, or an attacker finds an exploit, what are some reasonable checks we can do to stop it?"

That's why invariant unit tests have to modify the open ledger directly. The things they're testing for are, as far as we know, impossible to do with a transaction.

  1. Because you're checking the integrity of the object, and not checking any of the changes between before and after

removed check for "before"

👍

  1. The conditions in finalize are only checked for successful PermissionedDomainSet transactions.
    2.1 If no other transaction should be able to modify a permisioned domain object, then it should check that

I don't think it is possible. Invariant checks are all called for every tx, and I don't see a way to figure out that this particular PermissiondDomain sle object is modified by this particular tx. We just assumed that if there is "after" then it was PermissionedDomainSet

2.2 That probably means adding a check for PermissionedDomainDelete

Don't think that we need to add check for PermissionedDomainDelete. Do we actually care if the delete incorrect object ?

We don't care if the object is malformed if it's deleted, but what if it isn't deleted? What if it's deleted incorrectly?

Basic outline of how to check all of the above conditions in finalize() :

auto check = [...I think you can leave the lambda unchanged...]
if (view.rules.enabled(fixPermissionedDomainInvariant))
{
    if (result != tesSUCCESS && sleStatus_.size() == 0)
        // Nothing was changed by a failed tx. All good.
        return true;

    if (sleStatus_.size() > 1)
    {
        JLOG(j.fatal()) << "Invariant failed: transaction affected more than 1 "
                           "entry."; 
        return false;
    }

    switch (tx.getTxnType())
    {
        case ttPERMISSIONED_DOMAIN_SET:
        {
            if (sleStatus_.size() == 0)
            {
                 [Assuming that PDSet _must_ change an object, log and return false, otherwise return true]
                 JLOG(j.fatal()) << "PermissionedDomainSet should only affect one domain object";
                 return false;
            }
            auto const& sleStatus = sleStatus_[0];
            if(sleStatus.isDelete) [Requires adding isDelete to the struct and seting it in `visitEntry()`]
            {
                 JLOG(j.fatal()) << "Domain object deleted by PermissionedDomainSet";
                 return false;
            }
            return check(sleStatus, j);
        }
        case ttPERMISSIONED_DOMAIN_DELETE:
        {
            if(sleStatus_.size() == 0)
            {
                 [Assuming that PDDelete _must_ delete an object, log and return false, otherwise return true]
                 JLOG(j.fatal()) << "No domain objects affected by PermissionedDomainDelete";
                 return false;
            }
            if(!sleStatus_[0].isDelete)	[Requires adding isDelete to the struct and seting it in `visitEntry()`]
            {
                 JLOG(j.fatal()) << "Domain object modified, but not deleted by PermissionedDomainDelete";
                 return false;
            }
            return true;
        }
        default:
        {
            if (sleStatus_.size() != 0)
            {
                JLOG(j.fatal()) << sleStatus_.size() << " domain object(s) affected by an unauthorized transaction.";
                return false;
            }
            return true;
        }
    }
}
else
{
    // The pre-amendment behavior
    if (tx.getTxnType() != ttPERMISSIONED_DOMAIN_SET || result != tesSUCCESS)
        return true;

    return !sleStatus_.empty() ? check(sleStatus_[0], j) : true;
}

I can't convince myself that it's impossible for this change to result in a different ledger. The main reason is that if, somehow, a bad object had ever been written to the ledger, this check will now catch it, whereas it wouldn't have caught it before the change. That will result in the transaction succeeding on some nodes, and failing on others.

This is not true. Currently (without this PR) invariants check only "after" object for both cases ("before" and "after"), and only for PermissionedDomainSet transaction. Which mean "after" always will be present (no crash) and verified. And "before" check is not so important (as you mention early, and I agree).

Ironically, I realized that removing the before check does not require an amendment for exactly this reason.

Also this fix is related to verification logic, not to the creation/modification logic. So I don't see a way to create invalid object (remember that "after" was check always). Neither I see the behavior change after this fix, so not need for amendment.

"I don't see a way to create invalid object" is exactly what Invariants are for. You may not see a way to create an invalid object, or two objects, or to delete an object that shouldn't be deleted, but that doesn't mean that there isn't one.

Back to what I said earlier, the intention of Invariants is to catch those impossible situations that end up happening despite all our best efforts. Maybe you made a mistake. Maybe there's some subtle exploit that nobody can predict. Or maybe an inexperienced developer makes a change five years from now without understanding all the implications and consequences.

The questions to ask yourself are

  • "How could I have screwed up?"
  • "How can someone else screw this up?"
  • "If someone does screw this up, what are the things that are most important to prevent?"

For example, let's assume your implementation of the transactors is perfect. Great. Down the road, there's a new feature that has nothing to do with those transactions. Product, design, or the developer decides it would be cool to automatically create a Permissioned Domain automatically as part of the new HeyThisIsCool transaction. The engineer screws up. They implement it without referring to the current transactors, or using any of the helpers that you've written. The credentials are unsorted, or have duplicates, or there are too many of them, or whatever.

Oh, and they don't look at the Invariants either. Maybe they figure "Hey, if the invariants don't complain, then it must be ok."

As currently written, the Invariants won't complain.

And to add icing to the cake, the code review doesn't catch it. And testing maybe doesn't even know it exists, or they don't thoroughly exercise all the possible edge cases.

"Oh, that can't happen..."

Despite a very good developer implementing the feature, and a boatload of time spent on the code reviews, including onr by me, that issue was discovered and reported by external contributors (including a former Rippler, but we can't count on that). I don't remember the exact timeline, but I think it was before QA got their hands on it, Even so, I don't think they would have caught it, because they're not in the habit of checking the entire object tree for stray objects. (Maybe they should?)

@ximinez
Copy link
Copy Markdown
Collaborator

ximinez commented Dec 22, 2025

image

Please do not rewrite history on active PRs in this repo. It makes life harder for reviewers. Instead, add your changes in new commits, and when updating from the base branch, do a merge instead of a rebase.

(If there are special circumstances that require it, please document what you did and why.)

Copy link
Copy Markdown
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so much better and safer! Thank you for making the changes. I left a couple of suggestions, but they're all pretty minor, and easy to fix.

@vvysokikh1 vvysokikh1 added the Blocked on requested changes Reviewers have requested changes which must be addressed or responded to label Jan 2, 2026
@oleks-rip oleks-rip requested a review from ximinez January 5, 2026 22:02
@Tapanito Tapanito self-requested a review January 14, 2026 13:28
Copy link
Copy Markdown
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the absurdly long delay getting back to this. Thanks for making the changes.

@oleks-rip oleks-rip added Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required. and removed Blocked on requested changes Reviewers have requested changes which must be addressed or responded to labels Jan 29, 2026
@bthomee bthomee merged commit e11f619 into XRPLF:develop Feb 10, 2026
26 checks passed
gregtatcam added a commit to gregtatcam/rippled that referenced this pull request Mar 7, 2026
* chore: Set ColumnLimit to 120 in clang-format (XRPLF#6288)

This change updates the ColumnLimit from 80 to 120, and applies clang-format to reformat the code.

* chore: Remove unnecessary `boost::system` requirement from conanfile (XRPLF#6290)

* chore: Add cmake-format pre-commit hook (XRPLF#6279)

This change adds `cmake-format` as. a pre-commit hook. The style file closely matches that in Clio, and they will be made to be equivalent over time. For now, some files have been excluded, as those need some manual adjustments, which will be done in future changes.

* chore: Format all cmake files without comments (XRPLF#6294)

* ci: Update hashes of XRPLF/actions (XRPLF#6316)

This updates the hashes of all XRPLF/actions to their latest versions.

* chore: Add upper-case match for ARM64 in CompilationEnv (XRPLF#6315)

* fix: Restore config changes that broke standalone mode (XRPLF#6301)

When support was added for `xrpld.cfg` in addition to `rippled.cfg` in XRPLF#6098, as part of an effort to rename occurrences of ripple(d) to xrpl(d), the clearing and creation of the data directory were modified for what, at the time, seemed to result in an equivalent code flow. This has turned out to not be true, which is why this change restores two modifications to `Config.cpp` that currently break running the binary in standalone mode.

* docs: Update API changelog, add APIv2+APIv3 version documentation (XRPLF#6308)

This change cleans up the `API-CHANGELOG.md` file. It moves the version-specific documentation to other files and fleshes out the changelog with all the API-related changes in each version.

* chore: Add .zed editor config directory to .gitignore (XRPLF#6317)

This change adds the project configuration directory to `.gitignore` for the `zed` editor. 

As per the [documentation](https://zed.dev/docs/remote-development?highlight=.zed#zed-settings), the project configuration files are stored in the `.zed` directory at the project root dir.

* fix: Deletes expired NFToken offers from ledger (XRPLF#5707)

This change introduces the `fixExpiredNFTokenOfferRemoval` amendment that allows expired offers to pass through `preclaim()` and be deleted in `doApply()`, following the same pattern used for expired credentials.

* refactor: Add ServiceRegistry to help modularization (XRPLF#6222)

Currently we're passing the `Application` object around, whereby the `Application` class acts more like a service registry that gives other classes access to other services. In order to allow modularization, we should replace `Application` with a service registry class so that modules depending on `Application` for other services can be moved easily. This change adds the `ServiceRegistry` class.

* chore: Remove unity builds (XRPLF#6300)

Unity builds were intended to speed up builds, by bundling multiple files into compilation units. However, now that ccache is available on all platforms, there is no need for unity builds anymore, as ccache stores compiled individual build objects for reuse. This change therefore removes the ability to make unity builds.

* refactor: Replace include guards by '#pragma once' (XRPLF#6322)

This change replaces all include guards in the `src/` and `include/` directories by `#pragma once`.

* chore: Remove unnecessary script (XRPLF#6326)

* chore: Update secp256k1 and openssl (XRPLF#6327)

* fix typo in LendingHelpers unit-test (XRPLF#6215)

* fix: Increment sequence when accepting new manifests (XRPLF#6059)

The `ManifestCache::applyManifest` function was returning early without incrementing `seq_`. `OverlayImpl `uses this sequence to identify/invalidate a cached `TMManifests` message, which is exchanged with peers on connection. Depending on network size, startup sequencing, and topology, this can cause syncing issues. This change therefore increments `seq_` when a new manifest is accepted.

* refactor: Update secp256k1 to 0.7.1 (XRPLF#6331)

The latest secp256k1 release, 0.7.1, contains bug fixes that we may benefit from, see https://github.com/bitcoin-core/secp256k1/blob/master/CHANGELOG.md.

* chore: Restore unity builds (XRPLF#6328)

In certain cases, such as when modifying headers used by many compilation units, performing a unity build is slower than when performing a regular build with `ccache` enabled. There is also a benefit to a unity build in that it can detect things such as macro redefinitions within the group of files that are compiled together as a unit. This change therefore restores the ability to perform unity builds. However, instead of running every configuration with and without unity enabled, it is now only enabled for a single configuration to maintain lower computational use.

As part of restoring the code, it became clear that currently two configurations have coverage enabled, since the check doesn't focus specifically on Debian Bookworm so it also applies to Debian Trixie. This has been fixed too in this change.

* perf: Remove unnecessary caches (XRPLF#5439)

This change removes the cache in `DatabaseNodeImp` and simplifies the caching logic in `SHAMapStoreImp`. As NuDB and RocksDB internally already use caches, additional caches in the code are not very valuable or may even be unnecessary, as also confirmed during preliminary performance analyses.

* chore: Remove CODEOWNERS (XRPLF#6337)

* test: Add file and line location to Env (XRPLF#6276)

This change uses `std::source_location` to output the file and line location of the call that triggered a failed transaction.

* refactor: Fix spelling issues in tests (XRPLF#6199)

This change removes the `src/tests` exception from the `cspell` config and fixes all the issues that arise as a result. No functionality/test change.

* refactor: Change main thread name to `xrpld-main` (XRPLF#6336)

This change builds on the thread-renaming PR (XRPLF#6212), by renaming the main thread name to reduce ambiguity in performance monitoring tools.

* fix: Update invariant checks for Permissioned Domains (XRPLF#6134)

* refactor: Modularize WalletDB and Manifest (XRPLF#6223)

This change modularizes the `WalletDB` and `Manifest`. Note that the wallet db has nothing to do with account wallets and it stores node configuration, which is why it depends on the manifest code.

* refactor: Modularize RelationalDB (XRPLF#6224)

The rdb module was not properly designed, which is fixed in this change. The module had three classes:
1) The abstract class `RelationalDB`.
2) The abstract class `SQLiteDatabase`, which inherited from `RelationalDB` and added some pure virtual methods.
3) The concrete class `SQLiteDatabaseImp`, which inherited from `SQLiteDatabase` and implemented all methods.

The updated code simplifies this as follows:
* The `SQLiteDatabaseImp` has become `SQLiteDatabase`, and
* The former `SQLiteDatabase `has merged with `RelationalDatabase`.

* chore: Fix `gcov` lib coverage build failure on macOS (XRPLF#6350)

For coverage builds, we try to link against the `gcov` library (specific to the environment). But as macOS doesn't have this library and thus doesn't have the coverage tools to generate reports, the coverage builds on that platform were failing on linking.

We actually don't need to explicitly force this linking, as the `CodeCoverage` file already has correct detection logic (currently on lines 177-193), which is invoked when the `--coverage` flag is provided:
* AppleClang: Uses `xcrun -f llvm-cov` to set `GCOV_TOOL="llvm-cov gcov"`.
* Clang: Finds `llvm-cov` to set `GCOV_TOOL="llvm-cov gcov"`.
* GCC: Finds `gcov` to set `GCOV_TOOL="gcov"`.
The `GCOV_TOOL` is then passed to `gcovr` on line 416, so the correct tool is used for processing coverage data.

This change therefore removes the `gcov` suffix from lines 473 and 475 in the `CodeCoverage.cmake` file.

* refactor: Modularize the NetworkOPs interface (XRPLF#6225)

This change moves the NetworkOPs interface into `libxrpl` and it leaves its implementation in `xrpld`.

* chore: Fix minor issues in comments (XRPLF#6346)

* refactor: Modularize `HashRouter`, `Conditions`, and `OrderBookDB` (XRPLF#6226)

This change modularizes additional components by moving code to `libxrpl`.

* chore: Update clang-format to 21.1.8 (XRPLF#6352)

* refactor: Decouple app/tx from `Application` and `Config` (XRPLF#6227)

This change decouples app/tx from `Application` and `Config` to clear the way to moving transactors to `libxrpl`.

* refactor: Modularize app/tx (XRPLF#6228)

* ci: Add clang tidy workflow to ci (XRPLF#6369)

* chore: Set cmake-format width to 100 (XRPLF#6386)

* chore: Set clang-format width to 100 in config file (XRPLF#6387)

* chore: Apply clang-format width 100 (XRPLF#6387)

* Fix tautological assertion (XRPLF#6393)

* ci: Add dependabot config (XRPLF#6379)

* ci: Build docs in PRs and in private repos (XRPLF#6400)

* ci: [DEPENDABOT] bump codecov/codecov-action from 5.4.3 to 5.5.2 (XRPLF#6398)

Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5.4.3 to 5.5.2.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@18283e0...671740a)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-version: 5.5.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix merge conflicts, format, spelling,
replace all include guards by #pragma once

* Fix non-unity build

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Ayaz Salikhov <mathbunnyru@users.noreply.github.com>
Co-authored-by: Ed Hennis <ed@ripple.com>
Co-authored-by: Bart <bthomee@users.noreply.github.com>
Co-authored-by: Mayukha Vadari <mvadari@ripple.com>
Co-authored-by: Vito Tumas <5780819+Tapanito@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jingchen <a1q123456@users.noreply.github.com>
Co-authored-by: Niq Dudfield <ndudfield@gmail.com>
Co-authored-by: Valentin Balaschenko <13349202+vlntb@users.noreply.github.com>
Co-authored-by: Olek <115580134+oleks-rip@users.noreply.github.com>
Co-authored-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
Co-authored-by: nuxtreact <nuxtreact@outlook.com>
Co-authored-by: Sergey Kuznetsov <skuznetsov@ripple.com>
Co-authored-by: Ayaz Salikhov <asalikhov@ripple.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants