Skip to content

fix: reject invalid transactions at mempool entry with dry-run apply#620

Merged
ozgb merged 2 commits into
mainfrom
ozgb-fix-valiation-discrepancy
Feb 6, 2026
Merged

fix: reject invalid transactions at mempool entry with dry-run apply#620
ozgb merged 2 commits into
mainfrom
ozgb-fix-valiation-discrepancy

Conversation

@ozgb

@ozgb ozgb commented Feb 5, 2026

Copy link
Copy Markdown
Contributor

Overview

Identified by Christos - after merging of #608 , transactions would get stuck in the mempool and never enter a block:

Problem

Transactions that pass ZK proof/signature validation (well_formed()) but whose guaranteed part would fail against the current ledger state get stuck in the mempool indefinitely.

The root cause is a validation gap between pool admission and block authoring:

  1. do_validate_transaction (pool entry/revalidation) only checks well_formed() — ZK proofs and signatures — but not whether the guaranteed part can actually be applied to the current ledger state
  2. get_verified_transaction writes Ok(()) to the soft cache after well_formed() passes — so pool revalidation always sees a cached success, even though pre_dispatch keeps rejecting the tx
  3. Result: transactions cycle indefinitely: "ready" → pre_dispatch reject → soft cache hit → still "ready" → repeat

Solution

Three targeted changes in ledger/src/versions/common/mod.rs, all focused on making do_validate_transaction the single owner of the soft cache:

  1. Remove soft cache write from get_verified_transaction — the soft cache should only be written by do_validate_transaction, which has full context (including the apply result) to decide what to cache

  2. Add dry-run apply() in do_validate_transaction — after well_formed() passes, dry-run the transaction against current ledger state. Only cache Ok(()) on success (Success or PartialSuccess). Failures are not cached, so the tx will be fully re-checked on next revalidation

  3. Invalidate soft cache on entering do_validate_guaranteed_execution — always evict the soft cache entry when entering pre_dispatch, regardless of outcome. This forces the pool to fully re-validate the transaction after any block authoring attempt

Flow after fix

  1. Tx enters pool → do_validate_transaction: well_formed() + apply() → if both pass, cache Ok(()) in soft cache
  2. Pool revalidation → soft cache hit → Ok(()) → stays "ready"
  3. Block author tries to include tx → do_validate_guaranteed_execution (pre_dispatch):
  • Immediately invalidates soft cache entry (regardless of outcome)
  • Does strict validation (apply check with real block context)
  1. Pool revalidation after block → soft cache miss → full re-check (well_formed() + apply() against current state)
  • If tx still valid: re-cached, stays in pool
  • If tx now invalid: rejected, removed from pool

🗹 TODO before merging

  • Ready

📌 Submission Checklist

  • Changes are backward-compatible (or flagged if breaking)
  • Pull request description explains why the change is needed
  • Self-reviewed the diff
  • I have included a change file, or skipped for this reason:
  • If the changes introduce a new feature, I have bumped the node minor version
  • Update documentation (if relevant)
  • Updated AGENTS.md if build commands, architecture, or workflows changed
  • No new todos introduced

🧪 Testing Evidence

Please describe any additional testing aside from CI:

  • Additional tests are provided (if possible)

🔱 Fork Strategy

  • Node Runtime Update
  • Node Client Update
  • Other:
  • N/A

Links

Transactions that pass well_formed() (ZK proofs/signatures) but fail
apply() (guaranteed execution) were getting stuck in the mempool
indefinitely. The soft cache was caching Ok(()) after well_formed()
alone, so pool revalidation always saw success even though pre_dispatch
kept rejecting the tx.

Three changes to do_validate_transaction flow:
- Remove soft cache write from get_verified_transaction so that
  do_validate_transaction is the sole owner of the soft cache
- Add dry-run apply() in do_validate_transaction after well_formed(),
  only caching successes (failures are not cached)
- Invalidate soft cache entry at start of do_validate_guaranteed_execution
  to force full re-validation after block authoring attempts
@ozgb ozgb requested a review from a team as a code owner February 5, 2026 23:06
@CLAassistant

CLAassistant commented Feb 5, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@github-actions

github-actions Bot commented Feb 5, 2026

Copy link
Copy Markdown
Contributor

kics-logo

KICS version: v2.1.16

Category Results
CRITICAL CRITICAL 0
HIGH HIGH 0
MEDIUM MEDIUM 96
LOW LOW 12
INFO INFO 83
TRACE TRACE 0
TOTAL TOTAL 191
Metric Values
Files scanned placeholder 30
Files parsed placeholder 30
Files failed to scan placeholder 0
Total executed queries placeholder 73
Queries failed to execute placeholder 0
Execution time placeholder 7

@ozgb ozgb enabled auto-merge February 5, 2026 23:11

@justinfrevert justinfrevert left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it resolves the stuck tx issue for now, seems fine. We might want to come back and find a way to not run apply for every tx though.

@ozgb ozgb added this pull request to the merge queue Feb 6, 2026
Merged via the queue into main with commit b424c6c Feb 6, 2026
46 of 47 checks passed
@ozgb ozgb deleted the ozgb-fix-valiation-discrepancy branch February 6, 2026 01:56
gilescope pushed a commit that referenced this pull request Apr 8, 2026
* add multi-sig operations to integration tests

* test: remove multisig confest

removed:
- multisig conftest not needed because it was handled in `set_governance_to_multisig` fixture

* test(fix): xdist_group name

* test(fix): restore governance to single key after module is finished

* test: skip multisig if not configured

* test: remove unnecesary docstring

* test: fix incorrect cli parameters

---------

Co-authored-by: Radosław Sporny <404@rspo.dev>
m2ux added a commit that referenced this pull request Apr 23, 2026
* add multi-sig operations to integration tests

* test: remove multisig confest

removed:
- multisig conftest not needed because it was handled in `set_governance_to_multisig` fixture

* test(fix): xdist_group name

* test(fix): restore governance to single key after module is finished

* test: skip multisig if not configured

* test: remove unnecesary docstring

* test: fix incorrect cli parameters

---------

Co-authored-by: Radosław Sporny <404@rspo.dev>
Signed-off-by: Mike Clay <mike.clay@shielded.io>
m2ux added a commit that referenced this pull request Apr 23, 2026
* add multi-sig operations to integration tests

* test: remove multisig confest

removed:
- multisig conftest not needed because it was handled in `set_governance_to_multisig` fixture

* test(fix): xdist_group name

* test(fix): restore governance to single key after module is finished

* test: skip multisig if not configured

* test: remove unnecesary docstring

* test: fix incorrect cli parameters

---------

Co-authored-by: Radosław Sporny <404@rspo.dev>
Signed-off-by: Mike Clay <mike.clay@shielded.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants