Skip to content

Conversation

@stickies-v
Copy link
Contributor

Mempool currently considers 4 limits regarding ancestor and descendant count and size, which get passed around between functions quite a bit. This PR uses CTxMemPool::Limits introduced in #25290 to simplify those signatures and callsites.

The purpose of this PR is to improve readability and maintenance, without behaviour change.

As noted in the first commit "refactor: mempool: change MemPoolLimits members to uint", we currently have an underflow issue where a user could pass a negative -limitancestorsize, which is eventually cast to an unsigned integer. This behaviour already exists. Because it's orthogonal and to minimize scope, I think this should be fixed in a separate PR.

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK to using the Limits struct instead of a list of ints. I think this is more readable, especially the static NoLimits when we just want ancestors calculated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why the change from signed to unsigned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the commit message. Relevant part used to be:

These int64_t members later
get cast into a size_t in CTxMemPool::CTxMemPool() anyway

Is now:

These limits represent counts and sizes that should never be
negative. Currently, the int64_t members later on in the call stack
get cast into a size_t (without bounds checking) in 
CTxMemPool::CTxMemPool() anyway, so this type change does not
introduce any new underflow risks - it just makes them
unsigned earlier on in the process to minimize back-and-forth
type conversion.

Does that resolve it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to the code base so my question may look silly. Why the type of ancestor and descendant count are 64 bit? Isn't 32 bit more than sufficient? Although, I know that's not in the scope of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the int64_t members later on in the call stack
get cast into a size_t

Yeah so we have virtual size implemented as size_t, unsigned ints, and signed ints in various places in the codebase, and it would be good to switch to 1 type consistently. I don't think there's a PR to do all of them, but #23962 is one.

My current thinking is to use signed ints instead of size_t because we're doing arithmetic with these limits (see CalculateAncestorsAndCheckLimits, #23962 (comment)). So I always use int64_t to hold virtual sizes. Not sure where we're going long term 🤷, but that's why I don't really agree with switching to unsigned.

Why the type of ancestor and descendant count are 64 bit? Isn't 32 bit more than sufficient?

There's never a need for more than 32 bits, but switching partially would cause integer truncation somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your reasoning and the one from the linked comment, but I'm not sure it makes sense here. A lot of the comparisons we do are with other size_t's, e.g. from std::set::size(). So, we'd either have to unsafely cast them into an int64_t (probably fine, but... not ideal?) or implement a function that does it with bounds checking etc.

I'm not sure that's preferable? See e.g. the below diff, with some unsafe static_cast as well as implicit conversion (e.g. CheckPackageLimits() passing package.size() to CalculateAncestorsAndCheckLimits())

git diff
diff --git a/src/kernel/mempool_limits.h b/src/kernel/mempool_limits.h
index fd0979c6c..584af5679 100644
--- a/src/kernel/mempool_limits.h
+++ b/src/kernel/mempool_limits.h
@@ -17,20 +17,20 @@ namespace kernel {
  */
 struct MemPoolLimits {
     //! The maximum allowed number of transactions in a package including the entry and its ancestors.
-    uint64_t ancestor_count{DEFAULT_ANCESTOR_LIMIT};
+    int64_t ancestor_count{DEFAULT_ANCESTOR_LIMIT};
     //! The maximum allowed size in virtual bytes of an entry and its ancestors within a package.
-    uint64_t ancestor_size_vbytes{DEFAULT_ANCESTOR_SIZE_LIMIT_KVB * 1'000};
+    int64_t ancestor_size_vbytes{DEFAULT_ANCESTOR_SIZE_LIMIT_KVB * 1'000};
     //! The maximum allowed number of transactions in a package including the entry and its descendants.
-    uint64_t descendant_count{DEFAULT_DESCENDANT_LIMIT};
+    int64_t descendant_count{DEFAULT_DESCENDANT_LIMIT};
     //! The maximum allowed size in virtual bytes of an entry and its descendants within a package.
-    uint64_t descendant_size_vbytes{DEFAULT_DESCENDANT_SIZE_LIMIT_KVB * 1'000};
+    int64_t descendant_size_vbytes{DEFAULT_DESCENDANT_SIZE_LIMIT_KVB * 1'000};
 
     /**
      * @return MemPoolLimits with all the limits set to the maximum
      */
     static MemPoolLimits NoLimits()
     {
-        uint64_t no_limit{std::numeric_limits<uint64_t>::max()};
+        int64_t no_limit{std::numeric_limits<int64_t>::max()};
         return {no_limit, no_limit, no_limit, no_limit};
     }
 
diff --git a/src/txmempool.cpp b/src/txmempool.cpp
index 3097473ac..ee5ffff7b 100644
--- a/src/txmempool.cpp
+++ b/src/txmempool.cpp
@@ -183,14 +183,14 @@ void CTxMemPool::UpdateTransactionsFromBlock(const std::vector<uint256>& vHashes
     }
 }
 
-bool CTxMemPool::CalculateAncestorsAndCheckLimits(size_t entry_size,
-                                                  size_t entry_count,
+bool CTxMemPool::CalculateAncestorsAndCheckLimits(int64_t entry_size,
+                                                  int64_t entry_count,
                                                   setEntries& setAncestors,
                                                   CTxMemPoolEntry::Parents& staged_ancestors,
                                                   const Limits& limits,
                                                   std::string &errString) const
 {
-    size_t totalSizeWithAncestors = entry_size;
+    int64_t totalSizeWithAncestors = entry_size;
 
     while (!staged_ancestors.empty()) {
         const CTxMemPoolEntry& stage = staged_ancestors.begin()->get();
@@ -241,7 +241,7 @@ bool CTxMemPool::CheckPackageLimits(const Package& package,
             std::optional<txiter> piter = GetIter(input.prevout.hash);
             if (piter) {
                 staged_ancestors.insert(**piter);
-                if (staged_ancestors.size() + package.size() > limits.ancestor_count) {
+                if (static_cast<int64_t>(staged_ancestors.size() + package.size()) > limits.ancestor_count) {
                     errString = strprintf("too many unconfirmed parents [limit: %u]", limits.ancestor_count);
                     return false;
                 }
@@ -277,7 +277,7 @@ bool CTxMemPool::CalculateMemPoolAncestors(const CTxMemPoolEntry &entry,
             std::optional<txiter> piter = GetIter(tx.vin[i].prevout.hash);
             if (piter) {
                 staged_ancestors.insert(**piter);
-                if (staged_ancestors.size() + 1 > limits.ancestor_count) {
+                if (static_cast<int64_t>(staged_ancestors.size() + 1) > limits.ancestor_count) {
                     errString = strprintf("too many unconfirmed parents [limit: %u]", limits.ancestor_count);
                     return false;
                 }

Copy link
Contributor Author

@stickies-v stickies-v Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glozow:

So I always use int64_t to hold virtual sizes. Not sure where we're going long term 🤷, but that's why I don't really agree with switching to unsigned.

I went through all the places where any of the 4 members of MemPoolLimits are accessed. In all of them, except for one (see below), they are immediately cast to a uint. This happens explicitly e.g. here, or implicitly by passing it to a fn/ctor/variable that expects a uint or size_t (e.g. here or here or here).

The only place where we actually use the signedness of any of the members is here:

int64_t descendant_limit_bytes = mempool_opts.limits.descendant_size_vbytes * 40;

I don't think the implicit conversion introduced by this PR is problematic.

For that reason, I would argue that updating the MemPoolLimits members to be uint64_t instead of int64_t (which is necessary to avoid compiler warnings that we're comparing uint with int in all of the places outlined the diff here) is - except for the one case - not a regression, and is the most straightforward implementation for the goal of this PR. I currently have no objection to what is being proposed in #23962, but I feel like changing these members and all subsequent call sites to signed integers is orthogonal and would unnecessarily introduce complexity and controversy for this PR. As such, my preference would be to not do it here.

What do you think? If you agree with making MemPoolLimits members uint64_t, I will do a force push that removes the now unneccessary uint casts such as here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, thank you very much for going through the usage! I agree it seems appropriate to defer to a later PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3611c2b, approach NACK.

I do not agree that switching to unsigned type is an improvement. Especially for variables involved in arithmetic and comparison operations.

For example, see https://www.aristeia.com/Papers/C++ReportColumns/sep95.pdf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback hebasto. Even though the MemPoolLimits members are almost always cast to an unsigned int so I think it'd be clearer to have everything be in the type in which it's actually used, I understand your concern that you don't want to regress the interface to avoid future PRs building on an unsigned MemPoolLimits interface.

As such, I think the most straightforward way to move this PR forward and preserve as much review time as possible is to remove the commit that made the MemPoolLimits members unsigned and introduce static_cast<uint64_t> wherever necessary. This way the interface remains signed and behaviour remains unchanged (including arithmetic with unsigned integers) to how it was before this PR.

Copy link
Contributor

@aureleoules aureleoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@fanquake
Copy link
Member

Concept ACK

@stickies-v stickies-v force-pushed the mempool-simplify-fn-signatures branch from c5e5952 to 9bf8bce Compare September 16, 2022 15:33
Copy link
Contributor Author

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Force pushed to address review feedback and remove a leftover whitespace. Thank you for the quick feedback everyone!

Main changes:

  • Introduce CPFPCarveOutLimits() helper function in mempool_limits.h. Currently not used by anyone else, but a bit tidier.
  • pass Limits by reference instead of value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the commit message. Relevant part used to be:

These int64_t members later
get cast into a size_t in CTxMemPool::CTxMemPool() anyway

Is now:

These limits represent counts and sizes that should never be
negative. Currently, the int64_t members later on in the call stack
get cast into a size_t (without bounds checking) in 
CTxMemPool::CTxMemPool() anyway, so this type change does not
introduce any new underflow risks - it just makes them
unsigned earlier on in the process to minimize back-and-forth
type conversion.

Does that resolve it?

@aureleoules
Copy link
Contributor

you commited compile_commands.json

@stickies-v stickies-v force-pushed the mempool-simplify-fn-signatures branch from 9bf8bce to c17d468 Compare September 16, 2022 15:38
@stickies-v
Copy link
Contributor Author

stickies-v commented Sep 16, 2022

you commited compile_commands.json

whoops sorry, fixed and added that to global gitignore

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 17, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #25038 (policy: nVersion=3 and Package RBF by glozow)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@Riahiamirreza
Copy link
Contributor

As far as I could understand the PR Concept ACK. I think this PR is relatively easy to understand and putting the good-first-review label on it can make sense IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to the code base so my question may look silly. Why the type of ancestor and descendant count are 64 bit? Isn't 32 bit more than sufficient? Although, I know that's not in the scope of this PR.

@stickies-v stickies-v force-pushed the mempool-simplify-fn-signatures branch from c17d468 to 55b8e6f Compare September 20, 2022 12:35
@stickies-v
Copy link
Contributor Author

Force pushed to fix the failing ./test/functional/mempool_package_onemore.py test. In the last refactoring (c5e5952 -> c17d468) I should have passed m_limits when calling CPFPCarveOutLimits() because it was modified earlier in PreChecks() already. Also removed the argument-less overload of CPFPCarveOutLimits() since it's now not used anywhere anymore.

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code review ACK, just 1-2 nits

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, thank you very much for going through the usage! I agree it seems appropriate to defer to a later PR.

@stickies-v stickies-v force-pushed the mempool-simplify-fn-signatures branch from 55b8e6f to ae3a5f3 Compare September 28, 2022 15:39
Copy link
Contributor Author

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Force pushed to incorporate glozow's feedback - thank you for the review!

  • using copy constructor instead of designated initializer
  • removed the carved-out CPFPCarveOutLimits() function (~revert to previous version)

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK ae3a5f3

@glozow glozow requested a review from aureleoules October 4, 2022 08:33
Copy link
Contributor

@aureleoules aureleoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK ae3a5f3

@hebasto
Copy link
Member

hebasto commented Oct 4, 2022

Concept ACK.

@stickies-v stickies-v force-pushed the mempool-simplify-fn-signatures branch from ae3a5f3 to 11b6df3 Compare October 4, 2022 13:57
@stickies-v
Copy link
Contributor Author

Force pushed to address hebasto's concerns regarding making the MemPoolLimits interface unsigned, which goes counter to the direction we want to take.

Note: this introduces potential for follow-up improvements in CTxMemPool::CalculateAncestorsAndCheckLimits() where some of the static_casts can be removed by updating the function signature to use int64_t instead of size_t when CTxMemPoolEntry::GetCountWithDescendants() and CTxMemPoolEntry::GetSizeWithDescendants() are updated to return an int64_t. I'd prefer to minimize the

@stickies-v stickies-v force-pushed the mempool-simplify-fn-signatures branch from 11b6df3 to d0bfe68 Compare October 4, 2022 15:25
@stickies-v
Copy link
Contributor Author

Force pushed to rebase to fix failing Win64 native CI (unrelated to PR).

Copy link
Contributor

@aureleoules aureleoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK d0bfe6869855dcf112a7da640e2e8ad648f82bbd.
Since my last review:

  • rolled back from uint64_t to int64_t for descendant_count, descendant_size_vbytes and no_limit.
  • added static casts to compare these values against uint64_t types.

I also verified the values being checked against are of type size_t or uint64_t.

@hebasto
Copy link
Member

hebasto commented Oct 5, 2022

Approach ACK d0bfe6869855dcf112a7da640e2e8ad648f82bbd.

There are quite a few places in the codebase that require us to
construct a CTxMemPool without limits on ancestors and descendants.
This helper function allows us to get rid of all that duplication.
Simplifies function signatures by removing repetition of all the
ancestor/descendant limits,  and increases readability by being
more verbose by naming the limits, while still reducing the LoC.
The (100, 1000000, 1000, 1000000) limits are arbitrarily high and
don't restrict anything, they are just meant to calculate ancestors
properly. Using NoLimits() makes this intent more clear and simplifies
the code.
@stickies-v stickies-v force-pushed the mempool-simplify-fn-signatures branch from d0bfe68 to 33b12e5 Compare October 5, 2022 12:10
@stickies-v
Copy link
Contributor Author

Force pushed to address hebasto's review feedback - thank you!

  • Split out changes to test/mempool_tests.cpp into separate commit
  • Added commit with improvements to Doxygen comments
  • Made MemPoolLimits::NoLimits() constexpr

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 33b12e5, I have reviewed the code and it looks OK, I agree it can be merged.

return false;
} else if (totalSizeWithAncestors > limitAncestorSize) {
errString = strprintf("exceeds ancestor size limit [limit: %u]", limitAncestorSize);
} else if (totalSizeWithAncestors > static_cast<uint64_t>(limits.ancestor_size_vbytes)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
} else if (totalSizeWithAncestors > static_cast<uint64_t>(limits.ancestor_size_vbytes)) {
} else if (totalSizeWithAncestors > static_cast<size_t>(limits.ancestor_size_vbytes)) {

I beg a pardon for being pedantic :) Feel free to ignore this nit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a refactor PR, I'll leave this as is. limitAncestorSize was a uint64_t before, so better to change this in a future PR imo. I'll leave this comment open for visibility.

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reACK 33b12e5

@glozow glozow merged commit d33c589 into bitcoin:master Oct 9, 2022
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Oct 9, 2022
@bitcoin bitcoin locked and limited conversation to collaborators Oct 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants