Skip to content

Pull-mode flooding#3479

Merged
latobarita merged 11 commits intostellar:masterfrom
hidenori-shinohara:pull-mode-decent-perf
Sep 13, 2022
Merged

Pull-mode flooding#3479
latobarita merged 11 commits intostellar:masterfrom
hidenori-shinohara:pull-mode-decent-perf

Conversation

@hidenori-shinohara
Copy link
Contributor

Description

Resolves #3384.

Based on #2432. Some differences in the design are:

  • Implementation of flow control for adverts and demands
  • Use of SHA-256 to hash txns.
  • Round-robin when creating demands

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

@marta-lokhova marta-lokhova self-requested a review July 7, 2022 22:04
Copy link
Contributor

@marta-lokhova marta-lokhova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a preliminary pass, thanks for putting this together! Couple of questions, and I think we need to move the code around to appropriate modules.

Copy link
Contributor

@marta-lokhova marta-lokhova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the updates! I did another round, and I think there are a few things that can be causing instabilities you're seeing

@hidenori-shinohara hidenori-shinohara force-pushed the pull-mode-decent-perf branch 4 times, most recently from 98adf39 to c238aff Compare July 25, 2022 14:35
@hidenori-shinohara hidenori-shinohara force-pushed the pull-mode-decent-perf branch 3 times, most recently from fd66a5b to d8965a6 Compare July 26, 2022 21:18
@hidenori-shinohara hidenori-shinohara marked this pull request as ready for review August 1, 2022 18:24
@hidenori-shinohara hidenori-shinohara changed the title Draft: Pull-mode flooding Pull-mode flooding Aug 1, 2022
Copy link
Contributor

@marta-lokhova marta-lokhova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good to me, thank you! Just a few minor things. At this point I think @MonsieurNicolas should take a look.

I have one remaining question regarding tx queue flooding that I'd like to get input on. Right now because we have a timer for adverts and a timer for tx flooding, we may end up in an awkward situation where advert flooding is delayed by FLOOD_TX_PERIOD_MS + FLOOD_ADVERT_PERIOD_MS in the worst case scenario, which doesn't seem ideal, but I'm not sure how big of a problem it is. The reason is that we wanted tx advertising to be self-contained in overlay. Otherwise, tx queue needed to trigger advert flushing manually, which would violate abstraction barriers. Touching the tx queue timer itself didn't seem like an option as well, since there's non-trivial functionality to order txs there.

@marta-lokhova
Copy link
Contributor

a few more things:

  • we're missing the outbound queue drop metrics in this PR. I made the change for testing yesterday, feel free to just grab this commit: marta-lokhova@7bca3c3
  • based on what we discussed, I think doing pure round-robin is not ideal for the demand logic. Specifically, I'm thinking about the current network conditions: we have a limit of 1000 operations per ledger, with 40 ops broadcasted every 200 ms. This means that nodes with many connections (which is very common on the network) may end up sending a lot of small demands to its peers. This defeats the purpose of batching, and can get inefficient. To avoid this, I think it's better to iterate over all shuffled peers, and for each peer either fill the demand completely or until it has no more hashes. This way we have proper batching regardless of the number of connections, and load balancing due to peer randomization.

@hidenori-shinohara
Copy link
Contributor Author

hidenori-shinohara commented Aug 3, 2022

I updated the demand function so it'll take advantage of both shuffling and batching with a simpler implementation. @marta-lokhova let me know what you think

@hidenori-shinohara hidenori-shinohara force-pushed the pull-mode-decent-perf branch 2 times, most recently from bc35de7 to 4b2c8b1 Compare August 3, 2022 17:52
Copy link
Contributor

@MonsieurNicolas MonsieurNicolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed most of the code. Looks pretty good. Comments are mostly around cleanup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am getting an error trying to pull your branch because of the commit referenced by the protocol-next xdr.

fatal: Fetched in submodule path 'src/protocol-next/xdr', but it did not contain c59b1a13a587aa5aaf3bed97c01a5de559ff6650. Direct fetching of that commit failed.

I think we should just merge the xdr changes in master in xdr next when we're ready to merge this PR, that way we can point to a stable commit hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to open a PR in xdr vNext repo and bump version submodule in this PR after the other one is merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a PR stellar/stellar-xdr#24 (not ready to be merged yet)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge that other PR (ahead of merging this change) -> 1 action item in that PR for you @hidenori-shinohara

@hidenori-shinohara hidenori-shinohara force-pushed the pull-mode-decent-perf branch 4 times, most recently from a7848ba to bdded33 Compare August 31, 2022 20:22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why you made it signed instead of just following the pattern above: you still need to add asserts in the code base, and now you also introduce a bunch of warnings because int and size_t don't mix well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting warning: int64->double conversion (need explicit cast to avoid it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why you used long here, but now the warning is int64 -> long; as you're now using bigDivide you can just use int64 here.
(same problem in getMaxAdvertSize)

Copy link
Contributor

@MonsieurNicolas MonsieurNicolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks very close!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

condition is wrong (wrong boundary), and I'm still getting warnings here because you made mDemandQueueTxHashCount an uint32 and the compiler does not like += size_t. We can remove this assert and use size_t instead (pushed a commit for you with those changes as we've been bouncing over warnings that you cannot see)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still getting warnings because you're using uint32 and doing things like u32 -= u64. I am pushing an update to your branch. there is no need to use u32 when u64 is perfectly fine here (and we don't need to worry about overflow)

@hidenori-shinohara hidenori-shinohara force-pushed the pull-mode-decent-perf branch 3 times, most recently from 80c9c47 to 01ed73e Compare September 2, 2022 21:03
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

demans was moved from, so this metric always reports 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed it for CLOG_TRACE above when I was looking at logs, but I completely missed this. Thank you! Fixed it.

@marta-lokhova
Copy link
Contributor

r+ 88da68c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

transaction queues: pull mode

5 participants