[EXPERIMENT] pull-mode flooding by graydon · Pull Request #2432 · stellar/stellar-core

graydon · 2020-02-11T23:53:31Z

Pull-mode flooding

High-level explanation of changes:

The change aims to reduce the amount of redundant flood-type traffic (that is, a message received by a peer along multiple connections). It does so in a very conventional fashion: switching from "eager" to "lazy" flooding.

Specifically, flood-type traffic is no longer propagated to neighbours as soon as it's received; rather it's advertized with much smaller and denser batches of short (64 bit SIPHash-2-4) keyed hash codes. Advertisements are sent fairly often but there's a slight buffering delay to allow combining a decent-sized number of them into a single message. When an advertisement is received, the receiver immediately responds with a demand for all the hashes it hasn't yet seen, which the sender then immediately sends using the existing message types. Since advertisements are typically less than a tenth the size of the messages they're advertizing, this reduces the total traffic accordingly.

This change only modifies the flow of data between the floodgate and the IO queues -- data we've already committed to informing our neighbours about. It is thus fairly narrow and late in the full set of system queueing disciplines, and does not change most decisions around tx-queue throttling, surge-pricing, SCP rebroadcasting, etc.

It also only modifies flood traffic -- fetch traffic is unaffected.

It does add 2 WAN RTTs to flood traffic latency, and adds two possible attack vectors (that I am aware of):

a simple/dumb cache-poisoning vector where an attacker advertizes a message but then doesn't provide it when it's demanded. This is mitigated by making demands short-lived with a timeout, and allowing peers to race when re-advertizing a hash with a pending demand that's timed out.
a fancier cryptographic cache-poisoning vector where an attacker could send a message that collides with a target hash, inhibiting its peer from even demanding it from anyone (i.e. it thinks it already has message X when it really has Y, but hash(X) == hash(Y)). The hashes are keyed by LCL so this only works if an attacker can both collide a SipHash-2-4 in 5 seconds and get their packet accepted before the target packet, so the odds seem tolerable to me, but it's worth considering (there are other options here including just using SHA256s, as I did in an earlier iteration, but the SipHash variant uses 1/4 as much bandwidth, and this is all about reducing bandwidth.)

The change can coexist with existing (eager) connections that do not yet support lazy flooding, and in fact can decide message-by-message which mode to use, so one way to further mitigate both the cache-poisoning vectors and any potential latency problems is to vary the entire flooding strategy stochastically (eg. send 75% of traffic lazily and 25% eagerly). My plan is to do this and make it a config option, because it's a simple dial to understand and also allows the operator to just turn it off if it's misbehaving.

A more sophisticated strategy could opt to set a per-link policy to construct (eg.) an eager spanning tree with a backup lazy mesh (as in plumtree) but this has potentially more complex/fragile dynamics so I am deferring it for now.

As a "prior art" / comparing-notes aside: this is similar to Bitcoin Core's "compact block" / low-bandwidth mode BIP-152 except they use even smaller 6-byte truncated SipHash-2-4s (again keyed with the previous block hash).

Detailed explanation of changes:

The overlay protocol gets 2 new message types:
- FLOOD_ADVERT: a list of short hashes available to be sent
- FLOOD_DEMAND: a list of short hashes requested to be sent
Each Peer object gets 2 new fields:
- A "pending" FLOOD_ADVERT, which accumulates outgoing 64-bit advert-codes (until it's ready to be flushed)
- A timer
The Peer flushes its pending advert (sending it to its remote peer) on the earliest of 3 conditions:
- The IO loop finishes processing an incoming (large) read buffer from the remote
- The timer expires (every 100ms)
- A threshold of pending advert-codes is reached (1024)
Floodgate::FloodRecord gets 2 new fields:
- a short hash of the message keyed by the low 128 bits of the LCL to use when in sync
- a short hash of the message keyed by all zeroes to use when out of sync
The FloodGate gets 2 new maps:
- An index from short hashes to the existing FloodRecords in mFloodMap
- A map from demanded (but not yet received) short hashes to the ledger number they were demanded at, used to suppress multiple demands from being issued simultaneously, but also to eventually allow re-demanding if a demand is not fulfilled.
The implementations of FloodGate::{addRecord,broadcast} are partly merged into a new FloodGate::insert method, and broadcast is modified to push an advert into a Peer's pending FLOOD_ADVERT when possible rather than sending a full eager message.
FloodGate and OverlayManager get 2 new methods:
- demandMissing, called in response to receiving FLOOD_ADVERT. Sends a FLOOD_DEMAND for all the not-yet-known messages in the advert.
- fulfillDemand, called in response to receiving a FLOOD_DEMAND. Sends all the messages that were demanded.

graydon · 2020-02-17T19:49:42Z

Likely improvement: flush adverts to all peers after a given buffer-read rather than just the peer the read finished on.

MonsieurNicolas · 2022-09-14T16:39:05Z

this was replaced by #3479

graydon force-pushed the better-flooding branch 3 times, most recently from df69ba1 to 90a1bb5 Compare July 13, 2021 05:13

graydon added 2 commits February 27, 2022 14:37

Remove dead rand_fraction, add biased rand_flip overload.

74fd977

Add xdrComputeKeyedHash

8285793

graydon force-pushed the better-flooding branch from 90a1bb5 to 56af488 Compare February 28, 2022 00:31

WIP: add advert/demand system to flooding

5582869

graydon force-pushed the better-flooding branch from 56af488 to 5582869 Compare February 28, 2022 00:49

MonsieurNicolas mentioned this pull request Mar 29, 2022

transaction queues: pull mode #3384

Closed

5 tasks

hidenori-shinohara mentioned this pull request Jul 7, 2022

Pull-mode flooding #3479

Merged

6 tasks

MonsieurNicolas closed this Sep 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EXPERIMENT] pull-mode flooding#2432

[EXPERIMENT] pull-mode flooding#2432
graydon wants to merge 3 commits intostellar:masterfrom
graydon:better-flooding

graydon commented Feb 11, 2020 •

edited

Loading

Uh oh!

graydon commented Feb 17, 2020

Uh oh!

MonsieurNicolas commented Sep 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

graydon commented Feb 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull-mode flooding

High-level explanation of changes:

Detailed explanation of changes:

Uh oh!

graydon commented Feb 17, 2020

Uh oh!

MonsieurNicolas commented Sep 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

graydon commented Feb 11, 2020 •

edited

Loading