Flow control in bytes by marta-lokhova · Pull Request #3607 · stellar/stellar-core

marta-lokhova · 2022-11-22T22:19:57Z

Updates to switch flow control capacity accounting to bytes instead of message counts. This allows us to support larger transactions (i.e. Soroban) better.

src/protocol-curr/xdr/Stellar-overlay.x

MonsieurNicolas · 2022-12-19T19:16:53Z

src/overlay/FlowControlCapacity.cpp

will need to update the Visual Studio project file before merging

MonsieurNicolas · 2022-12-19T19:31:31Z

src/overlay/Peer.cpp

Why do we need mFlowControlMessages->getOutboundCapacity() > 0 ?

We don't, message outbound capacity is guaranteed to always be positive (I initially made this change when I made outbound capacity int64 instead of uint64, but forgot to remove it). Removed it.

MonsieurNicolas · 2022-12-19T19:34:47Z

src/overlay/FlowControlCapacity.cpp

We should not allow to go beyond capacity, ie code should be

auto capacity = getMsgResourceCount(msg); releaseAssert(mOutboundCapacity > capacity); mOutboundCapacity -= capacity;

or follow tryLock semantics like in lockLocalCapacity

MonsieurNicolas · 2022-12-19T19:56:27Z

src/overlay/FlowControlCapacity.cpp

this should be cleaned up as part of this refactor (I am not really sure why it's done like this in the first place):

lockLocalCapacity should be called tryLockLocalCapacity

we should not update capacity if we can't lock

capacity should never get negative (this points to not having a clean definition of what we're trying to enforce)

Addressing both this and the other comment around tryLock: I don't believe the proposed approach is correct. I intentionally implemented it this way, because right now messages take up arbitrary capacity that the receiver does not know ahead of time.

Consider a simple scenario: the receiver sets its batch size to 100 bytes, and asks the sender for 100 bytes worth of traffic. The sender has the next SCP message to send of size 300 bytes. Unless we allow capacity to temporarily go negative, the system deadlocks. To unblock it, we allow going past capacity for up to (message_size - 1).

Note that we wouldn't need to do this if we implemented flow control based on uniformly sized batches (no need to go negative if we know max batch size ahead of time), but I'm not doing it in this iteration because (a) benefits of this approach are unclear and (b) seems like it'll add more complexity to the code since we'll need to deconstruct/reconstruct messages, reconcile when to request more data based on complete messages vs batches received, etc

I am not sure I understand the deadlock comment, that sounds more like a limitation of the design that I imagine we can overcome? In your example, we could just wait to receive more "send more" messages until there is more than 300?
Accounting in bytes or in terms of messages (where we already send multiple messages until we use up capacity) should behave the same way: if a peer sends us too much data, that means it's not respecting flow control, right?

Reason I flagged this in the first place is that if you have capacity left at 100 bytes we allow to receive a 100KB message (or whatever the max message size is) but we would block after receiving a small (200 bytes) transaction which means we'd allow ~2x the configured quota for large messages.

I am not sure I understand the deadlock comment, that sounds more like a limitation of the design that I imagine we can overcome? In your example, we could just wait to receive more "send more" messages until there is more than 300?

Send more is triggered when capacity is released. This means that the node needs to receive a 100 bytes worth of data before requesting more data. If it doesn't do it and requests more data arbitrarily, then that defeats the purpose of flow control. I'm not sure I understand what you're proposing, but I think it's the same thing as the current implementation (except we shouldn't send arbitrary "send more"): at the end, node receives 300 bytes when it only has 100 bytes of capacity to spare. This limitation comes from the fact that we transmit full messages, and we don't know their sizes ahead of time.

I looked into this further after our chat, and even sketched the change, bit still found several issues:

We can't apply "capacity is always positive" reasoning to reading capacity as we don't know what non-flood traffic a peer is sending us and it's not controlled with SEND_MORE messages (we only define max message size for flood traffic, like transactions).

I still find it hard to reason about the proposal and the relationship between capacity and batch size - specifically, I can't tell if this logic will actually unblock nodes in all possible situations, with all possible combinations of messages in the queue. I find it easier to reason about the "capacity goes negative" approach as we we allow at most one more message to be sent to unblock the node.

With this change, we're re-purposing the batch flag as now there are strict requirements for it vs capacity. I don't really like that we're taking a flag that was only meant to be there to improve perf and make it part of correctness requirements. I think this will also cause confusion for the operators if they decide to change their configs. I think it'll be easier to communicate that whatever capacity they pick implies up to (capacity + max_message_size - 1) max capacity.

Let me know what you think, but I'd like to re-visit the currently implemented approach, as it seems easier to reason about

We can't apply "capacity is always positive" reasoning to reading capacity as we don't know what non-flood traffic a peer is sending us and it's not controlled with SEND_MORE messages (we only define max message size for flood traffic, like transactions).

Oh I see, I missed this - what is the point of the global in bytes reading capacity?
I think the reason we want to track in bytes is to avoid network congestion, and this does not contribute to this.
This one is more for DoS protection then right? If so, we should just have enough capacity for the largest message possible with some buffer on top - so something in the order of MAX_MESSAGE_SIZE*5 (right now MAX_MESSAGE_SIZE is 16MB btw).

I still find it hard to reason about the proposal and the relationship between capacity and batch size - specifically, I can't tell if this logic will actually unblock nodes in all possible situations, with all possible combinations of messages in the queue. I find it easier to reason about the "capacity goes negative" approach as we we allow at most one more message to be sent to unblock the node.

We went over this offline already.

I actually find it harder to reason when we go negative because we have implicit limits.
In the current version for example, you have PEER_READING_CAPACITY_BYTES of about 100KB but you can read 16MB of data.

Capacity & limits should be explicit so that we can properly reason about resource utilization, right now based on settings it looks like we would only about 100KB of memory per connection but this is way off.

Same thing applies to what is "on the wire": when modeling based on bandwidth numbers, you really want to configure based on the "real" numbers.

With this change, we're re-purposing the batch flag as now there are strict requirements for it vs capacity. I don't really like that we're taking a flag that was only meant to be there to improve perf and make it part of correctness requirements. I think this will also cause confusion for the operators if they decide to change their configs. I think it'll be easier to communicate that whatever capacity they pick implies up to (capacity + max_message_size - 1) max capacity.

Those settings were always "advanced" in a way, because what we really want to control with them is max latency. I don't think making additional checks on startup to ensure that people don't do bad things is that special.

I agree that we need to make configuration as simple as possible. We went back and forth on how to configure batch size in the original flow control PR (if you remember we had things expressed as a fraction of the capacity at some point).

This is related to my previous comment on what is easier to reason about for people, in the context of controlling latency, I think that expressing that in terms of true bytes makes sense (because if people know the bandwidth they have, they can do the math on how to allocate it).
We could look into better ways to express those settings (but I don't think anybody is really going to try to configure those things).

I think I am not following the discussion in this thread. Can you briefly explain the steps in 300K message + with capacity going negative approach?

We are implementing traffic shaping but in a different way. I think this as a way of some sort of bandwidth limit on top of message count instead. Message size can be arbitrary and we do not divide messages into multiple iterations. For example, if capacity size is 100 and we have already received two messages with 40 and 50, and the next message will be 30. Receiver does not know the size beforehand but still need to ask more because there is 10 available capacity; and hence we accept 30 and make it -20 and now stop asking more messages.

I believe this PR serves its purpose for now. For future as an improvement,we can introduce a burst size in addition to capacity size, and that may help us make it better represented.

I made some changes that should hopefully help unblock this thread:

Capacity is now always positive, but there's a requirement for capacity vs maximum flood message size (what we discussed offline)

Byte accounting does not apply to reading capacity (for now). This is for simplicity, as currently maximum message size is really big, so I'd like to think about the best way to protect against DoS with large messages. We can still move this change forward, and add reading capacity incrementally (it's a purely internal change, so can be added separately without a version bump)

src/overlay/Peer.cpp

MonsieurNicolas · 2022-12-19T20:35:08Z

src/overlay/Peer.cpp

I am having a hard time understanding how we'd set this limit:

is "10 MB" too high or too low?

other limit is expressed relative to ledger size, which is easy to understand

there is a lower bound that comes from business requirements

the limit itself is trying to control for latency? something else?

Right now that limit seems too high, but I don't fully understand what we're trying to do yet and if a simple limit like this is enough.

The limit here is related to memory DoS:

Not sure how we can reason about ledger size here, because we don't know transaction size distribution per ledger. 1000 ops of 300 bytes is very different from 1000 ops of 128kb.

I'm assuming transactions of arbitrary sizes can end up in this queue. Because the variance is too big, instead I decided to put a limit that is high enough in a sense that it'll still likely allow most/all transactions, but in some degenerate cases we don't blow up our memory (large txs add up fast with the current limit of 1000 - 128kb*1000=128MB per just one peer).

I'm not sure it's too high - we're still enforcing 1000 txs per queue limit, this limit is just to prevent the queue from growing too big.

My concern here is that this limit is fine for protecting against out of memory DoS, but it may still trigger a lot of badness: I think what we discussed in a different context was maybe moving to latency based queue policies that would avoid creating "spiral of death" situations. 10MB seems like a lot of data to move especially if there are a lot of connections competing against each other (100 peers type of situation -> 1GB of stuff queued up to be sent)

I'd like to move away from reasoning that ~100 peers is the norm. We have work in flight that actively tries to reduce this number by a lot. Even without it, a more realistic number based on pubnet data is ~50-60. That being said, I agree that we may still create too much overhead.

What I was thinking we could do is use the configs for "max classic ops" vs "max Soroban ops" per ledger to calculate this limit. The reason I didn't do it is because we don't have those configs yet, and we don't even know what the values would be. Hence the static limit for now. I think we can change it in the next iteration once we have a better idea about what the distribution is going to be like. In the meantime, I'll reduce the limit to something reasonable, like 3MB.

hidenori-shinohara · 2022-12-21T20:42:35Z

src/main/Config.h

nit: FLOW_CONTROL_SEND_MORE_BATCH_SIZE_BYTES bytes?

src/overlay/Peer.h

src/protocol-curr/xdr/Stellar-overlay.x

MonsieurNicolas · 2023-01-11T23:23:32Z

src/main/Config.cpp

forgot to update example config

but see the main convo, we may want to pick different parameters exposed in config (and derive those) if this is not easy to configure

MonsieurNicolas · 2023-01-12T02:38:53Z

src/overlay/FlowControlCapacity.cpp

We can't apply "capacity is always positive" reasoning to reading capacity as we don't know what non-flood traffic a peer is sending us and it's not controlled with SEND_MORE messages (we only define max message size for flood traffic, like transactions).

Oh I see, I missed this - what is the point of the global in bytes reading capacity?
I think the reason we want to track in bytes is to avoid network congestion, and this does not contribute to this.
This one is more for DoS protection then right? If so, we should just have enough capacity for the largest message possible with some buffer on top - so something in the order of MAX_MESSAGE_SIZE*5 (right now MAX_MESSAGE_SIZE is 16MB btw).

I still find it hard to reason about the proposal and the relationship between capacity and batch size - specifically, I can't tell if this logic will actually unblock nodes in all possible situations, with all possible combinations of messages in the queue. I find it easier to reason about the "capacity goes negative" approach as we we allow at most one more message to be sent to unblock the node.

We went over this offline already.

I actually find it harder to reason when we go negative because we have implicit limits.
In the current version for example, you have PEER_READING_CAPACITY_BYTES of about 100KB but you can read 16MB of data.

Capacity & limits should be explicit so that we can properly reason about resource utilization, right now based on settings it looks like we would only about 100KB of memory per connection but this is way off.

Same thing applies to what is "on the wire": when modeling based on bandwidth numbers, you really want to configure based on the "real" numbers.

With this change, we're re-purposing the batch flag as now there are strict requirements for it vs capacity. I don't really like that we're taking a flag that was only meant to be there to improve perf and make it part of correctness requirements. I think this will also cause confusion for the operators if they decide to change their configs. I think it'll be easier to communicate that whatever capacity they pick implies up to (capacity + max_message_size - 1) max capacity.

Those settings were always "advanced" in a way, because what we really want to control with them is max latency. I don't think making additional checks on startup to ensure that people don't do bad things is that special.

I agree that we need to make configuration as simple as possible. We went back and forth on how to configure batch size in the original flow control PR (if you remember we had things expressed as a fraction of the capacity at some point).

This is related to my previous comment on what is easier to reason about for people, in the context of controlling latency, I think that expressing that in terms of true bytes makes sense (because if people know the bandwidth they have, they can do the math on how to allocate it).
We could look into better ways to express those settings (but I don't think anybody is really going to try to configure those things).

mbsdf · 2023-01-20T00:13:34Z

src/main/Config.cpp

are we planning to remove these constants in the future?

mbsdf · 2023-01-20T00:14:45Z

src/main/Config.cpp

mbsdf · 2023-01-20T00:18:47Z

src/main/Config.cpp

mbsdf · 2023-01-20T00:20:14Z

src/main/Config.cpp

what is the reason to change it from uint to int?

mbsdf · 2023-01-20T00:23:03Z

src/main/Config.h

Can these have negative values? Or do we want to keep it same as all others, int instead of uint?

…ons for testing only

mbsdf · 2023-02-02T18:57:00Z

src/main/Config.h

+    // Additional number of bytes to add to every payment operation in loadgen
+    std::vector<uint32> LOADGEN_OP_SIZE_FOR_TESTING;
+    std::vector<uint32> LOADGEN_OP_SIZE_DISTRIBUTION_FOR_TESTING;
+


mbsdf · 2023-02-02T18:57:59Z

src/main/Config.h

+    // processes `FLOW_CONTROL_SEND_MORE_BATCH_SIZE_BYTES` bytes
+    uint32_t FLOW_CONTROL_SEND_MORE_BATCH_SIZE_BYTES;
+
+    // For testing only, remove later


mbsdf · 2023-02-02T19:09:44Z

src/overlay/FlowControlCapacity.cpp

+        mCapacity.mFloodCapacity -= msgResources;
+        if (mCapacity.mFloodCapacity == 0)
+        {
+            CLOG_DEBUG(Overlay, "No flood capacity for peer {}",


msg is accepted (returns true) but prints "No flood capacity", you mean no flood capacity left anymore in debug?

mbsdf · 2023-02-02T19:13:44Z

src/overlay/FlowControlCapacity.cpp

+    {
+        if (mCapacity.mFloodCapacity == 0)
+        {
+            CLOG_DEBUG(Overlay, "Got flood capacity for peer {} ({})",


+1 adding to both total capacity and flood capacity, and debug log when will make flood capacity >=0

mbsdf · 2023-02-02T19:15:26Z

src/overlay/FlowControlCapacity.cpp

+    ZoneScoped;
+    releaseAssert(msg.type() == SEND_MORE || msg.type() == SEND_MORE_EXTENDED);
+    auto numMessages = Peer::getNumMessages(msg);
+    if (!hasOutboundCapacity(msg) && numMessages != 0)


do we use hasOutboundCapacity() somewhere else?

mbsdf · 2023-02-02T19:21:58Z

src/overlay/FlowControlCapacity.cpp

+        CLOG_DEBUG(Overlay, "Got outbound capacity for peer {}",
+                   mApp.getConfig().toShortString(mNodeID));
+    }
+    mOutboundCapacity += msg.sendMoreExtendedMessage().numBytes;


Can you add a few lines of comments (or a brief description in PR summary) regarding: sendMoreExtendedMessage + releaseOutbound() both in FlowControlCapacityBytes and FlowControlCapacity

src/overlay/Peer.cpp

mbsdf

LGTM so far

marta-lokhova · 2023-05-16T15:03:50Z

Closing this PR: the change has been split into #3737 and #3686

marta-lokhova force-pushed the flow_control_bytes_NOV branch 2 times, most recently from 6cbe869 to 8acb946 Compare November 23, 2022 17:55

marta-lokhova marked this pull request as ready for review November 29, 2022 23:29

marta-lokhova mentioned this pull request Dec 9, 2022

Flow control capacity accounting #3594

Merged

6 tasks

marta-lokhova force-pushed the flow_control_bytes_NOV branch from 9cf7aea to 77372c3 Compare December 14, 2022 17:38

marta-lokhova commented Dec 14, 2022

View reviewed changes

src/protocol-curr/xdr/Stellar-overlay.x Outdated Show resolved Hide resolved

marta-lokhova force-pushed the flow_control_bytes_NOV branch 3 times, most recently from d4e98d5 to 33c8262 Compare December 16, 2022 18:55

MonsieurNicolas requested changes Dec 19, 2022

View reviewed changes

marta-lokhova force-pushed the flow_control_bytes_NOV branch from 33c8262 to cd5c2c1 Compare December 20, 2022 23:50

hidenori-shinohara reviewed Dec 21, 2022

View reviewed changes

marta-lokhova force-pushed the flow_control_bytes_NOV branch 2 times, most recently from 3eb57ae to d7fdd1b Compare January 6, 2023 19:25

MonsieurNicolas reviewed Jan 12, 2023

View reviewed changes

mbsdf self-requested a review January 19, 2023 21:31

mbsdf reviewed Jan 20, 2023

View reviewed changes

src/main/Config.cpp Outdated

Copy link

mbsdf Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we planning to remove these constants in the future?

mbsdf reviewed Jan 20, 2023

View reviewed changes

src/main/Config.cpp Outdated

Copy link

mbsdf Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+951

mbsdf reviewed Jan 20, 2023

View reviewed changes

src/main/Config.cpp Outdated

Copy link

mbsdf Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

mbsdf reviewed Jan 20, 2023

View reviewed changes

src/main/Config.cpp Outdated

Copy link

mbsdf Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason to change it from uint to int?

mbsdf reviewed Jan 20, 2023

View reviewed changes

src/main/Config.h Outdated

Copy link

mbsdf Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these have negative values? Or do we want to keep it same as all others, int instead of uint?

marta-lokhova added 2 commits January 27, 2023 14:54

Update overlay XDR

6aed397

Introduce configs to limit flooding in bytes

f00ee9c

marta-lokhova force-pushed the flow_control_bytes_NOV branch from d7fdd1b to b2f9455 Compare January 27, 2023 23:43

marta-lokhova added 4 commits January 30, 2023 12:01

Implementation flow control based on bytes

51d523b

Implement byte-based limit on outbound queues

4e75412

Add tests

c8b6149

DO_NOT_MERGE: Add ability to add arbitrary bytes to loadgen transacti…

dee35dc

…ons for testing only

marta-lokhova force-pushed the flow_control_bytes_NOV branch from b2f9455 to dee35dc Compare January 30, 2023 20:23

Temp: make outbound byte limit a config for testing

773b516

Outbound queue byte limit bugfix

42089cd

marta-lokhova force-pushed the flow_control_bytes_NOV branch from f1ff8b7 to 42089cd Compare January 31, 2023 20:37

More test improvements

57d3710

mbsdf reviewed Feb 2, 2023

View reviewed changes

marta-lokhova commented Feb 2, 2023

View reviewed changes

src/overlay/Peer.cpp Show resolved Hide resolved

mbsdf reviewed Apr 17, 2023

View reviewed changes

marta-lokhova mentioned this pull request May 16, 2023

overlay: add byte limit to flow control #3737

Merged

marta-lokhova closed this May 16, 2023

mbsdf mentioned this pull request May 19, 2023

overlay tech debt: flow control #3686

Merged

Conversation

marta-lokhova commented Nov 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marta-lokhova Dec 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marta-lokhova commented Nov 22, 2022 •

edited

Loading

marta-lokhova Dec 22, 2022 •

edited

Loading