implement a new configuration for read fan-out delay#7929
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
| "read_fan_out_delay_ms": { | ||
| "description": "Define number of milliseconds to wait before attempting to read from another replica. This setting can help to reduce latency spikes in case of occasional slow replicas. Default is 0, which means, that no additional request is sent.", | ||
| "type": "integer", | ||
| "format": "uint64", | ||
| "minimum": 0, | ||
| "nullable": true | ||
| }, |
There was a problem hiding this comment.
This means we cannot configure a delay of 0ms. Is that desired behavior?
How about using -1 as default to disable fan-out?
There was a problem hiding this comment.
0ms fan-out doesn't make sense. It is essentially the same as just read-fan-out-factor
There was a problem hiding this comment.
The comment gave me the impression setting 0m would completely disable fan-out.
Looking at the code reveals that it doesn't. It only disables fan out based on delay, and the fan-out triggered on a busy shard still applies.
I suggest to update the comment to make it a bit more clear.
There was a problem hiding this comment.
I've updated it to the following
Default is 0, which means delayed fan out request is disabled.
If you don't like it we can drop the commit 👍
timvisee
left a comment
There was a problem hiding this comment.
Experimented with some artificial delays. Works perfectly 👌
* implement a new configuration for read fan-out delay * upd schema * Update read_fan_out_delay_ms comment --------- Co-authored-by: timvisee <tim@visee.me>
There are cases, where we might want to smooth out tail latencies of read requests.
One of the typical exmples might be intecation happening in one replica but not another.
If we have an estimation of how long a certain request should take, we we can configure
read-fan-out-delayto fire another request to another replica, to prevent one slow replica slowing down whole service.Unlike
read-fan-out-factorthis configuration doesn't always require executing both requests, therefore it is more resource friendly at a cost of backed-in minimal delay.