Multipath support with non-zero length Connection IDs#1310
Multipath support with non-zero length Connection IDs#1310qdeconinck wants to merge 18 commits intocloudflare:masterfrom
Conversation
qlog/src/events/quic.rs
Outdated
| }, | ||
|
|
||
| PathAbandon { | ||
| identifier_type: u8, |
There was a problem hiding this comment.
in the absence of a qlog spec, I'd expect identifier_type to be u64. That makes it consistent with varint type in the frame definition.
qlog/src/events/quic.rs
Outdated
| }, | ||
|
|
||
| PathStatus { | ||
| identifier_type: u8, |
There was a problem hiding this comment.
in the absence of a qlog spec, I'd expect identifier_type to be u64. That makes it consistent with varint type in the frame definition.
895e259 to
896295e
Compare
be47100 to
3edcdd9
Compare
c730e2d to
ad2e520
Compare
b892a23 to
30a9dff
Compare
d2dda3d to
d677228
Compare
|
Thanks for your work on implementing MPQUIC! I played around with this code and have the suspicion that some packets are not sent on the path that they should be sent on. My setup: I created a mininet topology to test multipath support with the quiche-server and quiche-client applications. I added My tests showed that, on the server, This is however not true for The quiche logs hint that packets, that are assigned to the packet number space of one path, are sometimes sent on the other path: Is this behavior expected or is there actually something going wrong? Attachments:
|
|
@hendrikcech Nice to see you are experimenting with the code :) It seems that But at server side: So It actually seems that |
|
@qdeconinck Thanks for taking a look! I can confirm that these changes to quiche-server resolve the problem. |
apps/src/client.rs
Outdated
| let migrate_socket = if args.perform_migration { | ||
| let mut socket = | ||
| mio::net::UdpSocket::bind(bind_addr.parse().unwrap()).unwrap(); | ||
| let mut addrs = Vec::new(); |
There was a problem hiding this comment.
I think now would be a good opportunity to move the socket construction code into its own method.
There was a problem hiding this comment.
Also, per my comment about the args, it might help to think of the possible failure scenarios with user provided addresses and add some sanity checking to report problems rather then it just letting it fail as a timeout etc
apps/src/client.rs
Outdated
| /// Generate a new pair of Source Connection ID and reset token. | ||
| fn generate_cid_and_reset_token<T: SecureRandom>( | ||
| rng: &T, | ||
| ) -> (quiche::ConnectionId<'static>, u128) { | ||
| let mut scid = [0; quiche::MAX_CONN_ID_LEN]; | ||
| rng.fill(&mut scid).unwrap(); | ||
| let scid = scid.to_vec().into(); | ||
| let mut reset_token = [0; 16]; | ||
| rng.fill(&mut reset_token).unwrap(); | ||
| let reset_token = u128::from_be_bytes(reset_token); | ||
| (scid, reset_token) | ||
| } |
There was a problem hiding this comment.
this seems to be identical to the one in common.rs. Can't we just use that existing one?
There was a problem hiding this comment.
Oh, this seems to be code I forgot to clean up, well spotted!
apps/src/args.rs
Outdated
| --enable-active-migration Enable active connection migration. | ||
| --perform-migration Perform connection migration on another source port. | ||
| --multipath Enable multipath support. | ||
| -A --address ADDR ... Additional client addresses to use. |
There was a problem hiding this comment.
This description is kind of confusing.
Prior to this change, the client would look at the Server IP and pick an "any" socket using an IP family that matched the server's i.e. 0.0.0.0:0 or [::]:0 depending on v4 or v6 respectively
With this change, there aren't additional address that are used on top of the old behaviour, but instead they are just the addresses that would be used. This opens a few new failure scenarios that could catch users out. For example, if a server only returns an A record and the user provided a v6 client address, then the connection would fail.
Among the most confusing type of failure is where packets go to a black hole and the connection fails after a timeout.
We might want to tweak how the socket code handling for this option works, and make the description a bit clearer about what it does and how it might fail.
There was a problem hiding this comment.
You're right, the current description does not suit the actual behavior of this option. As a first step, the initial description could be rewritten as "Specify source addresses to be used instead of the unspecified address" (or "instead of letting the OS decides",...).
For the family address mismatch that may arise, I can indeed rework the code to remove all the specified addresses that do not match the family of the server one. In the case none is remaining, the code can hence fallback to the original behavior, i.e., use "0.0.0.0:0" or "[::]:0".
There still remains the issue that non-routable addresses may be provided, e.g., fe80::/16 or other. I'm not sure we can do much here, so maybe update the description as "Specify source addresses to be used instead of the unspecified address, non-routable addresses will lead to connectivity issues" (maybe a bit long, but not sure how to make it shorter without loosing information).
|
I made a simple client/server example and tested with this multipath patch. I noticed there might be a flaw in the ACKMP sending logic. My code is based on the one in quiche/apps. The client opens 2 paths against the server and writes data on a single QUIC stream. It turns out that the server sometimes stops sending ACKMPs which causes a client-side idle timeout and subsequent termination of the connection. This problem can be mitigated if the send() call in the server event loop is replaced with a send_on_path() call that iterates over all available paths, just like the client example (in the multipath branch) does. The send_single() function sends an ACKMP only if there is an unacked packet receive don the same path as specified in the function argument, while the send() function just selects a "best" path and no other paths are tried even if the selected path doesn't yield a packet data. I noticed the client is retransmitting STREAM packets before a timeout but it doesn't seem to help. I haven't checked on which path those retransmits are happening. |
| } | ||
|
|
||
| if args.perform_migration && | ||
| if conn_args.multipath && |
There was a problem hiding this comment.
When the number of addresses to use is greater than conn.available_dcids() some addresses are ignored without giving any information. I think a warning could be added to show that this happens and it can be solved with the --max-active-cids parameter
There was a problem hiding this comment.
It might be indeed interesting to add a warning message indicating so, good point!
| @@ -116,6 +101,7 @@ pub fn connect( | |||
| config.set_initial_max_streams_uni(conn_args.max_streams_uni); | |||
| config.set_disable_active_migration(!conn_args.enable_active_migration); | |||
| config.set_active_connection_id_limit(conn_args.max_active_cids); | |||
There was a problem hiding this comment.
Similar to my previous comment, I don't think it makes sense for the client to set the active_connection_id_limit to less than the number of paths it intends to use, so the configuration could be:
config.set_active_connection_id_limit(std::cmp::max( conn_args.max_active_cids, args.addrs.len().try_into().unwrap()));
There was a problem hiding this comment.
I'm not sure if we want the client to include such "magic" without proper documentation, but I can wait for other opinions. Also, the comparison should be made against addrs.len() and not args.addrs.len(), as some provided addresses might be of different families than the contacted server address.
|
@toshiiw This is strange, as the |
|
I tested with the following version. I also tested with the newest code on the multipath branch but still saw premature shutdowns. |
quiche/src/lib.rs
Outdated
| } | ||
|
|
||
| let mut consider_standby = false; | ||
| let dgrams_to_emit = self.dgram_max_writable_len().is_some(); |
There was a problem hiding this comment.
IIUC this always returns true even if self.dgram_send_queue is empty.
| // When using multiple packet number spaces, let's force ACK_MP sending | ||
| // on their corresponding paths. | ||
| if self.is_multipath_enabled() { | ||
| if let Some(pid) = |
There was a problem hiding this comment.
...so, this code isn't executed.
|
@toshiiw Oooh good catch indeed! If you configure your connection to enable datagrams on the connection, I can indeed reproduce the issue! I will push a fix with the adapted multipath test now; replacing |
ead77e6 to
f1ba4a6
Compare
- packet number space map - spaced packet number - `PathEvent::Closed` now includes error code and reason - function refactoring in lib.rs and frame.rs
This commit introduces the following features.
- Leave to the application to choose the multipath extensions or not
- Build on the `PathEvent`s to let the application decide which paths to use
- Provide default reasonable behavior into `quiche` while letting the
application provide optimised behavior
== Requesting the Multipath feature
The application can request multipath through the `Config` structure using a
specific API,`set_multipath()`.
```rust
config.set_multipath(true);
```
The boolean value determines whether the underlying connection negotiates the
multipath extension. Once the handshake succeeded, the application can check
whether the connection has the multipath feature enabled using the
`is_multipath_enabled()` API on the `Connection` structure.
== Path management
The API requires the application to specify on which paths/4-tuples it wants
to send non-probing packets. Paths must first be validated before using them.
This is automatically done for servers, and client must use `probe_path()`.
Once the path is validated, the application decides whether it wants active
usage through the `set_active()` API. It provides a single API entrypoint to
request its usage or not. For active path usage, it can use the following.
```rust
if let Some(PathEvent::Validated(local, peer)) = conn.path_event_next() {
conn.set_active(local, peer, true).unwrap();
}
```
Then, the path will then be considered to use non-probing packets.
On the other hand, if for some reason the application wants to temporarily
stop sending non-probing frames on a given path, it can do the following.
```rust
conn.set_active(local, peer, false).unwrap();
```
Note that in such state, quiche still replies to PATH_CHALLENGEs observed on
that path.
Finally, the multipath design allows a QUIC endpoint to close/abandon a
given path along with an error code and error message, without altering the
connection's operations as long as another path is available.
```rust
conn.abandon_path(local, peer, 42, "Some error message".into()).unwrap();
```
-- Retrocompatibility note
- The `Closed` variant of `PathEvent` is now a 4-tuple that, in addition to
the local and peer `SocketAddr`, also contains an `u64` error code and a
`Vec<u8>` reason message.
- There is a new variant of `PathEvent`: `PeerPathStatus` reports to the
application that the peer advertised some status for a given 4-tuple.
- There are two new `Error` variants: `UnavailablePath` and
`MultiPathViolation`.
-- Note
Currently this API is only available when multipath feature is enabled over
the session (i.e., `conn.is_multipath_enabled()` returns `true`). If the
extension is not enabled, `set_active()` and `abandon_path()` return an
`Error::InvalidState`. Actually, this API might sound "double usage" along
with the `migrate()` API (as there is no real "connection migration" with
multipath). Should we just keep the `set_active()` or similarly named API
and include the `migrate()` functionality in `set_active()`? Actually, an
client application without the multipath feature could just migrate using
`set_active(local, peer, true)`, setting the previous path in unused mode
under the hood.
== Scheduling sent packets
Similarly to the connection migration PR, there are two ways to control how
`quiche` schedules packets on paths.
The first consists in letting `quiche` handles this by itself. The
application simply uses the `send()` method. In the current master code,
`quiche` automatically handles path validation processing thanks to the
internal `get_send_path_id()` `Connection` method. The multipath patch
extends this method and iterates over all active paths following the lowest
latency path having its congestion window open heuristic (a reasonable
default in multipath protocols).
```rust
loop {
let (write, send_info) = match conn.send(&mut out) {
Ok(v) => v,
Err(quiche::Error::Done) => break,
Err(e) => {
conn.close(false, 0x1, b"fail").ok();
break;
},
};
// write to socket bound to `send_info.from`
}
```
The second option is to let the application choose on which path it wants
to send the next packet. The application can iterate over the available paths
and their corresponding statistics using `path_stats()` and schedules packets
using `send_on_path()` method. This can be useful when the use case requires
some specific scheduling strategies. See `apps/src/client.rs` for an example
of such application-driven scheduling.
And fix clippy
Credits to @toshiiw for finding the issue.
Through the addition of the `find_scid_seq()` method on `Connection`.
That was not doing much.
|
Hello everyone, I'm new to QUIC, and I'm starting my master's thesis entitled "Enhancing the Performance of a Single QUIC Connection with Multi-Path QUIC." While conducting measurements, I've noticed that the Multi-Path extension is experiencing correctness issues in my setup. I'm using loop-back addresses on a single host. Below is a script that reproduces the issue. The server has one endpoint, and the client has two endpoints. Observations:
I haven't dug deeply into the issue because I'm uncertain whether it's due to a misconfiguration on my part or if it's a bug in the actual source code. Thank you in advance for your time. #!/bin/bash
# Code partlty inspired by https://github.com/tumi8/quic-10g-paper
# Variables
QUICHE_REPO="https://github.com/qdeconinck/quiche.git"
QUICHE_COMMIT="d87332018d84fb7c429ad2ed34cbfdc6ee9477c8"
RUST_PLATFORM="x86_64-unknown-linux-gnu"
FILE_SIZE=8G
NB_RUNS=10
RED='\033[0;31m'
RESET='\033[0m'
echo_red() {
echo -e "${RED}$1${RESET}"
}
get_unused_port(){
local port
port=$(shuf -i 2000-65000 -n 1)
while netstat -atn | grep -q ":$port "; do
port=$(shuf -i 2000-65000 -n 1)
done
echo "$port"
}
clone_mp_quiche() {
if [ ! -d "./quiche" ]; then
git clone --recursive "$QUICHE_REPO"
cd quiche || exit
git checkout "$QUICHE_COMMIT"
RUSTFLAGS='-C target-cpu=native' cargo build --release
cd ..
fi
if [ ! -f "./quiche-client" ]; then
cp "quiche/target/release/quiche-client" .
fi
if [ ! -f "./quiche-server" ]; then
cp "quiche/target/release/quiche-server" .
fi
}
setup_rust() {
# Rust
if ! rustc --version 1>/dev/null 2>&1; then
curl --proto '=https' --tlsv1.2 -sSf -o /tmp/rustup-init.sh https://sh.rustup.rs
chmod +x /tmp/rustup-init.sh
/tmp/rustup-init.sh -q -y --default-host "$RUST_PLATFORM" --default-toolchain stable --profile default
source "$HOME/.cargo/env"
else
echo "Rust is already installed"
fi
}
setup_environment() {
mkdir -p "$(pwd)/www" "$(pwd)/responses" "$(pwd)/logs"
fallocate -l ${FILE_SIZE} "$(pwd)/www/${FILE_SIZE}B_file"
}
iteration_loop() {
for iter in $(seq 1 ${NB_RUNS}); do
echo "Testing Multi-Path QUIC correctness - Iteration $iter"
server_port=$(get_unused_port)
client_port_1=$(get_unused_port)
client_port_2=$(get_unused_port)
# Run server
env RUST_LOG=info ./quiche-server \
--listen 127.0.0.1:${server_port} \
--root "$(pwd)/www/" \
--key "$(pwd)/quiche/apps/src/bin/cert.key" \
--cert "$(pwd)/quiche/apps/src/bin/cert.crt" \
--multipath \
1>"$(pwd)/logs/server_${iter}.log" 2>&1 &
server_pid=$!
# Run client
env RUST_LOG=info ./quiche-client \
--no-verify "https://127.0.0.1:${server_port}/${FILE_SIZE}B_file" \
--dump-responses "$(pwd)/responses/" \
-A 127.0.0.1:${client_port_1} \
-A 127.0.0.1:${client_port_2} \
--multipath \
1>"$(pwd)/logs/client_${iter}.log" 2>&1
error_code=$?
sleep 1
kill -9 "$server_pid" 1>/dev/null 2>&1
if [ $error_code -ne 0 ]; then
echo_red "Error Client: $error_code"
exit 1
fi
# Check if files are the same
diff -q "$(pwd)/www/${FILE_SIZE}B_file" "$(pwd)/responses/${FILE_SIZE}B_file"
if [ $? -ne 0 ]; then
echo_red "Error: files are not the same"
exit 1
fi
done
}
main() {
# Version
setup_rust
[ $? -ne 0 ] && { echo_red "Error setting up rust"; exit 1; }
clone_mp_quiche
[ $? -ne 0 ] && { echo_red "Error cloning quiche"; exit 1; }
setup_environment
[ $? -ne 0 ] && { echo_red "Error setting up environment"; exit 1; }
iteration_loop
}
main |
This commits introduces the following features.
PathEvents to let the application decide which paths to usequichewhile letting theapplication provide optimised behavior
== Requesting the Multipath feature
The application can request multipath through the
Configstructure using aspecific API,
set_multipath().The boolean value determines whether the underlying connection negotiates the
multipath extension. Once the handshake succeeded, the application can check
whether the connection has the multipath feature enabled using the
is_multipath_enabled()API on theConnectionstructure.== Path management
The API requires the application to specify on which paths/4-tuples it wants
to send non-probing packets. Paths must first be validated before using them.
This is automatically done for servers, and client must use
probe_path().Once the path is validated, the application decides whether it wants active
usage through the
set_active()API. It provides a single API entrypoint torequest its usage or not. For active path usage, it can use the following.
Then, the path will then be considered to use non-probing packets.
On the other hand, if for some reason the application wants to temporarily
stop sending non-probing frames on a given path, it can do the following.
Note that in such state, quiche still replies to PATH_CHALLENGEs observed on
that path.
Finally, the multipath design allows a QUIC endpoint to close/abandon a
given path along with an error code and error message, without altering the
connection's operations as long as another path is available.
-- Retrocompatibility note
Closedvariant ofPathEventis now a 4-tuple that, in addition tothe local and peer
SocketAddr, also contains anu64error code and aVec<u8>reason message.PathEvent:PeerPathStatusreports to theapplication that the peer advertised some status for a given 4-tuple.
Errorvariants:UnavailablePathandMultiPathViolation.-- Note
Currently this API is only available when multipath feature is enabled over
the session (i.e.,
conn.is_multipath_enabled()returnstrue). If theextension is not enabled,
set_active()andabandon_path()return anError::InvalidState. Actually, this API might sound "double usage" alongwith the
migrate()API (as there is no real "connection migration" withmultipath). Should we just keep the
set_active()or similarly named APIand include the
migrate()functionality inset_active()? Actually, anclient application without the multipath feature could just migrate using
set_active(local, peer, true), setting the previous path in unused modeunder the hood.
== Scheduling sent packets
Similarly to the connection migration PR, there are two ways to control how
quicheschedules packets on paths.The first consists in letting
quichehandles this by itself. Theapplication simply uses the
send()method. In the current master code,quicheautomatically handles path validation processing thanks to theinternal
get_send_path_id()Connectionmethod. The multipath patchextends this method and iterates over all active paths following the lowest
latency path having its congestion window open heuristic (a reasonable
default in multipath protocols).
The second option is to let the application choose on which path it wants
to send the next packet. The application can iterate over the available paths
and their corresponding statistics using
path_stats()and schedules packetsusing
send_on_path()method. This can be useful when the use case requiressome specific scheduling strategies. See
apps/src/client.rsfor an exampleof such application-driven scheduling.