Multipath support with non-zero length Connection IDs by qdeconinck · Pull Request #1310 · cloudflare/quiche

qdeconinck · 2022-09-05T08:17:31Z

This commits introduces the following features.

Leave to the application to choose the multipath extensions or not
Build on the PathEvents to let the application decide which paths to use
Provide default reasonable behavior into quiche while letting the
application provide optimised behavior

== Requesting the Multipath feature

The application can request multipath through the Config structure using a
specific API,set_multipath().

config.set_multipath(true);

The boolean value determines whether the underlying connection negotiates the
multipath extension. Once the handshake succeeded, the application can check
whether the connection has the multipath feature enabled using the
is_multipath_enabled() API on the Connection structure.

== Path management

The API requires the application to specify on which paths/4-tuples it wants
to send non-probing packets. Paths must first be validated before using them.
This is automatically done for servers, and client must use probe_path().
Once the path is validated, the application decides whether it wants active
usage through the set_active() API. It provides a single API entrypoint to
request its usage or not. For active path usage, it can use the following.

if let Some(PathEvent::Validated(local, peer)) = conn.path_event_next() {
    conn.set_active(local, peer, true).unwrap();
}

Then, the path will then be considered to use non-probing packets.

On the other hand, if for some reason the application wants to temporarily
stop sending non-probing frames on a given path, it can do the following.

conn.set_active(local, peer, false).unwrap();

Note that in such state, quiche still replies to PATH_CHALLENGEs observed on
that path.

Finally, the multipath design allows a QUIC endpoint to close/abandon a
given path along with an error code and error message, without altering the
connection's operations as long as another path is available.

conn.abandon_path(local, peer, 42, "Some error message".into()).unwrap();

-- Retrocompatibility note

The Closed variant of PathEvent is now a 4-tuple that, in addition to
the local and peer SocketAddr, also contains an u64 error code and a
Vec<u8> reason message.
There is a new variant of PathEvent: PeerPathStatus reports to the
application that the peer advertised some status for a given 4-tuple.
There are two new Error variants: UnavailablePath and
MultiPathViolation.

-- Note

Currently this API is only available when multipath feature is enabled over
the session (i.e., conn.is_multipath_enabled() returns true). If the
extension is not enabled, set_active() and abandon_path() return an
Error::InvalidState. Actually, this API might sound "double usage" along
with the migrate() API (as there is no real "connection migration" with
multipath). Should we just keep the set_active() or similarly named API
and include the migrate() functionality in set_active()? Actually, an
client application without the multipath feature could just migrate using
set_active(local, peer, true), setting the previous path in unused mode
under the hood.

== Scheduling sent packets

Similarly to the connection migration PR, there are two ways to control how
quiche schedules packets on paths.

The first consists in letting quiche handles this by itself. The
application simply uses the send() method. In the current master code,
quiche automatically handles path validation processing thanks to the
internal get_send_path_id() Connection method. The multipath patch
extends this method and iterates over all active paths following the lowest
latency path having its congestion window open heuristic (a reasonable
default in multipath protocols).

loop {
    let (write, send_info) = match conn.send(&mut out) {
            Ok(v) => v,

            Err(quiche::Error::Done) => break,

            Err(e) => {
                conn.close(false, 0x1, b"fail").ok();
                break;
            },
        };
        // write to socket bound to `send_info.from`
}

The second option is to let the application choose on which path it wants
to send the next packet. The application can iterate over the available paths
and their corresponding statistics using path_stats() and schedules packets
using send_on_path() method. This can be useful when the use case requires
some specific scheduling strategies. See apps/src/client.rs for an example
of such application-driven scheduling.

LPardue · 2022-09-05T10:50:36Z

qlog/src/events/quic.rs

+    },
+
+    PathAbandon {
+        identifier_type: u8,


in the absence of a qlog spec, I'd expect identifier_type to be u64. That makes it consistent with varint type in the frame definition.

LPardue · 2022-09-05T10:51:03Z

qlog/src/events/quic.rs

+    },
+
+    PathStatus {
+        identifier_type: u8,


in the absence of a qlog spec, I'd expect identifier_type to be u64. That makes it consistent with varint type in the frame definition.

hendrikcech · 2022-11-16T17:44:06Z

Thanks for your work on implementing MPQUIC! I played around with this code and have the suspicion that some packets are not sent on the path that they should be sent on.

My setup: I created a mininet topology to test multipath support with the quiche-server and quiche-client applications.

                Path 1
         /--- s1 --- s2 ---/
     10.0.1.1           10.0.1.2
 Client h1          Server h2
     10.0.2.1           10.0.2.2 
         \--- s3 --- s4 ---/
                Path 2

I added space_id and path_id to the qlog PacketHeader that is used in the send_single and recv_single functions. From my understanding, space_id refers to the packet number space of the sent/received packet while path_id refers to the network path that the packet will be sent over / was received from. Since we have one packet number space per path, these two values should be equal.

My tests showed that, on the server, path_id always equals space_id (values are either 0 or 1). On the client, space_id and path_id are also equal in all transport:packet_sent events.

This is however not true for transport:packet_received events on the client: here, space_id != path_id in about 3% of cases (e.g., 1300 of 40000 received packets). I was not yet able to confirm if packets are actually sent on a different network path than space_id indicates in those cases (Wireshark fails to decode the QUIC packets).

The quiche logs hint that packets, that are assigned to the packet number space of one path, are sometimes sent on the other path:

[...]
[2022-11-16T14:19:49.964226504Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 path ID 0 now see SCID with seq num 1
[2022-11-16T14:19:49.974246915Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 path ID 0 now see SCID with seq num 0
[2022-11-16T14:19:49.985896148Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 path ID 0 now see SCID with seq num 1
[...]

Is this behavior expected or is there actually something going wrong?

Attachments:

0001-Add-path_id-and-space_id-to-qlog.patch.txt: the qlog code additions to log path_id and space_id. I used the latest commit (d6772280) from this PR.
logs.zip: qlog and quiche logs of a multipath transmission

qdeconinck · 2022-11-21T07:58:34Z

@hendrikcech Nice to see you are experimenting with the code :)

It seems that quiche behaves correctly, but the server code does not send the packet on the right path. From the client logs,

[2022-11-16T14:19:43.721770238Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx pkt Short dcid=18f4cf5ad9df2fb7f8cb280b8ea06fafa2e45d3c key_phase=false len=1329 pn=0 src:10.0.1.2:4433 dst:10.0.2.1:8002
[2022-11-16T14:19:43.721878453Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 path ID 1 now see SCID with seq num 1
[2022-11-16T14:19:43.721925418Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx frm PATH_RESPONSE data=[02, e6, 51, c4, 58, 32, 39, 45]
[2022-11-16T14:19:43.721993077Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx frm PATH_CHALLENGE data=[26, 6b, ab, b8, 54, 64, 49, 0b]
[2022-11-16T14:19:43.722450331Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx frm PADDING len=1294
[2022-11-16T14:19:43.722732114Z TRACE quiche_apps::client] 10.0.2.1:8002: processed 1350 bytes
[2022-11-16T14:19:43.722774926Z TRACE quiche_apps::client] 10.0.2.1:8002: got 49 bytes
[2022-11-16T14:19:43.722840885Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx pkt Short dcid=14d656e811af235b787dc1ba7bc22a72524c0d80 key_phase=false len=28 pn=1 src:10.0.1.2:4433 dst:10.0.2.1:8002
[2022-11-16T14:19:43.722927964Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 peer reused CID 14d656e811af235b787dc1ba7bc22a72524c0d80 from path 0 on path 1
[2022-11-16T14:19:43.722971351Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 path ID 1 now see SCID with seq num 0
[2022-11-16T14:19:43.723004292Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx frm DATA_BLOCKED limit=10000000
[2022-11-16T14:19:43.723037715Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx frm STREAM id=7 off=0 len=1 fin=false

But at server side:

[2022-11-16T14:19:43.668765632Z INFO  quiche_server] 327765e95cde0abdb523c6de9e6b2d292c0ced18 Seen new path (10.0.1.2:4433, 10.0.2.1:8002)
[2022-11-16T14:19:43.668816125Z TRACE quiche_server] recv() would block
[2022-11-16T14:19:43.669135491Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx pkt Short dcid=18f4cf5ad9df2fb7f8cb280b8ea06fafa2e45d3c key_phase=false len=1312 pn=0 src:10.0.1.2:4433 dst:10.0.2.1:8002
[2022-11-16T14:19:43.669183980Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx frm PATH_RESPONSE data=[02, e6, 51, c4, 58, 32, 39, 45]
[2022-11-16T14:19:43.669228104Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx frm PATH_CHALLENGE data=[26, 6b, ab, b8, 54, 64, 49, 0b]
[2022-11-16T14:19:43.669267438Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx frm PADDING len=1294
[2022-11-16T14:19:43.669409946Z TRACE quiche::recovery] 327765e95cde0abdb523c6de9e6b2d292c0ced18 timer=1.023455331s latest_rtt=0ns srtt=None min_rtt=0ns rttvar=166.5ms loss_time=[None, None, None] loss_probes=[0, 0, 0] cwnd=13500 ssthresh=18446744073709551615 bytes_in_flight=1350 app_limited=true congestion_recovery_start_time=None Rate { delivered: 0, delivered_time: Instant { tv_sec: 228741, tv_nsec: 941993262 }, first_sent_time: Instant { tv_sec: 228741, tv_nsec: 941993262 }, end_of_app_limited: SpacedPktNum(0, 0), last_sent_packet: SpacedPktNum(1, 0), largest_acked: SpacedPktNum(0, 0), rate_sample: RateSample { delivery_rate: 0, is_app_limited: false, interval: 0ns, delivered: 0, prior_delivered: 0, prior_time: None, send_elapsed: 0ns, ack_elapsed: 0ns, rtt: 0ns } } pacer=Pacer { enabled: true, capacity: 13500, used: 0, rate: 0, last_update: Instant { tv_sec: 228741, tv_nsec: 941993262 }, next_time: Instant { tv_sec: 228741, tv_nsec: 941993262 }, max_datagram_size: 1350, last_packet_size: None, iv: 0ns } hystart=window_end=Some(SpacedPktNum(1, 0)) last_round_min_rtt=18446744073709551615.999999999s current_round_min_rtt=18446744073709551615.999999999s css_baseline_min_rtt=18446744073709551615.999999999s rtt_sample_count=0 css_start_time=None css_round_count=0 cubic={ k=0 w_max=0 } 
[2022-11-16T14:19:43.669725202Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx pkt Short dcid=14d656e811af235b787dc1ba7bc22a72524c0d80 key_phase=false len=11 pn=1 src:10.0.1.2:4433 dst:10.0.1.1:8001
[2022-11-16T14:19:43.669773886Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx frm DATA_BLOCKED limit=10000000
[2022-11-16T14:19:43.669803595Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx frm STREAM id=7 off=0 len=1 fin=false
[2022-11-16T14:19:43.669936399Z TRACE quiche::recovery] 327765e95cde0abdb523c6de9e6b2d292c0ced18 timer=218.846885ms latest_rtt=104.273399ms srtt=Some(104.518737ms) min_rtt=104.273399ms rttvar=22.416993ms loss_time=[None, None, None] loss_probes=[0, 0, 0] cwnd=13500 ssthresh=18446744073709551615 bytes_in_flight=49 app_limited=true congestion_recovery_start_time=None Rate { delivered: 2194, delivered_time: Instant { tv_sec: 228741, tv_nsec: 936736141 }, first_sent_time: Instant { tv_sec: 228741, tv_nsec: 936736141 }, end_of_app_limited: SpacedPktNum(0, 1), last_sent_packet: SpacedPktNum(0, 1), largest_acked: SpacedPktNum(0, 1), rate_sample: RateSample { delivery_rate: 4958, is_app_limited: true, interval: 104.273399ms, delivered: 517, prior_delivered: 1677, prior_time: Some(Instant { tv_sec: 228741, tv_nsec: 832462742 }), send_elapsed: 0ns, ack_elapsed: 104.273399ms, rtt: 104.273399ms } } pacer=Pacer { enabled: true, capacity: 13500, used: 0, rate: 161454, last_update: Instant { tv_sec: 228741, tv_nsec: 936736141 }, next_time: Instant { tv_sec: 228741, tv_nsec: 936736141 }, max_datagram_size: 1350, last_packet_size: Some(0), iv: 0ns } hystart=window_end=Some(SpacedPktNum(0, 0)) last_round_min_rtt=18446744073709551615.999999999s current_round_min_rtt=18446744073709551615.999999999s css_baseline_min_rtt=18446744073709551615.999999999s rtt_sample_count=0 css_start_time=None css_round_count=0 cubic={ k=0 w_max=0 } 
[2022-11-16T14:19:43.670211822Z TRACE quiche_server] 327765e95cde0abdb523c6de9e6b2d292c0ced18 written 1399 bytes

So quiche at the server side indeed indicates the second packet should be sent on the original path src:10.0.1.2:4433 dst:10.0.1.1:8001, but for some reason the client side notices that the "STREAM" packet is received on the same path than the PATH_CHALLENGE one.

It actually seems that quiche-server batches the conn.send() into a large buffer and calls send_to once the buffer is full (or if there is no additional packet to be sent), but forgets to check whether packets should be sent on different 4-tuples (only the 4-tuple of the first packet in the batch is considered). I can quickly look at fixing this behaviour of quiche-server in a commit.

hendrikcech · 2022-11-21T17:11:06Z

@qdeconinck Thanks for taking a look! I can confirm that these changes to quiche-server resolve the problem.

LPardue · 2022-09-06T13:18:26Z

apps/src/client.rs

-    let migrate_socket = if args.perform_migration {
-        let mut socket =
-            mio::net::UdpSocket::bind(bind_addr.parse().unwrap()).unwrap();
+    let mut addrs = Vec::new();


I think now would be a good opportunity to move the socket construction code into its own method.

Also, per my comment about the args, it might help to think of the possible failure scenarios with user provided addresses and add some sanity checking to report problems rather then it just letting it fail as a timeout etc

LPardue · 2022-09-06T13:27:38Z

apps/src/client.rs

+/// Generate a new pair of Source Connection ID and reset token.
+fn generate_cid_and_reset_token<T: SecureRandom>(
+    rng: &T,
+) -> (quiche::ConnectionId<'static>, u128) {
+    let mut scid = [0; quiche::MAX_CONN_ID_LEN];
+    rng.fill(&mut scid).unwrap();
+    let scid = scid.to_vec().into();
+    let mut reset_token = [0; 16];
+    rng.fill(&mut reset_token).unwrap();
+    let reset_token = u128::from_be_bytes(reset_token);
+    (scid, reset_token)
+}


this seems to be identical to the one in common.rs. Can't we just use that existing one?

Oh, this seems to be code I forgot to clean up, well spotted!

LPardue · 2023-01-23T16:27:48Z

apps/src/args.rs

  --enable-active-migration   Enable active connection migration.
  --perform-migration      Perform connection migration on another source port.
+  --multipath              Enable multipath support.
+  -A --address ADDR ...    Additional client addresses to use.


This description is kind of confusing.

Prior to this change, the client would look at the Server IP and pick an "any" socket using an IP family that matched the server's i.e. 0.0.0.0:0 or [::]:0 depending on v4 or v6 respectively

With this change, there aren't additional address that are used on top of the old behaviour, but instead they are just the addresses that would be used. This opens a few new failure scenarios that could catch users out. For example, if a server only returns an A record and the user provided a v6 client address, then the connection would fail.

Among the most confusing type of failure is where packets go to a black hole and the connection fails after a timeout.

We might want to tweak how the socket code handling for this option works, and make the description a bit clearer about what it does and how it might fail.

You're right, the current description does not suit the actual behavior of this option. As a first step, the initial description could be rewritten as "Specify source addresses to be used instead of the unspecified address" (or "instead of letting the OS decides",...).

For the family address mismatch that may arise, I can indeed rework the code to remove all the specified addresses that do not match the family of the server one. In the case none is remaining, the code can hence fallback to the original behavior, i.e., use "0.0.0.0:0" or "[::]:0".

There still remains the issue that non-routable addresses may be provided, e.g., fe80::/16 or other. I'm not sure we can do much here, so maybe update the description as "Specify source addresses to be used instead of the unspecified address, non-routable addresses will lead to connectivity issues" (maybe a bit long, but not sure how to make it shorter without loosing information).

toshiiw · 2023-02-01T04:26:33Z

I made a simple client/server example and tested with this multipath patch. I noticed there might be a flaw in the ACKMP sending logic.

My code is based on the one in quiche/apps. The client opens 2 paths against the server and writes data on a single QUIC stream. It turns out that the server sometimes stops sending ACKMPs which causes a client-side idle timeout and subsequent termination of the connection. This problem can be mitigated if the send() call in the server event loop is replaced with a send_on_path() call that iterates over all available paths, just like the client example (in the multipath branch) does.

The send_single() function sends an ACKMP only if there is an unacked packet receive don the same path as specified in the function argument, while the send() function just selects a "best" path and no other paths are tried even if the selected path doesn't yield a packet data.
I think the problem lies here.

I noticed the client is retransmitting STREAM packets before a timeout but it doesn't seem to help. I haven't checked on which path those retransmits are happening.

FrancoLiberali · 2023-02-02T17:20:24Z

apps/src/client.rs

        }

-        if args.perform_migration &&
+        if conn_args.multipath &&


When the number of addresses to use is greater than conn.available_dcids() some addresses are ignored without giving any information. I think a warning could be added to show that this happens and it can be solved with the --max-active-cids parameter

It might be indeed interesting to add a warning message indicating so, good point!

FrancoLiberali · 2023-02-02T17:28:07Z

apps/src/client.rs

@@ -116,6 +101,7 @@ pub fn connect(
    config.set_initial_max_streams_uni(conn_args.max_streams_uni);
    config.set_disable_active_migration(!conn_args.enable_active_migration);
    config.set_active_connection_id_limit(conn_args.max_active_cids);


Similar to my previous comment, I don't think it makes sense for the client to set the active_connection_id_limit to less than the number of paths it intends to use, so the configuration could be:
config.set_active_connection_id_limit(std::cmp::max( conn_args.max_active_cids, args.addrs.len().try_into().unwrap()));

I'm not sure if we want the client to include such "magic" without proper documentation, but I can wait for other opinions. Also, the comparison should be made against addrs.len() and not args.addrs.len(), as some provided addresses might be of different families than the contacted server address.

qdeconinck · 2023-02-06T15:10:59Z

@toshiiw This is strange, as the get_send_path_id() method should address this point. Could you indicate on which commit you are based, and provide logs describing the behavior? Feel free to contact me offline if preferred.

toshiiw · 2023-02-07T06:02:58Z

I tested with the following version.
Attached are sqlog files with packet namespace ids.
Data is sent from a client to a server (unidirectional).

commit d67722801b043a9b82c04fa70bf1e3240492ee23 (HEAD -> multipath)
Author: Quentin De Coninck <quentin_d@apple.com>
Date:   Mon Oct 10 17:29:15 2022 +0200

sqlog.tar.gz

I also tested with the newest code on the multipath branch but still saw premature shutdowns.

toshiiw · 2023-02-09T08:54:25Z

quiche/src/lib.rs

        }

+        let mut consider_standby = false;
+        let dgrams_to_emit = self.dgram_max_writable_len().is_some();


IIUC this always returns true even if self.dgram_send_queue is empty.

toshiiw · 2023-02-09T08:55:02Z

quiche/src/lib.rs

+        // When using multiple packet number spaces, let's force ACK_MP sending
+        // on their corresponding paths.
+        if self.is_multipath_enabled() {
+            if let Some(pid) =


...so, this code isn't executed.

qdeconinck · 2023-02-09T09:14:51Z

@toshiiw Oooh good catch indeed! If you configure your connection to enable datagrams on the connection, I can indeed reproduce the issue! I will push a fix with the adapted multipath test now; replacing let dgrams_to_emit = self.dgram_max_writable_len().is_some(); with let dgrams_to_emit = self.dgram_send_queue.has_pending(); indeed solves the issue.

qdeconinck · 2023-04-27T15:39:23Z

@ghedo Refactoring changes are isolated in #1493.

- packet number space map - spaced packet number - `PathEvent::Closed` now includes error code and reason - function refactoring in lib.rs and frame.rs

This commit introduces the following features. - Leave to the application to choose the multipath extensions or not - Build on the `PathEvent`s to let the application decide which paths to use - Provide default reasonable behavior into `quiche` while letting the application provide optimised behavior == Requesting the Multipath feature The application can request multipath through the `Config` structure using a specific API,`set_multipath()`. ```rust config.set_multipath(true); ``` The boolean value determines whether the underlying connection negotiates the multipath extension. Once the handshake succeeded, the application can check whether the connection has the multipath feature enabled using the `is_multipath_enabled()` API on the `Connection` structure. == Path management The API requires the application to specify on which paths/4-tuples it wants to send non-probing packets. Paths must first be validated before using them. This is automatically done for servers, and client must use `probe_path()`. Once the path is validated, the application decides whether it wants active usage through the `set_active()` API. It provides a single API entrypoint to request its usage or not. For active path usage, it can use the following. ```rust if let Some(PathEvent::Validated(local, peer)) = conn.path_event_next() { conn.set_active(local, peer, true).unwrap(); } ``` Then, the path will then be considered to use non-probing packets. On the other hand, if for some reason the application wants to temporarily stop sending non-probing frames on a given path, it can do the following. ```rust conn.set_active(local, peer, false).unwrap(); ``` Note that in such state, quiche still replies to PATH_CHALLENGEs observed on that path. Finally, the multipath design allows a QUIC endpoint to close/abandon a given path along with an error code and error message, without altering the connection's operations as long as another path is available. ```rust conn.abandon_path(local, peer, 42, "Some error message".into()).unwrap(); ``` -- Retrocompatibility note - The `Closed` variant of `PathEvent` is now a 4-tuple that, in addition to the local and peer `SocketAddr`, also contains an `u64` error code and a `Vec<u8>` reason message. - There is a new variant of `PathEvent`: `PeerPathStatus` reports to the application that the peer advertised some status for a given 4-tuple. - There are two new `Error` variants: `UnavailablePath` and `MultiPathViolation`. -- Note Currently this API is only available when multipath feature is enabled over the session (i.e., `conn.is_multipath_enabled()` returns `true`). If the extension is not enabled, `set_active()` and `abandon_path()` return an `Error::InvalidState`. Actually, this API might sound "double usage" along with the `migrate()` API (as there is no real "connection migration" with multipath). Should we just keep the `set_active()` or similarly named API and include the `migrate()` functionality in `set_active()`? Actually, an client application without the multipath feature could just migrate using `set_active(local, peer, true)`, setting the previous path in unused mode under the hood. == Scheduling sent packets Similarly to the connection migration PR, there are two ways to control how `quiche` schedules packets on paths. The first consists in letting `quiche` handles this by itself. The application simply uses the `send()` method. In the current master code, `quiche` automatically handles path validation processing thanks to the internal `get_send_path_id()` `Connection` method. The multipath patch extends this method and iterates over all active paths following the lowest latency path having its congestion window open heuristic (a reasonable default in multipath protocols). ```rust loop { let (write, send_info) = match conn.send(&mut out) { Ok(v) => v, Err(quiche::Error::Done) => break, Err(e) => { conn.close(false, 0x1, b"fail").ok(); break; }, }; // write to socket bound to `send_info.from` } ``` The second option is to let the application choose on which path it wants to send the next packet. The application can iterate over the available paths and their corresponding statistics using `path_stats()` and schedules packets using `send_on_path()` method. This can be useful when the use case requires some specific scheduling strategies. See `apps/src/client.rs` for an example of such application-driven scheduling.

And fix clippy

@toshiiw

Credits to @toshiiw for finding the issue.

Through the addition of the `find_scid_seq()` method on `Connection`.

That was not doing much.

vanyingenzi · 2023-10-31T15:49:58Z

Hello everyone,

I'm new to QUIC, and I'm starting my master's thesis entitled "Enhancing the Performance of a Single QUIC Connection with Multi-Path QUIC."

While conducting measurements, I've noticed that the Multi-Path extension is experiencing correctness issues in my setup. I'm using loop-back addresses on a single host. Below is a script that reproduces the issue. The server has one endpoint, and the client has two endpoints.

Observations:

The server validates the paths. However, it sends traffic on both paths, but the transfer ends prematurely with the client.
The client terminates with a timeout error due to the idle timeout, ~~where the duration of the timeout equals the time of the transfer~~.
The occurrence of this error is non-deterministic, varying from one run to another. However, it appears to happen more frequently with larger files. I tested with 1GB and 8GB only.

I haven't dug deeply into the issue because I'm uncertain whether it's due to a misconfiguration on my part or if it's a bug in the actual source code.

Thank you in advance for your time.

#!/bin/bash

# Code partlty inspired by https://github.com/tumi8/quic-10g-paper

# Variables
QUICHE_REPO="https://github.com/qdeconinck/quiche.git"
QUICHE_COMMIT="d87332018d84fb7c429ad2ed34cbfdc6ee9477c8"
RUST_PLATFORM="x86_64-unknown-linux-gnu"
FILE_SIZE=8G
NB_RUNS=10

RED='\033[0;31m'
RESET='\033[0m'

echo_red() {
    echo -e "${RED}$1${RESET}"
}

get_unused_port(){
    local port
    port=$(shuf -i 2000-65000 -n 1)
    while netstat -atn | grep -q ":$port "; do
        port=$(shuf -i 2000-65000 -n 1)
    done
    echo "$port"
}

clone_mp_quiche() {
    if [ ! -d "./quiche" ]; then
        git clone --recursive "$QUICHE_REPO"
        cd quiche || exit
        git checkout "$QUICHE_COMMIT"
        RUSTFLAGS='-C target-cpu=native' cargo build --release
        cd ..
    fi
    if [ ! -f "./quiche-client" ]; then
        cp "quiche/target/release/quiche-client" .
    fi
    if [ ! -f "./quiche-server" ]; then
        cp "quiche/target/release/quiche-server" .
    fi
}

setup_rust() {
    # Rust
    if ! rustc --version 1>/dev/null 2>&1; then
        curl --proto '=https' --tlsv1.2 -sSf -o /tmp/rustup-init.sh https://sh.rustup.rs
        chmod +x /tmp/rustup-init.sh
        /tmp/rustup-init.sh -q -y --default-host "$RUST_PLATFORM" --default-toolchain stable --profile default
        source "$HOME/.cargo/env"
    else 
        echo "Rust is already installed"
    fi
}

setup_environment() {
    mkdir -p "$(pwd)/www" "$(pwd)/responses" "$(pwd)/logs"
    fallocate -l ${FILE_SIZE} "$(pwd)/www/${FILE_SIZE}B_file"
}

iteration_loop() {
    for iter in $(seq 1 ${NB_RUNS}); do
        echo "Testing Multi-Path QUIC correctness - Iteration $iter"
        
        server_port=$(get_unused_port)
        client_port_1=$(get_unused_port)
        client_port_2=$(get_unused_port)

        # Run server
        env RUST_LOG=info ./quiche-server \
            --listen 127.0.0.1:${server_port} \
            --root "$(pwd)/www/" \
            --key "$(pwd)/quiche/apps/src/bin/cert.key" \
            --cert "$(pwd)/quiche/apps/src/bin/cert.crt" \
            --multipath \
            1>"$(pwd)/logs/server_${iter}.log" 2>&1 &
        server_pid=$!

        # Run client
        env RUST_LOG=info ./quiche-client \
            --no-verify "https://127.0.0.1:${server_port}/${FILE_SIZE}B_file" \
            --dump-responses "$(pwd)/responses/" \
            -A 127.0.0.1:${client_port_1} \
            -A 127.0.0.1:${client_port_2} \
            --multipath \
            1>"$(pwd)/logs/client_${iter}.log" 2>&1
        error_code=$?

        sleep 1
        
        kill -9 "$server_pid" 1>/dev/null 2>&1
        if [ $error_code -ne 0 ]; then
            echo_red "Error Client: $error_code"
            exit 1
        fi

        # Check if files are the same
        diff -q "$(pwd)/www/${FILE_SIZE}B_file" "$(pwd)/responses/${FILE_SIZE}B_file"
        if [ $? -ne 0 ]; then
            echo_red "Error: files are not the same"
            exit 1
        fi
    done
}

main() {
    # Version
    setup_rust
    [ $? -ne 0 ] && { echo_red "Error setting up rust"; exit 1; }
    clone_mp_quiche
    [ $? -ne 0 ] && { echo_red "Error cloning quiche"; exit 1; }
    setup_environment
    [ $? -ne 0 ] && { echo_red "Error setting up environment"; exit 1; }
    iteration_loop
}

main

logs.zip

qdeconinck requested a review from a team as a code owner September 5, 2022 08:17

LPardue reviewed Sep 5, 2022

View reviewed changes

qdeconinck force-pushed the multipath branch from 895e259 to 896295e Compare September 8, 2022 15:23

qdeconinck force-pushed the multipath branch from be47100 to 3edcdd9 Compare September 22, 2022 13:25

qdeconinck force-pushed the multipath branch 2 times, most recently from c730e2d to ad2e520 Compare October 10, 2022 15:48

qdeconinck force-pushed the multipath branch 2 times, most recently from b892a23 to 30a9dff Compare October 26, 2022 06:18

qdeconinck mentioned this pull request Oct 26, 2022

refactor Epoch to be an enum instead of usize consts #1359

Merged

qdeconinck force-pushed the multipath branch from 30a9dff to 7d6960f Compare October 27, 2022 10:52

ehaydenr mentioned this pull request Nov 9, 2022

quic: API for app to elicit ACK from peer #1361

Merged

qdeconinck force-pushed the multipath branch 3 times, most recently from d2dda3d to d677228 Compare November 15, 2022 15:06

qdeconinck mentioned this pull request Dec 1, 2022

Any implementations of multipath quic? quicwg/multipath#153

Closed

LPardue reviewed Jan 23, 2023

View reviewed changes

qdeconinck force-pushed the multipath branch from 01b61a8 to fcdd84b Compare January 27, 2023 17:08

FrancoLiberali reviewed Feb 2, 2023

View reviewed changes

toshiiw reviewed Feb 9, 2023

View reviewed changes

qdeconinck force-pushed the multipath branch from ea9e8f8 to 7b23706 Compare April 27, 2023 15:32

ghedo force-pushed the multipath branch from 7b23706 to 69adb24 Compare June 2, 2023 14:04

qdeconinck force-pushed the multipath branch from 69adb24 to 3f923ec Compare June 13, 2023 07:45

qdeconinck force-pushed the multipath branch from 3f923ec to 3801d9f Compare June 27, 2023 07:17

qdeconinck force-pushed the multipath branch from 3801d9f to 18959f7 Compare July 14, 2023 12:12

qdeconinck force-pushed the multipath branch from 6eac61f to d544b81 Compare July 22, 2023 18:31

qdeconinck force-pushed the multipath branch 2 times, most recently from ead77e6 to f1ba4a6 Compare September 25, 2023 09:18

qdeconinck and others added 17 commits October 27, 2023 16:40

refactoring to prepare multipath support without side effects

1e9f0e7

- packet number space map - spaced packet number - `PathEvent::Closed` now includes error code and reason - function refactoring in lib.rs and frame.rs

quiche-server: ensure burst sending on same path

baf017d

apps: make the client's A flag less confusing

346218e

And fix clippy

apps: add warning if --max-active-cids is too low

08630f9

clippy fixes

3289324

fix non-sending of ACK_MP when datagrams are enabled

e18eab0

Credits to @toshiiw for finding the issue.

fix comment

5148542

tests: support multipath in decode_pkt

ee598c8

Through the addition of the `find_scid_seq()` method on `Connection`.

no need for public API find_scid_seq()

f6d2c86

remove unused function since rebase

8904dff

move support to draft-ietf-quic-multipath-04

bcb50d2

add temporary simultaneous support of 04 and 05 drafts

26bebad

also update applications to be able to run old multipath

c6986a0

add some additional flags for the multipath interop @ IETF117 hackathon

0699d99

remove support for draft-ietf-quic-multipath-04

2aa5abd

move support to draft-ietf-quic-multipath-06

6a53adb

qdeconinck force-pushed the multipath branch from f1ba4a6 to 6a53adb Compare October 27, 2023 15:22

apps: remove dangling --multipath-old flag

d873320

That was not doing much.

matttbe mentioned this pull request Apr 8, 2024

socket: add MPTCP support curl/curl#13278

Closed

Conversation

qdeconinck commented Sep 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LPardue Sep 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hendrikcech commented Nov 16, 2022

Uh oh!

qdeconinck commented Nov 21, 2022

Uh oh!

hendrikcech commented Nov 21, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toshiiw commented Feb 1, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qdeconinck commented Feb 6, 2023

Uh oh!

toshiiw commented Feb 7, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qdeconinck commented Feb 9, 2023

Uh oh!

qdeconinck commented Apr 27, 2023

Uh oh!

vanyingenzi commented Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

qdeconinck commented Sep 5, 2022 •

edited

Loading

LPardue Sep 5, 2022 •

edited

Loading

vanyingenzi commented Oct 31, 2023 •

edited

Loading