udp: add ECN, PMTUD, pktinfo, and GSO/GRO extensions to uv_udp_t#5116
udp: add ECN, PMTUD, pktinfo, and GSO/GRO extensions to uv_udp_t#5116jasnell wants to merge 2 commits into
Conversation
|
I just glanced at the API in uv.h, some thoughts:
|
|
I did consider adding to the existing udp handle but did not want to risk an abi breaking change, even with the reserved fields. If that's not as much of a risk tho, I can switch it. I also considered targeting V2 but as this is intended to support quic in node sooner rather than later, and since that is still on 1.x, that's where I wanted to start. |
f7b269a to
e821f58
Compare
|
Like @saghul, I don't think this needs a new handle type. Also, with a 4500 line diff, you shouldn't realistically expect anyone to review this in depth anytime soon. We're unpaid volunteers and there's funner stuff to do. |
Ok, I can update it to use the existing |
|
Ok, after chatting a bit with @santigimeno at the node.js collab summit in london this week about this and the updated plans to try to get v2 across the line, I think it does make sense to target v2 here. So I'll move this back to draft and will work on rebasing it both on v2 and using the |
|
Thanks. Will take a look when it's ready. |
a3a054f to
664a25a
Compare
|
@santigimeno @saghul ... ok, the implementation has been updated to (a) use the existing |
|
It's still over 3,100 lines of.... I don't want to use the phrase "AI slop" but I can't think of anything more appropriate. I won't speak for the others but I myself have a severe distaste for reviewing such code: a code monkey can now throw out in 30 seconds what takes me 1 hour to review. Worse, I end up having it to maintain it afterwards. Long story short: I'm not going to invest my own free time in this, and neither should other maintainers. If this is something you're doing for CloudFlare or OpenJS, then let them get in touch for a support contract. |
|
@bnoordhuis ... I appreciate your input. You've already indicated that you would not consider reviewing this PR to be "fun" and have indicated that you've "moved past caring" about what Node.js needs from libuv. Given that this is in support of a significant new Node.js feature, I wouldn't expect you personally to invest any amount of time on reviewing this PR or maintaining the changes. Implementing QUIC for Node.js is a long-time personal project that has absolutely nothing to do with Cloudflare and is not something I'm doing in any official capacity on behalf of the OpenJS foundation. I'm more than happy to help maintain these changes long term. I appreciate that @santigimeno has indicated to me that they intend to review the changes. @santigimeno, if there's anything I can do to help make the review process easier please do let me know. Roughly 50% of the LOC changed are implementation, which is reasonable split between the Unix and Windows portions (roughly 800-900 lines each). Over half the PR is tests and docs. If it would make the review easier I can split into multiple commits but I've tried to keep everything together. @saghul and @santigimeno ... one concern here worth noting.... if you look at the additional flags added in uv_common.h (lines 117-121)... they fully exhaust the remaining space in the 32-bit |
|
FWIW, the CI-docs failure (https://github.com/libuv/libuv/actions/runs/25630392082/job/75232922375?pr=5116) appear to be unrelated to this change. #5136 should fix a couple of the broken links. Not sure what's up the third one. |
Extends the existing uv_udp_t handle with per-packet metadata delivery and advanced UDP features, without introducing a new handle type. uv_udp_recv_start2() is an enhanced receive path that delivers metadata through a uv_udp_recv_t struct: ECN codepoint, local destination address, interface index, and GRO segment stride. Features are enabled at bind time via new UV_UDP_* flags: RECVECN IP_RECVTOS / IPV6_RECVTCLASS; uv_udp_set_ecn() for send PMTUD IP_MTU_DISCOVER / IP_DONTFRAG; uv_udp_set_pmtud() RECVPKTINFO IP_PKTINFO / IPV6_RECVPKTINFO (IP_RECVDSTADDR on BSD) GRO UDP GRO with transparent segment splitting (Linux 5.0+) GRO_RAW UDP GRO without splitting; app splits by segment_size TXTIME kernel packet pacing via SO_TXTIME (Linux 4.19+) uv_udp_try_send_batch() sends batches with optional UDP_SEGMENT GSO (Linux 4.18+) and per-message SCM_TXTIME timestamps. uv_udp_getmtu() queries the cached path MTU (Linux and Windows). uv_udp_set_cpu_affinity() sets SO_INCOMING_CPU (Linux 3.9+). uv_udp_configure() enables features post-bind. The enhanced recv callback is stored in the existing recv_cb field (same-sized function pointer) with a UV_HANDLE_UDP_RECV2 flag controlling dispatch. The uv_udp_t struct layout is unchanged. On Windows, WSARecvMsg is used when RECVECN or RECVPKTINFO is requested; otherwise falls back to WSARecvFrom. Where a bind flag is unsupported on the current platform it is silently ignored.
santigimeno
left a comment
There was a problem hiding this comment.
First pass focused on API shape and unix implementation only.
@jasnell yes, in principle, splitting the PR on single-feature commits would be nice.
| /* Indicates data from the Linux MSG_ERRQUEUE error queue. */ | ||
| UV_UDP_RECV_LINUX_RECVERR = 32768 |
There was a problem hiding this comment.
There's already UV_UDP_LINUX_RECVERR. Never used either
| /* Indicates data from the Linux MSG_ERRQUEUE error queue. | ||
| * Set in uv_udp_recv_t.flags when the received message came from the | ||
| * socket error queue (IP_RECVERR / IPV6_RECVERR). */ | ||
| UV_UDP_RECV_LINUX_RECVERR = 32768 |
| * - GSO (``uv_udp_try_send_batch``) | ||
| - Yes (4.18+) | ||
| - no-op | ||
| - no-op | ||
| - no-op | ||
| - no-op | ||
| * - GRO (``UV_UDP_GRO`` / ``UV_UDP_GRO_RAW``) | ||
| - Yes (5.0+) | ||
| - no-op | ||
| - no-op | ||
| - no-op | ||
| - no-op | ||
| * - SO_TXTIME (``UV_UDP_TXTIME``) | ||
| - Yes (4.19+) | ||
| - no-op | ||
| - no-op | ||
| - no-op | ||
| - no-op | ||
| * - SO_INCOMING_CPU (``uv_udp_set_cpu_affinity``) | ||
| - Yes (3.9+) | ||
| - ENOTSUP | ||
| - ENOTSUP | ||
| - ENOTSUP | ||
| - ENOTSUP |
There was a problem hiding this comment.
We usually don't support features supported on a single platform.
Most of these (the ones related to the transmission) seem something that could be implemented easily on userland, right?
GRO would require more thought: maybe a way to expose the metadata coming via ancillary data.
There was a problem hiding this comment.
well, I was hoping to come back in a follow up and add an approximated implementation for other runtimes that would make the API operational even if it needed to fallback to a not-actually-GSO/GRO impl under the covers.
The UDP_TXTIME is a kernel pacing feature. Not possible to implement at the user-space level with the same performance.
The cpu_affinity one is certainly not as critical.
| .. note:: | ||
| On **macOS**, ``IPV6_RECVPKTINFO`` is not available. IPv6 pktinfo | ||
| (``UV_UDP_RECVPKTINFO``) is silently ignored; ``recv->local.ss_family`` | ||
| will be 0. |
There was a problem hiding this comment.
I think you can use it if we define -D__APPLE_USE_RFC_3542. See for example: python/cpython#19526
There was a problem hiding this comment.
Also, is nodejs/node-v0.x-archive#6589 (comment) still a problem in macos? I assume not anymore?
There was a problem hiding this comment.
Oh! I'd completely missed that APPLE_USE_RFC_3542 was a thing. Nice.
| ``UV_UDP_RECVPKTINFO``. | ||
|
|
||
| :returns: 0 on success, ``UV_EBADF`` if no socket exists, ``UV_EINVAL`` | ||
| for unrecognized flags. |
There was a problem hiding this comment.
In which cases this function would be used?
For now I would probably drop this new method from the PR. It can be added, if we consider it useful, in a follow up PR. Some problems I see now: increases API surface. Some the options already have a public API added in this same PR. Also it sets a default uv_udp_pmtud value.
| #elif defined(IP_RECVDSTADDR) | ||
| setsockopt(fd, IPPROTO_IP, IP_RECVDSTADDR, &yes, sizeof(yes)); | ||
| #if defined(IP_RECVIF) | ||
| setsockopt(fd, IPPROTO_IP, IP_RECVIF, &yes, sizeof(yes)); |
There was a problem hiding this comment.
Why nesting these 2 options? Aren't they independent? Or enforcing that IP_RECVIF is only set when IP_RECVDSTADDR is intentional?
There was a problem hiding this comment.
That was the intention but it's worth revisiting if it's appropriate.
| if (flags & UV_UDP_PMTUD) { | ||
| /* Default to PROBE mode. */ | ||
| uv__udp_set_pmtud_opt(fd, addr->sa_family, 2); | ||
| } |
There was a problem hiding this comment.
Return error if uv__udp_set_pmtud_opt() fails?
| if (flags & UV_UDP_RECVECN) { | ||
| uv__udp_set_recvecn(fd, addr->sa_family); | ||
| handle->flags |= UV_HANDLE_UDP_RECVECN; | ||
| } |
There was a problem hiding this comment.
check return value and don't set the flag on error.
| if (flags & UV_UDP_RECVMMSG) | ||
| handle->flags |= UV_HANDLE_UDP_RECVMMSG; |
| static int uv__udp_set_recvecn(int fd, sa_family_t family) { | ||
| int yes = 1; | ||
| #if defined(IP_RECVTOS) | ||
| if (family == AF_INET) | ||
| setsockopt(fd, IPPROTO_IP, IP_RECVTOS, &yes, sizeof(yes)); | ||
| #endif | ||
| #if defined(IPV6_RECVTCLASS) | ||
| if (family == AF_INET6) | ||
| setsockopt(fd, IPPROTO_IPV6, IPV6_RECVTCLASS, &yes, sizeof(yes)); | ||
| #endif | ||
| return 0; | ||
| } | ||
|
|
There was a problem hiding this comment.
Should we handle setsockopt() errors here?
Also returning UV_ENOTSUP if no option set?
|
@santigimeno ... really appreciate the initial review. I'll take another update pass hopefully by this time next week |
Use a uv_check_t on BindingData to process outbound pending packet send, and use TrySend for actually sending packets when possible. Results in an 8% improvement in req/s and ~24% improvement in p95 latency. Also sets us up better for future improvements in libuv if the changes proposed in libuv/libuv#5116 are accepted. Signed-off-by: James M Snell <jasnell@gmail.com> Assisted-by: Opencode:Opus 4.6
Use a uv_check_t on BindingData to process outbound pending packet send, and use TrySend for actually sending packets when possible. Results in an 8% improvement in req/s and ~24% improvement in p95 latency. Also sets us up better for future improvements in libuv if the changes proposed in libuv/libuv#5116 are accepted. Signed-off-by: James M Snell <jasnell@gmail.com> Assisted-by: Opencode:Opus 4.6
Use a uv_check_t on BindingData to process outbound pending packet send, and use TrySend for actually sending packets when possible. Results in an 8% improvement in req/s and ~24% improvement in p95 latency. Also sets us up better for future improvements in libuv if the changes proposed in libuv/libuv#5116 are accepted. Signed-off-by: James M Snell <jasnell@gmail.com> Assisted-by: Opencode:Opus 4.6
Use a uv_check_t on BindingData to process outbound pending packet send, and use TrySend for actually sending packets when possible. Results in an 8% improvement in req/s and ~24% improvement in p95 latency. Also sets us up better for future improvements in libuv if the changes proposed in libuv/libuv#5116 are accepted. Signed-off-by: James M Snell <jasnell@gmail.com> Assisted-by: Opencode:Opus 4.6
Use a uv_check_t on BindingData to process outbound pending packet send, and use TrySend for actually sending packets when possible. Results in an 8% improvement in req/s and ~24% improvement in p95 latency. Also sets us up better for future improvements in libuv if the changes proposed in libuv/libuv#5116 are accepted. Signed-off-by: James M Snell <jasnell@gmail.com> Assisted-by: Opencode:Opus 4.6
Use a uv_check_t on BindingData to process outbound pending packet send, and use TrySend for actually sending packets when possible. Results in an 8% improvement in req/s and ~24% improvement in p95 latency. Also sets us up better for future improvements in libuv if the changes proposed in libuv/libuv#5116 are accepted. Signed-off-by: James M Snell <jasnell@gmail.com> Assisted-by: Opencode:Opus 4.6 PR-URL: #63267 Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Use a uv_check_t on BindingData to process outbound pending packet send, and use TrySend for actually sending packets when possible. Results in an 8% improvement in req/s and ~24% improvement in p95 latency. Also sets us up better for future improvements in libuv if the changes proposed in libuv/libuv#5116 are accepted. Signed-off-by: James M Snell <jasnell@gmail.com> Assisted-by: Opencode:Opus 4.6 PR-URL: #63267 Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Use a uv_check_t on BindingData to process outbound pending packet send, and use TrySend for actually sending packets when possible. Results in an 8% improvement in req/s and ~24% improvement in p95 latency. Also sets us up better for future improvements in libuv if the changes proposed in libuv/libuv#5116 are accepted. Signed-off-by: James M Snell <jasnell@gmail.com> Assisted-by: Opencode:Opus 4.6 PR-URL: nodejs#63267 Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Extends the existing uv_udp_t handle with per-packet metadata delivery
and advanced UDP features, without introducing a new handle type.
uv_udp_recv_start2() is an enhanced receive path that delivers metadata
through a uv_udp_recv_t struct: ECN codepoint, local destination
address, interface index, and GRO segment stride. Features are enabled
at bind time via new UV_UDP_* flags:
RECVECN IP_RECVTOS / IPV6_RECVTCLASS; uv_udp_set_ecn() for send
PMTUD IP_MTU_DISCOVER / IP_DONTFRAG; uv_udp_set_pmtud()
RECVPKTINFO IP_PKTINFO / IPV6_RECVPKTINFO (IP_RECVDSTADDR on BSD)
GRO UDP GRO with transparent segment splitting (Linux 5.0+)
GRO_RAW UDP GRO without splitting; app splits by segment_size
TXTIME kernel packet pacing via SO_TXTIME (Linux 4.19+)
uv_udp_try_send_batch() sends batches with optional UDP_SEGMENT GSO
(Linux 4.18+) and per-message SCM_TXTIME timestamps.
uv_udp_getmtu() queries the cached path MTU (Linux and Windows).
uv_udp_set_cpu_affinity() sets SO_INCOMING_CPU (Linux 3.9+).
uv_udp_configure() enables features post-bind.
The enhanced recv callback is stored in the existing recv_cb field
(same-sized function pointer) with a UV_HANDLE_UDP_RECV2 flag
controlling dispatch. The uv_udp_t struct layout is unchanged.
On Windows, WSARecvMsg is used when RECVECN or RECVPKTINFO is
requested; otherwise falls back to WSARecvFrom. Where a bind flag
is unsupported on the current platform it is silently ignored.
Some discussion of the motivation behind this: https://www.jasnell.me/posts/libuv-udp-extensions
Fixes: #3077