feat(relay): expose circuit-v2 relay-service resource knobs#1031
Merged
Conversation
EdgeVPN previously never enabled libp2p's circuit-v2 relay *service* (only the relay *client* via DefaultEnableRelay). That meant publicly reachable cluster peers refused to carry relayed traffic for NAT-traversed peers that failed to DCUtR hole-punch (QEMU slirp, CGNAT, double-NAT). This change unconditionally enables libp2p.EnableRelayService and exposes the relayv2.Resources tunables as CLI flags / env vars so operators can widen the small libp2p defaults (128KB/2min/16 circuits/2048 buffer) when they need cluster peers to relay bulk transfers (e.g. model files for distributed inference): --relay-service-max-data EDGEVPN_RELAY_MAX_DATA 1 GiB --relay-service-max-duration EDGEVPN_RELAY_MAX_DURATION 30m --relay-service-max-circuits EDGEVPN_RELAY_MAX_CIRCUITS 64 --relay-service-reservation-ttl EDGEVPN_RELAY_RESERVATION_TTL 1h --relay-service-buffer-size EDGEVPN_RELAY_BUFFER_SIZE 64 KiB
…fering
When operators don't want a node to act as a circuit-v2 relay for
others (resource-constrained edge nodes, untrusted environments,
deployments where only a few designated nodes should relay), set
--relay-service=false / EDGEVPN_RELAY_SERVICE=false / programmatic
Connection.RelayService.Disabled=true. The node still runs as a relay
client (can reserve slots on OTHER relays via AutoRelay) — only the
incoming-reservation service is skipped.
The struct field is named Disabled (not Enabled) so the Go zero value
preserves the prior "always offer relay service" behaviour for
programmatic callers constructing &config.Config{} directly.
Adds TestRelayServiceDisabledSkipsLibp2pOption which asserts that
ToOpts produces strictly fewer node options with Disabled=true (the
libp2p.EnableRelayService wrapper disappears) and that both variants
still produce a constructible Node.
A follow-up will add a NetworkOnly mode (relay-service ACL gated on
ledger membership) so cluster relays don't service random internet
peers that found us via DHT.
Assisted-by: Claude:claude-opus-4-7
ae06b87 to
7cc5c3a
Compare
mudler
added a commit
that referenced
this pull request
May 31, 2026
Follow-up to #1031. With the relay service offered by every edgevpn node, anyone on the public DHT who finds us can reserve a slot and get us to carry their traffic. That's fine for a permissive overlay, but in many deployments — especially private clusters — only network members should be allowed to use us as a relay. Adds a relayv2.ACLFilter (NetworkOnlyACL) that consults the local ledger's alive bucket. The cadence is driven by a NetworkService that periodically snapshots the bucket into the ACL via an atomic.Pointer swap; AllowReserve checks reduce to a constant-time map lookup. Bootstrap window: A peer joining for the first time is not yet in our alive bucket (it needs to join gossipsub to write to it). If we strict-gated from t=0 a new peer could deadlock trying to reserve its way in. The ACL therefore allows ALL reservations until the first successful alive-bucket snapshot, then switches to strict mode. In practice the window is at most one refresh tick (default 30s), and the alive service starts gossiping its own host ID immediately on startup so the first refresh almost always finds at least the local node. AllowConnect is left permissive (return true): we gate the reservation step, not in-flight relayed sessions, so a peer's existing tunnel doesn't get yanked if the alive bucket briefly flickers. Knobs: --relay-service-network-only / EDGEVPN_RELAY_SERVICE_NETWORK_ONLY (bool, default false — opt-in for now until operators have a chance to verify the bootstrap behaviour against their topology). --relay-service-acl-refresh / EDGEVPN_RELAY_SERVICE_ACL_REFRESH (duration, default 30s — should be ≤ the alive-service announce interval). Tests: - TestNetworkOnlyACLBootstrapWindowAllowsAll — open until first Members - TestNetworkOnlyACLMembersGates — strict mode rejects strangers - TestNetworkOnlyACLAllowConnect — Connect always permitted - TestNetworkOnlyACLMembersIsDefensivelyCopied — caller-map mutation after Members() does not race the ACL's strict-mode reads Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
mudler
added a commit
that referenced
this pull request
May 31, 2026
Follow-up to #1031. With the relay service offered by every edgevpn node, anyone on the public DHT who finds us can reserve a slot and get us to carry their traffic. In a private cluster only network members should be able to use us as a relay. Adds a relayv2.ACLFilter (NetworkOnlyACL) that consults the local ledger's alive bucket. A NetworkService periodically snapshots the bucket into the ACL via an atomic.Pointer swap; AllowReserve checks reduce to a constant-time map lookup. Bootstrap window: A peer joining for the first time is not yet in our alive bucket (it needs to join gossipsub to write to it). If we strict-gated from t=0 a new peer could deadlock trying to reserve its way in. The ACL therefore allows ALL reservations until the first successful alive-bucket snapshot, then switches to strict mode. AllowConnect is left permissive (return true): we gate the reservation step, not in-flight relayed sessions, so a peer's existing tunnel doesn't get yanked if the alive bucket briefly flickers. Knobs: --relay-service-network-only / EDGEVPN_RELAY_SERVICE_NETWORK_ONLY bool, default TRUE — secure by default. Pass =false to open the relay to all peers. --relay-service-acl-refresh / EDGEVPN_RELAY_SERVICE_ACL_REFRESH duration, default 30s — should be <= the alive-service announce interval so churn is reflected within a couple of ticks. Tests (Ginkgo, package config_test): - bootstrap window admits any peer until the first Members call - strict mode admits members listed in the set - strict mode rejects non-members - AllowConnect stays permissive regardless of membership - Members defensively copies the caller's map (caller mutation after handover must not race readers) The ACL lives in pkg/config alongside the existing relay-service plumbing so pkg/node doesn't grow a relayv2 dependency. Wired via the existing node.WithNetworkService pattern — same shape as the alive service itself. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
mudler
added a commit
that referenced
this pull request
May 31, 2026
Follow-up to #1031. With the relay service offered by every edgevpn node, anyone on the public DHT who finds us can reserve a slot and get us to carry their traffic. In a private cluster only network members should be able to use us as a relay. Adds a relayv2.ACLFilter (NetworkOnlyACL) that consults the local ledger's alive bucket. A NetworkService periodically snapshots the bucket into the ACL via an atomic.Pointer swap; AllowReserve checks reduce to a constant-time map lookup. Bootstrap window: A peer joining for the first time is not yet in our alive bucket (it needs to join gossipsub to write to it). If we strict-gated from t=0 a new peer could deadlock trying to reserve its way in. The ACL therefore allows ALL reservations until the first successful alive-bucket snapshot, then switches to strict mode. AllowConnect is left permissive (return true): we gate the reservation step, not in-flight relayed sessions, so a peer's existing tunnel doesn't get yanked if the alive bucket briefly flickers. Knobs: --relay-service-network-only / EDGEVPN_RELAY_SERVICE_NETWORK_ONLY bool, default TRUE — secure by default. Pass =false to open the relay to all peers. --relay-service-acl-refresh / EDGEVPN_RELAY_SERVICE_ACL_REFRESH duration, default 30s — should be <= the alive-service announce interval so churn is reflected within a couple of ticks. Tests (Ginkgo, package config_test): - bootstrap window admits any peer until the first Members call - strict mode admits members listed in the set - strict mode rejects non-members - AllowConnect stays permissive regardless of membership - Members defensively copies the caller's map The ACL lives in pkg/config alongside the existing relay-service plumbing so pkg/node doesn't grow a relayv2 dependency. Wired via the existing node.WithNetworkService pattern. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
mudler
added a commit
that referenced
this pull request
May 31, 2026
#1032) * feat(relay): NetworkOnly ACL — gate reservations on cluster membership Follow-up to #1031. With the relay service offered by every edgevpn node, anyone on the public DHT who finds us can reserve a slot and get us to carry their traffic. In a private cluster only network members should be able to use us as a relay. Adds a relayv2.ACLFilter (NetworkOnlyACL) that consults the local ledger's alive bucket. A NetworkService periodically snapshots the bucket into the ACL via an atomic.Pointer swap; AllowReserve checks reduce to a constant-time map lookup. Bootstrap window: A peer joining for the first time is not yet in our alive bucket (it needs to join gossipsub to write to it). If we strict-gated from t=0 a new peer could deadlock trying to reserve its way in. The ACL therefore allows ALL reservations until the first successful alive-bucket snapshot, then switches to strict mode. AllowConnect is left permissive (return true): we gate the reservation step, not in-flight relayed sessions, so a peer's existing tunnel doesn't get yanked if the alive bucket briefly flickers. Knobs: --relay-service-network-only / EDGEVPN_RELAY_SERVICE_NETWORK_ONLY bool, default TRUE — secure by default. Pass =false to open the relay to all peers. --relay-service-acl-refresh / EDGEVPN_RELAY_SERVICE_ACL_REFRESH duration, default 30s — should be <= the alive-service announce interval so churn is reflected within a couple of ticks. Tests (Ginkgo, package config_test): - bootstrap window admits any peer until the first Members call - strict mode admits members listed in the set - strict mode rejects non-members - AllowConnect stays permissive regardless of membership - Members defensively copies the caller's map The ACL lives in pkg/config alongside the existing relay-service plumbing so pkg/node doesn't grow a relayv2 dependency. Wired via the existing node.WithNetworkService pattern. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(relay-acl): e2e reservation handshake against a real libp2p relay The state-machine specs cover the ACL in isolation. These put it inside a real circuit-v2 relay and exercise the actual reservation handshake — the contract the NetworkService relies on in production. Four scenarios: - bootstrap window: fresh ACL accepts any peer (client.Reserve returns a valid voucher) - strict + member: voucher binds to the right peer ID - strict + stranger: libp2p surfaces the relay's refusal as "reservation error: status: PERMISSION_DENIED reason: reservation failed" — proves the ACL ran on the real handshake - membership flip: pre-membership denied, then Members(set) including the joiner is called, then the very next reservation attempt succeeds (mimics the alive-bucket watcher's behaviour) Uses libp2p.ForceReachabilityPublic() in the relay host so the relay service actually advertises itself — without it AutoNAT may refuse to register /hop and the e2e test can't reach the ACL code path. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
EdgeVPN previously never enabled libp2p's circuit-v2 relay service (only the relay client via DefaultEnableRelay). That meant publicly reachable cluster peers refused to carry relayed traffic for NAT-traversed peers that failed to DCUtR hole-punch (QEMU slirp, CGNAT, double-NAT).
This change unconditionally enables libp2p.EnableRelayService and exposes the relayv2.Resources tunables as CLI flags / env vars so operators can widen the small libp2p defaults (128KB/2min/16 circuits/2048 buffer) when they need cluster peers to relay bulk transfers (e.g. model files for distributed inference):
--relay-service-max-data EDGEVPN_RELAY_MAX_DATA 1 GiB
--relay-service-max-duration EDGEVPN_RELAY_MAX_DURATION 30m
--relay-service-max-circuits EDGEVPN_RELAY_MAX_CIRCUITS 64
--relay-service-reservation-ttl EDGEVPN_RELAY_RESERVATION_TTL 1h
--relay-service-buffer-size EDGEVPN_RELAY_BUFFER_SIZE 64 KiB