feat(relay): NetworkOnly ACL — gate reservations on cluster membership#1032
Merged
Conversation
cbe8b26 to
78b8fba
Compare
Follow-up to #1031. With the relay service offered by every edgevpn node, anyone on the public DHT who finds us can reserve a slot and get us to carry their traffic. In a private cluster only network members should be able to use us as a relay. Adds a relayv2.ACLFilter (NetworkOnlyACL) that consults the local ledger's alive bucket. A NetworkService periodically snapshots the bucket into the ACL via an atomic.Pointer swap; AllowReserve checks reduce to a constant-time map lookup. Bootstrap window: A peer joining for the first time is not yet in our alive bucket (it needs to join gossipsub to write to it). If we strict-gated from t=0 a new peer could deadlock trying to reserve its way in. The ACL therefore allows ALL reservations until the first successful alive-bucket snapshot, then switches to strict mode. AllowConnect is left permissive (return true): we gate the reservation step, not in-flight relayed sessions, so a peer's existing tunnel doesn't get yanked if the alive bucket briefly flickers. Knobs: --relay-service-network-only / EDGEVPN_RELAY_SERVICE_NETWORK_ONLY bool, default TRUE — secure by default. Pass =false to open the relay to all peers. --relay-service-acl-refresh / EDGEVPN_RELAY_SERVICE_ACL_REFRESH duration, default 30s — should be <= the alive-service announce interval so churn is reflected within a couple of ticks. Tests (Ginkgo, package config_test): - bootstrap window admits any peer until the first Members call - strict mode admits members listed in the set - strict mode rejects non-members - AllowConnect stays permissive regardless of membership - Members defensively copies the caller's map The ACL lives in pkg/config alongside the existing relay-service plumbing so pkg/node doesn't grow a relayv2 dependency. Wired via the existing node.WithNetworkService pattern. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The state-machine specs cover the ACL in isolation. These put it inside a real circuit-v2 relay and exercise the actual reservation handshake — the contract the NetworkService relies on in production. Four scenarios: - bootstrap window: fresh ACL accepts any peer (client.Reserve returns a valid voucher) - strict + member: voucher binds to the right peer ID - strict + stranger: libp2p surfaces the relay's refusal as "reservation error: status: PERMISSION_DENIED reason: reservation failed" — proves the ACL ran on the real handshake - membership flip: pre-membership denied, then Members(set) including the joiner is called, then the very next reservation attempt succeeds (mimics the alive-bucket watcher's behaviour) Uses libp2p.ForceReachabilityPublic() in the relay host so the relay service actually advertises itself — without it AutoNAT may refuse to register /hop and the e2e test can't reach the ACL code path. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
78b8fba to
cef854c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #1031. With the relay service offered by every edgevpn node, anyone on the public DHT who finds us can reserve a slot and get us to carry their traffic. That's fine for a permissive overlay, but in many deployments — especially private clusters — only network members should be allowed to use us as a relay.
Adds a relayv2.ACLFilter (NetworkOnlyACL) that consults the local ledger's alive bucket. The cadence is driven by a NetworkService that periodically snapshots the bucket into the ACL via an atomic.Pointer swap; AllowReserve checks reduce to a constant-time map lookup.
Bootstrap window:
A peer joining for the first time is not yet in our alive bucket
(it needs to join gossipsub to write to it). If we strict-gated
from t=0 a new peer could deadlock trying to reserve its way in.
The ACL therefore allows ALL reservations until the first
successful alive-bucket snapshot, then switches to strict mode.
In practice the window is at most one refresh tick (default 30s),
and the alive service starts gossiping its own host ID immediately
on startup so the first refresh almost always finds at least the
local node.
AllowConnect is left permissive (return true): we gate the reservation step, not in-flight relayed sessions, so a peer's existing tunnel doesn't get yanked if the alive bucket briefly flickers.
Knobs:
--relay-service-network-only / EDGEVPN_RELAY_SERVICE_NETWORK_ONLY
(bool, default false — opt-in for now until operators have a
chance to verify the bootstrap behaviour against their topology).
--relay-service-acl-refresh / EDGEVPN_RELAY_SERVICE_ACL_REFRESH
(duration, default 30s — should be ≤ the alive-service announce
interval).
Tests:
Assisted-by: Claude:claude-opus-4-7